Building Real-Time Data Pipelines

The book:

HTAP / Hybrid Transactional/Analytical Processing

OLTP: trade speed with transaction propery

OLAP: Online for ad-hoc query, not Hive / MR Job like query job

Bid gap between OLAP / OLTP, separate data silo, headache of data sync / batch data transfer from OLTP to OLAP / etl process.

Data pattern ~ TSDB

Drawback of traditional RDB system:

stream processing problem: hard to keep context / state, e.g. uniq count, attribution, need to rely on external NoSQL (the CEP mode: trade data structure for speed).

pattern: real-time data pipeline + NoSQL for historical lookup / context bookeeping.

custom aggregation / preprocessing layer, requires business logic specific design

semi-structured data schema design

insight: simplicity leads to efficiency

deployment: orchestration frameworks