Skip to content

Real-time Analytics — Notes#

Functional#

  • Sessionization, funnels, retention.
  • Per-event dashboards & ad-hoc queries.
  • Real-time alerts.
  • Export to warehouse for SQL.

Non-functional#

  • Pipeline lag < 5 s.
  • Sub-second dashboard for last 24h.
  • Petabytes long-term in lake.

Trade-offs#

  • Druid/Pinot for high-QPS dashboards; ClickHouse for ad-hoc SQL.
  • Lambda vs Kappa: kappa simpler if stream-first.
  • Exactly-once is expensive; design idempotent counters.

Refs#

  • Druid, Pinot, ClickHouse docs.
  • "Streaming Systems" Tyler Akidau.
  • Snowplow / Heap engineering posts.