Skip to content

Distributed Logging — Notes#

Functional#

  • Collect logs from every host / service.
  • Parse, enrich, route by tags.
  • Index for free-text search.
  • Dashboards + alerts.
  • Tiered retention (hot / warm / cold).

Non-functional#

  • 100k+ events/s for big estates.
  • p99 indexing latency < 30 s.
  • 99.9% availability for ingest.

Capacity#

  • Logs are the most expensive observability pillar; budget by team.
  • ES hot tier: ~1 KB/event, 100M events/day = ~100 GB/day per tenant.

Trade-offs#

  • ES inverted index = great search, expensive disk.
  • Loki labels-only = cheap storage, weaker search (regex over data).
  • CDC vs polling at sources: agents always push.
  • Structured JSON logs vs free text: enforce JSON + redaction.

Refs#

  • ELK / EFK stack docs; Loki paper.
  • "Honeycomb: How we built our datastore" blogs.
  • Vector + OpenTelemetry Collector docs.