Skip to content

Metrics & Monitoring — Detailed#

flowchart TB
  subgraph Sources
    APPS[Apps - instrumented]
    NODE[Node exporters]
    CADV[cAdvisor / kubelet]
    BB[Blackbox probes]
  end

  subgraph Collection
    PROM[Prometheus scrapers]
    OTEL[OTel Collector]
    PUSH[Pushgateway]
  end

  subgraph Store[Storage layer]
    LOCAL[Per-Prom WAL + blocks]
    LONG[(Long-term: Mimir / Thanos / Cortex / VictoriaMetrics)]
    S3[(Object storage)]
    DOWN[Downsampled tiers]
  end

  subgraph Query
    PQ[PromQL / Flux / MetricsQL]
    GRAF[Grafana dashboards]
    REC[Recording rules]
    ALERTR[Alert rules]
  end

  subgraph Alert
    AM[Alertmanager / route + dedup]
    SILENCE[Silences / Maintenance]
    PD[PagerDuty / Opsgenie]
    SLACK[Slack]
  end

  subgraph SLO
    SLI[SLI definitions]
    SLOC[SLOs + budgets]
    BURN[Multi-window burn-rate alerts]
  end

  Sources --> Collection --> Store
  Store --> Query --> GRAF
  Query --> Alert --> PD
  SLO --- Query

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class APPS,NODE,CADV,BB,PUSH,LOCAL,DOWN,REC,SILENCE,SLACK,SLOC service;
    class LONG datastore;
    class S3 storage;
    class PROM,OTEL,PQ,GRAF,ALERTR,AM,PD,SLI,BURN obs;

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD LSM vs B-Tree engines WAL, memtable, SSTables, compaction storage-engines-lsm-btree
HLD Observability metrics, logs, traces, SLOs observability