Metrics & Monitoring — Detailed#
flowchart TB
subgraph Sources
APPS[Apps - instrumented]
NODE[Node exporters]
CADV[cAdvisor / kubelet]
BB[Blackbox probes]
end
subgraph Collection
PROM[Prometheus scrapers]
OTEL[OTel Collector]
PUSH[Pushgateway]
end
subgraph Store[Storage layer]
LOCAL[Per-Prom WAL + blocks]
LONG[(Long-term: Mimir / Thanos / Cortex / VictoriaMetrics)]
S3[(Object storage)]
DOWN[Downsampled tiers]
end
subgraph Query
PQ[PromQL / Flux / MetricsQL]
GRAF[Grafana dashboards]
REC[Recording rules]
ALERTR[Alert rules]
end
subgraph Alert
AM[Alertmanager / route + dedup]
SILENCE[Silences / Maintenance]
PD[PagerDuty / Opsgenie]
SLACK[Slack]
end
subgraph SLO
SLI[SLI definitions]
SLOC[SLOs + budgets]
BURN[Multi-window burn-rate alerts]
end
Sources --> Collection --> Store
Store --> Query --> GRAF
Query --> Alert --> PD
SLO --- Query
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class APPS,NODE,CADV,BB,PUSH,LOCAL,DOWN,REC,SILENCE,SLACK,SLOC service;
class LONG datastore;
class S3 storage;
class PROM,OTEL,PQ,GRAF,ALERTR,AM,PD,SLI,BURN obs;
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
LSM vs B-Tree engines | WAL, memtable, SSTables, compaction | storage-engines-lsm-btree |
HLD |
Observability | metrics, logs, traces, SLOs | observability |