Skip to content

Observability — Simple#

Problem statement (interviewer prompt)

Design the observability stack for a microservice deployment with 200 services. Cover metrics, structured logs, distributed traces, and continuous profiling; explain SLO/SLI/error budget; design alerts that page on user-visible breakage, not on internal CPU spikes.

flowchart LR
  APP[Service] --> M[(Metrics<br/>Prometheus)]
  APP --> L[(Logs<br/>Loki / ELK)]
  APP --> T[(Traces<br/>Jaeger / Tempo)]
  M --> G[Dashboards / Alerts]
  L --> G
  T --> G

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class APP service;
    class M,L,T datastore;
    class G obs;

Three pillars: Metrics (numbers over time), Logs (events), Traces (causal chains).