Skip to content

Resilience Patterns — Detailed#

flowchart TB
  subgraph Stability[Stability patterns]
    TO[Timeout<br/>connect + read]
    RT[Retry with backoff + jitter]
    CB[Circuit Breaker<br/>closed -> open -> half-open]
    BH[Bulkhead<br/>per-dependency pools]
    SH[[Shed Load<br/>queue length / cpu]]
    RL[Rate Limit / Quota]
    HD[Hedged Requests]
    DG[Graceful Degradation]
    FB[Fallback / cached / static]
  end

  subgraph Pressure[Backpressure]
    QS[[Bounded queues]]
    REJ[Reject on full<br/>fail fast]
    BP[Reactive backpressure<br/>RxJava / Reactive Streams]
    FLOW[gRPC / HTTP2 flow control]
  end

  subgraph Failure[Failure injection]
    CHAOS[Chaos Monkey / GameDays]
    DELAY[Latency injection]
    NETWORK[Network partition]
    CRASH[Process kill]
  end

  subgraph Detect[Health & detection]
    LIVE[Liveness probe]
    READY[Readiness probe]
    HC[Active health checks]
    HK[Heartbeats]
    PHI[Phi accrual]
    CIRC[Circuit metrics]
  end

  subgraph Recover[Recovery]
    REST[Restart / kill switch]
    FAIL[Failover replicas]
    SPILL[Spillover region]
    REPLAY[[Replay from log / queue]]
  end

  Caller --> TO --> RT --> CB --> BH --> Downstream
  CB --> FB
  SH --> REJ
  RL --> SH
  HD --> Downstream
  QS --> REJ
  CHAOS -.exercise.-> Detect
  Detect --> Recover

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class TO,RT,CB,BH,RL,HD,DG,FB,REJ,BP,FLOW,CHAOS,DELAY,NETWORK,CRASH,LIVE,READY,HC,HK,PHI,REST,FAIL,SPILL service;
    class SH,QS,REPLAY queue;
    class CIRC obs;

Circuit breaker states#

  • Closed: requests flow; failures counted in rolling window.
  • Open: requests rejected immediately; downstream is shielded.
  • Half-open: after timeout, allow N probe requests; if pass → closed, else → open.

Thresholds: e.g., 50% failure rate over 20 requests in 10 s → open for 30 s.

Bulkhead#

  • Separate thread/connection pool per dependency.
  • A slow analytics API can't starve payment calls of threads.

Timeouts#

  • Always set both connect and read timeouts.
  • Total timeout for a request = t. If you do N retries, per-attempt timeout ≤ t / (1 + N).
  • Tail latency dominates — set timeouts at p99 of healthy, not avg.

Hedged requests#

  • After p95, send a second copy; cancel slower.
  • Used by Google (BigTable), AWS (S3 hot reads).
  • Risk: amplifies load → use sparingly with budget.

Backpressure#

  • Bounded queues are mandatory; unbounded = OOM time bomb.
  • Better to reject (429) than queue forever.
  • Reactive Streams spec, gRPC flow control, Kafka consumer poll cadence.

Graceful degradation examples#

  • Cache-only mode if DB down.
  • Serve stale recommendations if model server down.
  • Disable comments if comment service down (rather than fail page).

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD Pub/Sub & message brokers topics, consumer groups, delivery semantics pub-sub-pattern
HLD Leader/follower replication sync/semi-sync/async replication, failover replication-leader-follower
HLD Idempotency & retries safe re-execution, backoff + jitter idempotency-retries
HLD Resilience patterns timeout, retry, breaker, bulkhead, backpressure resilience-patterns
HLD Observability metrics, logs, traces, SLOs observability
HLD HTTP / TLS protocols HTTP 1.1/2/3, QUIC, TLS 1.3 http-protocols
LLD REST API design verbs, statuses, pagination, errors rest-api-design
LLD Async models futures / async-await / coroutines / actors async-models