Resilience Patterns — Detailed#
flowchart TB
subgraph Stability[Stability patterns]
TO[Timeout<br/>connect + read]
RT[Retry with backoff + jitter]
CB[Circuit Breaker<br/>closed -> open -> half-open]
BH[Bulkhead<br/>per-dependency pools]
SH[[Shed Load<br/>queue length / cpu]]
RL[Rate Limit / Quota]
HD[Hedged Requests]
DG[Graceful Degradation]
FB[Fallback / cached / static]
end
subgraph Pressure[Backpressure]
QS[[Bounded queues]]
REJ[Reject on full<br/>fail fast]
BP[Reactive backpressure<br/>RxJava / Reactive Streams]
FLOW[gRPC / HTTP2 flow control]
end
subgraph Failure[Failure injection]
CHAOS[Chaos Monkey / GameDays]
DELAY[Latency injection]
NETWORK[Network partition]
CRASH[Process kill]
end
subgraph Detect[Health & detection]
LIVE[Liveness probe]
READY[Readiness probe]
HC[Active health checks]
HK[Heartbeats]
PHI[Phi accrual]
CIRC[Circuit metrics]
end
subgraph Recover[Recovery]
REST[Restart / kill switch]
FAIL[Failover replicas]
SPILL[Spillover region]
REPLAY[[Replay from log / queue]]
end
Caller --> TO --> RT --> CB --> BH --> Downstream
CB --> FB
SH --> REJ
RL --> SH
HD --> Downstream
QS --> REJ
CHAOS -.exercise.-> Detect
Detect --> Recover
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class TO,RT,CB,BH,RL,HD,DG,FB,REJ,BP,FLOW,CHAOS,DELAY,NETWORK,CRASH,LIVE,READY,HC,HK,PHI,REST,FAIL,SPILL service;
class SH,QS,REPLAY queue;
class CIRC obs;
Circuit breaker states#
- Closed: requests flow; failures counted in rolling window.
- Open: requests rejected immediately; downstream is shielded.
- Half-open: after timeout, allow N probe requests; if pass → closed, else → open.
Thresholds: e.g., 50% failure rate over 20 requests in 10 s → open for 30 s.
Bulkhead#
- Separate thread/connection pool per dependency.
- A slow
analyticsAPI can't starvepaymentcalls of threads.
Timeouts#
- Always set both connect and read timeouts.
- Total timeout for a request =
t. If you do N retries, per-attempt timeout≤ t / (1 + N). - Tail latency dominates — set timeouts at p99 of healthy, not avg.
Hedged requests#
- After
p95, send a second copy; cancel slower. - Used by Google (BigTable), AWS (S3 hot reads).
- Risk: amplifies load → use sparingly with budget.
Backpressure#
- Bounded queues are mandatory; unbounded = OOM time bomb.
- Better to reject (429) than queue forever.
- Reactive Streams spec, gRPC flow control, Kafka consumer poll cadence.
Graceful degradation examples#
- Cache-only mode if DB down.
- Serve stale recommendations if model server down.
- Disable comments if comment service down (rather than fail page).
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Pub/Sub & message brokers | topics, consumer groups, delivery semantics | pub-sub-pattern |
HLD |
Leader/follower replication | sync/semi-sync/async replication, failover | replication-leader-follower |
HLD |
Idempotency & retries | safe re-execution, backoff + jitter | idempotency-retries |
HLD |
Resilience patterns | timeout, retry, breaker, bulkhead, backpressure | resilience-patterns |
HLD |
Observability | metrics, logs, traces, SLOs | observability |
HLD |
HTTP / TLS protocols | HTTP 1.1/2/3, QUIC, TLS 1.3 | http-protocols |
LLD |
REST API design | verbs, statuses, pagination, errors | rest-api-design |
LLD |
Async models | futures / async-await / coroutines / actors | async-models |