HLD
LLD
Idempotency, Retries, Backoff — Detailed
flowchart TB
subgraph Idem[Idempotency]
KEY([Client supplies key<br/>e.g. UUID per intent])
STORE[(Idempotency table<br/>key, request_hash, response, status, expires)]
LOCK[Acquire row lock by key]
INPROG[State: in_progress]
DONE[State: completed]
FAIL[State: failed]
DUP[Dup detection on retry]
TTL[24-48h TTL default]
MISMATCH[Hash mismatch -> 422]
end
subgraph Retry[Retry policy]
POL[Retryable error classification<br/>5xx, 429, network, idempotent 4xx]
NO[Non-retryable: validation, auth]
BACK[Exponential backoff base * 2^n]
JIT[Full jitter / decorrelated jitter]
MAXR[Max attempts cap]
BUDGET[Retry budget - server-side<br/>cap retry % of traffic]
end
subgraph Patterns
PUT[Use PUT not POST when possible]
PG[Idempotent operations natively<br/>SET, DELETE]
DEDUP[Dedup at consumer via msg id]
CO[Conditional update<br/>If-Match etag]
end
subgraph Pitfalls
AMP[Retry amplification - thundering herd]
DUPC[Double charge<br/>missing key]
PART[Partial side-effect before crash]
STAL[Stale response on timeout]
end
KEY --> STORE
STORE --> LOCK --> INPROG
INPROG --> DONE
INPROG --> FAIL
DUP --> STORE
POL --> BACK --> JIT
BACK --> MAXR
BUDGET -.shed.-> MAXR
AMP -. mitigate .-> JIT
AMP -. mitigate .-> BUDGET
DUPC -. mitigate .-> KEY
PART -. mitigate .-> Outbox
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class KEY client;
class LOCK,INPROG,DONE,FAIL,DUP,TTL,MISMATCH,POL,NO,BACK,JIT,MAXR,BUDGET,PUT,PG,DEDUP,CO,AMP,DUPC,PART,STAL service;
class STORE datastore;
Idempotency table flow
BEGIN
row = SELECT key FOR UPDATE
if row.status == 'completed': return row.response # replay
if row.status == 'in_progress': return 409 # still running
if row exists & hash mismatch: return 422
INSERT/UPDATE status=in_progress
-- do business work in same tx if possible
UPDATE status=completed, response=...
COMMIT
Backoff math
Exponential: delay = base · 2^n (e.g. 100, 200, 400, 800 ms).
Add jitter to avoid lockstep retries:
Full jitter: delay = rand(0, base·2^n).
Decorrelated jitter (AWS): delay = min(cap, rand(base, prev·3)).
Cap at e.g. 30 s; cap attempts at 5–10.
What is retryable
Status
Retry?
5xx
yes (with backoff)
429
yes, respect Retry-After
408 / timeout
yes, with idempotency key
4xx (400, 401, 404)
no — fix request
409
depends — refresh state first
Server-side retry budget
Limit retries to e.g. 20% of base RPS to avoid overload during incident.
Hedged requests: send second copy after p95; cancel slower. Risk: amplifies load.
Where this matters in this repo
Payment gateway, digital wallet, splitwise, e-commerce checkout, message queue,
webhooks system, distributed unique id (request dedupe), notification system.
Glossary & fundamentals
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
Tag
Concept
What it is
Page
HLD
CAP / PACELC
C vs A under partition; L vs C otherwise
cap-pacelc
HLD
Idempotency & retries
safe re-execution, backoff + jitter
idempotency-retries
HLD
Resilience patterns
timeout, retry, breaker, bulkhead, backpressure
resilience-patterns
LLD
REST API design
verbs, statuses, pagination, errors
rest-api-design