Skip to content

Idempotency, Retries, Backoff — Detailed#

flowchart TB
  subgraph Idem[Idempotency]
    KEY([Client supplies key<br/>e.g. UUID per intent])
    STORE[(Idempotency table<br/>key, request_hash, response, status, expires)]
    LOCK[Acquire row lock by key]
    INPROG[State: in_progress]
    DONE[State: completed]
    FAIL[State: failed]
    DUP[Dup detection on retry]
    TTL[24-48h TTL default]
    MISMATCH[Hash mismatch -> 422]
  end

  subgraph Retry[Retry policy]
    POL[Retryable error classification<br/>5xx, 429, network, idempotent 4xx]
    NO[Non-retryable: validation, auth]
    BACK[Exponential backoff base * 2^n]
    JIT[Full jitter / decorrelated jitter]
    MAXR[Max attempts cap]
    BUDGET[Retry budget - server-side<br/>cap retry % of traffic]
  end

  subgraph Patterns
    PUT[Use PUT not POST when possible]
    PG[Idempotent operations natively<br/>SET, DELETE]
    DEDUP[Dedup at consumer via msg id]
    CO[Conditional update<br/>If-Match etag]
  end

  subgraph Pitfalls
    AMP[Retry amplification - thundering herd]
    DUPC[Double charge<br/>missing key]
    PART[Partial side-effect before crash]
    STAL[Stale response on timeout]
  end

  KEY --> STORE
  STORE --> LOCK --> INPROG
  INPROG --> DONE
  INPROG --> FAIL
  DUP --> STORE
  POL --> BACK --> JIT
  BACK --> MAXR
  BUDGET -.shed.-> MAXR
  AMP -. mitigate .-> JIT
  AMP -. mitigate .-> BUDGET
  DUPC -. mitigate .-> KEY
  PART -. mitigate .-> Outbox

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class KEY client;
    class LOCK,INPROG,DONE,FAIL,DUP,TTL,MISMATCH,POL,NO,BACK,JIT,MAXR,BUDGET,PUT,PG,DEDUP,CO,AMP,DUPC,PART,STAL service;
    class STORE datastore;

Idempotency table flow#

BEGIN
  row = SELECT key FOR UPDATE
  if row.status == 'completed': return row.response   # replay
  if row.status == 'in_progress': return 409          # still running
  if row exists & hash mismatch: return 422
  INSERT/UPDATE status=in_progress
  -- do business work in same tx if possible
  UPDATE status=completed, response=...
COMMIT

Backoff math#

  • Exponential: delay = base · 2^n (e.g. 100, 200, 400, 800 ms).
  • Add jitter to avoid lockstep retries:
  • Full jitter: delay = rand(0, base·2^n).
  • Decorrelated jitter (AWS): delay = min(cap, rand(base, prev·3)).
  • Cap at e.g. 30 s; cap attempts at 5–10.

What is retryable#

Status Retry?
5xx yes (with backoff)
429 yes, respect Retry-After
408 / timeout yes, with idempotency key
4xx (400, 401, 404) no — fix request
409 depends — refresh state first

Server-side retry budget#

  • Limit retries to e.g. 20% of base RPS to avoid overload during incident.
  • Hedged requests: send second copy after p95; cancel slower. Risk: amplifies load.

Where this matters in this repo#

  • Payment gateway, digital wallet, splitwise, e-commerce checkout, message queue, webhooks system, distributed unique id (request dedupe), notification system.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD CAP / PACELC C vs A under partition; L vs C otherwise cap-pacelc
HLD Idempotency & retries safe re-execution, backoff + jitter idempotency-retries
HLD Resilience patterns timeout, retry, breaker, bulkhead, backpressure resilience-patterns
LLD REST API design verbs, statuses, pagination, errors rest-api-design