Rate Limiter — Detailed#

flowchart TB
  subgraph Client[Clients]
    A([App / Browser])
    M([Mobile])
    P((Partner API))
  end

  subgraph Edge[Edge Tier]
    LB[L7 LB / Envoy]
    GW[API Gateway]
  end

  subgraph RL[Rate Limit Layer]
    direction TB
    EX([Identity extractor<br/>API key / user / IP / tenant])
    POL[Policy lookup<br/>per route + plan]
    ALG[Algorithm<br/>token bucket / sliding window]
    LOCAL[Local in-memory<br/>per-pod allowance]
    SYNC[Async sync to central]
    HDR[Set headers<br/>X-RateLimit-Remaining<br/>Retry-After]
  end

  subgraph Central[Central Counter Store]
    R1[(Redis cluster<br/>Lua atomic ops)]
    R2[(Redis replica)]
  end

  subgraph Plans
    PDB[(Plans / Quotas DB)]
    CFG[Config service /<br/>dynamic update]
  end

  subgraph Algos[Algorithm catalog]
    TB[Token Bucket<br/>refill r tokens/s, cap b]
    LB1[Leaky Bucket<br/>smooths bursts]
    FW[Fixed Window<br/>simple, boundary spikes]
    SW[Sliding Log<br/>exact, memory heavy]
    SWC[Sliding Window Counter<br/>weighted approx]
  end

  subgraph Resp
    OK[200 / 2xx]
    DENY[429 Too Many Requests]
    DEG[Degrade<br/>shed lower priority]
  end

  A --> LB --> GW --> EX
  M --> LB
  P --> LB
  EX --> POL --> ALG
  PDB -.policies.-> POL
  CFG -.dynamic.-> POL
  ALG --> LOCAL
  LOCAL -. periodic sync .-> SYNC --> R1
  R1 --- R2
  ALG -->|allow| OK
  ALG -->|over limit| DENY
  ALG -->|priority| DEG
  ALG --> HDR
  Algos --- ALG

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class A,M,EX client;
    class LB,GW edge;
    class POL,SYNC,HDR,CFG,FW,SW,SWC,OK,DENY,DEG service;
    class PDB datastore;
    class LOCAL,R1,R2 cache;
    class ALG,TB,LB1 storage;
    class P external;

Algorithm cheat sheet#

Algo	State per key	Allows burst	Notes
Token bucket	`(tokens, last_ts)`	yes (up to bucket size)	Most common; cheap
Leaky bucket	queue + drain rate	no, smooths	Like TB with bucket=1
Fixed window	count + window	spike at boundary	Easy
Sliding log	timestamps array	exact	Memory O(N)
Sliding counter	weighted avg of two fixed windows	approx, cheap	Cloudflare style

Distributed correctness#

Local-only counters drift; cross-pod inconsistencies allow short bursts.
Use Redis with Lua for atomic check-and-decrement.
Pre-fetch tokens in batches (e.g. 10 tokens/sync) to amortize Redis RTT.
Two-tier: local soft limit + central hard limit.

Identity / scope#

Per-IP (anti-abuse), per-API-key (paid plans), per-user, per-tenant, per-route, per-method.
Compound key: route:plan:user.

Failure mode#

Redis down → fail open (serve traffic) or fail closed (deny). Choose per route.
Stampede: many synced bursts — add jitter to Retry-After.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag	Concept	What it is	Page
`HLD`	Load balancer / GSLB	L4/L7 traffic distribution and failover	load-balancer
`HLD`	API gateway / BFF	single ingress, auth, rate limit, routing	api-gateway
`HLD`	CAP / PACELC	C vs A under partition; L vs C otherwise	cap-pacelc
`HLD`	Leader/follower replication	sync/semi-sync/async replication, failover	replication-leader-follower
`HLD`	Idempotency & retries	safe re-execution, backoff + jitter	idempotency-retries
`LLD`	Concurrency primitives	mutex, semaphore, RW lock, atomic, CAS	concurrency-primitives