Rate Limiter — Detailed
flowchart TB
subgraph Client[Clients]
A([App / Browser])
M([Mobile])
P((Partner API))
end
subgraph Edge[Edge Tier]
LB[L7 LB / Envoy]
GW[API Gateway]
end
subgraph RL[Rate Limit Layer]
direction TB
EX([Identity extractor<br/>API key / user / IP / tenant])
POL[Policy lookup<br/>per route + plan]
ALG[Algorithm<br/>token bucket / sliding window]
LOCAL[Local in-memory<br/>per-pod allowance]
SYNC[Async sync to central]
HDR[Set headers<br/>X-RateLimit-Remaining<br/>Retry-After]
end
subgraph Central[Central Counter Store]
R1[(Redis cluster<br/>Lua atomic ops)]
R2[(Redis replica)]
end
subgraph Plans
PDB[(Plans / Quotas DB)]
CFG[Config service /<br/>dynamic update]
end
subgraph Algos[Algorithm catalog]
TB[Token Bucket<br/>refill r tokens/s, cap b]
LB1[Leaky Bucket<br/>smooths bursts]
FW[Fixed Window<br/>simple, boundary spikes]
SW[Sliding Log<br/>exact, memory heavy]
SWC[Sliding Window Counter<br/>weighted approx]
end
subgraph Resp
OK[200 / 2xx]
DENY[429 Too Many Requests]
DEG[Degrade<br/>shed lower priority]
end
A --> LB --> GW --> EX
M --> LB
P --> LB
EX --> POL --> ALG
PDB -.policies.-> POL
CFG -.dynamic.-> POL
ALG --> LOCAL
LOCAL -. periodic sync .-> SYNC --> R1
R1 --- R2
ALG -->|allow| OK
ALG -->|over limit| DENY
ALG -->|priority| DEG
ALG --> HDR
Algos --- ALG
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class A,M,EX client;
class LB,GW edge;
class POL,SYNC,HDR,CFG,FW,SW,SWC,OK,DENY,DEG service;
class PDB datastore;
class LOCAL,R1,R2 cache;
class ALG,TB,LB1 storage;
class P external;
Algorithm cheat sheet
| Algo |
State per key |
Allows burst |
Notes |
| Token bucket |
(tokens, last_ts) |
yes (up to bucket size) |
Most common; cheap |
| Leaky bucket |
queue + drain rate |
no, smooths |
Like TB with bucket=1 |
| Fixed window |
count + window |
spike at boundary |
Easy |
| Sliding log |
timestamps array |
exact |
Memory O(N) |
| Sliding counter |
weighted avg of two fixed windows |
approx, cheap |
Cloudflare style |
Distributed correctness
- Local-only counters drift; cross-pod inconsistencies allow short bursts.
- Use Redis with Lua for atomic check-and-decrement.
- Pre-fetch tokens in batches (e.g. 10 tokens/sync) to amortize Redis RTT.
- Two-tier: local soft limit + central hard limit.
Identity / scope
- Per-IP (anti-abuse), per-API-key (paid plans), per-user, per-tenant, per-route, per-method.
- Compound key:
route:plan:user.
Failure mode
- Redis down → fail open (serve traffic) or fail closed (deny). Choose per route.
- Stampede: many synced bursts — add jitter to
Retry-After.
Glossary & fundamentals
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag |
Concept |
What it is |
Page |
HLD |
Load balancer / GSLB |
L4/L7 traffic distribution and failover |
load-balancer |
HLD |
API gateway / BFF |
single ingress, auth, rate limit, routing |
api-gateway |
HLD |
CAP / PACELC |
C vs A under partition; L vs C otherwise |
cap-pacelc |
HLD |
Leader/follower replication |
sync/semi-sync/async replication, failover |
replication-leader-follower |
HLD |
Idempotency & retries |
safe re-execution, backoff + jitter |
idempotency-retries |
LLD |
Concurrency primitives |
mutex, semaphore, RW lock, atomic, CAS |
concurrency-primitives |