Skip to content

Rate Limiter — Detailed#

flowchart TB
  subgraph Client[Clients]
    A([App / Browser])
    M([Mobile])
    P((Partner API))
  end

  subgraph Edge[Edge Tier]
    LB[L7 LB / Envoy]
    GW[API Gateway]
  end

  subgraph RL[Rate Limit Layer]
    direction TB
    EX([Identity extractor<br/>API key / user / IP / tenant])
    POL[Policy lookup<br/>per route + plan]
    ALG[Algorithm<br/>token bucket / sliding window]
    LOCAL[Local in-memory<br/>per-pod allowance]
    SYNC[Async sync to central]
    HDR[Set headers<br/>X-RateLimit-Remaining<br/>Retry-After]
  end

  subgraph Central[Central Counter Store]
    R1[(Redis cluster<br/>Lua atomic ops)]
    R2[(Redis replica)]
  end

  subgraph Plans
    PDB[(Plans / Quotas DB)]
    CFG[Config service /<br/>dynamic update]
  end

  subgraph Algos[Algorithm catalog]
    TB[Token Bucket<br/>refill r tokens/s, cap b]
    LB1[Leaky Bucket<br/>smooths bursts]
    FW[Fixed Window<br/>simple, boundary spikes]
    SW[Sliding Log<br/>exact, memory heavy]
    SWC[Sliding Window Counter<br/>weighted approx]
  end

  subgraph Resp
    OK[200 / 2xx]
    DENY[429 Too Many Requests]
    DEG[Degrade<br/>shed lower priority]
  end

  A --> LB --> GW --> EX
  M --> LB
  P --> LB
  EX --> POL --> ALG
  PDB -.policies.-> POL
  CFG -.dynamic.-> POL
  ALG --> LOCAL
  LOCAL -. periodic sync .-> SYNC --> R1
  R1 --- R2
  ALG -->|allow| OK
  ALG -->|over limit| DENY
  ALG -->|priority| DEG
  ALG --> HDR
  Algos --- ALG

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class A,M,EX client;
    class LB,GW edge;
    class POL,SYNC,HDR,CFG,FW,SW,SWC,OK,DENY,DEG service;
    class PDB datastore;
    class LOCAL,R1,R2 cache;
    class ALG,TB,LB1 storage;
    class P external;

Algorithm cheat sheet#

Algo State per key Allows burst Notes
Token bucket (tokens, last_ts) yes (up to bucket size) Most common; cheap
Leaky bucket queue + drain rate no, smooths Like TB with bucket=1
Fixed window count + window spike at boundary Easy
Sliding log timestamps array exact Memory O(N)
Sliding counter weighted avg of two fixed windows approx, cheap Cloudflare style

Distributed correctness#

  • Local-only counters drift; cross-pod inconsistencies allow short bursts.
  • Use Redis with Lua for atomic check-and-decrement.
  • Pre-fetch tokens in batches (e.g. 10 tokens/sync) to amortize Redis RTT.
  • Two-tier: local soft limit + central hard limit.

Identity / scope#

  • Per-IP (anti-abuse), per-API-key (paid plans), per-user, per-tenant, per-route, per-method.
  • Compound key: route:plan:user.

Failure mode#

  • Redis down → fail open (serve traffic) or fail closed (deny). Choose per route.
  • Stampede: many synced bursts — add jitter to Retry-After.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD Load balancer / GSLB L4/L7 traffic distribution and failover load-balancer
HLD API gateway / BFF single ingress, auth, rate limit, routing api-gateway
HLD CAP / PACELC C vs A under partition; L vs C otherwise cap-pacelc
HLD Leader/follower replication sync/semi-sync/async replication, failover replication-leader-follower
HLD Idempotency & retries safe re-execution, backoff + jitter idempotency-retries
LLD Concurrency primitives mutex, semaphore, RW lock, atomic, CAS concurrency-primitives