Skip to content

Rate Limiter — Notes#

Functional#

  • Per-key allowance with refill rate r and burst capacity b.
  • Multiple scopes (IP, user, key, tenant, route).
  • Plan-aware quotas (free 100/min, pro 1000/min).
  • Standard headers: X-RateLimit-Limit/Remaining/Reset, Retry-After.

Non-functional#

  • Latency: < 1 ms added.
  • Throughput: 100k+ checks/s per node.
  • Centralized counters consistent within ~100 ms.

Token bucket pseudocode (atomic)#

allow(key, r, b, cost=1):
  now = T
  state = redis.HGETALL key  # tokens, last_ts
  delta = (now - last_ts) * r
  tokens = min(b, tokens + delta)
  if tokens >= cost:
     tokens -= cost
     redis.HSET key tokens=tokens last_ts=now
     return true
  return false

Run in a Lua script for atomicity.

Memory budget#

  • 50 bytes / key. 10M unique keys = 500 MB → fits in one Redis shard.

Algorithms — when to pick#

  • Token bucket: burst-friendly APIs (Twitter, GitHub).
  • Leaky bucket: queueing systems (postal, network shaping).
  • Sliding window counter: large keyspace, cheap, "good enough" (Cloudflare, Stripe).
  • Sliding log: low-volume, exact requirements (audit).

Pitfalls#

  • Clock skew → token-bucket time math drift. Use single source (Redis TIME).
  • "Thundering retries" at Retry-After — jitter is mandatory.
  • IP-based limits killed by NAT / CGNAT.
  • Don't rate-limit health checks or critical control plane.

Refs#

  • Stripe rate-limiter blog post, Cloudflare sliding window blog, GCMD Token Bucket RFC2697, System Design Interview Vol 1 ch.4 (Alex Xu).