Rate Limiter — Notes#
Functional#
- Per-key allowance with refill rate r and burst capacity b.
- Multiple scopes (IP, user, key, tenant, route).
- Plan-aware quotas (free 100/min, pro 1000/min).
- Standard headers:
X-RateLimit-Limit/Remaining/Reset,Retry-After.
Non-functional#
- Latency: < 1 ms added.
- Throughput: 100k+ checks/s per node.
- Centralized counters consistent within ~100 ms.
Token bucket pseudocode (atomic)#
allow(key, r, b, cost=1):
now = T
state = redis.HGETALL key # tokens, last_ts
delta = (now - last_ts) * r
tokens = min(b, tokens + delta)
if tokens >= cost:
tokens -= cost
redis.HSET key tokens=tokens last_ts=now
return true
return false
Run in a Lua script for atomicity.
Memory budget#
- 50 bytes / key. 10M unique keys = 500 MB → fits in one Redis shard.
Algorithms — when to pick#
- Token bucket: burst-friendly APIs (Twitter, GitHub).
- Leaky bucket: queueing systems (postal, network shaping).
- Sliding window counter: large keyspace, cheap, "good enough" (Cloudflare, Stripe).
- Sliding log: low-volume, exact requirements (audit).
Pitfalls#
- Clock skew → token-bucket time math drift. Use single source (Redis TIME).
- "Thundering retries" at
Retry-After— jitter is mandatory. - IP-based limits killed by NAT / CGNAT.
- Don't rate-limit health checks or critical control plane.
Refs#
- Stripe rate-limiter blog post, Cloudflare sliding window blog, GCMD Token Bucket RFC2697, System Design Interview Vol 1 ch.4 (Alex Xu).