Distributed Cache — Notes
Functional
- O(1) GET/SET; rich data structures.
- TTL, LRU/LFU eviction.
- Horizontal scale via sharding.
- Optional replication + persistence.
Non-functional
- p99 GET < 1 ms LAN; throughput 50–200k ops/s/node.
- 99.99% via cluster + replicas.
Capacity
- 1 TB hot data → 30 shards × 32 GB (3 replicas each) = 90 instances.
- Throughput 1M ops/s → 10 shards if 100k ops each.
Patterns
- Cache-aside (most common).
- Read-through, write-through, write-back (rare in practice for KV cache).
- Locks, counters, leaderboards, queues, pub/sub bus.
Operational concerns
- BGSAVE forks: COW memory doubling risk.
- AOF rewrite IO spikes.
- Replication lag during big-load on primary.
- Cluster split-brain — Sentinel quorum size critical (3+).
Trade-offs
- Persistence on = recovery without re-warm but write amplification.
- Multiple data structures vs Memcached's pure KV simplicity.
- Client-side hashing (Memcached) vs server-side cluster (Redis Cluster) — Redis handles migrations better.
- TLS / authn in latency-sensitive cache = significant overhead.
Refs
- Facebook memcached at scale (NSDI '13), Redis docs (replication, cluster),
Netflix EVCache, DynamoDB DAX, Twitter Twemcache.