Skip to content

Distributed Cache — Notes#

Functional#

  • O(1) GET/SET; rich data structures.
  • TTL, LRU/LFU eviction.
  • Horizontal scale via sharding.
  • Optional replication + persistence.

Non-functional#

  • p99 GET < 1 ms LAN; throughput 50–200k ops/s/node.
  • 99.99% via cluster + replicas.

Capacity#

  • 1 TB hot data → 30 shards × 32 GB (3 replicas each) = 90 instances.
  • Throughput 1M ops/s → 10 shards if 100k ops each.

Patterns#

  • Cache-aside (most common).
  • Read-through, write-through, write-back (rare in practice for KV cache).
  • Locks, counters, leaderboards, queues, pub/sub bus.

Operational concerns#

  • BGSAVE forks: COW memory doubling risk.
  • AOF rewrite IO spikes.
  • Replication lag during big-load on primary.
  • Cluster split-brain — Sentinel quorum size critical (3+).

Trade-offs#

  • Persistence on = recovery without re-warm but write amplification.
  • Multiple data structures vs Memcached's pure KV simplicity.
  • Client-side hashing (Memcached) vs server-side cluster (Redis Cluster) — Redis handles migrations better.
  • TLS / authn in latency-sensitive cache = significant overhead.

Refs#

  • Facebook memcached at scale (NSDI '13), Redis docs (replication, cluster), Netflix EVCache, DynamoDB DAX, Twitter Twemcache.