Skip to content

Database Sharding — Notes#

When to shard#

  • Single primary saturated on writes (CPU, IOPS, replication lag).
  • Dataset > working set fits in RAM, page-cache misses dominate.
  • 1–5 TB / 10–100k QPS write threshold (rule of thumb; varies).

Choosing the shard key#

  • High cardinality, even distribution.
  • Co-locates rows accessed together (queries hit 1 shard).
  • Stable (rarely changes) to avoid row moves.
  • Examples: user_id, tenant_id, hash(order_id).

Anti-patterns#

  • Monotonic key (autoinc, timestamp) → write hotspot.
  • Cross-shard joins on the hot path.
  • 2PC at scale.

Capacity rule of thumb#

  • 1 shard = 1 primary box, target ~70% headroom.
  • Per shard: 1–4 TB data, 5–20k write QPS (SSD, b-tree).

Routing approaches#

  • Client-side (Cassandra driver): driver hashes the key.
  • Proxy-side (Vitess, ProxySQL): SQL router parses statement.
  • Coordinator (MongoDB mongos): central router.

Trade-offs#

  • More shards = more parallelism, more failure points, more ops.
  • Replicas = read scale + HA; need replica lag handling.
  • Re-sharding online is the hardest ongoing problem.

Refs#

  • Vitess (YouTube), Citus (Postgres), MongoDB sharding, Cassandra (token rings), CockroachDB, Spanner (range-sharded, Paxos-replicated).