Database Sharding — Notes#
When to shard#
- Single primary saturated on writes (CPU, IOPS, replication lag).
- Dataset > working set fits in RAM, page-cache misses dominate.
-
1–5 TB / 10–100k QPS write threshold (rule of thumb; varies).
Choosing the shard key#
- High cardinality, even distribution.
- Co-locates rows accessed together (queries hit 1 shard).
- Stable (rarely changes) to avoid row moves.
- Examples:
user_id,tenant_id,hash(order_id).
Anti-patterns#
- Monotonic key (autoinc, timestamp) → write hotspot.
- Cross-shard joins on the hot path.
- 2PC at scale.
Capacity rule of thumb#
- 1 shard = 1 primary box, target ~70% headroom.
- Per shard: 1–4 TB data, 5–20k write QPS (SSD, b-tree).
Routing approaches#
- Client-side (Cassandra driver): driver hashes the key.
- Proxy-side (Vitess, ProxySQL): SQL router parses statement.
- Coordinator (MongoDB
mongos): central router.
Trade-offs#
- More shards = more parallelism, more failure points, more ops.
- Replicas = read scale + HA; need replica lag handling.
- Re-sharding online is the hardest ongoing problem.
Refs#
- Vitess (YouTube), Citus (Postgres), MongoDB sharding, Cassandra (token rings), CockroachDB, Spanner (range-sharded, Paxos-replicated).