Database Sharding — Detailed#
flowchart TB
subgraph App[Application Tier]
A1[App Pod 1]
A2[App Pod 2]
A3[App Pod N]
end
subgraph Routing[Routing Layer]
SR[Shard Router /<br/>Proxy: Vitess vtgate,<br/>ProxySQL, Mongos]
SM[(Shard Map<br/>etcd / ZK)]
end
subgraph Strategy[Sharding Strategies]
H1[Range<br/>userId 0-1M]
H2[Hash<br/>hash key mod N]
H3[Consistent Hash<br/>tokens / rings]
H4[Directory / Lookup<br/>tenant -> shard]
H5[Geo / Tenant]
end
subgraph Shards[Physical Shards]
direction LR
subgraph SH1[Shard 1]
P1[(Primary)]
R11[(Replica)]
R12[(Replica)]
end
subgraph SH2[Shard 2]
P2[(Primary)]
R21[(Replica)]
R22[(Replica)]
end
subgraph SHN[Shard N]
PN[(Primary)]
RN1[(Replica)]
RN2[(Replica)]
end
end
subgraph Ops[Operational Tooling]
RS[Resharder /<br/>Range splitter]
BACK([Online Backfill])
DBL[Dual-write / Verify]
BAL[Auto-Balancer<br/>move hot ranges]
BR[Backup / PITR]
MON[Metrics: QPS,<br/>per-shard load]
end
A1 --> SR
A2 --> SR
A3 --> SR
SM -.shard map.-> SR
SR --> SH1
SR --> SH2
SR --> SHN
SR -.uses.-> Strategy
P1 -. async repl .-> R11
P1 -. async repl .-> R12
P2 -. async repl .-> R21
P2 -. async repl .-> R22
PN -. async repl .-> RN1
PN -. async repl .-> RN2
RS -. split/merge .-> Shards
BACK -. copy chunks .-> Shards
DBL -.verify.-> Shards
BAL -.rebalance.-> Shards
Shards -.WAL.-> BR
Shards -.stats.-> MON
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class A1,A2,A3,H1,H2,H3,H4,H5,RS,DBL,BAL,BR service;
class SR,SM,P1,R11,R12,P2,R21,R22,PN,RN1,RN2 datastore;
class BACK compute;
class MON obs;
Key choice#
- Range: ordered scans easy, prone to hotspots on monotonic keys.
- Hash: even distribution, but range queries scatter.
- Consistent hash: minimal data movement when adding/removing shards.
- Directory: flexible (per-tenant placement), extra hop.
Cross-shard operations#
- Joins → denormalize, or use scatter-gather (expensive).
- Transactions → 2PC, Sagas, or co-locate by shard key.
- Aggregations → pre-compute in OLAP store.
Resharding#
- Add shards: split ranges, dual-write old+new, backfill, cutover.
- Vitess vreplication, MongoDB chunk migration, Citus shard rebalancer.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Sharding | horizontal partitioning across nodes | database-sharding |
HLD |
Consistent hashing | key placement with minimal remap | consistent-hashing |
HLD |
Leader/follower replication | sync/semi-sync/async replication, failover | replication-leader-follower |
HLD |
LSM vs B-Tree engines | WAL, memtable, SSTables, compaction | storage-engines-lsm-btree |
HLD |
Distributed transactions | 2PC, TCC, sagas, outbox/inbox | distributed-transactions |
HLD |
Observability | metrics, logs, traces, SLOs | observability |
LLD |
Structural patterns | Adapter, Decorator, Facade, Proxy, Composite | structural-patterns |
LLD |
Behavioural patterns | Strategy, Observer, State, Command, Chain | behavioral-patterns |