Skip to content

Database Sharding — Detailed#

flowchart TB
  subgraph App[Application Tier]
    A1[App Pod 1]
    A2[App Pod 2]
    A3[App Pod N]
  end

  subgraph Routing[Routing Layer]
    SR[Shard Router /<br/>Proxy: Vitess vtgate,<br/>ProxySQL, Mongos]
    SM[(Shard Map<br/>etcd / ZK)]
  end

  subgraph Strategy[Sharding Strategies]
    H1[Range<br/>userId 0-1M]
    H2[Hash<br/>hash key mod N]
    H3[Consistent Hash<br/>tokens / rings]
    H4[Directory / Lookup<br/>tenant -> shard]
    H5[Geo / Tenant]
  end

  subgraph Shards[Physical Shards]
    direction LR
    subgraph SH1[Shard 1]
      P1[(Primary)]
      R11[(Replica)]
      R12[(Replica)]
    end
    subgraph SH2[Shard 2]
      P2[(Primary)]
      R21[(Replica)]
      R22[(Replica)]
    end
    subgraph SHN[Shard N]
      PN[(Primary)]
      RN1[(Replica)]
      RN2[(Replica)]
    end
  end

  subgraph Ops[Operational Tooling]
    RS[Resharder /<br/>Range splitter]
    BACK([Online Backfill])
    DBL[Dual-write / Verify]
    BAL[Auto-Balancer<br/>move hot ranges]
    BR[Backup / PITR]
    MON[Metrics: QPS,<br/>per-shard load]
  end

  A1 --> SR
  A2 --> SR
  A3 --> SR
  SM -.shard map.-> SR
  SR --> SH1
  SR --> SH2
  SR --> SHN
  SR -.uses.-> Strategy
  P1 -. async repl .-> R11
  P1 -. async repl .-> R12
  P2 -. async repl .-> R21
  P2 -. async repl .-> R22
  PN -. async repl .-> RN1
  PN -. async repl .-> RN2
  RS -. split/merge .-> Shards
  BACK -. copy chunks .-> Shards
  DBL -.verify.-> Shards
  BAL -.rebalance.-> Shards
  Shards -.WAL.-> BR
  Shards -.stats.-> MON

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class A1,A2,A3,H1,H2,H3,H4,H5,RS,DBL,BAL,BR service;
    class SR,SM,P1,R11,R12,P2,R21,R22,PN,RN1,RN2 datastore;
    class BACK compute;
    class MON obs;

Key choice#

  • Range: ordered scans easy, prone to hotspots on monotonic keys.
  • Hash: even distribution, but range queries scatter.
  • Consistent hash: minimal data movement when adding/removing shards.
  • Directory: flexible (per-tenant placement), extra hop.

Cross-shard operations#

  • Joins → denormalize, or use scatter-gather (expensive).
  • Transactions → 2PC, Sagas, or co-locate by shard key.
  • Aggregations → pre-compute in OLAP store.

Resharding#

  • Add shards: split ranges, dual-write old+new, backfill, cutover.
  • Vitess vreplication, MongoDB chunk migration, Citus shard rebalancer.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD Sharding horizontal partitioning across nodes database-sharding
HLD Consistent hashing key placement with minimal remap consistent-hashing
HLD Leader/follower replication sync/semi-sync/async replication, failover replication-leader-follower
HLD LSM vs B-Tree engines WAL, memtable, SSTables, compaction storage-engines-lsm-btree
HLD Distributed transactions 2PC, TCC, sagas, outbox/inbox distributed-transactions
HLD Observability metrics, logs, traces, SLOs observability
LLD Structural patterns Adapter, Decorator, Facade, Proxy, Composite structural-patterns
LLD Behavioural patterns Strategy, Observer, State, Command, Chain behavioral-patterns