Skip to content

Document Database (MongoDB-style) — Detailed#

flowchart TB
  subgraph Client[App tier]
    APP[App]
    DRV([Smart driver<br/>topology aware])
  end

  subgraph Routing[Routing - mongos]
    MONGOS[Router]
    CFG[(Config servers<br/>3-5 Raft / RS)]
    BAL[Balancer<br/>chunk migration]
  end

  subgraph Shard1[Shard 1 - replica set]
    P1[Primary]
    S1A[Secondary]
    S1B[Secondary]
    ARB1[Arbiter]
  end

  subgraph Shard2[Shard 2 - replica set]
    P2[Primary]
    S2A[Secondary]
    S2B[Secondary]
  end

  subgraph Shard3[Shard 3 - replica set]
    P3[Primary]
    S3A[Secondary]
    S3B[Secondary]
  end

  subgraph Engine[Storage engine - WiredTiger]
    BPTREE[B-Tree default]
    LSMOPT[LSM optional]
    MMAP[mmap legacy]
    CACHE[Page cache]
    JOURNAL[WAL / Journal]
    SNAP[MVCC snapshots]
  end

  subgraph Repl[Replication]
    OPLOG[Oplog idempotent ops]
    HEARTB[Heartbeats every 2s]
    ELECT[Election via Raft]
    READPREF[Read preferences<br/>primary / nearest / secondary]
    WC[Write concern w-majority]
    RC[Read concern majority / linearizable]
  end

  subgraph Index[Indexing]
    BIDX[B-tree single field]
    COMP[Compound]
    TEXT[Text index]
    GEO[2dsphere geo]
    PART[Partial]
    HASHED[Hashed]
    TTL[TTL]
    VEC[Vector / Atlas Search]
  end

  subgraph CDC[Change Streams]
    OPL2[Oplog tail]
    CHG[Change Streams API]
    DOWN[Downstream: search, cache, analytics]
  end

  subgraph Features
    AGG[Aggregation framework]
    JOIN[$lookup join]
    TX[Multi-doc transactions<br/>snapshot isolation across shards]
    GRIDFS[GridFS - large files]
  end

  APP --> DRV --> MONGOS
  MONGOS --- CFG
  MONGOS --> Shard1
  MONGOS --> Shard2
  MONGOS --> Shard3
  BAL -.chunk move.-> Shard1
  P1 -. oplog .-> S1A
  P1 -. oplog .-> S1B
  P2 -. oplog .-> S2A
  P3 -. oplog .-> S3A
  Engine --- Shard1
  Repl --- Shard1
  Index --- Shard1
  OPL2 --- P1
  OPL2 --> CHG --> DOWN

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class DRV client;
    class APP,MONGOS,BAL,P1,S1A,S1B,ARB1,P2,S2A,S2B,P3,S3A,S3B,BPTREE,LSMOPT,MMAP,JOURNAL,SNAP,OPLOG,HEARTB,ELECT,READPREF,WC,RC,BIDX,COMP,TEXT,GEO,PART,HASHED,TTL,VEC,OPL2,CHG,DOWN,AGG,JOIN,TX,GRIDFS service;
    class CFG datastore;
    class CACHE cache;

Sharding#

  • Shard key chosen at collection level (hashed or ranged).
  • Data split into chunks (~64–128 MB); balancer migrates chunks for even load.
  • Cross-shard queries scatter-gather; targeted queries hit one shard if shard key supplied.

Replication & elections#

  • Replica set = 3 to 7 nodes; one primary, rest secondaries.
  • Oplog is a capped collection containing idempotent ops.
  • Elections via Raft-like protocol; replica with highest oplog wins.

Transactions#

  • Multi-document, multi-collection transactions across shards (since 4.2).
  • Snapshot isolation; cost is non-trivial compared to single-doc updates.

Indexes#

  • Lots of types; choice dominates query latency.
  • Indexes are per-shard; a global secondary index requires app-level work.

Change Streams (built-in CDC)#

  • Tails the oplog and exposes a typed cursor.
  • Use for cache invalidation, search sync, downstream services.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD Sharding horizontal partitioning across nodes database-sharding
HLD Cache strategies cache-aside, read/write-through, eviction caching-strategies
HLD CAP / PACELC C vs A under partition; L vs C otherwise cap-pacelc
HLD Raft / Paxos consensus replicated state machine via majority quorum consensus-raft-paxos
HLD Leader/follower replication sync/semi-sync/async replication, failover replication-leader-follower
HLD LSM vs B-Tree engines WAL, memtable, SSTables, compaction storage-engines-lsm-btree
HLD MVCC & isolation levels snapshot isolation, serializability, vacuum mvcc-isolation-levels
HLD Change Data Capture WAL/binlog tailing, outbox publishing change-data-capture
LLD REST API design verbs, statuses, pagination, errors rest-api-design