Skip to content

Document Database — Notes#

Functional#

  • Schema-flexible JSON / BSON documents.
  • Rich query language with secondary indexes.
  • Sharding + replication built-in.
  • Aggregation pipelines.
  • Multi-document transactions.

Non-functional#

  • 10–50k ops/s/shard (varies with workload).
  • Replica set HA: failover seconds.
  • Linearizable reads on request (read concern majority + linearizable).

Capacity#

  • Per shard: 1–4 TB working set; up to 64 TB+ disk.
  • Cluster sizes routinely 10–100 shards.

API#

  • Drivers (Mongo, Couchbase) speak native protocol; OpQuery / OP_MSG.
  • HTTP layer via Atlas Data API or REST gateways.

Schema (conceptual)#

  • db.collection holds documents {_id, ...}.
  • Shard key chosen at create time, hard to change.

Trade-offs#

  • Flexible schema = velocity at start, debt over time. Add JSON schema validators.
  • Embedded vs referenced docs: embed for 1-to-few; reference for many-to-many.
  • Sharded transactions are real but slow; design hot paths to stay single-shard.
  • Eventually consistent secondary reads vs majority reads: pick per query.

Refs#

  • MongoDB docs (sharding, replication, oplog, change streams).
  • "Designing Data-Intensive Applications" doc DB chapter.
  • Couchbase architecture papers; Azure Cosmos DB internals.