Document Database — Notes
Functional
- Schema-flexible JSON / BSON documents.
- Rich query language with secondary indexes.
- Sharding + replication built-in.
- Aggregation pipelines.
- Multi-document transactions.
Non-functional
- 10–50k ops/s/shard (varies with workload).
- Replica set HA: failover seconds.
- Linearizable reads on request (read concern majority + linearizable).
Capacity
- Per shard: 1–4 TB working set; up to 64 TB+ disk.
- Cluster sizes routinely 10–100 shards.
API
- Drivers (Mongo, Couchbase) speak native protocol; OpQuery / OP_MSG.
- HTTP layer via Atlas Data API or REST gateways.
Schema (conceptual)
db.collection holds documents {_id, ...}.
- Shard key chosen at create time, hard to change.
Trade-offs
- Flexible schema = velocity at start, debt over time. Add JSON schema validators.
- Embedded vs referenced docs: embed for 1-to-few; reference for many-to-many.
- Sharded transactions are real but slow; design hot paths to stay single-shard.
- Eventually consistent secondary reads vs majority reads: pick per query.
Refs
- MongoDB docs (sharding, replication, oplog, change streams).
- "Designing Data-Intensive Applications" doc DB chapter.
- Couchbase architecture papers; Azure Cosmos DB internals.