Skip to content

Distributed Key-Value Store — Notes#

Goals (Dynamo paper)#

  • Always writable (AP).
  • Symmetric, decentralized nodes.
  • Incremental scalability.
  • Heterogeneous hardware support.

Data model#

  • key → opaque blob (or column family).
  • Sometimes secondary indexes (materialized views in Cassandra).

Quorum tuning (N, R, W)#

  • N = replicas, R = read consensus, W = write consensus.
  • R + W > N → linearizability-like guarantee for last-writer wins.
  • Common: N=3, R=W=2 → strong-ish, tolerate 1 down.
  • Lower W=1 for fast writes (less durable).

Conflict resolution#

  • Last-writer-wins (server clock) — easy, can lose data.
  • Vector clocks (Dynamo) — keep siblings, app reconciles.
  • CRDT counters / sets — automatic merge.

Capacity#

  • 1 PB across 100 nodes ≈ 10 TB/node.
  • 100k ops/s/node common with SSD.
  • Compaction can double IO; size accordingly.

Trade-offs#

  • Tunable consistency is great but operationally complex.
  • Wide rows in Cassandra → range scans cheap but hotspot risk.
  • TTL columns → tombstones → compaction pressure.
  • LWTs (Paxos) much slower than normal write.

Refs#

  • Dynamo paper (SOSP '07), Cassandra design doc, ScyllaDB blog, Riak vs Cassandra comparisons, RocksDB tuning guide.