Skip to content

Consensus: Raft / Paxos — Detailed#

flowchart TB
  subgraph Roles
    LR[Leader]
    FOL[Followers]
    CAND[Candidate]
  end

  subgraph Phases[Raft Phases]
    LE[Leader Election<br/>RequestVote RPC]
    LR2[Log Replication<br/>AppendEntries RPC]
    SAFE[Safety: Election restriction +<br/>commit only current-term entries]
    SNAP[Log Compaction /<br/>Snapshots]
    CFG[Membership Change<br/>joint consensus]
  end

  subgraph State[Persistent State]
    PT[currentTerm]
    PV[votedFor]
    LOG[(log entries<br/>term, index, cmd)]
  end

  subgraph Replication[Replication]
    Q[Majority quorum]
    COMMIT[Commit Index]
    APPLY[Apply to State Machine]
  end

  subgraph Variants[Variants]
    BP[Basic Paxos]
    MP[Multi-Paxos]
    EPX[EPaxos - leaderless]
    FPX[Fast Paxos]
    ZB[Zab - ZooKeeper]
    VR[Viewstamped Replication]
  end

  subgraph Failures
    SP[Split vote]
    NP[Network partition]
    DL[Delayed RPC]
  end

  Client[Client] --> LR
  LR --> LR2
  LR2 --> FOL
  FOL -.ack.-> LR
  LR --> Q --> COMMIT --> APPLY
  LR -. heartbeat .-> FOL
  FOL -. timeout .-> CAND --> LE --> LR
  LR --> SNAP
  Variants --- LE
  Failures --- LE

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class Client client;
    class LR,FOL,CAND,LE,SNAP,CFG,PT,PV,Q,COMMIT,APPLY,BP,MP,EPX,FPX,ZB,VR,SP,NP,DL service;
    class LR2,SAFE,LOG datastore;

Raft cheat sheet#

  • Term: monotonically increasing logical clock; one leader per term.
  • RequestVote: candidate asks for votes; granter must have at least as up-to-date a log.
  • AppendEntries: leader replicates entries; serves as heartbeat when empty.
  • Commit rule: entry committed when stored on majority and leader has committed an entry from current term.
  • Membership change: joint configuration C_old,new then C_new.

Paxos vs Raft#

  • Paxos: hard to implement, classical, decouples roles (proposer/acceptor/learner).
  • Multi-Paxos ≈ Raft with elected leader + log of values.
  • Raft simplifies via strong leader and contiguous log.

Performance#

  • Latency = 1 RTT to majority. With 5 nodes, lose 1 RTT to slowest of 3.
  • Throughput bound by leader fsync.
  • Optimizations: pipelined AppendEntries, batched fsync, read leases / read-index.

Where it's used#

  • etcd, Consul, CockroachDB, TiKV, MongoDB (replica set), Kafka KRaft, RethinkDB, Google Chubby (Paxos), Spanner (Paxos per group), Aurora.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD Pub/Sub & message brokers topics, consumer groups, delivery semantics pub-sub-pattern
HLD Raft / Paxos consensus replicated state machine via majority quorum consensus-raft-paxos
HLD Leader/follower replication sync/semi-sync/async replication, failover replication-leader-follower
HLD LSM vs B-Tree engines WAL, memtable, SSTables, compaction storage-engines-lsm-btree
LLD State machines FSM, HSM, transitions, guards state-machines
LLD Testing strategy pyramid, doubles, TDD, contracts testing-strategy
LLD Behavioural patterns Strategy, Observer, State, Command, Chain behavioral-patterns