Skip to content

Leader/Follower Replication — Detailed#

flowchart TB
  subgraph App[Application]
    W[Write traffic]
    R[Read traffic]
  end

  subgraph Primary[Primary / Leader]
    PRI[(Primary DB)]
    WAL[WAL / Binlog]
    POS[LSN / GTID position]
  end

  subgraph Sync[Sync Replicas - same DC]
    SR1[(Sync Replica 1)]
    SR2[(Sync Replica 2)]
  end

  subgraph Async[Async Replicas - cross-region/read scaling]
    AR1[(Async Replica<br/>read-only)]
    AR2[(Async Replica<br/>analytics)]
    AR3[(Cascading Replica)]
  end

  subgraph Failover[Failover & HA]
    SENT([Sentinel / Orchestrator])
    VIP[Virtual IP / DNS swap]
    FENCE[Fencing / STONITH]
    LAG[Replica lag monitor]
  end

  subgraph Modes
    SYNC[Synchronous: ack on N replicas]
    SEMI[Semi-sync: ack 1 replica]
    ASYNC[Async: ack from leader only]
  end

  W --> PRI
  PRI --> WAL
  WAL -. ship .-> SR1
  WAL -. ship .-> SR2
  WAL -. ship .-> AR1
  AR1 -. cascade .-> AR3
  WAL -. ship .-> AR2
  R -->|read-your-write| PRI
  R -->|eventually consistent| AR1
  R -->|eventually consistent| AR2
  SENT -.health.-> PRI
  SENT -.health.-> SR1
  SENT -. promote .-> SR1
  SENT --> VIP
  SENT --> FENCE
  LAG -.alert.-> SENT

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class VIP edge;
    class W,R,WAL,POS,FENCE,LAG,SYNC,SEMI,ASYNC service;
    class PRI,SR1,SR2,AR1,AR2,AR3 datastore;
    class SENT compute;

Replication mechanics#

  • Statement-based: SQL replayed. Cheap but non-deterministic (NOW(), RAND()).
  • Row-based: serialized row diff. Deterministic, larger volume.
  • WAL/redo shipping (Postgres physical, Oracle redo): byte-level.
  • Logical/CDC (Postgres logical, Debezium): row events for downstream consumers.

Replication topologies#

  • Single leader (most common).
  • Multi-leader (geo, conflict resolution needed: LWW, CRDT, app-defined).
  • Leaderless (Dynamo): clients write to N, read from N, R+W>N.

Failover steps#

  1. Detect leader down (heartbeat, quorum check).
  2. Pick most-up-to-date replica.
  3. Fence old leader (STONITH).
  4. Promote new leader, repoint app/VIP.
  5. Re-attach stale replicas (rebuild if diverged).

Replication lag#

  • Causes: long-running tx, single-threaded apply (older MySQL), network.
  • Symptoms: read-after-write stale, broken pagination.
  • Mitigation: read from primary for critical paths, monotonic-read session, parallel applier.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD Load balancer / GSLB L4/L7 traffic distribution and failover load-balancer
HLD Raft / Paxos consensus replicated state machine via majority quorum consensus-raft-paxos
HLD Leader/follower replication sync/semi-sync/async replication, failover replication-leader-follower
HLD LSM vs B-Tree engines WAL, memtable, SSTables, compaction storage-engines-lsm-btree
HLD CRDTs commutative replicated data types crdts
HLD Change Data Capture WAL/binlog tailing, outbox publishing change-data-capture
HLD Multi-region & DR RTO / RPO, active-active, failover multi-region-dr