Leader/Follower Replication — Detailed#

flowchart TB
  subgraph App[Application]
    W[Write traffic]
    R[Read traffic]
  end

  subgraph Primary[Primary / Leader]
    PRI[(Primary DB)]
    WAL[WAL / Binlog]
    POS[LSN / GTID position]
  end

  subgraph Sync[Sync Replicas - same DC]
    SR1[(Sync Replica 1)]
    SR2[(Sync Replica 2)]
  end

  subgraph Async[Async Replicas - cross-region/read scaling]
    AR1[(Async Replica<br/>read-only)]
    AR2[(Async Replica<br/>analytics)]
    AR3[(Cascading Replica)]
  end

  subgraph Failover[Failover & HA]
    SENT([Sentinel / Orchestrator])
    VIP[Virtual IP / DNS swap]
    FENCE[Fencing / STONITH]
    LAG[Replica lag monitor]
  end

  subgraph Modes
    SYNC[Synchronous: ack on N replicas]
    SEMI[Semi-sync: ack 1 replica]
    ASYNC[Async: ack from leader only]
  end

  W --> PRI
  PRI --> WAL
  WAL -. ship .-> SR1
  WAL -. ship .-> SR2
  WAL -. ship .-> AR1
  AR1 -. cascade .-> AR3
  WAL -. ship .-> AR2
  R -->|read-your-write| PRI
  R -->|eventually consistent| AR1
  R -->|eventually consistent| AR2
  SENT -.health.-> PRI
  SENT -.health.-> SR1
  SENT -. promote .-> SR1
  SENT --> VIP
  SENT --> FENCE
  LAG -.alert.-> SENT

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class VIP edge;
    class W,R,WAL,POS,FENCE,LAG,SYNC,SEMI,ASYNC service;
    class PRI,SR1,SR2,AR1,AR2,AR3 datastore;
    class SENT compute;

Replication mechanics#

Statement-based: SQL replayed. Cheap but non-deterministic (NOW(), RAND()).
Row-based: serialized row diff. Deterministic, larger volume.
WAL/redo shipping (Postgres physical, Oracle redo): byte-level.
Logical/CDC (Postgres logical, Debezium): row events for downstream consumers.

Replication topologies#

Single leader (most common).
Multi-leader (geo, conflict resolution needed: LWW, CRDT, app-defined).
Leaderless (Dynamo): clients write to N, read from N, R+W>N.

Failover steps#

Detect leader down (heartbeat, quorum check).
Pick most-up-to-date replica.
Fence old leader (STONITH).
Promote new leader, repoint app/VIP.
Re-attach stale replicas (rebuild if diverged).

Replication lag#

Causes: long-running tx, single-threaded apply (older MySQL), network.
Symptoms: read-after-write stale, broken pagination.
Mitigation: read from primary for critical paths, monotonic-read session, parallel applier.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag	Concept	What it is	Page
`HLD`	Load balancer / GSLB	L4/L7 traffic distribution and failover	load-balancer
`HLD`	Raft / Paxos consensus	replicated state machine via majority quorum	consensus-raft-paxos
`HLD`	Leader/follower replication	sync/semi-sync/async replication, failover	replication-leader-follower
`HLD`	LSM vs B-Tree engines	WAL, memtable, SSTables, compaction	storage-engines-lsm-btree
`HLD`	CRDTs	commutative replicated data types	crdts
`HLD`	Change Data Capture	WAL/binlog tailing, outbox publishing	change-data-capture
`HLD`	Multi-region & DR	RTO / RPO, active-active, failover	multi-region-dr