Leader/Follower Replication — Detailed#
flowchart TB
subgraph App[Application]
W[Write traffic]
R[Read traffic]
end
subgraph Primary[Primary / Leader]
PRI[(Primary DB)]
WAL[WAL / Binlog]
POS[LSN / GTID position]
end
subgraph Sync[Sync Replicas - same DC]
SR1[(Sync Replica 1)]
SR2[(Sync Replica 2)]
end
subgraph Async[Async Replicas - cross-region/read scaling]
AR1[(Async Replica<br/>read-only)]
AR2[(Async Replica<br/>analytics)]
AR3[(Cascading Replica)]
end
subgraph Failover[Failover & HA]
SENT([Sentinel / Orchestrator])
VIP[Virtual IP / DNS swap]
FENCE[Fencing / STONITH]
LAG[Replica lag monitor]
end
subgraph Modes
SYNC[Synchronous: ack on N replicas]
SEMI[Semi-sync: ack 1 replica]
ASYNC[Async: ack from leader only]
end
W --> PRI
PRI --> WAL
WAL -. ship .-> SR1
WAL -. ship .-> SR2
WAL -. ship .-> AR1
AR1 -. cascade .-> AR3
WAL -. ship .-> AR2
R -->|read-your-write| PRI
R -->|eventually consistent| AR1
R -->|eventually consistent| AR2
SENT -.health.-> PRI
SENT -.health.-> SR1
SENT -. promote .-> SR1
SENT --> VIP
SENT --> FENCE
LAG -.alert.-> SENT
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class VIP edge;
class W,R,WAL,POS,FENCE,LAG,SYNC,SEMI,ASYNC service;
class PRI,SR1,SR2,AR1,AR2,AR3 datastore;
class SENT compute;
Replication mechanics#
- Statement-based: SQL replayed. Cheap but non-deterministic (
NOW(),RAND()). - Row-based: serialized row diff. Deterministic, larger volume.
- WAL/redo shipping (Postgres physical, Oracle redo): byte-level.
- Logical/CDC (Postgres logical, Debezium): row events for downstream consumers.
Replication topologies#
- Single leader (most common).
- Multi-leader (geo, conflict resolution needed: LWW, CRDT, app-defined).
- Leaderless (Dynamo): clients write to N, read from N, R+W>N.
Failover steps#
- Detect leader down (heartbeat, quorum check).
- Pick most-up-to-date replica.
- Fence old leader (STONITH).
- Promote new leader, repoint app/VIP.
- Re-attach stale replicas (rebuild if diverged).
Replication lag#
- Causes: long-running tx, single-threaded apply (older MySQL), network.
- Symptoms: read-after-write stale, broken pagination.
- Mitigation: read from primary for critical paths, monotonic-read session, parallel applier.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Load balancer / GSLB | L4/L7 traffic distribution and failover | load-balancer |
HLD |
Raft / Paxos consensus | replicated state machine via majority quorum | consensus-raft-paxos |
HLD |
Leader/follower replication | sync/semi-sync/async replication, failover | replication-leader-follower |
HLD |
LSM vs B-Tree engines | WAL, memtable, SSTables, compaction | storage-engines-lsm-btree |
HLD |
CRDTs | commutative replicated data types | crdts |
HLD |
Change Data Capture | WAL/binlog tailing, outbox publishing | change-data-capture |
HLD |
Multi-region & DR | RTO / RPO, active-active, failover | multi-region-dr |