Leader/Follower Replication — Notes#
Why#
- HA: survive primary loss.
- Read scale: serve reads off followers.
- Analytics isolation: dedicated replica for OLAP.
- Geo: place followers near readers.
Sync vs async vs semi-sync#
| Mode | Durability on primary loss | Write latency |
|---|---|---|
| Async | last txs may be lost | low (1 fsync) |
| Semi-sync | 1 replica safe | +1 RTT |
| Sync | N replicas safe | bounded by slowest |
Many production systems use one sync replica + many async (Postgres, MySQL group repl, Aurora).
Lag management#
- Monitor
seconds_behind_master/pg_stat_replication. - Backpressure writes if lag > threshold.
- Session-bound routing:
read_after_write→ primary for next K seconds.
Failover patterns#
- Manual (safest), automated (Patroni, Orchestrator, Sentinel).
- Split-brain prevention: fencing tokens, etcd lease, STONITH.
- VIP swap vs DNS swap vs proxy (PgBouncer/ProxySQL).
Refs#
- Designing Data-Intensive Applications, ch. 5.
- Postgres streaming replication docs, MySQL replication doc, Aurora paper (SIGMOD '17).
- Orchestrator (GitHub MySQL HA), Patroni (Zalando).