Skip to content

Leader/Follower Replication — Notes#

Why#

  • HA: survive primary loss.
  • Read scale: serve reads off followers.
  • Analytics isolation: dedicated replica for OLAP.
  • Geo: place followers near readers.

Sync vs async vs semi-sync#

Mode Durability on primary loss Write latency
Async last txs may be lost low (1 fsync)
Semi-sync 1 replica safe +1 RTT
Sync N replicas safe bounded by slowest

Many production systems use one sync replica + many async (Postgres, MySQL group repl, Aurora).

Lag management#

  • Monitor seconds_behind_master / pg_stat_replication.
  • Backpressure writes if lag > threshold.
  • Session-bound routing: read_after_write → primary for next K seconds.

Failover patterns#

  • Manual (safest), automated (Patroni, Orchestrator, Sentinel).
  • Split-brain prevention: fencing tokens, etcd lease, STONITH.
  • VIP swap vs DNS swap vs proxy (PgBouncer/ProxySQL).

Refs#

  • Designing Data-Intensive Applications, ch. 5.
  • Postgres streaming replication docs, MySQL replication doc, Aurora paper (SIGMOD '17).
  • Orchestrator (GitHub MySQL HA), Patroni (Zalando).