Skip to content

Health Check / Heartbeat — Notes#

Functional#

  • Periodic active and/or passive checks.
  • Distributed agreement on liveness.
  • Feed routing + LB + alerts.
  • Dependency-aware (don't page if upstream is the real cause).

Non-functional#

  • Sub-second detection ideal.
  • Avoid flapping (debounce, hysteresis).
  • Survive network partitions safely (prefer caution).

Trade-offs#

  • Frequent probes = fast detection but load.
  • Phi accrual = nuanced; tuning effort.
  • Multi-checker quorum to avoid single observer triggering global change.

Refs#

  • Cassandra & Akka phi-accrual.
  • Consul / etcd / ZK health docs.
  • "The Tail at Scale" Dean & Barroso.