Health Check / Heartbeat — Notes
Functional
- Periodic active and/or passive checks.
- Distributed agreement on liveness.
- Feed routing + LB + alerts.
- Dependency-aware (don't page if upstream is the real cause).
Non-functional
- Sub-second detection ideal.
- Avoid flapping (debounce, hysteresis).
- Survive network partitions safely (prefer caution).
Trade-offs
- Frequent probes = fast detection but load.
- Phi accrual = nuanced; tuning effort.
- Multi-checker quorum to avoid single observer triggering global change.
Refs
- Cassandra & Akka phi-accrual.
- Consul / etcd / ZK health docs.
- "The Tail at Scale" Dean & Barroso.