Skip to content

Load Balancer — Notes#

Functional requirements#

  • Distribute incoming traffic across N backends.
  • Detect and bypass unhealthy backends within seconds.
  • Support multiple algorithms (RR, WRR, least-conn, consistent hash).
  • Terminate TLS (optional) and route by host/path (L7).

Non-functional requirements#

  • Throughput: 1M+ RPS per LB pair (commodity NIC).
  • Latency overhead: < 1 ms p99 added by LB.
  • Availability: 99.99%+. No single LB = SPOF.
  • Horizontal scale: ECMP + multiple LB nodes.

Capacity estimation (example)#

  • 100k RPS, 1 KB request, 10 KB response → ~10 Gbps egress.
  • Connections: 100k RPS × 0.5 s avg keep-alive = 50k concurrent.
  • File descriptors per LB box: 200k+ (tune ulimit, ephemeral ports).

API surface#

  • Control plane: add_backend(host, weight), drain(host), set_health(host).
  • Data plane: transparent — clients send to VIP, LB forwards.
  • xDS (Envoy) for dynamic config push.

Data model#

  • Pool{ id, algo, hc_config }
  • Backend{ id, pool_id, addr, weight, state(UP/DOWN/DRAIN) }
  • Listener{ vip, port, tls_cert, route_rules[] }

Trade-offs#

  • L4 = cheapest, fastest, opaque to app; L7 = richer features, ~2–5× CPU.
  • DNS LB = simple but slow failover (TTL); GSLB needed for multi-region.
  • Sticky sessions simplify legacy apps but pin load; prefer stateless JWT.
  • Active-active anycast scales out but requires BGP; VRRP active-passive is easier ops.
  • TLS termination at edge improves CPU on backend but exposes plaintext in DC; mTLS to backend solves it at cost.

Real-world refs#

  • Google Maglev (consistent hash + ECMP), Facebook Katran (XDP/eBPF L4), AWS NLB (L4) / ALB (L7), Cloudflare Unimog, Envoy + Istio.