WhatsApp / Messenger — Notes
Functional
- 1:1 and group chat (text, voice notes, media, location).
- Delivery + read receipts, typing, presence.
- Voice + video calls.
- Multi-device, E2E encryption.
- Push notifications when offline.
Non-functional
- p99 send latency < 500 ms.
- 2B users; 100B msgs/day.
- 99.99% uptime; survive regional outage.
Capacity
- 100B msgs/day → ~1.2M/s avg, 5M/s peak.
- Avg msg + envelope ~1 KB → ~100 TB/day before E2E.
- Long-lived WS connections: 2B / 100k per box = 20k+ gateway boxes.
API (simplified)
WS up: hello + auth token + device id
WS msg: send(chat_id, payload, msg_id, prekey?)
WS evt: delivered(msg_id) / read(msg_id) / typing(chat_id)
Schema
users(phone, id, public_key)
devices(user_id, device_id, push_token, key_bundle)
groups(id, [member_id...])
inbox(user_id, msg_id, payload, ts) Cassandra, TTL after delivery
Trade-offs
- Sticky WS gateway simplifies routing; failover requires re-connect.
- Server-deletes-after-delivery (WhatsApp) vs full server history (Messenger): privacy vs sync convenience.
- E2E prevents server-side moderation; flag-based reporting compensates.
- Multi-device pairing: device tree (one master, others linked) vs full mesh.
Refs
- WhatsApp engineering talks (Erlang scaling stories).
- Signal Protocol papers (X3DH, Double Ratchet).
- Facebook Messenger architecture blog posts.
- Alex Xu Vol 2 "Design a chat system."