Email Service — Notes
Functional
- Receive (SMTP), store, deliver.
- Send via SMTP outbound.
- Per-user folders/labels; threads.
- Full-text search.
- Spam, malware, phishing detection.
- IMAP/POP for clients; web UI.
- Attachments, signatures (S/MIME, PGP optional).
Non-functional
- 99.99% availability for inbound (loss = lost mail).
- Deliver within seconds normally; minutes during incidents.
- 1B+ mailboxes, 10s of B msgs/day.
Capacity
- 50B msgs/day × 50 KB avg = 2.5 PB/day raw.
- Per user: avg 5–10 GB mailbox.
- Attachment dedup saves 30–50%.
Schema (per-user mailbox)
messages(user_id, msg_id, thread_id, headers, body_ref, labels[], read)
threads(user_id, thread_id, subject, last_msg_ts)
index_terms(user_id, term, msg_ids[]) posting list
Trade-offs
- Per-user partitioning simplifies isolation and search but breaks aggregate analytics.
- Bigtable / Cassandra preferred over RDB at scale; transactions are per-user.
- Threading by Subject can over-merge unrelated mails; modern Gmail uses Refs.
- Server-side rules vs client filters: server-side scales but harder UX.
- End-to-end encryption (S/MIME, PGP) cripples search and spam — niche.
Refs
- Gmail architecture talks (Bigtable + search), RFC 5321 (SMTP), RFC 5322 (Mail format),
DMARC.org docs, "Gmail outage post-mortems."