Skip to content

Email Service — Notes#

Functional#

  • Receive (SMTP), store, deliver.
  • Send via SMTP outbound.
  • Per-user folders/labels; threads.
  • Full-text search.
  • Spam, malware, phishing detection.
  • IMAP/POP for clients; web UI.
  • Attachments, signatures (S/MIME, PGP optional).

Non-functional#

  • 99.99% availability for inbound (loss = lost mail).
  • Deliver within seconds normally; minutes during incidents.
  • 1B+ mailboxes, 10s of B msgs/day.

Capacity#

  • 50B msgs/day × 50 KB avg = 2.5 PB/day raw.
  • Per user: avg 5–10 GB mailbox.
  • Attachment dedup saves 30–50%.

Schema (per-user mailbox)#

  • messages(user_id, msg_id, thread_id, headers, body_ref, labels[], read)
  • threads(user_id, thread_id, subject, last_msg_ts)
  • index_terms(user_id, term, msg_ids[]) posting list

Trade-offs#

  • Per-user partitioning simplifies isolation and search but breaks aggregate analytics.
  • Bigtable / Cassandra preferred over RDB at scale; transactions are per-user.
  • Threading by Subject can over-merge unrelated mails; modern Gmail uses Refs.
  • Server-side rules vs client filters: server-side scales but harder UX.
  • End-to-end encryption (S/MIME, PGP) cripples search and spam — niche.

Refs#

  • Gmail architecture talks (Bigtable + search), RFC 5321 (SMTP), RFC 5322 (Mail format), DMARC.org docs, "Gmail outage post-mortems."