Gmail-style Email Service — Detailed#
flowchart TB
subgraph Inbound[Inbound path]
DNS[MX records]
MTA_IN[Inbound MTA<br/>SMTP 25/465/587]
TLS[STARTTLS / MTA-STS]
AUTHV[SPF / DKIM / DMARC / ARC]
SPAM([Spam classifier<br/>Bayes + ML])
VIRUS[Malware scan]
GREY[Greylisting / rate limit]
ROUTE[Address rewriter / aliases]
end
subgraph Storage[Storage Layer]
BIG([(Bigtable / KV<br/>per-user mailbox)])
OBJ[(Attachment store)]
DEDUP[Content dedup<br/>shared blob refs]
META[(Mailbox metadata)]
LABEL[Labels / Folders model]
end
subgraph Search
IDX([(Inverted Index<br/>per user)])
REALTIME[Real-time indexer]
end
subgraph User[User access]
WEB([Web UI])
IMAP[IMAP / POP / JMAP]
API[Gmail API]
PUSH((Push - APNS / FCM))
end
subgraph Outbound
COMPOSE[Compose]
QUEUE[[Outbound queue]]
MTA_OUT[Outbound MTA pool]
DKIM_SIGN[DKIM signing]
BOUNCE[Bounce / DSN handling]
REPUTE[IP / domain reputation]
end
subgraph Features
THREAD[Threading - by Subject / Refs]
PRIO[[Priority Inbox]]
SMART[Smart Reply / Smart Compose]
PHISH[Phishing detection]
LABEL2[Filter / Rule engine]
end
subgraph Ops
QUOTA([Per-user quota])
RETN[Retention / Deletion]
AUDIT[Audit log]
end
Internet --> DNS --> MTA_IN
MTA_IN --> TLS --> AUTHV --> GREY --> SPAM --> VIRUS --> ROUTE --> BIG
BIG --> REALTIME --> IDX
BIG --> META
OBJ --- BIG
DEDUP --- OBJ
User --> WEB
WEB --> BIG
WEB --> IDX
COMPOSE --> QUEUE --> DKIM_SIGN --> MTA_OUT --> Internet
MTA_OUT --> BOUNCE
BOUNCE --> COMPOSE
REPUTE --- MTA_OUT
Features --- BIG
Ops --- BIG
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class WEB,QUOTA client;
class DNS,MTA_IN,TLS,AUTHV,VIRUS,GREY,ROUTE,DEDUP,LABEL,REALTIME,IMAP,API,COMPOSE,MTA_OUT,DKIM_SIGN,BOUNCE,REPUTE,THREAD,SMART,PHISH,LABEL2,RETN service;
class BIG,OBJ,META,IDX datastore;
class QUEUE,PRIO queue;
class SPAM compute;
class PUSH external;
class AUDIT obs;
Mailbox storage#
- Per-user mailbox in a sharded KV (Gmail historically on Bigtable; Yahoo on Cassandra-ish).
- Message keyed by
(user_id, msg_id); immutable body + mutable flags & labels. - Attachments stored once in object store, referenced by content hash (dedup).
- Search index is per-user inverted index.
Threading#
- RFC 2822
In-Reply-To/Referencesheaders form thread graph; fall back to normalized Subject. - Gmail-style label-based threading vs Outlook-style folder model.
Anti-spam stack#
- Connection-time checks (RBL, rate, greylisting).
- Auth: SPF (sender IP allowed), DKIM (signature match), DMARC (policy alignment), ARC (forwarded chain).
- Content classifier (Bayes + ML + reputation).
- User feedback ("Report spam") fed back into models.
Outbound reputation#
- Warm IPs gradually; SPF + DKIM on every send.
- Bounce processing → list hygiene.
Search#
- Real-time indexing on receive (within seconds).
- Per-user inverted index avoids cross-tenant leaks.
Scale notes#
- Billions of mailboxes; some > 10 GB.
- Attachment dedup saves significant storage (same forwarded chain).
- IMAP idle keeps connections persistent → many open sockets.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Search internals | inverted index, BM25, embeddings, ANN | search-internals |
LLD |
Immutability | immutable types, persistent collections | immutability |