Google Drive / Dropbox — Detailed#
flowchart TB
subgraph Clients
DESK([Desktop client])
MOB([Mobile])
WEB([Web])
OFF[Office plugins]
end
subgraph Edge
LB
GW[API Gateway]
end
subgraph Sync[Sync Engine]
WATCH[Filesystem watcher / FSEvents / inotify]
DIFF[Local change diff]
CHUNK[Content-defined chunking<br/>Rabin / FastCDC, ~4 MB]
HASH[SHA-256 per chunk]
DEDUP([Block-level dedup<br/>against user + global])
COMP[Compression]
ENC([Client-side / At-rest encryption])
UP[Resumable upload]
DOWN[Range download]
end
subgraph Services
META[Metadata Service]
NSP([Namespace Service<br/>per user / shared drives])
NOTIF[[Notification fan-out]]
ACL[Permissions / Sharing]
SEARCH([Search / Tika OCR + ML])
PREVIEW[Preview / Thumbnail]
OFFLINE[[Offline sync queue]]
VER[Versioning / Snapshots]
TRASH[Trash / Restore]
end
subgraph Storage
BLOB[(Block store<br/>S3 / Colossus / Magic Pocket)]
META_DB[(Metadata SQL/Spanner)]
INDEX[(Search index)]
AUDIT[(Audit log)]
end
subgraph Realtime
LP[Long-poll / SSE / WS]
PUSH((APNS / FCM))
end
subgraph Office[Docs integration]
DOC[Docs / Sheets / Slides]
LIVE[Live collab CRDT/OT]
end
subgraph Safety
AV[Virus scan]
DLP([DLP scanner])
LEGAL[Legal hold]
end
Clients --> LB --> GW --> Sync
Sync --> BLOB
Sync --> META
META --> META_DB
META --> NOTIF
NOTIF --> LP --> Clients
NOTIF --> PUSH
ACL --- META
SEARCH --- INDEX
PREVIEW --- BLOB
VER --- META
TRASH --- META
Office --- META
Office --- LIVE
Safety --- BLOB
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class DESK,MOB,WEB,DEDUP,ENC,NSP client;
class GW edge;
class OFF,WATCH,DIFF,CHUNK,HASH,COMP,UP,DOWN,META,ACL,PREVIEW,VER,LP,DOC,LIVE,AV,LEGAL service;
class TRASH,META_DB,INDEX,AUDIT datastore;
class NOTIF,OFFLINE queue;
class SEARCH,DLP compute;
class BLOB storage;
class PUSH external;
Block-level sync#
- Watcher detects change.
- Chunker (FastCDC) splits file into ~4 MB content-defined chunks.
- SHA-256 per chunk → lookup in dedup table.
- Only NEW chunks are uploaded.
- Metadata service updates
file → [chunk hashes]mapping. - Notifier wakes other clients on same file/namespace.
Why content-defined chunking#
- Edits in the middle of a file shift bytes; fixed-size chunking re-uploads everything after.
- Rolling-hash boundaries align to content, so only changed regions create new chunks.
Dedup scopes#
- Per-user dedup: safe by default.
- Cross-user (global) dedup: large savings but enables side-channel attacks (proof of ownership needed; Dropbox abandoned cross-user dedup years ago).
Sharing & permissions#
- Permission models: per-file ACL, link-based (capability tokens), domain-restricted.
- Permission inheritance and explicit overrides.
Versioning#
- Keep historical chunk lists per file version.
- Versions never delete chunks until all refs gone (refcount or GC).
Real-time notification#
- Long-poll or WebSocket per client; namespace =
(user_id, root_dir). - Mobile fallback to push.
Capacity#
- Dropbox: ~700 M users, EB scale storage (Magic Pocket).
- Drive: tightly integrated with Google's Colossus.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
API gateway / BFF | single ingress, auth, rate limit, routing | api-gateway |
HLD |
CRDTs | commutative replicated data types | crdts |
HLD |
Realtime protocols | WS / SSE / polling / gRPC streaming | realtime-protocols |
HLD |
Search internals | inverted index, BM25, embeddings, ANN | search-internals |
LLD |
REST API design | verbs, statuses, pagination, errors | rest-api-design |
LLD |
Async models | futures / async-await / coroutines / actors | async-models |
LLD |
OOP pillars | encapsulation, abstraction, inheritance, polymorphism | oop-pillars |