Google Drive / Dropbox — Notes
Functional
- Multi-device file sync (incremental).
- Sharing (per-file ACL, link sharing).
- Versioning + restore.
- Preview, search.
- Office integration / live collab.
Non-functional
- Eventual consistency tolerated (seconds).
- Durability: 11×9s (S3-class).
- Conflict resolution: rename-on-conflict by default.
Capacity
- Avg user: 10 GB stored, hundreds of files.
- Dedup ratio: 25–50% within user, lower if encrypted client-side.
- 1B users × 10 GB = 10 EB; with dedup → ~5 EB physical.
Schema
files(id, owner, parent_id, name, mime, version, size, chunks[])
chunks(hash PK, refcount, blob_url) — global or per-user table
acl(file_id, principal_id, role)
events(user_id, ts, type, file_id) for change feed
Trade-offs
- Block dedup is the killer feature; CPU cost on client.
- Long-poll vs WebSocket vs push: combine; reliability over elegance.
- Conflict resolution: rename-on-conflict is simple; OT/CRDT for live collab on docs.
- Server-side encryption vs client-side: client-side breaks search and preview.
- Magic Pocket / Colossus vs S3: own infra at scale saves big money.
Refs
- Dropbox engineering blog: Magic Pocket, "Why we built our own…", FSEvents chains.
- Google Drive APIs + Docs collab papers.
- Casado: "Towards Internet-scale conflict-free replicated data types".
- ByteByteGo "Design Google Drive", Alex Xu Vol 2.