Skip to content

Google Drive / Dropbox — Notes#

Functional#

  • Multi-device file sync (incremental).
  • Sharing (per-file ACL, link sharing).
  • Versioning + restore.
  • Preview, search.
  • Office integration / live collab.

Non-functional#

  • Eventual consistency tolerated (seconds).
  • Durability: 11×9s (S3-class).
  • Conflict resolution: rename-on-conflict by default.

Capacity#

  • Avg user: 10 GB stored, hundreds of files.
  • Dedup ratio: 25–50% within user, lower if encrypted client-side.
  • 1B users × 10 GB = 10 EB; with dedup → ~5 EB physical.

Schema#

  • files(id, owner, parent_id, name, mime, version, size, chunks[])
  • chunks(hash PK, refcount, blob_url) — global or per-user table
  • acl(file_id, principal_id, role)
  • events(user_id, ts, type, file_id) for change feed

Trade-offs#

  • Block dedup is the killer feature; CPU cost on client.
  • Long-poll vs WebSocket vs push: combine; reliability over elegance.
  • Conflict resolution: rename-on-conflict is simple; OT/CRDT for live collab on docs.
  • Server-side encryption vs client-side: client-side breaks search and preview.
  • Magic Pocket / Colossus vs S3: own infra at scale saves big money.

Refs#

  • Dropbox engineering blog: Magic Pocket, "Why we built our own…", FSEvents chains.
  • Google Drive APIs + Docs collab papers.
  • Casado: "Towards Internet-scale conflict-free replicated data types".
  • ByteByteGo "Design Google Drive", Alex Xu Vol 2.