Skip to content

Google Drive / Dropbox — Detailed#

flowchart TB
  subgraph Clients
    DESK([Desktop client])
    MOB([Mobile])
    WEB([Web])
    OFF[Office plugins]
  end

  subgraph Edge
    LB
    GW[API Gateway]
  end

  subgraph Sync[Sync Engine]
    WATCH[Filesystem watcher / FSEvents / inotify]
    DIFF[Local change diff]
    CHUNK[Content-defined chunking<br/>Rabin / FastCDC, ~4 MB]
    HASH[SHA-256 per chunk]
    DEDUP([Block-level dedup<br/>against user + global])
    COMP[Compression]
    ENC([Client-side / At-rest encryption])
    UP[Resumable upload]
    DOWN[Range download]
  end

  subgraph Services
    META[Metadata Service]
    NSP([Namespace Service<br/>per user / shared drives])
    NOTIF[[Notification fan-out]]
    ACL[Permissions / Sharing]
    SEARCH([Search / Tika OCR + ML])
    PREVIEW[Preview / Thumbnail]
    OFFLINE[[Offline sync queue]]
    VER[Versioning / Snapshots]
    TRASH[Trash / Restore]
  end

  subgraph Storage
    BLOB[(Block store<br/>S3 / Colossus / Magic Pocket)]
    META_DB[(Metadata SQL/Spanner)]
    INDEX[(Search index)]
    AUDIT[(Audit log)]
  end

  subgraph Realtime
    LP[Long-poll / SSE / WS]
    PUSH((APNS / FCM))
  end

  subgraph Office[Docs integration]
    DOC[Docs / Sheets / Slides]
    LIVE[Live collab CRDT/OT]
  end

  subgraph Safety
    AV[Virus scan]
    DLP([DLP scanner])
    LEGAL[Legal hold]
  end

  Clients --> LB --> GW --> Sync
  Sync --> BLOB
  Sync --> META
  META --> META_DB
  META --> NOTIF
  NOTIF --> LP --> Clients
  NOTIF --> PUSH
  ACL --- META
  SEARCH --- INDEX
  PREVIEW --- BLOB
  VER --- META
  TRASH --- META
  Office --- META
  Office --- LIVE
  Safety --- BLOB

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class DESK,MOB,WEB,DEDUP,ENC,NSP client;
    class GW edge;
    class OFF,WATCH,DIFF,CHUNK,HASH,COMP,UP,DOWN,META,ACL,PREVIEW,VER,LP,DOC,LIVE,AV,LEGAL service;
    class TRASH,META_DB,INDEX,AUDIT datastore;
    class NOTIF,OFFLINE queue;
    class SEARCH,DLP compute;
    class BLOB storage;
    class PUSH external;

Block-level sync#

  1. Watcher detects change.
  2. Chunker (FastCDC) splits file into ~4 MB content-defined chunks.
  3. SHA-256 per chunk → lookup in dedup table.
  4. Only NEW chunks are uploaded.
  5. Metadata service updates file → [chunk hashes] mapping.
  6. Notifier wakes other clients on same file/namespace.

Why content-defined chunking#

  • Edits in the middle of a file shift bytes; fixed-size chunking re-uploads everything after.
  • Rolling-hash boundaries align to content, so only changed regions create new chunks.

Dedup scopes#

  • Per-user dedup: safe by default.
  • Cross-user (global) dedup: large savings but enables side-channel attacks (proof of ownership needed; Dropbox abandoned cross-user dedup years ago).

Sharing & permissions#

  • Permission models: per-file ACL, link-based (capability tokens), domain-restricted.
  • Permission inheritance and explicit overrides.

Versioning#

  • Keep historical chunk lists per file version.
  • Versions never delete chunks until all refs gone (refcount or GC).

Real-time notification#

  • Long-poll or WebSocket per client; namespace = (user_id, root_dir).
  • Mobile fallback to push.

Capacity#

  • Dropbox: ~700 M users, EB scale storage (Magic Pocket).
  • Drive: tightly integrated with Google's Colossus.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD API gateway / BFF single ingress, auth, rate limit, routing api-gateway
HLD CRDTs commutative replicated data types crdts
HLD Realtime protocols WS / SSE / polling / gRPC streaming realtime-protocols
HLD Search internals inverted index, BM25, embeddings, ANN search-internals
LLD REST API design verbs, statuses, pagination, errors rest-api-design
LLD Async models futures / async-await / coroutines / actors async-models
LLD OOP pillars encapsulation, abstraction, inheritance, polymorphism oop-pillars