Skip to content

Google Photos — Detailed#

flowchart TB
  subgraph Client[Clients]
    PH([Phone])
    WEB([Web])
    DESK[Desktop sync]
  end

  subgraph Upload[Upload Pipeline]
    PRE[Pre-signed URL]
    CHUNK[Chunked + resumable upload]
    HASH[SHA-256 + perceptual hash]
    DEDUP([Dedup against user library])
    ENC[Encrypt at rest]
  end

  subgraph Storage[Storage Tiers]
    HOT[(Hot tier - SSD)]
    COLD[(Cold tier - HDD/Tape)]
    ORIG[(Originals)]
    DERIV[(Derivatives<br/>thumbs 320/720/1080/4K)]
    CDN[Photos CDN]
  end

  subgraph Process[Processing]
    THUMB[Thumbnailer]
    EXIF([EXIF parser<br/>date, GPS, camera])
    LIVE[Live photo / motion]
    HEVC([HEIC → JPEG/WebP transcoder])
    VIDEO([Video transcoder ladder])
  end

  subgraph ML[ML Tagging]
    FACE([Face clustering<br/>per-user-only])
    OBJ[Object detection<br/>cat, dog, beach, food]
    OCR([OCR + scene text])
    LANDMARK[Landmark recognition]
    SAFE([NSFW / CSAM scan])
    EMB([Image embeddings])
  end

  subgraph Search[Search]
    INV([(Inverted index<br/>per user)])
    NLQ["Natural language query<br/>beach 2019 to labels + date"]
    PEOPLE[People search]
  end

  subgraph Albums[Albums & Memories]
    ALB[Album service]
    MEM([Memories generator<br/>this day, smart highlights])
    SHARE[Shared albums]
    PARTNER([Partner sharing])
  end

  subgraph Sync[Sync & Backup]
    SYNCSVC([Sync engine])
    NOTIF[Realtime notify]
    OFFLINE[[Offline queue]]
  end

  subgraph Meta
    META[(Metadata DB)]
    AUDIT[(Audit log)]
    QUOTA[Quota / Plan]
  end

  Client --> PRE --> CHUNK --> ORIG
  CHUNK --> HASH --> DEDUP
  CHUNK --> ENC
  ORIG --> THUMB --> DERIV --> CDN
  ORIG --> EXIF --> META
  ORIG --> HEVC --> DERIV
  ORIG --> VIDEO --> DERIV
  ORIG --> ML
  ML --> EMB --> Search
  ML --> META
  Client --> Search
  Search --> INV
  Client --> Albums
  Albums --> META
  Sync --- Client
  Sync --- ORIG
  Sync --- META
  ORIG -. age out .-> COLD

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class PH,WEB,DEDUP,FACE,PARTNER client;
    class CDN edge;
    class DESK,PRE,CHUNK,HASH,ENC,THUMB,LIVE,OBJ,LANDMARK,NLQ,PEOPLE,ALB,SHARE,NOTIF,QUOTA service;
    class HOT,COLD,ORIG,DERIV,INV,META,AUDIT datastore;
    class OFFLINE queue;
    class EXIF,HEVC,VIDEO,OCR,SAFE,EMB,MEM,SYNCSVC compute;

Storage tiering#

  • Originals always retained (per quota).
  • Derivatives regenerated lazily if missing — cheaper than always storing every size × format.
  • Cold tier on long-term storage for old photos with low access prob.

Face clustering (privacy-aware)#

  • Faces detected and grouped only within the user's library — never cross-user.
  • User confirms or merges clusters; not used for cross-user identification.
  • Combine inverted index over labels (cat, beach) with structured filters (date, location).
  • Image embedding search: text query → image embedding via CLIP-like model → ANN.

Sharing#

  • Per-album share token (capability URL); access control on view.
  • Partner sharing: bi-directional auto-share of selected albums.

Capacity#

  • Many billions of photos; ~5 MB / photo avg ≈ exabytes total.
  • Dedup saves significant cost when users re-upload same files across devices.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD CDN edge caching for static assets cdn
HLD Search internals inverted index, BM25, embeddings, ANN search-internals
LLD REST API design verbs, statuses, pagination, errors rest-api-design