Skip to content

TikTok — Detailed#

flowchart TB
  subgraph Creator
    APP([Creator App])
    CAM[Camera + Effects]
  end

  subgraph Upload[Upload Pipeline]
    PRE[Pre-signed URL]
    OBJ[(Origin S3)]
    META[Video Metadata Service]
    TRANS([Transcoder<br/>H.264 + H.265 + AV1<br/>multi-ladder HLS/DASH])
    THUMB[Thumbnail / Cover gen]
    AUD[Audio extract + fingerprint]
    AI([AI moderation<br/>NSFW, violence, copyright, OCR])
    HASH[Perceptual hash<br/>dedup]
  end

  subgraph Distribution
    CDN[Global CDN]
    EDGE[Edge caches]
  end

  subgraph Recommend[Recommendation - the core]
    CG([Candidate Generation<br/>recall: ANN + collaborative])
    RANK1([Coarse ranker<br/>fast DNN])
    RANK2([Fine ranker<br/>heavy DNN multitask])
    RERANK([Diversity + business reranker])
    FRESH[Cold-start exploration]
    NEG[Negative signals dedup]
  end

  subgraph Signals
    WATCH[Watch time / completion]
    REWATCH[Rewatches / loops]
    LIKE[Like / share / save / follow]
    SCROLL[Scroll-past time]
    AUDIO[Audio engagement]
    CMT[Comments / sentiment]
  end

  subgraph Stores
    PDB[(Video metadata)]
    USER([(User profile / interest)])
    EMB([(Embeddings store)])
    GRAPH[(Follow / interest graph)]
    HIST[(Watch history)]
    COMM[(Comments)]
    LIKES[(Likes)]
  end

  subgraph ML[ML Platform]
    FE[Feature store - real-time]
    TR[Training pipelines]
    DEPLOY([Model server fleet])
    AB[A/B exp framework]
  end

  subgraph Social
    DM[Direct Messages]
    LIVE[Live streaming]
    DUET[Duets / Stitches]
  end

  APP --> PRE --> OBJ
  APP --> META --> PDB
  META --> TRANS --> CDN
  TRANS --> THUMB
  TRANS --> AUD
  TRANS --> AI --> HASH
  Viewer[Viewer] --> CDN
  Viewer --> RANK1
  CG --> RANK1 --> RANK2 --> RERANK --> Viewer
  Signals --> FE
  FE --> CG
  FE --> RANK1
  FE --> RANK2
  EMB --> CG
  HIST --> CG
  GRAPH --> CG
  AB --> DEPLOY
  DEPLOY --- RANK1
  DEPLOY --- RANK2
  Social --- Viewer

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class APP,Viewer client;
    class CDN edge;
    class CAM,PRE,META,THUMB,AUD,HASH,FRESH,NEG,WATCH,REWATCH,LIKE,SCROLL,AUDIO,CMT,FE,TR,AB,DM,LIVE,DUET service;
    class PDB,USER,EMB,HIST,COMM,LIKES datastore;
    class EDGE cache;
    class TRANS,AI,CG,RANK1,RANK2,RERANK,DEPLOY compute;
    class OBJ storage;

What makes TikTok different#

  • Feed is recommendation-first: most videos shown aren't from followed accounts.
  • Heavy reliance on watch-time and completion signals; like is a weaker signal.
  • Recall stage uses two-tower embeddings (user × video) with ANN (HNSW/ScaNN).
  • Fine ranker is a multi-task DNN predicting many heads (like, share, follow, complete) combined.

Cold start#

  • New videos enter exploration bucket; if early CTR good, broaden distribution.
  • New users get popular by region/language until profile builds.
  • Audio fingerprinting identifies music; license / mute / monetization.
  • DMCA pipeline.

Real-time#

  • Watch event → Kafka → feature store → next session feed.
  • Latency budget for feature push: < 1 min for recent watch to influence next call.

Scale#

  • 1B+ MAU, hundreds of millions of new videos / day.
  • Multi-region storage; edge caches for cold-start regional bias.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD CDN edge caching for static assets cdn
HLD Pub/Sub & message brokers topics, consumer groups, delivery semantics pub-sub-pattern
HLD Search internals inverted index, BM25, embeddings, ANN search-internals
HLD Multi-region & DR RTO / RPO, active-active, failover multi-region-dr