TikTok — Detailed#
flowchart TB
subgraph Creator
APP([Creator App])
CAM[Camera + Effects]
end
subgraph Upload[Upload Pipeline]
PRE[Pre-signed URL]
OBJ[(Origin S3)]
META[Video Metadata Service]
TRANS([Transcoder<br/>H.264 + H.265 + AV1<br/>multi-ladder HLS/DASH])
THUMB[Thumbnail / Cover gen]
AUD[Audio extract + fingerprint]
AI([AI moderation<br/>NSFW, violence, copyright, OCR])
HASH[Perceptual hash<br/>dedup]
end
subgraph Distribution
CDN[Global CDN]
EDGE[Edge caches]
end
subgraph Recommend[Recommendation - the core]
CG([Candidate Generation<br/>recall: ANN + collaborative])
RANK1([Coarse ranker<br/>fast DNN])
RANK2([Fine ranker<br/>heavy DNN multitask])
RERANK([Diversity + business reranker])
FRESH[Cold-start exploration]
NEG[Negative signals dedup]
end
subgraph Signals
WATCH[Watch time / completion]
REWATCH[Rewatches / loops]
LIKE[Like / share / save / follow]
SCROLL[Scroll-past time]
AUDIO[Audio engagement]
CMT[Comments / sentiment]
end
subgraph Stores
PDB[(Video metadata)]
USER([(User profile / interest)])
EMB([(Embeddings store)])
GRAPH[(Follow / interest graph)]
HIST[(Watch history)]
COMM[(Comments)]
LIKES[(Likes)]
end
subgraph ML[ML Platform]
FE[Feature store - real-time]
TR[Training pipelines]
DEPLOY([Model server fleet])
AB[A/B exp framework]
end
subgraph Social
DM[Direct Messages]
LIVE[Live streaming]
DUET[Duets / Stitches]
end
APP --> PRE --> OBJ
APP --> META --> PDB
META --> TRANS --> CDN
TRANS --> THUMB
TRANS --> AUD
TRANS --> AI --> HASH
Viewer[Viewer] --> CDN
Viewer --> RANK1
CG --> RANK1 --> RANK2 --> RERANK --> Viewer
Signals --> FE
FE --> CG
FE --> RANK1
FE --> RANK2
EMB --> CG
HIST --> CG
GRAPH --> CG
AB --> DEPLOY
DEPLOY --- RANK1
DEPLOY --- RANK2
Social --- Viewer
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class APP,Viewer client;
class CDN edge;
class CAM,PRE,META,THUMB,AUD,HASH,FRESH,NEG,WATCH,REWATCH,LIKE,SCROLL,AUDIO,CMT,FE,TR,AB,DM,LIVE,DUET service;
class PDB,USER,EMB,HIST,COMM,LIKES datastore;
class EDGE cache;
class TRANS,AI,CG,RANK1,RANK2,RERANK,DEPLOY compute;
class OBJ storage;
What makes TikTok different#
- Feed is recommendation-first: most videos shown aren't from followed accounts.
- Heavy reliance on watch-time and completion signals; like is a weaker signal.
- Recall stage uses two-tower embeddings (user × video) with ANN (HNSW/ScaNN).
- Fine ranker is a multi-task DNN predicting many heads (like, share, follow, complete) combined.
Cold start#
- New videos enter exploration bucket; if early CTR good, broaden distribution.
- New users get popular by region/language until profile builds.
Audio + copyright#
- Audio fingerprinting identifies music; license / mute / monetization.
- DMCA pipeline.
Real-time#
- Watch event → Kafka → feature store → next session feed.
- Latency budget for feature push: < 1 min for recent watch to influence next call.
Scale#
- 1B+ MAU, hundreds of millions of new videos / day.
- Multi-region storage; edge caches for cold-start regional bias.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
CDN | edge caching for static assets | cdn |
HLD |
Pub/Sub & message brokers | topics, consumer groups, delivery semantics | pub-sub-pattern |
HLD |
Search internals | inverted index, BM25, embeddings, ANN | search-internals |
HLD |
Multi-region & DR | RTO / RPO, active-active, failover | multi-region-dr |