TikTok — Notes
Functional
- Short-video upload + edit + publish.
- Personalized "For You" feed.
- Follow / DM / comments / duets / stitches.
- Live streams.
- Music / sound library.
Non-functional
- p99 video first frame < 1 s.
- 1B+ MAU, multi-region.
- Real-time-ish personalization (< 1 min).
Capacity
- Uploads: 100M+ videos/day, avg 30 s × ~5 MB encoded = ~500 PB/yr storage with multi-ladder.
- Views: trillions/yr.
Schema highlights
videos(id, author, audio_id, ts, duration, hashtags[], language, region, status)
watch_events(user, video, ts, watch_ms, completion, skipped)
embeddings(user_vec, video_vec) in Faiss/ScaNN
audio_fingerprints(audio_id, hash)
Trade-offs
- Recommendation-first loosens dependency on social graph → cold-start friendlier.
- Heavy ML stack = big infra cost; gains in engagement justify it.
- Watch-time signal rewards retention but can incentivize manipulative content; needs safety brake.
- Region segregation for compliance / latency.
Refs
- ByteDance engineering posts, "Monolith" (TikTok's recommendation system paper),
Facebook's two-tower retrieval paper, FAISS/HNSW papers,
ByteByteGo "Design TikTok", Alex Xu Vol 2.