Skip to content

TikTok — Notes#

Functional#

  • Short-video upload + edit + publish.
  • Personalized "For You" feed.
  • Follow / DM / comments / duets / stitches.
  • Live streams.
  • Music / sound library.

Non-functional#

  • p99 video first frame < 1 s.
  • 1B+ MAU, multi-region.
  • Real-time-ish personalization (< 1 min).

Capacity#

  • Uploads: 100M+ videos/day, avg 30 s × ~5 MB encoded = ~500 PB/yr storage with multi-ladder.
  • Views: trillions/yr.

Schema highlights#

  • videos(id, author, audio_id, ts, duration, hashtags[], language, region, status)
  • watch_events(user, video, ts, watch_ms, completion, skipped)
  • embeddings(user_vec, video_vec) in Faiss/ScaNN
  • audio_fingerprints(audio_id, hash)

Trade-offs#

  • Recommendation-first loosens dependency on social graph → cold-start friendlier.
  • Heavy ML stack = big infra cost; gains in engagement justify it.
  • Watch-time signal rewards retention but can incentivize manipulative content; needs safety brake.
  • Region segregation for compliance / latency.

Refs#

  • ByteDance engineering posts, "Monolith" (TikTok's recommendation system paper), Facebook's two-tower retrieval paper, FAISS/HNSW papers, ByteByteGo "Design TikTok", Alex Xu Vol 2.