YouTube — Notes
Functional
- Resumable video upload.
- Multi-resolution + codec transcode.
- Global delivery via CDN with ABR.
- Discovery: home, search, related, subscriptions, shorts.
- Engagement: likes, comments, subs, notifications, playlists.
- Live streaming + DVR.
- Ads, Premium (ad-free), monetization for creators.
- Content ID copyright management.
Non-functional
- 2.5B MAU; 500h video / minute uploaded.
- p99 first frame < 2 s globally.
- 99.99% playback availability.
Capacity
- 500h/min uploads × 60 min × 24h = 720k h/day.
- Avg 1 GB/h raw → 720 PB/day raw → many EB/yr.
- Transcoded ladders push that 3-5× to storage; offset by aggressive cold-tiering.
- View traffic: tens of exabytes/year egress.
Schema highlights
videos(id, channel_id, title, desc, ts, lang, ladders[], status)
engagements(video_id, user_id, type, ts) for watch/like
subs(viewer, channel)
embeddings(video_id, vec)
Trade-offs
- Resumable chunked upload mandatory for large/spotty creator uploads.
- VP9 / AV1 saves egress but costs CPU encoding time; tier roll-out.
- Pre-push vs on-demand to edge: predict by signals; cold videos pull on first miss.
- Watch-time objective pushed dwell up but moderation backlash → balance with safety metrics.
Refs
- "Deep Neural Networks for YouTube Recommendations" (Covington et al., RecSys '16).
- "Vitess" architecture (YouTube's MySQL scaler).
- Google's CDN / Edge engineering posts; Open Connect (Netflix) for comparison.
- ByteByteGo "Design YouTube", Alex Xu Vol 2.