Functional
- Post tweet (text/media, ≤ 280 chars).
- Follow / unfollow.
- Home timeline (chronological + For-You).
- Search, hashtag/trend, mentions.
- Like, retweet, reply, DM.
- Notifications.
Non-functional
- 500M tweets/day, 5B reads/day.
- p99 home timeline open < 200 ms.
- 99.99% availability.
Capacity
- 500M tweets/day → 5,800/s avg, 50k/s peak.
- 5B reads/day → 58k/s avg, 500k/s peak.
- Average follower count ~200; fan-out write = 500M × 200 = 100B/day = 1M/s.
- Tweet storage: 500M × 365 × 1 KB = 180 TB/yr text + media in S3.
Schema
tweets(id PK, author_id, text, created_at, lang, media_ids[])
follow(follower_id, followee_id, created_at) (sharded by follower_id and by followee_id — both directions).
home_timeline(user_id, [(ts, tweet_id), ...]) — Redis ZSET, capped 800.
likes(tweet_id, user_id); counts in Redis.
Trade-offs
- Push wins for read latency but explodes for celebs.
- Pull-only simpler but adds latency and load to author shards.
- Hybrid is the production answer; merger adds complexity.
- Ranked feed (ML) drives engagement but at compute cost and feedback-loop hazards.
Refs
- "The infrastructure behind Twitter" engineering blog, FlockDB paper,
Manhattan KV blog, Earlybird real-time search paper,
ByteByteGo "Design Twitter", Alex Xu Vol 2.