Reddit / Quora / Stack Overflow — Notes
Functional
- Post / question with title, body, tags, media.
- Threaded comments / answers (deeply nested).
- Voting (up/down or accept).
- Listings: hot, new, top, controversial.
- Per-user feed, notifications, mentions.
- Moderation (community + global).
Non-functional
- Read-heavy 50:1; comment voting bursts.
- p99 listing < 200 ms.
- Eventual consistency on vote counts OK; user must see own vote immediately.
Capacity (Reddit-scale)
- 50M+ DAU, 1B+ posts/comments total.
- ~2B page views/day → ~25k/s avg.
- Storage: text ~2 KB avg × 1B posts = 2 TB + comments many TB.
Schema highlights
posts(id PK, subreddit_id, author, ts, title, body, score, num_comments)
comments(id, post_id, parent_id, path, author, body, score) materialized path
votes(user_id, target_id, value, ts) Cassandra; counters in Redis
subreddits(id, name, type, rules)
Trade-offs
- Materialized-path comments simplify thread queries; cost: rewrite path on move.
- Eventually consistent vote counters allow huge scale; show user-personal vote from local cache.
- Personalized vs algorithm-only: SO uses simple formulas (signals trust); Quora goes full ML.
- Sharding by subreddit localizes hot subs but can hotspot one shard.
Refs
- Reddit "Hot" ranking blog post (Amir Salihefendic).
- Quora engineering posts (Webnode, Quora Search).
- Stack Overflow architecture (Nick Craver's blog series).
- ByteByteGo "Design Reddit".