Reddit / Quora / Stack Overflow — Detailed#
flowchart TB
subgraph Clients
Web
Mobile
API
end
subgraph Edge
CDN
LB
GW[API Gateway]
end
subgraph Write
POST[Post / Question Service]
COM[Comment / Answer Service]
VOTE[Vote Service]
EDIT[Edit history / versioning]
FLAG[Flag / Report]
end
subgraph Stores
PDB[[(Posts Postgres / Cassandra<br/>sharded by subreddit / topic)]]
CDB[(Comments tree per post)]
VDB[(Votes - Redis counters)]
USER[(Users)]
META[(Communities / Tags)]
HIST[(Edit history)]
SEARCH[(Elasticsearch)]
end
subgraph Read
LIST[Listings: hot / new / top]
THREAD[Thread / Question view]
USERFEED[Home feed]
NOTIF[[Notifications / Inbox]]
end
subgraph Ranking[Ranking Algorithms]
HOT[Hot - Reddit log decay]
BEST[Best - Wilson confidence]
CON[Controversial]
TOP[Top in window]
SOFT[SO score - score + age decay]
QPER[Quora personalized DNN]
end
subgraph Async
K[[Kafka]]
SC[Score recomputer]
SPAM([Spam classifier])
BAD[Bad actor detector]
EMB([Embeddings job])
REC([Recommendations])
end
subgraph Cache
R1[(Hot post cache)]
R2[(Vote count cache)]
R3[(Comment tree cache)]
end
Clients --> CDN --> LB --> GW
GW --> POST --> PDB
GW --> COM --> CDB
GW --> VOTE --> VDB
POST --> K
COM --> K
VOTE --> K
K --> SC --> R2
K --> SPAM
K --> BAD
K --> EMB --> REC
POST --> SEARCH
GW --> LIST --> R1
R1 -. miss .-> PDB
LIST --> Ranking
GW --> THREAD --> R3
R3 -. miss .-> CDB
GW --> USERFEED --> REC
GW --> SRCH[Search] --> SEARCH
Ranking --> SC
FLAG --> SPAM
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class GW edge;
class POST,COM,VOTE,EDIT,FLAG,LIST,THREAD,USERFEED,HOT,BEST,CON,TOP,SOFT,QPER,SC,BAD,SRCH service;
class PDB,CDB,USER,META,HIST,SEARCH,R1,R2,R3 datastore;
class VDB cache;
class NOTIF,K queue;
class SPAM,EMB,REC compute;
Ranking formulas#
- Reddit Hot:
score = log10(max(|ups - downs|, 1)) + sign × seconds_since_epoch / 45000. - Reddit Best: Wilson lower-bound of upvote ratio for comment ranking.
- Stack Overflow:
(upvotes - downvotes) + acceptance + age decay. - Quora: personalized DNN over content + viewer features.
Comment tree#
- Stored materialized-path or adjacency list with path encoded.
- Paginated with "load more replies" to avoid huge trees.
- Vote counts cached separately to allow O(1) update.
Subreddit / community#
- Sharded by
subreddit_id— locality benefit (listings & posts together). - Hot listings per community precomputed every few seconds.
Search#
- Elasticsearch index per content type (posts, comments).
- Real-time CDC from Postgres binlog.
Spam / abuse#
- ML classifier on submit + behavioral signals (vote ring detection).
- Shadow-ban + report queues.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
CDN | edge caching for static assets | cdn |
HLD |
API gateway / BFF | single ingress, auth, rate limit, routing | api-gateway |
HLD |
Sharding | horizontal partitioning across nodes | database-sharding |
HLD |
Pub/Sub & message brokers | topics, consumer groups, delivery semantics | pub-sub-pattern |
HLD |
Leader/follower replication | sync/semi-sync/async replication, failover | replication-leader-follower |
HLD |
Change Data Capture | WAL/binlog tailing, outbox publishing | change-data-capture |