Skip to content

Reddit / Quora / Stack Overflow — Detailed#

flowchart TB
  subgraph Clients
    Web
    Mobile
    API
  end

  subgraph Edge
    CDN
    LB
    GW[API Gateway]
  end

  subgraph Write
    POST[Post / Question Service]
    COM[Comment / Answer Service]
    VOTE[Vote Service]
    EDIT[Edit history / versioning]
    FLAG[Flag / Report]
  end

  subgraph Stores
    PDB[[(Posts Postgres / Cassandra<br/>sharded by subreddit / topic)]]
    CDB[(Comments tree per post)]
    VDB[(Votes - Redis counters)]
    USER[(Users)]
    META[(Communities / Tags)]
    HIST[(Edit history)]
    SEARCH[(Elasticsearch)]
  end

  subgraph Read
    LIST[Listings: hot / new / top]
    THREAD[Thread / Question view]
    USERFEED[Home feed]
    NOTIF[[Notifications / Inbox]]
  end

  subgraph Ranking[Ranking Algorithms]
    HOT[Hot - Reddit log decay]
    BEST[Best - Wilson confidence]
    CON[Controversial]
    TOP[Top in window]
    SOFT[SO score - score + age decay]
    QPER[Quora personalized DNN]
  end

  subgraph Async
    K[[Kafka]]
    SC[Score recomputer]
    SPAM([Spam classifier])
    BAD[Bad actor detector]
    EMB([Embeddings job])
    REC([Recommendations])
  end

  subgraph Cache
    R1[(Hot post cache)]
    R2[(Vote count cache)]
    R3[(Comment tree cache)]
  end

  Clients --> CDN --> LB --> GW
  GW --> POST --> PDB
  GW --> COM --> CDB
  GW --> VOTE --> VDB
  POST --> K
  COM --> K
  VOTE --> K
  K --> SC --> R2
  K --> SPAM
  K --> BAD
  K --> EMB --> REC
  POST --> SEARCH
  GW --> LIST --> R1
  R1 -. miss .-> PDB
  LIST --> Ranking
  GW --> THREAD --> R3
  R3 -. miss .-> CDB
  GW --> USERFEED --> REC
  GW --> SRCH[Search] --> SEARCH
  Ranking --> SC
  FLAG --> SPAM

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class GW edge;
    class POST,COM,VOTE,EDIT,FLAG,LIST,THREAD,USERFEED,HOT,BEST,CON,TOP,SOFT,QPER,SC,BAD,SRCH service;
    class PDB,CDB,USER,META,HIST,SEARCH,R1,R2,R3 datastore;
    class VDB cache;
    class NOTIF,K queue;
    class SPAM,EMB,REC compute;

Ranking formulas#

  • Reddit Hot: score = log10(max(|ups - downs|, 1)) + sign × seconds_since_epoch / 45000.
  • Reddit Best: Wilson lower-bound of upvote ratio for comment ranking.
  • Stack Overflow: (upvotes - downvotes) + acceptance + age decay.
  • Quora: personalized DNN over content + viewer features.

Comment tree#

  • Stored materialized-path or adjacency list with path encoded.
  • Paginated with "load more replies" to avoid huge trees.
  • Vote counts cached separately to allow O(1) update.

Subreddit / community#

  • Sharded by subreddit_id — locality benefit (listings & posts together).
  • Hot listings per community precomputed every few seconds.
  • Elasticsearch index per content type (posts, comments).
  • Real-time CDC from Postgres binlog.

Spam / abuse#

  • ML classifier on submit + behavioral signals (vote ring detection).
  • Shadow-ban + report queues.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD CDN edge caching for static assets cdn
HLD API gateway / BFF single ingress, auth, rate limit, routing api-gateway
HLD Sharding horizontal partitioning across nodes database-sharding
HLD Pub/Sub & message brokers topics, consumer groups, delivery semantics pub-sub-pattern
HLD Leader/follower replication sync/semi-sync/async replication, failover replication-leader-follower
HLD Change Data Capture WAL/binlog tailing, outbox publishing change-data-capture