Capacity Planning — Detailed#

Numbers every developer should know#

Operation	Time
L1 cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns
Mutex lock/unlock	25 ns
Main memory reference	100 ns
Compress 1 KB with Zippy	3,000 ns (3 µs)
Send 1 KB over 1 Gbps net	10,000 ns (10 µs)
Read 4 KB random from SSD	150,000 ns (150 µs)
Read 1 MB sequential from RAM	250,000 ns
Round trip in same DC	500,000 ns (0.5 ms)
Read 1 MB sequential from SSD	1,000,000 ns (1 ms)
HDD seek	10,000,000 ns (10 ms)
Cross-continent round trip	150,000,000 ns (150 ms)

(Memorise the orders of magnitude. Interviewers test the gap between RAM and disk a lot.)

Common throughput envelopes#

Resource	Order of magnitude
One commodity server CPU	~100k simple req/s
10 GbE NIC	~1 GB/s (8 Gbps usable)
NVMe SSD	1–7 GB/s, 1M IOPS
Postgres single primary	5–20k writes/s, much more reads via replicas
Redis single node	100k–1M ops/s
Kafka partition	10 MB/s comfortably
One WS gateway	100k concurrent conns

Little's Law#

L = λ · W

L = concurrent items in system.
λ = arrival rate (req/s).
W = average time in system (seconds).

If p99 = 200 ms and target 1000 RPS, concurrency = 200.

The 5-step back-of-envelope#

flowchart LR
  U[1 - User volume]
  Q[2 - QPS - avg + peak]
  S[3 - Storage / yr]
  B[4 - Bandwidth]
  C[5 - Servers needed]
  U --> Q --> S --> B --> C

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class U client;
    class Q,S,B,C service;

Worked example — design a chat app#

Users: 200M MAU. ~25M DAU (≈ 12.5% of MAU).
Activity: 30 msgs / user / day → 750M msgs/day.
Avg QPS = 750M / 86,400 ≈ 8,700/s.
Peak ≈ 3-5× avg ≈ 35,000/s.
Storage: 1 KB / msg × 750M × 365 = ~270 TB/year. With media + indices, x3 → ~800 TB/yr.
Bandwidth: 35k peak × 1 KB = 35 MB/s ingest. Egress (fan-out 1:1 chat) similar.
Servers: WS gateways for connections. 25M DAU / 100k per box = 250 boxes (with HA + headroom → 350).

Headroom & growth#

Always size to peak, not average.
Keep 30–50% headroom for failover (lose a region or two AZs).
Plan for 1 year of growth; revisit quarterly.

Storage tiers#

Tier	Cost/GB/mo	Latency
RAM	$1+	ns
NVMe SSD	$0.10	µs
Standard cloud SSD	$0.10	ms
Object store (hot)	$0.02	10s of ms
Cold archive (Glacier)	$0.004	minutes to hours

Queueing theory primer#

M/M/1: W = 1 / (μ - λ) where μ = service rate.
Utilisation > 70% → queue grows; > 90% → latency explodes.
Engineer for ≤ 70% steady-state.

Common mistakes#

Confusing bytes vs bits (network speeds).
Forgetting replication factor when sizing storage.
Computing QPS as avg-only; peak is what blows up.
Ignoring the read:write ratio (often 100:1 for content).
Putting everything in one column "DB" — split per workload.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag	Concept	What it is	Page
`HLD`	Pub/Sub & message brokers	topics, consumer groups, delivery semantics	pub-sub-pattern
`HLD`	Leader/follower replication	sync/semi-sync/async replication, failover	replication-leader-follower
`HLD`	Capacity planning	BOE, Little's Law, queueing	capacity-planning
`LLD`	Concurrency primitives	mutex, semaphore, RW lock, atomic, CAS	concurrency-primitives