URL Shortener — Notes
Functional
- Shorten arbitrary URL → ≤ 7-char code.
- Redirect on GET.
- Optional custom alias.
- Optional TTL / expiry.
- Click analytics.
Non-functional
- 99.99% availability on redirect.
- < 100 ms p99 redirect latency.
- Read:Write ≈ 100:1.
Capacity (back-of-envelope)
- 100M new URLs/day → 1,160 writes/s avg, 5k peak.
- 100:1 reads → 100k reads/s avg, ~500k peak.
- 5 years × 100M × 500 B ≈ 100 TB storage.
- Hot working set ~10 GB cache.
ID design math
- Base62 with 7 chars: 62^7 ≈ 3.5 × 10^12 codes. Plenty for 100B URLs.
- Pre-allocate ranges of 100k IDs per app instance via ZooKeeper counter (no per-request coord).
API
POST /v1/shorten body={url, alias?, ttl_days?}
GET /{code} -> 301 redirect
GET /v1/{code} -> { url, created_at, clicks }
DELETE /v1/{code} (owner only)
Data model
url_map(code PK, long_url, owner, expires_at) — Cassandra / DynamoDB.
clicks — ClickHouse (event store).
meta — Postgres (users, plans, billing).
Trade-offs
- 301 vs 302: 301 cached → fewer hits, worse analytics. Most use 302.
- Counter-based code is short & sequential (enumerable); use base62 with random salt or hash-based to avoid scraping.
- Hash-based dedup prevents duplicate shortens of same URL but complicates per-user codes.
- Eventually consistent stats acceptable; redirect must be strongly consistent (or read-your-write within session).
Refs
- bit.ly engineering blog, TinyURL/Bitly architecture talks,
ByteByteGo URL shortener video, Grokking SDI.