Skip to content

URL Shortener — Notes#

Functional#

  • Shorten arbitrary URL → ≤ 7-char code.
  • Redirect on GET.
  • Optional custom alias.
  • Optional TTL / expiry.
  • Click analytics.

Non-functional#

  • 99.99% availability on redirect.
  • < 100 ms p99 redirect latency.
  • Read:Write ≈ 100:1.

Capacity (back-of-envelope)#

  • 100M new URLs/day → 1,160 writes/s avg, 5k peak.
  • 100:1 reads → 100k reads/s avg, ~500k peak.
  • 5 years × 100M × 500 B ≈ 100 TB storage.
  • Hot working set ~10 GB cache.

ID design math#

  • Base62 with 7 chars: 62^7 ≈ 3.5 × 10^12 codes. Plenty for 100B URLs.
  • Pre-allocate ranges of 100k IDs per app instance via ZooKeeper counter (no per-request coord).

API#

POST /v1/shorten        body={url, alias?, ttl_days?}
GET  /{code}            -> 301 redirect
GET  /v1/{code}         -> { url, created_at, clicks }
DELETE /v1/{code}       (owner only)

Data model#

  • url_map(code PK, long_url, owner, expires_at) — Cassandra / DynamoDB.
  • clicks — ClickHouse (event store).
  • meta — Postgres (users, plans, billing).

Trade-offs#

  • 301 vs 302: 301 cached → fewer hits, worse analytics. Most use 302.
  • Counter-based code is short & sequential (enumerable); use base62 with random salt or hash-based to avoid scraping.
  • Hash-based dedup prevents duplicate shortens of same URL but complicates per-user codes.
  • Eventually consistent stats acceptable; redirect must be strongly consistent (or read-your-write within session).

Refs#

  • bit.ly engineering blog, TinyURL/Bitly architecture talks, ByteByteGo URL shortener video, Grokking SDI.