Skip to content

Search Autocomplete — Detailed#

flowchart TB
  subgraph Client[Client]
    BR([Browser / App])
    DBN[Debounce 100ms<br/>cancel on new key]
    CACHE[Client cache<br/>per session]
  end

  subgraph Edge
    GW[API Gateway / CDN edge]
    RL([Rate limit per user])
  end

  subgraph Serve[Suggest Service]
    PARSE[Normalize: lowercase,<br/>diacritics, language]
    SPL[Speller / fuzzy match<br/>edit distance ≤ 2]
    LOOK[Top-k Lookup]
    PERS([Personalize<br/>user history + locale])
    RANK([Ranker])
    AB[A/B exp]
  end

  subgraph Store[Index Store]
    TRIE[(Compressed Trie / DAWG<br/>per-prefix top-k materialized)]
    REDIS[(Redis Sorted Sets<br/>key = prefix, score = freq)]
    ES[(Elasticsearch<br/>edge_ngram fallback)]
  end

  subgraph Build[Offline Build Pipeline]
    QL[[(Query logs - Kafka)]]
    BATCH([Spark / Flink job<br/>n-gram counts, daily])
    DECAY[Time decay + spike detection]
    TRENDS[[Trending topics signal]]
    FILT[Safety / profanity filter]
    DEPLOY[Blue/green index swap]
  end

  subgraph Personal[Personalization]
    HIST([User history KV])
    LOC[Geo / language]
    SES[Session intent]
  end

  BR --> DBN --> CACHE
  CACHE -. miss .-> GW --> RL --> Serve
  Serve --> PARSE --> SPL --> LOOK
  LOOK --> TRIE
  LOOK --> REDIS
  LOOK -. fallback .-> ES
  PERS --> RANK
  HIST --> PERS
  LOC --> PERS
  SES --> PERS
  LOOK --> RANK --> AB --> BR
  BR -.queries+clicks.-> QL
  QL --> BATCH --> DECAY --> FILT --> DEPLOY
  TRENDS --> DEPLOY
  DEPLOY --> TRIE
  DEPLOY --> REDIS

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class BR,RL,PERS,HIST client;
    class GW edge;
    class DBN,PARSE,SPL,LOOK,AB,DECAY,FILT,DEPLOY,LOC,SES service;
    class TRIE,ES,QL datastore;
    class CACHE,REDIS cache;
    class TRENDS queue;
    class RANK,BATCH compute;

Trie design#

  • Each node: char + top-k results (id + score) materialized.
  • Lookup = walk N chars (N ≈ query length), return top-k at terminal node.
  • Update via offline batch (recompute) — cheap reads, daily/hourly refresh.

Latency target#

  • < 100 ms end-to-end (UI feels instant).
  • Debounce ≥ 80 ms in client; cache on client too.

Spell correction#

  • Symspell / BK-tree precomputed; runtime fuzzy match within edit distance ≤ 2.
  • Real-time stream finds spikes (e.g. breaking news) → boost score immediately via Redis overlay.

Index swap#

  • Build new trie / Redis snapshot offline; atomic swap by flipping a pointer / namespace.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD CDN edge caching for static assets cdn
HLD API gateway / BFF single ingress, auth, rate limit, routing api-gateway
HLD Pub/Sub & message brokers topics, consumer groups, delivery semantics pub-sub-pattern
HLD Batch & stream processing Lambda vs Kappa, watermarks, windows batch-stream-processing
LLD Data structures & complexity Big-O, common DS, latency numbers data-structures-complexity
LLD Concurrency primitives mutex, semaphore, RW lock, atomic, CAS concurrency-primitives