Skip to content

Probabilistic Data Structures — Simple#

Problem statement (interviewer prompt)

You need to know membership ("have we seen this URL?"), frequency ("top hashtags right now"), and cardinality ("unique users today") on streams of billions of events with only megabytes of memory. Pick the right probabilistic structure for each and explain the trade-offs.

flowchart LR
  K[Key / event]
  BF[Bloom Filter<br/>membership]
  CMS[Count-Min Sketch<br/>frequency]
  HLL[HyperLogLog<br/>cardinality]
  TD[t-digest / KLL<br/>quantiles]
  CK[Cuckoo Filter<br/>membership + delete]
  K --> BF
  K --> CMS
  K --> HLL
  K --> TD
  K --> CK

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class K,BF,CMS,HLL,TD,CK service;

Trade exact answers for huge memory savings + O(1) updates.