Skip to content

Search Internals — Simple#

flowchart LR
  Doc[Doc / Item]
  Tok[Tokeniser]
  IDX[Inverted Index]
  Q[Query]
  Score[BM25 / TF-IDF<br/>or vector cosine]
  Top[Top-k]
  Doc --> Tok --> IDX
  Q --> IDX
  IDX --> Score --> Top

  classDef p fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
  classDef s fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
  class Doc,Q,Top p;
  class Tok,IDX,Score s;

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class Doc,Tok,Q,Score,Top service;
    class IDX datastore;

Two families of search: lexical (tokens → inverted index → BM25 / TF-IDF) and vector (embed → ANN index → cosine similarity). Modern systems combine both ("hybrid search").