A/B Testing Platform — Detailed#

flowchart TB
  subgraph Author[Experiment authoring]
    UI([Web UI / yaml definition])
    METR[Metric registry]
    GUARD[Guardrail metrics]
    REVIEW[Review + approval]
  end

  subgraph Cfg[Config plane]
    CFG[(Experiment registry)]
    GIT[Git source of truth]
    PUB([Push to edge / SDK])
  end

  subgraph Assign[Assignment]
    SDK([Client SDK / server])
    HASH[Hash user_id + exp -> bucket]
    OVR[Force overrides]
    EXCL[Mutual exclusion groups]
    CONS[Sticky consistent assignment]
  end

  subgraph Events
    EXP_EVT[Exposure events]
    METR_EVT[Metric events]
    KAFKA[[Kafka]]
    LAKE[Data lake]
  end

  subgraph Analysis
    JOB[Daily / streaming jobs]
    CUPED[CUPED variance reduction]
    SEQ[Sequential testing]
    P[p-values + effects]
    DASH[Dashboard]
    PEEK[Peeking protection]
  end

  subgraph Safety
    GUARDR[Guardrail breaches alert]
    KILL[Kill switch]
    SRM[Sample ratio mismatch detector]
  end

  Author --> Cfg --> Assign
  Assign --> EXP_EVT --> KAFKA --> LAKE --> Analysis
  Safety --- Cfg

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class UI,PUB,SDK client;
    class REVIEW,GIT,OVR,EXCL,CONS,EXP_EVT,LAKE,JOB,CUPED,SEQ,P,PEEK,KILL,SRM service;
    class CFG datastore;
    class KAFKA queue;
    class HASH storage;
    class METR,GUARD,METR_EVT,DASH,GUARDR obs;

Assignment#

Deterministic hash of (user_id, experiment_id) → bucket.
Mutually exclusive experiments share a salt to avoid overlap.
Sticky across sessions; new users get sticky cookie.

Stats#

SRM detector flags broken randomization.
CUPED reduces variance using pre-experiment metrics.
Sequential testing or always-valid p-values avoid early peeking inflation.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag	Concept	What it is	Page
`HLD`	Pub/Sub & message brokers	topics, consumer groups, delivery semantics	pub-sub-pattern
`HLD`	Observability	metrics, logs, traces, SLOs	observability