A/B Testing Platform — Detailed#
flowchart TB
subgraph Author[Experiment authoring]
UI([Web UI / yaml definition])
METR[Metric registry]
GUARD[Guardrail metrics]
REVIEW[Review + approval]
end
subgraph Cfg[Config plane]
CFG[(Experiment registry)]
GIT[Git source of truth]
PUB([Push to edge / SDK])
end
subgraph Assign[Assignment]
SDK([Client SDK / server])
HASH[Hash user_id + exp -> bucket]
OVR[Force overrides]
EXCL[Mutual exclusion groups]
CONS[Sticky consistent assignment]
end
subgraph Events
EXP_EVT[Exposure events]
METR_EVT[Metric events]
KAFKA[[Kafka]]
LAKE[Data lake]
end
subgraph Analysis
JOB[Daily / streaming jobs]
CUPED[CUPED variance reduction]
SEQ[Sequential testing]
P[p-values + effects]
DASH[Dashboard]
PEEK[Peeking protection]
end
subgraph Safety
GUARDR[Guardrail breaches alert]
KILL[Kill switch]
SRM[Sample ratio mismatch detector]
end
Author --> Cfg --> Assign
Assign --> EXP_EVT --> KAFKA --> LAKE --> Analysis
Safety --- Cfg
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class UI,PUB,SDK client;
class REVIEW,GIT,OVR,EXCL,CONS,EXP_EVT,LAKE,JOB,CUPED,SEQ,P,PEEK,KILL,SRM service;
class CFG datastore;
class KAFKA queue;
class HASH storage;
class METR,GUARD,METR_EVT,DASH,GUARDR obs;
Assignment#
- Deterministic hash of
(user_id, experiment_id)→ bucket. - Mutually exclusive experiments share a salt to avoid overlap.
- Sticky across sessions; new users get sticky cookie.
Stats#
- SRM detector flags broken randomization.
- CUPED reduces variance using pre-experiment metrics.
- Sequential testing or always-valid p-values avoid early peeking inflation.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Pub/Sub & message brokers | topics, consumer groups, delivery semantics | pub-sub-pattern |
HLD |
Observability | metrics, logs, traces, SLOs | observability |