Notification System — Detailed#
flowchart TB
subgraph Producers
APP[App events]
BATCH[Marketing batch]
TXN[Transactional<br/>signup, OTP]
SCHED([Scheduler<br/>reminders])
end
subgraph API[Notification API]
GW[Send API<br/>POST /notify]
TMPL[Template / Personalization<br/>Handlebars / MJML]
DEDUP[Idempotency key]
PRIO[Priority tier<br/>txn / promo / digest]
end
subgraph Queue[Async Bus]
K1[[Kafka: txn topic]]
K2[[Kafka: promo topic]]
K3[[Kafka: digest topic]]
DLQ[[Dead-letter topic]]
end
subgraph Router[Routing]
PREF([User preferences<br/>opt-in / quiet hours])
LIM([Rate limit per user<br/>throttle frequency])
AB[A/B router]
ROUTE[Channel selector<br/>push -> sms -> email fallback]
end
subgraph Channels[Channel Workers]
PW([Push worker])
SW([SMS worker])
EW([Email worker])
IW([In-app worker])
WB([Web Push / WebSocket])
end
subgraph Providers
APNS((Apple APNS))
FCM((Google FCM))
HMS((Huawei HMS))
TW[[Twilio / SNS / MSG91]]
SG((SendGrid / SES / Mailgun))
end
subgraph Devices
iOS
AND[Android]
PH([Phone])
EM[[Email Inbox]]
WEB([Browser])
end
subgraph Storage
PRDB([(User & Device DB<br/>tokens, prefs)])
TPDB[(Templates<br/>versioned)]
HIST[[(History / Inbox<br/>Cassandra)]]
AUD[(Audit log)]
end
subgraph Feedback
REC[Delivery receipts]
BNC[Bounce / unsubscribe]
ENG[Open / click events]
ML[ML send-time optimizer]
end
Producers --> GW --> DEDUP --> TMPL --> PRIO
PRIO --> K1
PRIO --> K2
PRIO --> K3
K1 --> Router
K2 --> Router
K3 --> Router
PREF -.lookup.-> Router
LIM -.lookup.-> Router
Router --> Channels
Channels --> Providers
Providers --> Devices
Providers -.receipts.-> REC
REC --> ENG --> ML --> PRIO
Channels --> HIST
Channels -.failure.-> DLQ
PRDB --- Router
TPDB --- TMPL
Channels --> AUD
BNC -.update prefs.-> PRDB
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class PREF,LIM,WB,PH,WEB client;
class APP,BATCH,TXN,GW,TMPL,DEDUP,PRIO,AB,ROUTE,AND,REC,BNC,ENG,ML service;
class PRDB,TPDB,HIST,AUD datastore;
class K1,K2,K3,DLQ,TW,EM queue;
class SCHED,PW,SW,EW,IW compute;
class APNS,FCM,HMS,SG external;
Delivery guarantees#
- At-least-once with idempotency key (
tenant, event_id). - Receipts close the loop; without them, system can't optimize send-time.
- Quiet hours, frequency capping per user.
Templating#
- Source of truth in template service (versioned, localized).
- Compile templates once; pass user data context.
Scale knobs#
- Separate topics per priority — txn never blocked by marketing batches.
- Per-provider client pools with circuit breakers.
- Backoff & DLQ on provider 4xx/5xx.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Pub/Sub & message brokers | topics, consumer groups, delivery semantics | pub-sub-pattern |
HLD |
Idempotency & retries | safe re-execution, backoff + jitter | idempotency-retries |
HLD |
Resilience patterns | timeout, retry, breaker, bulkhead, backpressure | resilience-patterns |
HLD |
Realtime protocols | WS / SSE / polling / gRPC streaming | realtime-protocols |
LLD |
REST API design | verbs, statuses, pagination, errors | rest-api-design |
LLD |
Async models | futures / async-await / coroutines / actors | async-models |