Job / Task Scheduler — Detailed (Airflow / distributed cron / Temporal-style)#
flowchart TB
subgraph Author
DAG[DAG / workflow definition]
GIT[Git repo]
UI([Web UI])
end
subgraph Sched[Scheduler]
PARSE([DAG parser])
PLAN([Run planner<br/>schedules + backfill])
LEAD[Leader election - HA]
LOCK[Distributed lock per DAG run]
TIME([Cron evaluator])
end
subgraph Queue
Q[[Queue per worker pool]]
PRIO[[Priority queues]]
DLQ[Dead-letter]
end
subgraph Workers
W1([Worker / executor])
W2([Worker])
KEXEC[K8s executor]
CEXEC[Celery executor]
SUB[Sub-process / pod per task]
end
subgraph State[State + History]
DB[(Metadata DB)]
LOG[Task logs]
TS([Trigger / event store])
ART[Artifacts]
end
subgraph Reliability
RETRY[Retry policies + backoff]
SLA[SLA miss alerts]
IDEMP[Idempotency for tasks]
CKPT[Checkpoint long tasks]
end
subgraph Trigger
CRON([Cron schedule])
SENS([Sensors / external triggers])
API([On-demand trigger API])
end
Author --> Sched
Sched --> Queue --> Workers
Workers --> State
Reliability --- Sched
Trigger --- Sched
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class UI client;
class DAG,GIT,LEAD,LOCK,DLQ,KEXEC,CEXEC,SUB,ART,RETRY,IDEMP,CKPT service;
class DB datastore;
class Q,PRIO queue;
class PARSE,PLAN,TIME,W1,W2,TS,CRON,SENS,API compute;
class LOG,SLA obs;
Correctness patterns#
- Singleton scheduler: leader election in HA pair to avoid double-runs.
- Idempotent tasks: each run keyed by
(dag, run_id, task_id, attempt). - Workers ack work: re-queue on heartbeat loss; tasks must tolerate at-least-once.
- Backfill = scheduling historical runs after deploy.
Workflow systems vs cron#
- Simple cron: timer + job command.
- Workflow systems (Airflow / Argo / Temporal / Cadence) add DAGs, retries, sensors, observability, durable state.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Pub/Sub & message brokers | topics, consumer groups, delivery semantics | pub-sub-pattern |
HLD |
Raft / Paxos consensus | replicated state machine via majority quorum | consensus-raft-paxos |
HLD |
Idempotency & retries | safe re-execution, backoff + jitter | idempotency-retries |
HLD |
Observability | metrics, logs, traces, SLOs | observability |
HLD |
Event sourcing + CQRS | commands -> events; separate read model | event-sourcing-cqrs |
LLD |
Creational patterns | Singleton, Factory, Builder, Prototype | creational-patterns |
LLD |
Behavioural patterns | Strategy, Observer, State, Command, Chain | behavioral-patterns |