Skip to content

Batch vs Stream Processing — Simple#

flowchart LR
  E[Events]
  B[Batch: Spark / Airflow<br/>periodic jobs]
  S[Stream: Flink / Kafka Streams<br/>continuous]
  DW[(Data warehouse)]
  RT[(Real-time view)]
  E --> B --> DW
  E --> S --> RT

  classDef p fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
  classDef s fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
  classDef r fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
  class E p;
  class B,S s;
  class DW,RT r;

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class E service;
    class DW,RT datastore;
    class S queue;
    class B compute;

Batch processes bounded chunks of data periodically (cheap, easy). Stream processes events one-at-a-time, continuously (fresh, harder). Modern systems use both — historically as Lambda architecture, increasingly as Kappa.