Skip to content

Netflix — Detailed#

flowchart TB
  subgraph Devices
    TV([Smart TV / Roku])
    PH([Phone])
    BR([Browser])
    CON([Console])
  end

  subgraph Edge[Open Connect - Netflix CDN]
    OCA[Open Connect Appliances<br/>installed at ISPs]
    OCBackbone[Netflix backbone]
    PEERED[Peered exchange + transit]
  end

  subgraph Control[Control Plane on AWS]
    GW[Zuul API Gateway]
    DISCOVERY[Eureka service discovery]
    META[(Catalog Metadata)]
    ACCT[Account / Auth]
    BILL[Billing]
    PERS[Personalization Service]
    PLAYBACK[Playback API]
    LICENSE[DRM License<br/>Widevine / FairPlay]
    SUB[Subtitles]
    TRACK[Telemetry / QoE]
  end

  subgraph Encode[Content Pipeline]
    INGEST([Studio master ingest])
    CO[Color / mastering tools]
    PER_TITLE[Per-title encoding<br/>complexity-aware]
    LADDER[Adaptive bitrate ladders<br/>H.264, HEVC, VP9, AV1]
    AUDS[Audio: AAC, EAC3, Atmos]
    SUBS[Subtitles, dubs, multi-lang]
    DRM_PACK[DRM packaging / CMAF]
    QC[Automated QC]
  end

  subgraph Microservices[Microservice mesh - 1000s services]
    HOME[Home rows]
    SEARCH[Search]
    EVIDENCE[Artwork personalization]
    RANK([Ranker DNN])
    AB[A/B - Spinnaker rollout]
    HYS[Hystrix circuit breakers]
  end

  subgraph Data
    EVCACHE[(EVCache - memcached fork)]
    CASS[(Cassandra clusters)]
    DYNOMITE[Dynomite - Redis fronted]
    BIGDATA([(S3 + Iceberg + Spark)])
    KEYSTONE[[Keystone - Kafka + Flink]]
  end

  subgraph Chaos
    SIM[Simian Army<br/>Chaos Monkey/Kong/Gorilla]
    GAME[Game days]
  end

  Devices --> OCA
  OCA -. miss .-> OCBackbone --> ORIG[(Origin S3)]
  Devices --> GW
  GW --> ACCT
  GW --> Personalization
  GW --> PLAYBACK
  PLAYBACK --> LICENSE
  PLAYBACK --> OCA
  PLAYBACK --> TRACK --> KEYSTONE --> BIGDATA
  Microservices --> Data
  Encode --> ORIG
  ORIG -. fill nightly .-> OCA
  DISCOVERY -.- Microservices
  Chaos --- Microservices
  RANK --> EVCACHE
  RANK --> CASS
  EVIDENCE --> RANK
  AB --- Microservices

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class TV,PH,BR,CON client;
    class GW edge;
    class OCA,OCBackbone,PEERED,DISCOVERY,ACCT,BILL,PERS,PLAYBACK,LICENSE,SUB,CO,PER_TITLE,LADDER,AUDS,SUBS,DRM_PACK,QC,HOME,SEARCH,EVIDENCE,AB,HYS,SIM,GAME service;
    class META,CASS datastore;
    class EVCACHE,DYNOMITE cache;
    class KEYSTONE queue;
    class INGEST,RANK compute;
    class BIGDATA,ORIG storage;
    class TRACK obs;

Open Connect (Netflix's CDN)#

  • Custom appliances (caches) inside ISPs.
  • ISPs save transit; Netflix saves egress.
  • Catalog warm-filled in off-hours; popular titles pinned.
  • Anycast routing not used; client picks server via control-plane steering.

Per-title encoding#

  • Complexity analysis per scene → distinct bitrate ladders per title.
  • Saves 20-50% bandwidth on average vs fixed ladder.
  • Today: per-shot dynamic optimizer.

Microservices in AWS#

  • Hundreds of services in EC2.
  • EVCache (memcached fork) is the primary cache layer.
  • Cassandra for OLTP; S3 + Iceberg + Spark for batch analytics.
  • Keystone = Kafka + Flink for streaming.

QoE / playback#

  • Adaptive Bitrate switching driven by client buffer + bandwidth estimation.
  • Telemetry continuously feeds dashboards and reco signals.

Chaos engineering#

  • Random instance termination in prod (Chaos Monkey).
  • Region failover game-days (Chaos Kong).

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD Load balancer / GSLB L4/L7 traffic distribution and failover load-balancer
HLD CDN edge caching for static assets cdn
HLD API gateway / BFF single ingress, auth, rate limit, routing api-gateway
HLD Pub/Sub & message brokers topics, consumer groups, delivery semantics pub-sub-pattern
HLD Resilience patterns timeout, retry, breaker, bulkhead, backpressure resilience-patterns
HLD Service mesh sidecar mesh, mTLS, traffic policy service-mesh
HLD Batch & stream processing Lambda vs Kappa, watermarks, windows batch-stream-processing