Skip to content

Zoom / Google Meet — Detailed#

flowchart TB
  subgraph Clients
    APP([Native client])
    WEB([Web - WebRTC])
    PSTN[Dial-in PSTN]
  end

  subgraph Signaling
    SIG[Signaling Service<br/>WS / WebSocket]
    JOIN[Join / Meeting service]
    AUTH[Auth / SSO]
    DIR[Directory / Calendar]
  end

  subgraph NAT[NAT traversal]
    STUN[STUN servers]
    TURN([TURN relay servers])
    ICE[ICE candidate gathering]
  end

  subgraph Media[Media plane]
    SFU[SFU Selective Forwarding Unit]
    MCU[MCU Multipoint Control Unit<br/>for big rooms / mix-down]
    SIM[Simulcast / SVC ladders]
    SR[Server-side recording]
    LIVE[[Live stream out RTMP]]
    BG[Virtual background / blur ML]
    NS[Noise suppression ML]
    JIT[Jitter buffer / FEC / RED]
    BW[Bandwidth estimation - GCC / TWCC]
  end

  subgraph Webinar[Webinar / Large events]
    PRES([Presenter set])
    AUD([Audience receive-only])
    CDN([HLS fallback for huge audiences])
  end

  subgraph Features
    SHARE[Screen share]
    CHAT[In-meeting chat]
    REACT[Reactions]
    POLL[Polls / breakout rooms]
    WB[Whiteboard]
    CAP[Live captions / translation]
    REC[Cloud recording + transcript]
  end

  subgraph Storage
    REC_S3[(Recording S3)]
    TRANS[(Transcripts)]
    META[(Meetings metadata)]
  end

  subgraph Crypto
    E2E[E2E option<br/>MLS / SFrame]
    DTLS[DTLS-SRTP per hop]
  end

  Clients --> AUTH --> SIG --> JOIN
  Clients --> ICE
  ICE --> STUN
  ICE -. relay .-> TURN
  Clients --> SFU
  SFU --> BG
  SFU --> NS
  SFU --> SIM
  SFU --> JIT
  SFU --> BW
  SFU --> SR --> REC_S3
  SR --> TRANS
  Webinar --- SFU
  Webinar --- CDN
  Features --- SFU
  PSTN --> MCU
  Crypto --- Media

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class APP,WEB,PRES,AUD,CDN client;
    class PSTN,SIG,JOIN,AUTH,DIR,STUN,ICE,SFU,MCU,SIM,SR,BG,NS,JIT,BW,SHARE,CHAT,REACT,POLL,WB,CAP,REC,E2E,DTLS service;
    class TRANS,META datastore;
    class LIVE queue;
    class TURN compute;
    class REC_S3 storage;

SFU vs MCU#

  • SFU (Selective Forwarding Unit): forwards each participant's stream to others; client renders mosaic. Cheap CPU; bandwidth scales O(N) per participant.
  • MCU: server mixes streams into one outgoing → fixed bandwidth but heavy CPU and adds latency.
  • Modern stacks default to SFU; MCU used for PSTN bridging or huge webinars.

Simulcast / SVC#

  • Sender encodes 2-3 spatial layers (e.g., 1080p, 720p, 360p).
  • SFU picks per-receiver the right layer.
  • SVC (Scalable Video Coding) sends layered single stream.

NAT traversal#

  1. Each client gathers ICE candidates: host, server-reflexive (STUN), relayed (TURN).
  2. Exchange candidates via signaling.
  3. Try pairs; pick best (lowest latency).
  4. Fallback to TURN if symmetric NAT.

E2E encryption#

  • Per-meeting key, ratcheted via MLS / Sender Keys.
  • SFrame frames encrypted end-to-end while SFU still routes (without decrypting).
  • Trade-off: limits server-side features (recording, captions need plaintext).

Recording / captioning#

  • Server-side: SFU writes raw frames + mix → S3.
  • Captions: real-time ASR; multilingual on stream.

Capacity#

  • Each SFU handles 1k–5k participants depending on machine and bitrates.
  • 1M concurrent meetings → 1000s of SFU nodes globally.

Glossary & fundamentals#

Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.

Tag Concept What it is Page
HLD CDN edge caching for static assets cdn
HLD CAP / PACELC C vs A under partition; L vs C otherwise cap-pacelc
HLD Idempotency & retries safe re-execution, backoff + jitter idempotency-retries
HLD Realtime protocols WS / SSE / polling / gRPC streaming realtime-protocols