Service Mesh — Detailed#
flowchart TB
subgraph DataPlane[Data plane - per pod sidecar]
direction LR
P1([Sidecar A<br/>Envoy / linkerd2-proxy])
P2([Sidecar B])
P3([Sidecar C])
P1 --> P2
P2 --> P3
end
subgraph ControlPlane[Control plane]
XDS[xDS server<br/>route + cluster + listener config]
CA[Cert authority<br/>SPIFFE / SPIRE]
POLICY[Policy engine<br/>OPA / native]
TELEM[Telemetry collector]
end
subgraph Features[Cross-cutting features]
MTLS[mTLS everywhere]
RETRY[Retries + timeouts + circuit breaker]
TRAFFIC[Traffic split / canary / mirroring]
AUTHZ[L7 authz]
OBS[Distributed tracing + metrics]
end
ControlPlane -.config.-> DataPlane
DataPlane --- Features
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class P1 edge;
class P2,P3,XDS,CA,POLICY,MTLS,TRAFFIC,AUTHZ service;
class RETRY datastore;
class TELEM,OBS obs;
Why it exists#
Microservice resilience + security duplicates across every language. A mesh consolidates these concerns into a sidecar (or a per-host agent), uniformly enforced:
- mTLS between every service, certs rotated automatically.
- Retries with budget, timeouts, circuit breakers, outlier detection.
- Traffic shifting for canary / blue-green / A-B.
- Authz by service identity (SPIFFE ID), method, headers.
- Telemetry: golden signals + traces, no app instrumentation needed.
Architecture choices#
| Sidecar (per pod) | Per-host agent | Sidecarless (eBPF) | |
|---|---|---|---|
| Examples | Istio, Linkerd, Consul Connect | early Linkerd 1.x | Cilium Service Mesh |
| Resource cost | +1 container/pod | shared | kernel only |
| Mature | yes | aging | newer |
| Granularity | per app | per host | per socket |
Istio data flow#
sequenceDiagram
participant A as Service A
participant SA as Envoy A
participant SB as Envoy B
participant B as Service B
A->>SA: HTTP/gRPC localhost
SA->>SB: mTLS, retries, tracing
SB->>B: HTTP/gRPC localhost
B-->>SB: response
SB-->>SA: response
SA-->>A: response
Ingress + mesh#
The mesh is for east-west traffic (service↔service). For north-south (internet ↔ service) you still need an ingress / API gateway. Many meshes ship an ingress gateway component that's just another Envoy.
When to use one#
-
30 services and growing.
- Polyglot stack — Java, Go, Python, Node.
- Zero-trust requirement (mTLS + identity-based authz).
- You want canary / traffic shifting without app changes.
When to skip#
- Monolith or a handful of services — library-based resilience (Resilience4j, Polly) is simpler.
- Tight latency budget — the sidecar adds 0.5–2 ms per hop.
Common pitfalls#
- mTLS misconfigurations silently drop traffic — start in permissive mode.
- Resource overhead: 50–200 MB RAM per sidecar at scale.
- Debug visibility: an extra network hop changes how
curl localhostbehaves. - Upgrades: control plane and proxies must move together; canary it.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Load balancer / GSLB | L4/L7 traffic distribution and failover | load-balancer |
HLD |
API gateway / BFF | single ingress, auth, rate limit, routing | api-gateway |
HLD |
Idempotency & retries | safe re-execution, backoff + jitter | idempotency-retries |
HLD |
Resilience patterns | timeout, retry, breaker, bulkhead, backpressure | resilience-patterns |
HLD |
Observability | metrics, logs, traces, SLOs | observability |
HLD |
Service mesh | sidecar mesh, mTLS, traffic policy | service-mesh |
LLD |
Structural patterns | Adapter, Decorator, Facade, Proxy, Composite | structural-patterns |