Object Storage (S3 / GCS / Azure Blob) — Detailed#
flowchart TB
subgraph Client[Clients]
SDK([SDK / CLI])
BR([Browser pre-signed])
APP[App service]
end
subgraph Edge[Edge]
DNS[Regional endpoint DNS]
LB[L7 LB]
GW[REST Frontend<br/>SigV4 auth, range, multipart]
end
subgraph Control[Control Plane]
AUTH[IAM / SigV4 verify]
POL[Bucket policy / ACL]
LIFECYCLE[Lifecycle rules<br/>tier transition / expiry]
VERSION[Versioning manager]
REPL[Cross-region replication]
EVENTS[[Event notifications<br/>SNS / Lambda]]
end
subgraph Index[Metadata / Index]
KEYS[(Key index<br/>distributed B-tree)]
PART[Partitioning by key prefix]
HOT[Hot-shard splitter]
end
subgraph Data[Data Plane - Storage Nodes]
PLACE[Placement service]
SHARD[Erasure coding<br/>e.g. 10+4 RS]
CHUNK[Chunks 4-64 MB]
NODE1[Storage node 1]
NODE2[Storage node 2]
NODE3[Storage node 3]
NODEN[Storage node N]
SCRUB[Scrubber / repair]
BG[Background rebalance]
end
subgraph Tiering
STANDARD[Standard tier - SSD/HDD mix]
IA[Infrequent access]
GLACIER[Cold / Glacier / Archive<br/>tape, minutes-to-hours retrieval]
end
subgraph Features
MULTI[Multipart upload]
PRES[Pre-signed URLs]
CMK[KMS / SSE / CSE encryption]
LOCK[Object lock / WORM / Legal hold]
LOG[Server access logs]
RR[Requester-pays]
SELECT[S3 Select / pushdown]
end
Client --> DNS --> LB --> GW --> AUTH --> POL
GW --> KEYS
GW --> PLACE --> SHARD --> CHUNK
CHUNK --> NODE1
CHUNK --> NODE2
CHUNK --> NODE3
CHUNK --> NODEN
SCRUB -.repair.-> Data
BG -.move.-> Data
LIFECYCLE -.transition.-> Tiering
VERSION --- KEYS
REPL -.async.-> Data
EVENTS -.fire.-> Client
Features --- GW
classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
class SDK,BR client;
class DNS,LB edge;
class APP,GW,AUTH,LIFECYCLE,VERSION,REPL,PART,HOT,PLACE,SHARD,CHUNK,NODE1,NODE2,NODE3,NODEN,SCRUB,BG,STANDARD,IA,MULTI,PRES,CMK,LOCK,RR service;
class KEYS,GLACIER datastore;
class EVENTS queue;
class POL,SELECT storage;
class LOG obs;
Architecture beats#
- Flat keyspace keyed by
bucket + object_key; no real directories. - Metadata (index) is separate from data (blobs). Index is itself a distributed sorted structure.
- Erasure coding: each object split into chunks; chunks encoded into N+K shards (e.g. Reed-Solomon 10+4) across nodes/racks/AZs.
- Durability of 11×9s comes from EC + cross-AZ placement + scrubbing.
Consistency#
- Modern S3: strong read-after-write across all regions for new objects.
- Historically eventually consistent for overwrites/deletes.
Multipart upload#
- Initiate → upload parts (parallel) → complete (server assembles).
- Allows resumable uploads of TB-scale objects.
Lifecycle & tiering#
- Transition rules:
> 30 dto IA,> 90 dto Glacier,> 365 ddelete. - Restore from cold tiers takes minutes to hours; pay-per-restore.
Glossary & fundamentals#
Concepts referenced in this design. Each row links to its canonical page; the tag column shows whether it is a high-level (HLD) or low-level (LLD) concept.
| Tag | Concept | What it is | Page |
|---|---|---|---|
HLD |
Load balancer / GSLB | L4/L7 traffic distribution and failover | load-balancer |
HLD |
Sharding | horizontal partitioning across nodes | database-sharding |
HLD |
Pub/Sub & message brokers | topics, consumer groups, delivery semantics | pub-sub-pattern |
HLD |
Leader/follower replication | sync/semi-sync/async replication, failover | replication-leader-follower |
HLD |
LSM vs B-Tree engines | WAL, memtable, SSTables, compaction | storage-engines-lsm-btree |
HLD |
Realtime protocols | WS / SSE / polling / gRPC streaming | realtime-protocols |
HLD |
Multi-region & DR | RTO / RPO, active-active, failover | multi-region-dr |
LLD |
REST API design | verbs, statuses, pagination, errors | rest-api-design |