Distributed File System — Notes#

Goal#

A single namespace usable by data-processing frameworks, with bandwidth-saturating sequential reads/writes and rack-aware placement.

Append-only is great for analytics; bad for OLTP-like workloads → split storage by purpose.
Single master = simpler but bottleneck → federation / Colossus-style separation.
Large blocks = great bandwidth, bad for many tiny files (NameNode RAM pressure).
3× replication vs EC: replication faster recovery + reads; EC saves cost on cold data.