Skip to content

Distributed File System (HDFS / GFS) — Simple#

Problem statement (interviewer prompt)

Design a GFS/HDFS-style distributed file system for analytics workloads. Files are large (GB–TB), writes are append-only, reads are bandwidth-bound, and the namespace must survive single-machine failures. Cover the master/datanode split, chunk placement, and recovery.

flowchart LR
  C([Client])
  NN[NameNode<br/>metadata + namespace]
  DN1[(DataNode 1)]
  DN2[(DataNode 2)]
  DN3[(DataNode 3)]
  C -->|open file| NN
  NN -->|chunk locations| C
  C -->|read/write| DN1
  C --> DN2
  C --> DN3
  DN1 <-. replicate .-> DN2

    classDef client fill:#dbeafe,stroke:#1e40af,stroke-width:1px,color:#0f172a;
    classDef edge fill:#cffafe,stroke:#0e7490,stroke-width:1px,color:#0f172a;
    classDef service fill:#fef3c7,stroke:#92400e,stroke-width:1px,color:#0f172a;
    classDef datastore fill:#fee2e2,stroke:#991b1b,stroke-width:1px,color:#0f172a;
    classDef cache fill:#fed7aa,stroke:#9a3412,stroke-width:1px,color:#0f172a;
    classDef queue fill:#ede9fe,stroke:#5b21b6,stroke-width:1px,color:#0f172a;
    classDef compute fill:#d1fae5,stroke:#065f46,stroke-width:1px,color:#0f172a;
    classDef storage fill:#e5e7eb,stroke:#374151,stroke-width:1px,color:#0f172a;
    classDef external fill:#fce7f3,stroke:#9d174d,stroke-width:1px,color:#0f172a;
    classDef obs fill:#f3e8ff,stroke:#6b21a8,stroke-width:1px,color:#0f172a;
    class C client;
    class NN service;
    class DN1,DN2,DN3 datastore;