Hadoop Distributed File System and Google File System

Google File System (GFS)S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” Symposium on Operating Systems Principles ’03, 2003. is a proprietary distributed file system developed by Google. It is also the inspiration for the Hadoop Distributed File System (HDFS), a distributed file system developed as an Apache project.

Note: The basic design principles are similar for these two systems, with some small differences.

The core requirements of distributed file systems are as follows:

Fault tolerance

The system should continue to function despite any node failures.

Scalability

The system should be able to scale to huge volumes of stored information.

Optimized for batch operations

The system should be optimized for use-cases that involve batch operations, such as applications that perform processing and analysis of huge datasets. This implies that throughput is more important than latency, and most of the files are expected to be mutated by appending data rather than overwriting existing data.

GFS architecture

The following illustration gives an overview of the GFS architecture:

Note: In the original research paper describing GFS, the authors use the term “master”, but we’ll use the term “GFS manager,” “manager node,” or simply “manager” to refer to the same thing.

A GFS cluster consists of a single manager node and multiple chunk server nodes.

  • Chunk server nodes store and serve the data of the files.

  • The manager node maintains the file system metadata, informing clients about which chunk servers store a specific part of a file and performing necessary administration tasks, such as garbage collection of orphaned chunks or data migration during failures.

Note: The HDFS architecture is similar, but the manager node is called Namenode, and the chunkserver nodes are called Datanodes.

Partitioning and replication

Each file is divided into fixed-size chunks, identified by an immutable and globally unique 64-bit chunk handle assigned by the manager during chunk creation.

Chunk servers store chunks on local disks as regular files. The system employs both partitioning and replication:

  • Partitions files across different chunk servers.
  • Replicates each chunk on multiple chunk servers.

The former improves performance and the latter improves availability and data reliability.

Note: The manager node is not involved in the transfer of file data to ensure good performance and scalability.

Network topology

The system considers the network topology of a data center, which usually consists of multiple racks of servers. This has several implications, e.g., bandwidth into or out of a rack may be less than the aggregate bandwidth of all the machines within the rack, and a failure of a single shared resource in a rack (a network switch or power circuit) can essentially bring all the machines of the rack down.

Balancing disk and network bandwidth

When GFS creates a new chunk and places its initially empty replicas, a manager tries to use chunk servers with below-average disk space utilization. It also tries to use chunk servers with low numbers of recent creations since that can reliably predict imminent heavy write traffic. In this way, the manager attempts to balance disk and network bandwidth utilization across the cluster.

Placing replicas

When deciding where to place the replicas, the manager also follows a chunk replica placement policy that is configurable. By default, it will attempt to store two replicas at two different nodes that reside in the same rack while storing the third replica at a node that resides in a separate rack. This process is a trade-off between high network bandwidth and data reliability.

Get hands-on with 1400+ tech skills courses.