Creating and Reading Files

Let's understand how clients can create files and read data from files in a distributed file system.

The clients can create and read files from the distributed file system using a GFS client library linked to the application that abstracts some implementation details.

For example, the applications can operate based on byte offsets of files. The client library can translate these byte offsets to the associated chunk index and communicates with the manager to retrieve the chunk handle for the provided chunk index and the location of associated chunk servers. Finally, It contacts the appropriate chunkserver (most likely the closest one) to retrieve the data.

Create operation

The manager node maintains the metadata about the filesystem. As a result, an operation that creates a file needs only to contact the manager node, which creates the file locally.

The manager node uses locking while creating new files to handle the concurrent requests safely. More specifically, a read lock is acquired on the directory name, and a write lock is acquired on the file name.

Read operation

The following illustration displays the workflow for a read operation:

Responsibilities of clients

Clients cache the metadata for chunk locations locally, so they only have to contact the manager for new chunks or when the cache has expired.

During the migration of chunks due to failures, clients organically request fresh data from the manager when they realize the old chunk servers cannot serve the data for the specified chunk anymore.

Note: On the other hand, clients do not cache the actual chunk data since they are expected to stream through huge files and have working sets that are too large to benefit from caching.

Responsibilities of the manager

The manager stores:

  • the file and chunk namespaces
  • the mapping from files to chunks
  • the chunk locations

All metadata is stored in the manager’s memory. The namespaces and the mappings are also kept persistent by logging mutating operations (e.g. file creation, renaming etc.) to an operation log that is stored on the manager’s local disk and replicated on remote machines. This is shown in the following illustration:

The manager node also checkpoints its memory state to the disk when the log grows significantly. As a result, in case of the manager’s failure, the image of the filesystem can be reconstructed by loading the last checkpoint in memory and replaying the operation log from this point forward.

Get hands-on with 1400+ tech skills courses.