Writing and Deleting Files

Learn how clients write and delete data on files in a distributed file system.

The clients can write and delete files from the distributed file system by using a GFS client library linked into the application that abstracts some implementation details.

For example, the applications can operate based on byte offsets of files. The client library can translate these byte offsets to the associated chunk index and communicate with the manager to retrieve the chunk handle for the provided chunk index as well as the location of associated chunk servers. Finally, it contacts the appropriate chunk server (most likely the closest one) to retrieve the data.

Write operations

Following are some write operations that clients can perform in a GFS system:

Atomic and linearizable file namespace mutations

File namespace mutations are atomic and linearizable. This is achieved by executing this operation in a single node, the manager node. The operation log defines a total global order for these operations. The manager node also uses read-write locks on the associated namespace nodes to perform proper serialization on any concurrent writes.

Concurrency support

GFS supports multiple concurrent writers for a single file. The following illustration illustrates how this works:

1. Identifying chunk servers

The client communicates with the manager node to identify the chunk servers that contain the relevant chunks.

2. Pushing data to all replicas

The client starts pushing the data to all the replicas using some form of chain replication.

The chunk servers are put in a chain depending on the network topology and data is pushed linearly along the chain.

For instance, the client pushes the data to the first chunk server in the chain, which pushes the data to the second chunk server. It helps fully utilize each machine’s network bandwidth avoiding bottlenecks in a single node.

The manager grants a lease for each chunk to one of the chunk servers, which is nominated as the primary replica, responsible for serializing all the mutations on this chunk.

3. Writing data to all replicas

After all the data is pushed to the chunk servers, the client sends a write request to the primary replica, identifying the data pushed earlier.

The primary assigns consecutive serial numbers to all the mutations, applies them locally, and then forwards the write request to all secondary replicas, which apply the mutations within the same serial number imposed by the primary.

4. Acknowledge the write to the client

After the secondary replicas have acknowledged the write to the primary replica, then the primary replica can acknowledge the write to the client.

Delete operations

Delete operations are also executed initially at the manager node only. The manager node makes use of the same locking scheme as when creating a file. Instead of completely removing the file, the manager node moves it into a hidden namespace where it can still be accessed (and undeleted) until it’s been permanently deleted. After a specific period of time, if the file still remains in this namespace, it is permanently removed, and all the associated references to chunks, etc. The actual content of the chunks is removed by the chunk servers lazily later on through a process of garbage collection.

Partial failures in write workflow

The write workflow described above is vulnerable to partial failures.

Think about the scenario where the primary replica crashes in the middle of performing a write. After the lease expires, a secondary replica can request the lease and start imposing a new serial number that might disagree with the writes of other replicas in the past. As a result, a write might be persisted only in some replicas or it might be persisted in different orders in different replicas.

Note: GFS provides a custom consistency model for write operations which we will discuss in the next lesson.

Get hands-on with 1400+ tech skills courses.