Hard to Guarantee Atomicity

Let's check out why atomicity is hard to achieve in general and how it becomes even more difficult in distributed systems.

The second challenging aspect of transactions, and especially distributed transactions, is atomicity.

One of the benefits of grouping operations inside a transaction is the guarantee that either all of them will be performed, or none of them will be. As a result, the application developer does not need to think about scenarios of partial failures, where the transaction fails after some of the operations were already performed.

Similar to the isolation guarantees, the atomicity guarantee makes it easier to develop applications. It delegates some of the complexity around handling these situations to the persistence layer, e.g., to the datastore used by the application that provides atomicity guarantees.

Let’s see why atomicity is hard to achieve in general, and how it is even more difficult in distributed systems.

Guaranteeing atomicity is a hard problem

Guaranteeing atomicity is hard in general, and not only in distributed systems.

The reason is that components can fail unexpectedly regardless of whether they are software or hardware components.

According to Pillai et al.T. S. and C. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-consistent Applications,” Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, 2014., “Even the simple act of writing some bytes to a file requires extra work to ensure it will be performed in an atomic way and the file will not end up in a corrupted state if the disk fails while executing part of the operation”.

A way to achieve atomicity

One common way of achieving atomicity, in this case, is through journalling or write-ahead logging. In this technique, metadata about the operation are first written to a separate file, along with markers that denote whether an operation has been completed or not.

Based on this data, the system can identify which operations were in progress when a failure happened, and drive them to completion either by undoing their effects and aborting them, or by completing the remaining part and committing them. This approach is used extensively in file systems and databases.

Atomicity in a distributed system

The issue of atomicity in a distributed system becomes even more complicated because components (nodes in this context) are separated by the network that is slow and unreliable.

Furthermore, we do not only need to make sure that an operation is performed atomically in a node. In most cases, we need to ensure that an operation is performed atomically across multiple nodes. This means that the operation needs to take effect either at all the nodes or at none of them. This problem is also known as atomic commit.

The lessons, 2PC, 3PC, and the quorum-based commit protocol, will look at how we can achieve atomicity in distributed settings. Algorithms are discussed in chronological order so that we can understand what the pitfalls of each algorithm are, and how subsequent algorithms addressed them.

Get hands-on with 1400+ tech skills courses.