Introduction to Distributed Transactions
Get an introduction to distributed transactions.
One of the most common problems faced when moving from a centralized to a distributed system is performing operations across multiple nodes in an atomic way. We call this a distributed transaction.
In the next three chapters, we explore all the complexities involved in performing a distributed transaction, and examine several available solutions for each one as well as their pitfalls.
Before diving into the available solutions, let’s first learn about transactions and their properties, and what distinguishes distributed transactions from them.
Transaction
A transaction is a unit of work performed in a database system that represents a change potentially composed of multiple operations.
Database transactions are an abstraction invented to simplify engineers’ work and relieve them of dealing with all the possible failures that the inherent unreliability of hardware introduces.
Guarantees provided by database transactions
As we have learned, the acronym ACID sums up the major guarantees that database transactions provide. As a reminder, ACID stands for the following:
- Atomicity
- Consistency
- Isolation
- Durability
Each transaction TX comprises multiple operations (op1, op2, op3, …), and multiple transactions (TX1, TX2, etc.,) are executed simultaneously in a database.
Atomicity
Atomicity is the property that guarantees that either all of the operations of a transaction completed successfully or none of them take effect.
In other words, there are no situations where the transaction “partially fails,” by performing only some of the operations.
Consistency
Consistency ensures that a transaction transitions the database from a valid state to another valid state, while maintaining all the invariants that the application defines.
As an example, a financial application defines an invariant that states that the balance of every account should always be positive. The database then ensures that this invariant is maintained at all times while executing transactions.
Isolation
Isolation guarantees that transactions are executed concurrently, without interfering with each other.
Durability
Durability guarantees that once a transaction is committed, it remains committed even in the case of a system failure (i.e., power outage).
All these properties transfer a set of responsibilities from the application to the database, simplify the development of applications, and reduce any potential errors due to software bugs or hardware failures.
What ACID means in an application
Atomicity implies that our application does not have to take care of all possible failures, but has the conditional logic to bring the database back to a consistent state in the case of a partial failure by rolling back operations that should not have occured.
Consistency allows us to state the invariants of our application in a declarative way and remove redundant code from our application and allow the database to perform these checks whenever necessary.
Isolation allows our applications to leverage concurrency and serve multiple requests by executing transactions in parallel, with the certainty that the database prevents any bugs due to the concurrent execution.
Durability guarantees that when the database declares a transaction committed, it is a final declaration that cannot be reverted. This relieves our application of complicated logic again.
These aspects of consistency and durability do not require special treatment in distributed systems and are relatively straightforward, so there is no separate analysis for them in this course.
Achieving consistency and durability
Databases provide many different mechanisms that maintain consistency. This includes constraints, cascades, triggers, etc. The application is responsible for defining any constraints through these mechanisms. Meanwhile, the database is responsible for checking these conditions while executing transactions and aborting any transactions that violate them.
Durability is guaranteed by persisting transactions at non-volatile storage when they commit.
In distributed systems, this might be a bit more nuanced. This is because the system should ensure that it stores the results of a transaction in more than one node so that the system keeps functioning if a single node fails. In fact, this is reasonable, because availability is one of the main benefits of a distributed system. We can achieve this via replication.
A database transaction is quite a powerful abstraction that greatly simplifies how applications are built. Considering the inherent complexity in distributed systems, we can easily deduce that transactional semantics are even more useful in them.
Distributed transaction
A distributed transaction is a transaction that takes place in two or more different nodes.
There are two slightly different variants of distributed transactions.
-
The first variant is one where the same piece of data needs to be updated in multiple replicas. This is the case where the whole database is essentially duplicated in multiple nodes, and a transaction needs to update all of them in an atomic way.
-
The second variant is one where different pieces of data that reside in different nodes need to be updated atomically. For instance, a financial application may use a partitioned database for the accounts of customers, where the balance of user A resides in node n1. In contrast, the balance of user B resides in node n2, and we want to transfer some money from user A to user B. We need to do this in an atomic way so that data is not lost (i.e., removed from user A, but not added in user B, because the transaction fails midway).
The second variant is the most common use of distributed transactions, while primary-backup synchronous replication mostly tackles the first variant.
Atomicity and isolation in distributed transactions
The aspects of atomicity and isolation are significantly more complex and require us to consider more things in the context of distributed transactions.
For instance, partial failures make it much harder to guarantee atomicity. Meanwhile, the concurrency and network asynchrony present in distributed systems make it challenging to preserve isolation between transactions running in different nodes.
Furthermore, atomicity and isolation have far-reaching implications for the performance and the availability of a distributed system, as we see later in this course.
This course contains dedicated sections for atomicity and isolation, that analyze their characteristics and some techniques.
Note that some of these techniques are used internally by database systems, and we might want to use them from our application to store and access data. This means that everything may be hidden from us behind a high-level interface. This interface lets us build an application without having to know how the database provides all these capabilities under the hood.
However, it is still useful to understand these techniques well, so we can make better decisions about which database systems to use. There can also be cases where we need to use some of these techniques directly at the application level to achieve properties that the database system doesn’t provide.
Get hands-on with 1400+ tech skills courses.