3-Phase Commit (3PC)
In this lesson, we will look into how the 3-phase commit protocol solves the problem of the 2-phase commit protocol.
The problem with 2-phase commit protocol
As we described previously, the main bottleneck of the 2-phase commit protocol was failures of the coordinator leading the system to a blocked state.
Ideally, we would like the participants to be able to take the lead in some way and continue the execution of the protocol in this case, but this is not so easy.
The underlying reason is the fact that in the commit phase, the participants are not aware of the state of the other participants—only the coordinator is So, taking the lead without waiting for the coordinator can result in breaking the atomicity property.
For instance, imagine the following scenario: in the commit phase of the protocol, the coordinator manages to send a commit (or abort) message to one of the participants but then fails, and this participant also fails. If one of the other participants takes the lead, it will only be able to query the live participants. So, it will be unable to make the right decision without waiting for the failed participant (or the coordinator) to recover.
Tackling the 2PC problem with 3-phase commit protocol
The 2-phase commit problem could be tackled by splitting the first round (voting phase) into 2 sub-rounds, where the coordinator first communicates the votes result to the nodes, waits for an acknowledgment, and then proceeds with the commit or abort message.
In this case, the participants would know the result from the votes and complete the protocol independently in case of a coordinator failure. This is essentially the
Wikipedia contains a detailed description of the various stages of the protocol and a nice visual demonstration. Feel free to refer to this resource for additional study on the protocol.
Benefit of 3PC
The main benefit of this protocol is that the coordinator stops being a single point of failure.
In case of a coordinator failure, the participants are able to take over and complete the protocol.
A participant taking over can commit the transaction if it receives a prepare-to-commit, knowing that all the participants have voted “Yes”. If it does not receive a prepare-to-commit, it can abort the transaction, knowing that no participant has committed, without all the participants receiving a prepare-to-commit message first.
As a result, the 3PC protocol increases availability and prevents the coordinator from being a single point of failure.
However, this comes at the cost of correctness, since the protocol is vulnerable to failures such as network partitions.
Network partition failure in 3PC
An example of such a failure case is shown in the following illustration.
In this example, a network partition occurs at a point where the coordinator manages to send a prepare-to-commit message only to some participants. Meanwhile, the coordinator fails right after this point, so the participants time out and have to complete the protocol on their own.
In this case, one side of the partition has participants that receive a prepare-to-commit and continue with committing the transaction. However, the participants at the other side of the partition do not receive a prepare-to-commit message and, thus, unilaterally abort the transaction.
This can seem like a failure case that is very unlikely to happen. However, the consequences are disastrous if it happens, since the system is at an inconsistent state after the network partition is fixed. The atomicity property of the transaction has been violated.
Conclusion
The 3PC protocol satisfies the liveness property that ensures it will always make progress, at the cost of violating the safety property of atomicity.
Get hands-on with 1400+ tech skills courses.