Intricacies of Paxos

Let's look into the intricacies of Paxos that happen while solving leader election problem and handling partial failures.

The Paxos protocol was considered to be difficult to understand by many people.

One of the reasons for this is the inherent complexity of the consensus problem, which in turn originates from the increased concurrency and large state space of distributed systems.

This lesson will cover some edge cases and how Paxos handles them. Of course, we will not be able to cover all the possible cases since that would be a much bigger undertaking. The examples presented in this lesson will help us understand the basic parts of the protocol and give us a starting point for exploring any other cases we might think of.

For all of the examples presented in this lesson, we will assume that the nodes play all the roles of the protocol, thus being proposers, acceptors, and learners at the same time, to simplify our explanations.

Keep in mind that this is a realistic assumption since many implementations of the Paxos protocol follow this approach.

Paxos solving leader election problem

Paxos can be used to solve the leader election problem.

Paxos itself needs to elect a leader in order to reach consensus, which seems like a catch-22.

The Paxos protocol resolves this paradox, by allowing multiple leaders to be elected, thus not needing to reach consensus for the leader itself.

It still has to guarantee that there will be a single decision, even though multiple nodes might be proposing different values.

Let’s examine how Paxos achieves that and what are some of the consequences.

When a proposer receives a response to a prepare message from a majority of nodes, it considers itself the (temporary) leader and proceeds with a proposal. If no other proposer has attempted to become the leader in the meanwhile, its proposal will be accepted. However, if another proposer managed to become a leader, the accept requests of the initial node will be rejected. This prevents multiple values to be chosen by the proposals of both nodes.

Dueling proposers

The above solution results in a situation, where proposers are continuously dueling each other, thus not making any progress, as we can see in the following illustration.

vNv_N is a value as per the definition of accept (N,v)(N,v) request at Phase 2((b)) of the Paxos protocol in the The Paxos Algorithm lesson.

There are many ways to avoid getting into this infinite loop.

A way to handle dueling proposers

The most basic way is to force the proposers to use random delays or exponential back-off every time they get their accept messages rejected and have to send a new prepare request. In this way, they give more time to the node that is currently leading to complete the protocol, by making a successful proposal, instead of competing.

Paxos handling partial failures

Another interesting aspect of the Paxos protocol is how it handles partial failures gracefully, maintaining safety at all times.

In this context, by partial failures, we refer to cases where a node sends a message to multiple nodes (i.e., accept messages as part of Phase 2.a, and only some of them are delivered either due to node failures or network issues.

Let’s examine an extreme case, where multiple proposers attempt to propose different values, but only one of their accept messages gets delivered to the acceptors of the majority quorum. The following illustration provides a visualization of the execution of the protocol to aid comprehension.

  • Every row represents a different round of the protocol
  • The dashed box shows which nodes were included in the majority quorum of Phase 1.
  • The text inside every node displays any proposal that has already been accepted in the form (nn, vv), where nn is the proposal number, and vv is the value of the proposal.
  • The bold text represents the values that have been accepted in that round.

As we can see, every proposer manages to deliver an accept message to only one acceptor at every round.

For the first three rounds, none of the nodes in the majority quorum have accepted any value, so proposers are free to propose their own value.

In rounds four and five, proposers have to propose the value of the highest-numbered proposal that has been accepted by the acceptors included in Phase one’s majority quorum. This is A for round four and B for round five.

As it’s demonstrated for round six, at this point, the behavior depends partially on the quorum that will be used. For example, if the next proposer selects the yellow quorum, value C will be proposed, while value B will be proposed if the green quorum is used instead.

However, there is one important thing to note: as soon as the system recovers from failures and a proposer manages to get a proposal accepted by a majority quorum, then this value is chosen, and it cannot be changed.

The reason is that any subsequent proposer will need to get a majority quorum for Phase 1 of the protocol. This majority will have to contain at least 1 node from the majority that has accepted the aforementioned proposal, thus transferring the accepted proposal to the prospective leader.

Furthermore, it’s guaranteed this will be the highest-numbered proposal, which means any subsequent proposer can only propagate the chosen value to the acceptors that might not have it yet.

Get hands-on with 1400+ tech skills courses.