The Paxos Algorithm

In this lesson, we will examine how the Paxos algorithm solves the consensus problem.

Some algorithms could arguably be applied as solutions to the consensus problem.

For instance, the 2-phase commit protocol could be used, where the coordinator would drive the voting process.

However, such a protocol would have very limited fault tolerance, since the failure of a single node (the coordinator) could bring the whole system to a halt.

The obvious next step is to allow multiple nodes to inherit the role of the coordinator in these failure cases. This would then mean that there might be multiple primaries that might produce conflicting results.

This phenomenon is demonstrated in the lesson multi-primary replication and the 3-phase commit lesson.

One of the first algorithms that could solve the consensus problem safely under these failures is the Paxos algorithm.

Story of the Paxos algorithm

This algorithm guarantees that the system will come to an agreement on a single value and tolerate the failure of any number of nodes (potentially all of them) as long as more than half the nodes work properly at any time, which is a significant improvement.

Funnily enough, this algorithm was invented by Leslie Lamport during his attempt to prove that this is actually impossible!

He decided to explain the algorithm in terms of a parliamentary procedure used in an ancient, fictional Greek island called Paxos.

Despite being elegant and highly entertaining, this first paperL. Lamport, “The Part-time Parliament,” ACM Transactions on Computer Systems (TOCS), 1998. was not well received by the academic community, who found it extremely complicated and could not discern its applicability in the field of distributed systems.

A few years later and after several successful attempts to use the algorithm in real-life systems, Lamport decided to publish a second paperL. Lamport, “Paxos Made Simple,” ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001), 2001., explaining the algorithm in simpler terms and demonstrating how it can be used to build an actual, highly available distributed system.

A historical residue of all this is the fact that the Paxos algorithm is regarded as a rather complicated algorithm until today. Hopefully, this section will help dispel this myth.

Roles

The Paxos algorithm defines three different roles:

  • Proposers
  • Acceptors
  • Learners

Every node in the system can potentially play multiple roles.

Proposer

A proposer is responsible for proposing values (potentially received from clients’ requests) to the acceptors and trying to persuade them to accept their value to arrive at a common decision.

Acceptor

An acceptor is responsible for receiving these proposals and replying with their decision on whether this value can be chosen or not.

Learners

The learners are responsible for learning the outcome of the consensus, storing it (in a replicated way) and potentially acting on it, by either notifying clients about the result or performing actions.

The following illustration contains a visual overview of these roles and how they interact with the clients.

Phases

The Paxos algorithm is split into two phases, each of which contains two parts:

Phase 1 (a)

A proposer selects a number NN and sends a prepare request with this number prepare(NN) to at least a majority of the acceptors.

NN is the round Identifier, which has two interesting properties:

  1. The round identifier has to be bigger than any previous round identifier used by any other proposer in our Paxos cluster. This is achieved by incrementing a counter i++.
  2. Make round identifiers unique because we never want two proposers to come up with the same round identifier and reuse it. Reusing identifiers would break the protocol. To achieve this we append node_number to the counter.

We will use a function cat(i++, node_number) that appends node_number to the counter i++.

Phase 1 (b)

When receiving a prepare request, an acceptor has the following options:

  • If it has not already responded to another prepare request of a higher number NN, it responds to the request with a promise not to accept any more proposals that are numbered less than NN. It also returns the highest-numbered proposal it has accepted, if any (Note: the definition of a proposal follows).
  • Otherwise, if it has already accepted a prepare request with a higher number, it rejects this prepare request. This ideally gives a hint to the proposer about the number of the other prepare request it has already responded to.

Phase 2 (a)

If the proposer receives a response to its prepare(NN) requests from a majority of acceptors, then it sends an accept(NN, vv) request to these acceptors for a proposal numbered NN with a value vv. The value is selected according to the following logic:

  • If any of the acceptors had already accepted another proposal and included that in its response, then the proposer uses the value of the highest-numbered proposal among these responses. Essentially, this means that the proposer attempts to bring the latest proposal to conclusion.
  • Otherwise, if none of the acceptors had accepted any other proposal, then the proposer is free to select any desired value. This value is usually selected based on the clients’ requests.

Phase 2 (b)

If the acceptor receives an accept(NN, vv) request for a proposal numbered NN, it accepts the proposal, unless it has already responded to a prepare(kk) request of a higher number (k>Nk > N).

Furthermore, as the acceptors accept proposals, they also announce their acceptance to the learners. When a learner receives an acceptance from a majority of acceptors, it knows that a value has been chosen. This is the most basic version of the Paxos protocol.

Nodes can play multiple roles for practical reasons and this is usually the case in real-life systems.

Example

As an example, we can observe that the proposers can play the role of learners as well, since they receive some of these accept responses anyway, minimize traffic, and improve the performance of the system.

During Phase 1 (a) of the protocol, the proposers have to select a proposal number NN. These numbers must be unique for the protocol to maintain its correctness properties. This is so that acceptors are always able to compare two prepare messages.

This can be achieved in several ways, but the easiest one is to compose these numbers out of two parts, the one being an integer and the second one being a unique identifier of the proposer (i.e., the IP address of the node). In this way, proposers can draw numbers from the same set.

As we have insinuated at the beginning of this section, multiple proposers can initiate concurrent prepare requests. The proposer that receives a response to its prepare request from a majority of acceptors is essentially elected as the current (but temporary) leader.

As a result, it can proceed with making a proposal request. The value of this proposal will be the chosen one unless a majority of acceptors have failed (and did not reply to the proposal) or another leader stepped up, becoming the temporary leader in the meanwhile (in which case, the acceptors will reject this proposal).

Basic ingredient of the Paxos protocol

The basic ingredient of the Paxos protocol is a concept we have already seen, namely the quorum. More specifically, the Paxos protocol makes use of majority quorums.

A majority quorum consists of more than half of the nodes of the system, such as at least k+1k+1 nodes in a system of 2k2k nodes.

This protocol guarantees that there can’t be two different proposers that complete both phases of the protocol concurrently because proposers require a majority quorum to proceed with a proposal. As a result, only a single value can be chosen, satisfying the agreement property of consensus.

Create a free account to view this lesson.

Continue your learning journey with a 14-day free trial.

By signing up, you agree to Educative's Terms of Service and Privacy Policy