As we have seen so far, the Paxos protocol is well-specified. However, there are some small details and optimizations that the original paper could not cover. Some of these topics were covered in subsequent papersT. D. Chandra, R. Griesemer, and J. Redstone, “Paxos Made Live: An Engineering Perspective,” Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing, 2007.. This lesson will cover some of these topics as briefly as possible.

Towards running multiple instances of Paxos

The basic Paxos protocol describes how a distributed system of multiple nodes can decide on a single value.

However, just choosing a single value would have limited practical applications on its own.

To build more useful systems, we need to be able to continuously select values.

This can be achieved by running multiple instances of Paxos, where an instance is an execution of the protocol that leads to a decision on a single value. These instances can run independently and in parallel, but they also have to be numbered.

Depending on the functionality needed, there can be several rules applied, such as not returning the result of an instance to the client, unless all the previous instances have been completed as well.

We will elaborate more on this topic in the next lesson.

Paxos returning current state of the system

Another common need is the ability to query the current state of the system.

Of course, the clients of the system learn the chosen values, so they could keep track of the state on their side. But, there will always be cases, where some clients need to retrieve some of the values chosen in the past, i.e. because they are clients that have just been brought into operation.

Read operation

Paxos should also support read operations that return the decisions of previously completed instances alongside write operations that start new instances of the protocol.

These read operations have to be routed to the current leader of the system, which is essentially the node that completed successfully the last proposal.

It’s important to note that this node cannot reply to the client using its local copy.

The reason for this is that another node might have done a proposal in the meanwhile (becoming the new leader), thus meaning that the reply will not reflect the latest state of the system.

This would mean that the read/write consensus operations would not be linearizable. Note that in the context of consensus, operations such as proposals are considered single-object operations. As a result, there is no need for isolation guarantees.

As a result, that node will have to perform a read from a majority of nodes, essentially seeing any potential new proposal from another node.

We should be able to understand how a majority quorum can guarantee that by now. If not, it would probably be a good idea to revisit the lesson about quorums and their intersection properties.

This means that reads can become quite slow since they will have to execute in two phases.

Leader leases

According to LampsonB. W. Lampson, “How to Build a Highly Available System Using Consensus,” Proceedings of the 10th International Workshop on Distributed Algorithms, 1996., "An alternative option that works as optimization is to make use of the technique called leases.

Using this approach, a node can take a lease, by running a Paxos instance, establishing a point in timeThis point in time is essentially the time of the proposal (a timestamp that can be part of the proposal’s value) plus a pre-defined time period, which corresponds to the duration of the lease., until which it’s guaranteed to be considered the leader and no other node can challenge it. This means that this node can then serve read operations locally.

However, one has to take clock skew into account in the implementation of this approach and keep in mind it will be safe only if there’s an upper bound in the clock skew.

Problem while using multiple instances of Paxos

By the same logic, one could argue that electing a leader in every instance of the Paxos protocol is not as efficient as possible and degrades performance significantly under normal conditions without many failures.

Indeed, that is true!

Solution with Multi-Paxos

There is a slightly adjusted implementation of Paxos, called Multi-Paxos by David et al.H. Du, J. S. David, and Hilaire, “Multi-Paxos: An Implementation and Evaluation,” 2009. that mitigates this issue.

In this approach, the node that has performed the last successful proposal is considered the current distinguished proposer. This means that a node can perform a full instance of Paxos and then it can proceed straight to the second phase for the subsequent instances, using the same proposal number that has been accepted previously.

The rest of the nodes know which node is currently the leader based on which node made the last successful proposal. They can perform periodic health checks. If they believe this node has crashed, they can initiate a prepare request to perform a successful proposal and become the distinguished proposer.

Essentially, this means that the protocol is much more efficient under stable conditions since it has only one phase. When failures occur, the protocol falls back to plain Paxos.

Dynamically updating the nodes

Another common need is a way to update the nodes that are members of the system dynamically. The answer to this requirement might sound familiar thanks to the elegance of the protocol; membership information can just be propagated as a new Paxos proposal!

The nodes that are members of the system can have their own way of identifying failures of other nodes (i.e., periodic health checks) and the corresponding policies on when a node should be considered dead. When a node is considered dead, one of the nodes that have identified it can trigger a new Paxos instance. Then it proposes a new membership list, which is the previous one minus the dead node. As soon as this proposal completes, all the subsequent instances of Paxos should make use of the updated membership list.

Get hands-on with 1400+ tech skills courses.