Replicated State Machine via Consensus
In this lesson, we will explain how a consensus algorithm could be used to build a replicated state machine.
We'll cover the following
At the beginning of this chapter, we briefly described how a consensus algorithm could be used to solve a wide variety of problems.
This is not a coincidence since all these problems share a common, fundamental characteristic. This is the fact that they can all be modeled as a state machine to some extent. This is also the reason why it’s easier to solve them in a centralized setting, but it gets much harder when we want to solve them in a distributed setting in order to increase availability.
Building a replicated state machine using a consensus algorithm
Using a consensus algorithm, we can build a replicated state machine. This is a set of nodes, where each of them is receiving commands and executing them, transitioning between states.
If all the nodes use the same state machine, all we need is to ensure that all the nodes receive the same inputs in the same order, and then we can guarantee that all the nodes will make the same transitions. This would mean that the distributed system would look similar to a single server from the outside.
As a result, one could achieve all the benefits of a distributed system while maintaining a simple programming model.
The following illustration contains a high-level overview of such a system.
The top layer receives requests from the clients. It creates proposals for the consensus layer, which conducts the necessary coordination between the other nodes of the system and propagates the chosen values to the lower layer, which receives these values as inputs and executes the necessary state transitions.
Using Paxos at the consensus layer
Let’s elaborate a bit more on what that would entail, assuming Paxos is used as the consensus layer of the system.
Essentially, the clients would send regular requests to the system, depending on the system’s domain. These requests could be either commands to the system or requests to inspect its internal system.
These requests would be dispatched to the system’s current leader, which will be determined based on previous instances of the consensus.
If a node that is not a leader receives a request, it can return the leader’s address so the client can reroute it.
When the system is bootstrapped, and no consensus instances have been run yet, the leader can be determined from a configuration file, or the nodes can compete with each other for the leader role.
Every time the leader node receives a new command, it attempts to execute a new instance of consensus, increasing the instance number every time.
To achieve satisfactory performance, multiple consensus instances can be run in parallel. However, the necessary serialization must be performed in some places to ensure correctness.
For instance, the lower layer should process the decision of a consensus instance only when it has processed all the previous instances to ensure that all state machines perform the same transitions.
Similarly, the leader should wait after an instance is completed and reply to the associated client only after all the previous instances have been completed.
When the current leader is unstable, and other nodes start making proposals, there might be increased contention, for instance, creating significant delays for any subsequent instances that might have completed.
A dummy value can be proposed by the nodes in these cases, which essentially represents a no-op, rejecting the client’s operation.
This abstraction of a replicated state machine is quite powerful and could potentially be used to implement solutions for many common problems in the distributed systems area.
Get hands-on with 1400+ tech skills courses.