Defining the Consensus Problem
Let's look into the consensus and its use-cases.
Most problems in distributed systems space share a common trait that characterizes most, if not all, of them.
It’s the fact that the various nodes of a distributed system try to reach an agreement on a specific thing.
- In the case of a distributed transaction, it’s whether a transaction has been committed or not.
- In the case of message delivery, it’s whether a message has been delivered or not.
This underlying property is common in many more problems in the distributed systems space.
As a result, researchers formally defined this problem and researched possible solutions, since these can then be used as building blocks for more complicated problems. This is known as the consensus problem.
Formal definition
Assume we have a distributed system that consists of k nodes (, , …, ), where each one can propose a different value .
Consensus is the problem of making all these nodes agree on a single value .
The following properties must also be satisfied:
- Termination: Every non-faulty node must eventually decide.
- Agreement: The final decision of every non-faulty node must be identical.
- Validity: The agreed value must have been proposed by one of the nodes.
Some use-cases of consensus
As explained before, consensus lies beneath many other common problems in the distributed systems space. We will now visit some of them and discuss how they relate to the consensus problem.
Leader election
It is a widespread problem where the nodes that are part of a distributed system need to elect one node from amongst them to act as their leader, and coordinate the operation of the whole system.
Example
An example of this problem is the primary-backup replication scheme.
This scheme is based on the fact that one node, designated as primary, will be responsible for performing operations that update data. The other nodes, designated as secondaries, will follow up with the same operations.
However, the system first needs to select the primary node through a process called leader election. Since all the nodes are practically agreeing on a single value, the identity of the leader, this problem can easily be modeled as a consensus problem.
Distributed locking
One more common problem is distributed locking. Most distributed systems receive multiple concurrent requests and need to perform concurrency control to prevent data inconsistencies because of interference between these requests.
One of these concurrency control methods is locking. However using locking in the context of a distributed system comes with a lot of edge-cases that add a lot of risks.
Of course, distributed locking can also be modeled as a consensus problem, where the nodes of the system agree on a single value, which is the node that holds the lock.
Atomic broadcast
Another commonly cited problem is atomic broadcast, which is concerned with allowing a set of nodes to concurrently broadcast messages while ensuring that all destinations consistently deliver them in the same sequence despite the possible presence of faulty nodes.
This problem is also equivalent to consensus, as also demonstrated in previous research by
The reason we described these problems and demonstrated how they could be modelled as a consensus problem is so that we can appreciate the value of this abstraction, and understand that solving the consensus problem can provide solutions to many more problems.
Get hands-on with 1400+ tech skills courses.