An introduction to Raft

Let's investigate why the Raft algorithm was created when there was already a Paxos algorithm for solving the consensus problem.

Motivation

Paxos has been the canonical solution to the consensus problem. However, the initial specification of the algorithm did not cover some aspects that were crucial in implementing the algorithm in practice, some of these aspects were covered in subsequent papers.

The Paxos algorithm is also widely considered hard to understand.

As a response to these issues, researchers decided to create a new algorithm with the goals of improved understandability and ease of implementation. This algorithm, proposed by Ongaro et al.D. Ongaro and J. Ousterhout, “In Search of an Understandable Consensus Algorithm,” Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, 2014., is called Raft.

We will briefly examine the Raft algorithm in this lesson since it has provided a good foundation for many practical implementations. It nicely demonstrates how the various aspects described before can be consolidated in a single protocol.

Getting started with Raft

Raft establishes the concept of a replicated state machine and the associated replicated log of commands as first-class citizens and supports by default multiple consecutive rounds of consensus by default.

It requires a set of nodes that form the consensus group, which is referred to as the Raft cluster. Each of these nodes can be in one of the three states:

  • Leader
  • Follower
  • Candidate

Node states

One node is elected the leader and is responsible for receiving log entries from clients (proposals) and replicating them to the other follower nodes to reach consensus.

The leader is responsible for sending heartbeats to the other nodes in order to maintain its leadership.

Any node that hasn’t heard from the leader for a while will assume the leader has crashed; it will enter the candidate state and attempt to become the leader by triggering a new election.

On the other hand, if a previous leader identifies another node has gained leadership, it falls back to a follower state.

The following illustration illustrates the behavior of the nodes depending on their state.

Preventing two leaders from operating concurrently

To prevent two leaders from operating concurrently, Raft has the temporal concept of terms.

Terms

Time is divided into terms, which are numbered with consecutive integers.

Each term begins with an election where one or more candidates attempt to become leaders.

To become a leader, a candidate needs to receive votes from a majority of nodes. Each node votes for at most one node per term on a first-come-first-served basis. Consequently, at most, one node can win the election for a specific term since two different majorities would conflict in at least one node.

If a candidate wins the election, it serves as the leader for the rest of the term.

Any leader from previous terms will not be able to replicate any new log entries across the group since the voters of the new leader will be rejecting its requests, and it will eventually discover it has been deposed.

If none of the candidates manages to get a majority of votes in a term, this term ends with no leader, and a new term (with a new election) begins straight after.

Get hands-on with 1400+ tech skills courses.