Total and Partial Ordering

This lesson will explain the order types with examples. Also, we will explore the ordering used by distributed systems to order events.

Determining the order of events is a common problem that needs to be solved in software systems.

However, there are two different possible types of ordering: total ordering and partial ordering.

Total ordering

A total order is a binary relation that compares any two elements of a set with each other. As a direct consequence of this property, we can use this relation to derive only a single order for all the elements in a set. This is why it’s called a total order.

Example

If we totally order the set of unique integers {7,9,3,2,6}\lbrace 7, 9, 3, 2, 6 \rbrace using the relation less than <<, the associated total order is [2,3,6,7,9][2, 3, 6, 7, 9].

Partial ordering

A partial order is a binary relation that can be used to compare only some of the elements of a set with each other. Consequently, we can use this relation to derive multiple, valid orders for all the elements in the set.

Example

The set of the following sets of integers {{0},{1},{2},{0,1},{1,2}}\lbrace\lbrace0\rbrace,\lbrace1\rbrace,\lbrace2\rbrace,\lbrace0,1\rbrace,\lbrace1,2 \rbrace\rbrace can only be partially ordered, using the subset relation ⊆\subseteq. If we pick the elements {0}\lbrace0\rbrace and {0,1}\lbrace0,1\rbrace, we can order them since {0}⊆{0,1}\lbrace0\rbrace\subseteq\lbrace0,1\rbrace. However, the elements {0,1}\lbrace0,1\rbrace and {1,2}\lbrace1,2\rbrace cannot be ordered using this relation, since neither {0,1}⊆{1,2}\lbrace0,1\rbrace\subseteq\lbrace1,2\rbrace nor {1,2}⊆{0,1}\lbrace1,2\rbrace\subseteq\lbrace0,1\rbrace holds.

As a result, both [{0},{1},{2},{0,1},{1,2}][\lbrace0\rbrace,\lbrace1\rbrace,\lbrace2\rbrace,\lbrace0,1\rbrace,\lbrace1,2\rbrace] and [{2},{1},{0},{1,2},{0,1}][\lbrace2\rbrace,\lbrace1\rbrace,\lbrace0\rbrace,\lbrace1,2\rbrace,\lbrace0,1\rbrace] would be valid partial orderings.

There could be more sequences other than these two.

Total ordering events in single-node systems

In systems comprised of a single node, it’s easy and intuitive to determine a total ordering of events. The main reason is that there is a single actor where all the events happen, so this actor can impose a total order on these events as they occur.

Total orderings also make it much simpler to build protocols and algorithms.

Partial ordering events in distributed systems

In a distributed system it’s not that straightforward to impose a total order on events. This is because there are multiple nodes in the system, and events might be happening concurrently on different nodes. As a result, a distributed system can use any valid partial ordering of the events occurring if there is no strict need for a total ordering.

The following illustration shows why total ordering is much harder to determine in a distributed system.

In a system comprising of a single node that can only execute events serially, it’s easy to define a total order on all the events happening. This is because, between two events (e1e_1, e2e_2), one of them starts after the other one finishes. On the other hand, in a distributed system with multiple nodes where events are happening concurrently, it’s much harder to determine a total order. This is because there might be pairs of events that cannot be ordered.

We can make an interesting and important observation from the above discussion. The fact that these operations have some duration and are interleaved with each other is not the only problem that makes total ordering hard to achieve. Even if these operations are instantaneous (or linearizable), total ordering is still non-trivial to achieve as there is no global clock.

As a result, even if each node can assign unique timestamps to the events happening, these timestamps will come from clocks running at different rates, thus making it harder to compare them. This is demonstrated in the following illustration.

The above illustration shows that any clock errors can be ignored in a single-node system since there is only one clock in use. This makes it possible to assign timestamps to events as if they are instantaneous and establish a total order.

However, in a distributed system, there are multiple clocks in play that run at different rates and can have different errors. This means that the errors need to be taken into account when comparing timestamps between different nodes. This makes it harder to establish a total order since there will be pairs of events where we cannot know which one happened first.

Note that for ease of understanding, the illustration shows the clocks of all the nodes having the same error, which is not realistic.

Get hands-on with 1400+ tech skills courses.