Coordination Service

Let's have an overview of coordination in distributed systems.

It must have become evident by now that coordination is a central aspect of distributed systems.

Even though each component of a distributed system might function correctly in isolation, one needs to ensure that they will also function correctly when operating simultaneously. This can be achieved through some form of coordination between these components.

Coordination as an API

As illustrated in the chapter about consensus, this coordination can be quite complicated with many edge cases. Consequently, implementing these coordination algorithms on every new system from scratch would be inefficient and would also introduce a lot of risk for bugs.

On the contrary, if there was a separate system that could provide this form of coordination as an API, it would be a lot easier for other systems to offload any coordination function to this system.

Several different systems were born out of this need which are listed below:

Chubby

ChubbyM. Burrows, “The Chubby Lock Service for Loosely-coupled Distributed Systems,” Proceedings of the 7th Symposium on Operating Systems Design and Implementation, 2006. was a system implemented internally in Google and used from several different systems for coordination purposes.

Zookeeper

ZookeeperP. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper: Wait-free Coordination for Internet-scale Systems,” Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, 2010. is a system that was partially inspired from Chubby, it was originally developed in Yahoo and later became an Apache project.

Many companies have widely used it to perform coordination in distributed systems, including some systems that are part of the Hadoop ecosystem.

etcd

etcd is another coordination system that implements similar coordination primitives and later formed the basis of Kubernetes’ control plane.

These three systems have a lot of similarities, but they also slightly differ from each other.

In short, this chapter will focus on Zookeeper, providing an overview of its design. It will also explain the basic differences of the other two systems where relevant.

Get hands-on with 1400+ tech skills courses.