Some More Things to Discover
Besides the things we highlighted in the previous lesson, look at some more things to discover.
We'll cover the following
At the risk of being unfair to other systems and material out there, we would like to mention CockroachDB as one system that has a lot of public material demonstrating how they have used theoretical concepts in practice. Some concrete examples are implementation of pipelined consensus and a parallelized version of two-phase commit that required a single round-trip instead of two before acknowledging a commit. Some resources that contain a lot of practical information to build and operate distributed systems are the Amazon Builders Library and papers by
The chapters on practices and patterns discussed about how systems can deal with failure. Unfortunately, two types of failure are frequently neglected when building or operating distributed systems, even though they are quite common:
Gray failures P. Huang et al., “Gray Failure: The Achilles’ Heel of Cloud-Scale Systems,” Proceedings of the 16th Workshop on Hot Topics in Operating Systems, 2017. Partial failures C. Lou, P. Huang, and S. Smith, “Understanding, Detecting and Localizing Partial Failures in Large System Software,” 17th USENIX Symposium on Networked Systems Design and Implementation, 2020
Gray failures do not manifest cleanly as a binary indication . They are more subtle and can be observed differently by different parts of a system. Partial failures are those in which only parts of a system fail in a way that has serious consequences equivalent to a full failure of the system, sometimes due to a defect in the design.
These types of failures can be very common in distributed systems due to many moving parts. They can have serious consequences, so it is essential for people who build and run distributed systems to internalize these concepts and look out for them in the systems they build and operate.
Note: Another important topic that we did not cover at all is the formal verification of systems.
We can use many formal verification techniques and tools to prove safety and liveness properties of systems with
It is important to note that users of these formal verification methods have acknowledged publicly that these methods have helped them discover bugs in their designs but have also helped them significantly reason about the behavior of their systems in a better way.
Next steps
Now that you’ve learned what distributed systems are, we recommend applying that knowledge by taking the following two courses:
Grokking Modern System Design Interview for Engineers and Managers: This course teaches you to design large-scale distributed systems by employing building blocks such as load balancers, rate limiters, messaging queues, CDNs, etc. You will learn how to design 13 real-world systems and evaluate your understanding simultaneously.
Grokking the Product Architecture Design Interview: Product architecture design is essential to ensure that the services you build can effectively communicate with other systems. Mastering API design principles is critical for creating reliable and compatible APIs that support modern software architectures.
Get hands-on with 1400+ tech skills courses.