Distributed Algorithms: An Intuitive Approach by Wan Fokkink
I kept coming across descriptions of various algorithms from distributed computing during my reading, such as this one on Paxos, but felt that I’d like to read a book that gives an overview of the many clever algorithms that exist out there. This book is excellent, covering lots of algorithms, often offering an intuitive explanation for the correctness of the algorithm and with a good set of examples and exercises to get you familiar with the algorithms.
It is divided into two sections – message passing and shared memory. In the message passing model the processes communicate via messages that pass across a network which can have various topologies in various algorithms from ring to directed to undirected and which can be FIFO or random order across the various links. The shared memory section looks at algorithms that depend on variants of the atomic test-and-set operations.
The first section breaks the algorithms into the various groupings. Snapshots – how do we capture the state of the processes and the messages in transit at a given moment, so that we could potentially debug or restart the distributed system. Waves – algorithms for visiting every node in the network, where each node only knows about various neighbours. Deadlock detection – getting a global wait graph from the network which can be used to detect circular wait patterns (and hence deadlock). Termination detection – discovering that a launched activity has finished. Garbage collection in the presence of inter-node references, and an explanation of the close relationship between GC and some of the earlier algorithms. Routing – how the network can adapt to knowledge about best paths.
The first section then continues with the more typical distributed system algorithms I have seen mentioned before. Election of a leader by a group of nodes, anonymous networks and how this affects the election process. Synchronous networks, where each process needs to take a step before all processes take another step. Handling crash and byzantine failures – in the former the process just stops communicating, in the latter it can continue behaving but in an illegal manner. The last chapter of the first part covers mutual exclusion, various types of locking in a distributed system and ways to ensure fairness.
At times one can get a little lost in the details of the algorithms which are just listed one after another in the chapters, but I found the exercises a good way to step back and think about the differences between the algorithms.
Section two has chapters of algorithms for processes running in shared memory. Chapter one covers the usual Peterson’s and Bakery algorithms, but then goes into a bit more details about memory coherence and avoids many flying cache lines by using test-and-test-and-set. Chapter two covers barriers and the next chapter covers self-stabilization. The last chapter covers online scheduling, considering the various scheduling policies that a task may have as a hard deadline.
The book is a great introduction to vast range of clever algorithms for doing operations in a distributed setting. It is well written and you are guaranteed to learn something from it.