Programming Concurrency on the JVM by Venkat Subramamiam
I’ve always been fascinated by the connection between the hardware and the software, which manifests itself in a programming language when we start having to think about memory barriers and atomic check-and-set instructions which are often hidden behind the abstraction of locks, semaphores and monitors. It is an interesting area but I’ve always found this level of programming to be really hard and error prone.
I remember the horror of a colleague a few years back when he showed me some code that had one thread that made a call and then set a static field and another thread that waited for the thread to be set. Despite the first thread finishing, the second thread didn’t notice the change. I pointed out that he needed to use the volatile keyword to stop the compiler allowing the second thread to cache the value that was read from the field. That such a simple program requires the field to be marked specially is a shock to many people. Locks have the same feel to them, and there’s a good paper by Edward Lee which argues that the non-determinism that the thread abstraction offers is completely non-intuitive, leaving us in a position where we are forced to hack away the non-determinism by using locks to get us to a position where we can start reasoning about the program again. Somehow this feels wrong – the usual mantra of development is “get it working, get it right, get it fast”, and we would often prefer to be in a position where we can reason about the program being right and then add extra code to get it fast rather than having to obscure the algorithm using tons of locking to get it right in the first place.
This book looks at concurrency from three perspectives: shared mutability (mutable fields and locks), isolated mutability (using actors) and pure immutability (using STM). It looks at implementations of these three techniques running on the JVM, using the languages Scala mainly for the Akka framework which offers Actors and STM, Clojure which offers STM and a version of Actors called Agents, and Java for the thread pool and Java 7 fork-join APIs.
Part one is a great introduction to concurrency. It first looks at a couple of the perils of concurrency and continues by looking at how we can introduce concurrency to speed up an I/O bound and a computationally intensive application. In these two cases, we need to break the work into independent tasks which then need to synchronise their results with each other. This breakdown introduces the ideas of state and scalability, allowing a discussion around ensuring visibility via the use of memory barriers and the preservation of invariants via atomicity using locks and synchronised.
Part two deals with software transactional memory, first from the perspective of Clojure, a language in which all of the datatypes are immutably persistent and then from the point of view of Scala and the Akka library. In this second case, the programmer has to be a lot more careful to make sure that the objects managed by the transaction are immutable to avoid side-effects leaking through the transaction. STM looks like a great solution for the cases where there are many reads and fairly infrequent writes – too many writes and we may spent a lot of time retrying transactions, only to have them aborted again after they have burned lots of CPU cycles. STM is clever in that it leads to code that can be composed, without the programmer having to introduce any extra synchronisation.
Every time I see STM, I forget that we’re not really talking about making it appear to the participants that they live in a world where all results can be serialized. By default the two implementations covered in the text implement snapshot isolation, though both offer a means to work around problems that this might introduce by allowing reads to be included in the transaction log in such a way that the transaction will fail to commit if a value it read is modified by a previously committed transaction – Clojure has the ensure function to do logging reads and Akka allows the transaction to be configured so that all reads are logged. You can see the Clojure implementation here. It’s not clear to me how many programs will be broken if programmers forget to upgrade some reads into effective writes.
Part three looks at Actors. These are one technique for isolating mutability – changes to some state can only be made from a single thread which also serializes access to the state by demanding the requests to access the state are pushed into a mailbox which is processed by the controlling thread. The book looks at untyped Actors, in which the processing happens in an untyped receive method that the Actor implements, and typed Actors where the messages are strongly typed to correspond to methods defined on a particular interface. In a true Actor model, the messages need to be immutable, and in both Java and Scala it is the responsibility of the user to ensure that this is the case.
Actors help us to isolate the places that change the state, giving us thread safety without the need to introduce synchronisation. As the author points out, you can still get deadlocks happening, particularly in the Akka model which offers not only the traditional send-and-forget message send but also the send-and-wait-for-reply message, but at least Actors provide a nice boundary across which you access controlled state.
In order to get actors to cooperate, Akka has a notion of transactors – objects responsible for getting a set of actors to do a series of actions inside a transaction.
Part four summaries what we’ve learned, that the less mutable state, the easier it is to manage the concurrency of an application. I think the book is a really good overview of concurrency and how there may be better ways for handling it than via the use of locks and atomic instructions. As a side effect this book made me keen to look more closely as Scala and the Akka framework.