I’ve been following the Mechanical Sympathy blog for some time. It hits the sweet spot of one of my interests in computer science – the use of virtual machines and their interaction with the hardware when running with multiple concurrent operations over shared resources. Locks are the traditional way to control concurrency, serializing the access to resources to make it easy to calculate the effects of multiple completing readers and writers, but their use may require entry to the kernel of the operating system with potential context switches and dirtying of the cache. These disadvantages can lead to a large loss in performance when compared against lower level but harder to use implementation mechanisms such as memory barriers and CAS (check and set) instructions that are build into most processor architectures.
The blog’s author has spent a lot of time trying to get producers and consumers interacting at very high message rates as part of the implementation of a trading system. Disruptor, his component programming framework, is the result of this work. There’s a very good technical paper here together with a whole set of links to other blog posts and articles including this one on the badness of locks and a quick overview of memory barriers.
The real trick is to use a circular buffer for the buffering – this avoids the jitter of allocation of more dynamic schemes. Producers can use a CAS operation to reserve a slot within the buffer which they can fill at their leisure (within certain bounds), and they can then publish this change using another CAS. The consumers can check for the availability of new data using only a memory barrier rather than requiring the use of a potentially expensive lock.
Over the course of the work, the author and his collaborators have found various inefficiencies in the existing JVM implementations, and have managed to get fixes into particular high performance systems such as Azul’s Zing architecture.
The blog is good because the author goes out of his way to write micro-benchmarks that demonstrate the problems. For example, he has examples that show how adding padding can eliminate false sharing in the cache lines of one benchmark. The implementation is full of clever ideas and tricks and is well worth a read.