I read my way through the appendix of Is Parallel Programming Hard, and, if so, what can you do about it? which covered modern CPU architecture memory barriers, in my quest to understand them better. That appendix alone, Appendix C, is a very informative read, which makes things clearer than a good many other articles I’ve read in the past.
After reading that, I started with the rest of the book which is a great overview of writing scalable parallel algorithms – using techniques like code locking, data locking and breaking data into independent chunks with later synchronisation. Moreover, it covers a reader-writer variant called RCU which the author helped to make a fundamental construct in the scalable Linux 2.6 kernel. Paul McKenny’s phd thesis covers the implementation in detail, as well as contrasting it with alternatives.
RCU is a technique for allowing readers access to a data structure whilst simultaneously allowing writers to mutate copies of the data structure and then update them in place to make the change visible. The key is then to avoid freeing any memory until the system can be guarantee that no readers are still using the old version of the object. RCU manages this by scoping read access to a structure between quiescent states of the CPUs, having readers notify that they are using the structure, and allowing actions to be queued on the old structure so that they execute when a suitable quiescent state is reached. It is a very clever idea, which scales very well, and which is used many times inside the Linux kernel.