You’re doing it upside down

Programming Concurrency on the JVM by Venkat Subramamiam

I’ve always been fascinated by the connection between the hardware and the software, which manifests itself in a programming language when we start having to think about memory barriers and atomic check-and-set instructions which are often hidden behind the abstraction of locks, semaphores and monitors. It is an interesting area but I’ve always found this level of programming to be really hard and error prone.

I remember the horror of a colleague a few years back when he showed me some code that had one thread that made a call and then set a static field and another thread that waited for the thread to be set. Despite the first thread finishing, the second thread didn’t notice the change. I pointed out that he needed to use the volatile keyword to stop the compiler allowing the second thread to cache the value that was read from the field. That such a simple program requires the field to be marked specially is a shock to many people. Locks have the same feel to them, and there’s a good paper by Edward Lee which argues that the non-determinism that the thread abstraction offers is completely non-intuitive, leaving us in a position where we are forced to hack away the non-determinism by using locks to get us to a position where we can start reasoning about the program again. Somehow this feels wrong – the usual mantra of development is “get it working, get it right, get it fast”, and we would often prefer to be in a position where we can reason about the program being right and then add extra code to get it fast rather than having to obscure the algorithm using tons of locking to get it right in the first place.

This book looks at concurrency from three perspectives: shared mutability (mutable fields and locks), isolated mutability (using actors) and pure immutability (using STM). It looks at implementations of these three techniques running on the JVM, using the languages Scala mainly for the Akka framework which offers Actors and STM, Clojure which offers STM and a version of Actors called Agents, and Java for the thread pool and Java 7 fork-join APIs.

Part one is a great introduction to concurrency. It first looks at a couple of the perils of concurrency and continues by looking at how we can introduce concurrency to speed up an I/O bound and a computationally intensive application. In these two cases, we need to break the work into independent tasks which then need to synchronise their results with each other. This breakdown introduces the ideas of state and scalability, allowing a discussion around ensuring visibility via the use of memory barriers and the preservation of invariants via atomicity using locks and synchronised.

Part two deals with software transactional memory, first from the perspective of Clojure, a language in which all of the datatypes are immutably persistent and then from the point of view of Scala and the Akka library. In this second case, the programmer has to be a lot more careful to make sure that the objects managed by the transaction are immutable to avoid side-effects leaking through the transaction. STM looks like a great solution for the cases where there are many reads and fairly infrequent writes – too many writes and we may spent a lot of time retrying transactions, only to have them aborted again after they have burned lots of CPU cycles. STM is clever in that it leads to code that can be composed, without the programmer having to introduce any extra synchronisation.

Every time I see STM, I forget that we’re not really talking about making it appear to the participants that they live in a world where all results can be serialized. By default the two implementations covered in the text implement snapshot isolation, though both offer a means to work around problems that this might introduce by allowing reads to be included in the transaction log in such a way that the transaction will fail to commit if a value it read is modified by a previously committed transaction – Clojure has the ensure function to do logging reads and Akka allows the transaction to be configured so that all reads are logged. You can see the Clojure implementation here. It’s not clear to me how many programs will be broken if programmers forget to upgrade some reads into effective writes.

Part three looks at Actors. These are one technique for isolating mutability – changes to some state can only be made from a single thread which also serializes access to the state by demanding the requests to access the state are pushed into a mailbox which is processed by the controlling thread. The book looks at untyped Actors, in which the processing happens in an untyped receive method that the Actor implements, and typed Actors where the messages are strongly typed to correspond to methods defined on a particular interface. In a true Actor model, the messages need to be immutable, and in both Java and Scala it is the responsibility of the user to ensure that this is the case.

Actors help us to isolate the places that change the state, giving us thread safety without the need to introduce synchronisation. As the author points out, you can still get deadlocks happening, particularly in the Akka model which offers not only the traditional send-and-forget message send but also the send-and-wait-for-reply message, but at least Actors provide a nice boundary across which you access controlled state.

In order to get actors to cooperate, Akka has a notion of transactors – objects responsible for getting a set of actors to do a series of actions inside a transaction.

Part four summaries what we’ve learned, that the less mutable state, the easier it is to manage the concurrency of an application. I think the book is a really good overview of concurrency and how there may be better ways for handling it than via the use of locks and atomic instructions. As a side effect this book made me keen to look more closely as Scala and the Akka framework.

Posted in Books | Leave a comment

That’s magic, that is

Magical Mathematics: The Mathematical Ideas That Animate Great Magic Tricks by Persi Diaconis and Ron Graham

A great read which takes some principles from combinatorics and shows how they can be used to produce some fantastic magic tricks. This includes the Gilbreath Principle which can be used to maintain some order in a deck when it is shuffled.

The book contains lots of interesting tricks, mathematical ideas, some juggling and lots of interesting stories behind some of the magicians of the day. I hadn’t realised before that there are people who can do a perfect shuffle on demand – take a pack of cards and spilt it in half, and then riffle shuffle the two parts back together so that the final order has alternate cards from the left and right hands.

Posted in Books | Leave a comment

The sooner the better

It’s nice to see that Microsoft have made some additions to .NET 4.5 to try to use information recorded during one run of the application to improve the performance in subsequent runs. Multicore JIT is going to allow the JIT to be compiling methods on a background thread before the main thread of execution reaches them, hopefully improving the start up time. All you need to do is provide a directory where the CLR can store the information it accumulates, and name the profile when you tell the system to start profiling.

profile

If the named file already exists, because of a previous recorded run, then the existing information is used to start background jitting of the recorded methods, and details from the latest run are used to overwrite the file.

There’s also a way to run your application through scenarios while collecting profile data which can be used by the NGEN compiler to optimise the final assembly – profile guided optimisation. This isn’t as dynamic as the above approach, but the NGEN compiler is able to spend a lot more time optimising than the JIT compiler and hence use more traditional compiler optimisations.

It’s going to interesting to try these features out on our .NET applications to see what performance improvements we can get.

Oh, and on a performance related theme, there’s a good talk on the benefits of lock free programming here on InfoQ, which has a pointer to the following paper on memory ordering on Intel processors and details of the cache hit/miss information and cycles per instruction data that Intel processors make available by (unfortunately) Ring 0 instructions.

Posted in Computers and Internet | Leave a comment

Surely performant isn’t a word

High Performance JavaScript by Nicholas C Zakas

My loan of this from the library was almost finished before I’d had time to finish it – I had the choice of reading it in one day or not reading it at all. Thankfully I chose to read it in a couple of sittings, and I’m really glad I did.

The book consists of ten chapters on different aspects of writing high performance JavaScript enabled web sites and covers a range of issues around this topic. I’d heard bits and pieces of the advice from various sources such as blogs in the past, but it was nice to find the information all in one place.

Chapter one offers advice for getting the page to render quickly, predominantly by moving the script elements to the end of the page near the closing of the body element. Other techniques are also covered: deferred scripts in IE, and delayed loading of script using XHR or other methods and then loading the new code using a <script> element which is added to the DOM.

Chapter two drills into the implementation of JavaScript, in particular scope chains for the activation records and prototype inheritance for the objects, and uses this to explain why some coding patterns are slower than others. The book is really good in measuring the performance improvements across a number of browsers and displaying the speed different using graphs to give the reader a clear feel for the impact of the suggestions.

Chapter three covers the implementation of the DOM, emphasising that the DOM and the JavaScript engine often live in different dlls with a fairly slow communication path between thees two parts. The chapter looks at the actions that cause repaints and reflows by the browser, and DOM queries that are blocked while this happens. This leads to a discussion of techniques for batching activities to avoid the engine having to wait for the DOM layout.

Chapters 4 and 5 look at slow language constructs including loop performance, if-then versus switch and lookup tables. There’s a discussion of stack overflow and the stack sizes in the various browsers, and some techniques to avoid it happening. Writing efficient regular expressions is covered in some detail.

Chapter 6 looks at how the single-threadedness of a browser affects the perceived performance. The UI rendering and the execution share a single thread in most browser implementations. If the JavaScript blocks for more than 100 milliseconds, then this is going to cause the user to feel that the page is unresponsive. The chapter looks at how we can use setTimeout to schedule a long activity as a set of tasks that can be executed part by part whilst allowing the browser to catch up with rendering between the tasks. The behaviours of the different browsers is covered – information that I’ve not seen before. The chapter also contains details of the originally HTML5 web workers proposal which allows long running activities to happen on their own thread with a mechanism to communicate results between the main thread and the worker thread.

Chapter 7 looks at Ajax and the different in performance when using the data formats XML, JSON, JSON-P, HTML compared to custom data transmission formats. Response caching using HTTP headers is also covered.

Chapter 8 covers practices to get good performance including lazy loading and using native methods for good performance.

Chapters 9 and 10 round off the book by talking about the tooling for building and deploying the sites, and for analysing the performance of both the JavaScript (using a profiler) and the page download (by looking at the order of resource download and the techniques for compressing and merging the code files).

The book is fairly short but covers the material with a good amount of detail, making it both a quick and an informative read.

Posted in Books | Leave a comment

Entomology for beginners

A Bug Hunter’s Diary by Tobias Klein

This is a book that takes 7 examples of the author inspecting an application, finding an exploit and then exploiting it to take control of the machine. The examples cover a number of different machine architectures and operating systems. The appendices cover various technical items such as how to set up a windows kernel for debugging, notes on type conversions in C and an explanation of the ELF format global offset table.

Each exploit is covered in its own chapter. The author typically traces input from the user, and uses C reverse generated from x86 assembler to show how the target application doesn’t correctly use this input data. By crafting a suitable set of data, the author shows how the instruction pointer (EIP) can be manipulated, and then explains how this could be used to take control of the machine. On Windows, one of the examples is the exploitation of a virus checker which installs a device driver, accessible by everyone, which has an IOCTL call which copies data from a source location in user space to a target location that can be in either user or kernel space. By using a suitable crafted input request packet, this piece of code can be coerced into writing over its own code so that later execution of the virus checker will be running in kernel mode but will jump to an address that the writer has set up. Hence the user has elevated their privilege and has effectively got control of the machine.

The author makes the exploits look really easy- he explains that there are numerous fuzzing tools in existence that automatically alter the input to some piece of code as a means of finding out if the code can be coerced into altering something it shouldn’t.

Moreover, every chapter has a fairly extensive set of references to interesting papers and web pages that explain some of the tricks in more detail. For example, there are references to a paper explaining a Windows or Linux hack to enable a command shell and a very interesting article on how to construct a Rootkit and how this uses DKOM to hide the fact that it is on the machine.

The appendices cover some of the more modern defences against the techniques that are outlined – address space randomisation, security cookies (to prevent stack buffer overflows, and using the CPU no-execute bit. A very good read!

Posted in Books | Leave a comment

Let’s get it going

I recently gave a lightning talk at work on the subject of Erlang. In five minutes of slides and code examples I tried to give a feel for several interesting features that the language and its runtime offers – lightweight processes using an Actor like model for communication, linking processes together to achieve fault tolerance and dynamic code updating.

Posted in Computers and Internet | Leave a comment

There’s nothing to see here

The release of Windows 8 is getting closer and closer, and there’s a lot of new material to understand around this new platform. Here are a few articles that have recently caught my eye.

An msdn article on App contracts and extensions – Windows 8 Metro style apps are fairly isolated from each other, and to get a unified experience the runtime manages the communication between applications via contracts.

Metro is a very grid based layout, and so HTML5 style applications are going to need to use CSS3 grid layouts to get their interfaces to fit in to the Metro style. This article is a good introduction to the new layout.

So just how secure is the code of an application which is written for the WinRT platform. This article covers reverse engineering of such applications.

There seems to be an association in the WinRT world between DirectX and native applications, but of course you can use libraries like SharpDx to do DirectX from managed code. This article goes in to some of the details.

Posted in Computers and Internet | Leave a comment