I spent some more time at the weekend reading up about some of the multi-core related work that Microsoft are doing.
First I watched a video about the Orleans research project and read the associated paper. Orleans is basically what, in the old days, would have been called an application server along the lines of COM+/EJB/CORBA, but with some novel ideas. What we used to call components are now known as grains. Grains are single threaded in their execution – they take part in one orchestrated activity at a time, and can only be reused once this has finished. This is one interesting angle in the recent search for scalability… programming can get easier with threading happening at the application server level rather than at the level of user objects. Grains can be persisted, and they can call other grains via object references that they are passed. Multiple instances of a particular grain can be created by the infrastructure, but only a single instance can take part in a given activity – this may occasionally mean that transactions need to be rolled back if two instances of a given grain are inadvertently pulled into the same activity.
Transactions can be aborted and restarted, and the infrastructure is responsible for controlling the amount of concurrency in order to meet service level requirements.
This is all implemented in managed code, and the use of this actor model really simplifies the programming. At the user code level, the code sees the result of calls to a grain as a future. These futures are built on top of the TPL, and they can be linked together like Tasks in the TPL – using ContinueWith and various Wait operations to link them together.
The Task Parallel Library is a great .NET library for handling concurrent tasks, and there is now a dataflow library that is written on top of it. There’s a great overview document on TPL dataflow. This library can be used for applications which can be structured as a dataflow pipeline. You can take pipelines which take values and process them before pushing them out to other consumers. The consumers can take multiple input values from different inputs, and there is a two-phase commit protocol for allowing this to happen. There are two .NET interfaces, one for sources and one for targets, which the library user can implement, and there are a number of pre-supplied components which look really useful. There are some similarities between this and the Rx library, though the focus of the two libraries is a little different, the latter being aimed at enabling LINQ manipulation of event streams rather than pipelined data processing.
The first release of TPL Dataflow was part of the recent Async CTP. There are some great power point slides on the design of async functionality which I hadn’t noticed before.
Talking of scalability, there’s also an interview with Gil Tene concerning Azul’s highly scalable Java technology, including their pauseless garbage collector.