You can have your own CLR too

I was surprised when the source to the CoreCLR was published on GitHub a few weeks ago. Over the weekend I thought I’d start to build it and begin looking through the code in detail. I was amazed at how easy it is to build.

First you check out the sources from GitHub. Next you down the latest version of CMake and put it on your path. Then you run the build.cmd script. Your PC gets a bit busy for 30 minutes and then you have your own version of the CLR and the associated mscorlib.dll. [There is one gotcha concerning the DIA SDK that the script will warn you about – the fix is a copy from an older VS installation to the newest]

You can then use your existing C# compiler to compile against this version of the CLR.

c:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe /nostdlib /r:D:\git\coreclr\binaries\Product\x64\debug\mscorlib.dll test.cs

and then run it using CoreRun

D:\git\coreclr\binaries\Product\x64\debug\CoreRun.exe test.exe

Brilliantly painless.

Posted in Computers and Internet | Leave a comment

JavaScript Ninja Secrets

Secrets of the JavaScript Ninja by John Resig and Bear Bibeault

This is a very good book which is so much more than a simple JavaScript language text. Sure, it covers some of the subtle points of the JavaScript language, and you are expected to have knowledge of JavaScript before reading the book, but it also offers a lot of information about some of the practical problems you will meet as a JavaScript programmer.

It starts in a slightly odd manner. There is a chapter discussing the benefits of cross browser development, followed by a chapter in which the  author writes a simple testing framework.  This framework allows you to embed tests inside an HTML page, and displays the results of the assertions inside the tests in an easy to read manner. Lots of the explanations that follow in the book will be expressed as assertions in the language of this testing framework. This way of working works very well.

The next section of the book, Apprentice Training, consisting of six chapters, has five chapters that focus on the subtle parts of the JavaScript language. The book focuses on the functional nature of JavaScript, discusses why it is important and then gives a very good explanation of scope and the four ways that you can invoke a function object – as a function, as a method, as a constructor and via apply and call. There is a good discussion of recursion followed by an example of memorisation, showing the big benefits of pure functional code. This is followed by a chapter on closures, which describes what they are and then shows how they can be used to implement partial application and temporary scopes with private local variables. We then go object-oriented in a discussion of prototypes and how they can be used to get features like those of standard OO languages. The author defines a JavaScript mini-framework that defines inheritance hierarchies that support a super operation to access base types. There is then a chapter on regular expressions, showing their great power (though you’ll need to be responsible to use them).

The last chapter of this section talks about the single threaded nature of browsers, the typical JavaScript host, and discusses how timers can be set and cleared. There is also lots of cross browser detail in how these things are implemented, and the book covers some of the common gotchas.

The next section, Ninja Training, first talks about runtime code evaluation, at first using eval and Function to make new function objects, but then moving on to describe the inbuilt decompilation (as you can ask a Function for its source code). There is then an interesting section showing some of the uses of these techniques.

The next chapter looks at with statements, a feature of JavaScript that you either love or hate. There is a small concise example that shows a powerful micro-templating engine that is written in a tiny amount of code, which shows the power of the features we have been shown before.

There are then two chapters on cross-browser strategies for using the DOM. Lots of practical discussion of the differences between browsers which taught me a lot about the DOM and how it should be used.

The last section, Master Training, also looks at browser differences, looking at the differences in the event model and CSS selectors.

The book is a really good read. It explains the parts of JavaScript that it covers really well, and you’ll learn loads about browsers and the DOM which is very interesting and useful if you do cross-browser work. The browsers that it discusses are a little behind the times, but that is probably the only complaint I have.

Posted in Books | Leave a comment

Understanding the CLR via its assembly code

Expert .NET 2.0 IL Assembler by Serge Lidin

I have read large parts of this book multiple times, but have never sat down and read it cover to cover before. It’s a great book for understanding the CLR. The book isn’t just about IL assembly language as a programming language. The book covers this but also covers the translated form of the IL, discussing the format and sections of a PE file including the various metadata tables of the various types (think assembly or module or class) and heaps. It also gives a great explanation of the various IL instructions and discusses the semantics of generics and exceptions. All in all, I think it is a great book for understanding the CLR and I always (re)learn something every time I reread it.

For example, when you want to call the methods on a valuetype via an interface, you typically need to box it. If the struct is not mutable this is usually not the semantics you want. When you instantiate a generic class using a structure type, the user would like the interface method to be invoked without any boxing, and the CLR version 2 got a new IL instruction, constrained to deal with this.

class Holder<T> where T : IFoo
{
T field;
public Holder(T x)
{
field = x;
}
public void DoInc()
{
field.Increment();
}
}

leads to the following code for the DoInc method

IL_0000: nop
IL_0001: ldarg.0
IL_0002: ldflda !0 class ConsoleApplication10.Holder`1<!T>::’field’
IL_0007: constrained. !T
IL_000d: callvirt instance void ConsoleApplication10.IFoo::Increment()
IL_0012: nop
IL_0013: ret

The book also warns you of some of the times when C# isn’t the assembly language of .NET… if you satisfy an interface using a non-virtual method, the C# compiler silently makes it sealed virtual, or if you nest a non-generic class inside a generic class the compiler silently makes the nested class generic so that you have access to the type parameter.

The book teaches you loads of miscellaneous facts about the CLR – debug information and various security attributes for example, and how unmanaged mixed-mode assemblies work. The emphasis on using ILDASM/ILASM to round trip code with modification is also very informative.

A thoroughly great book!

Posted in Books | Leave a comment

Category theory seems to pop up everywhere

Basic Category Theory For Computer Scientists by Benjamin C. Pierce

I have looked at this book before, but decided to try to give it another read over the holiday. The content is rather dry, but for each of the concepts the author provides a mass of examples that make it easier to understand the concepts.

The book is fairly short with only four chapters, and the fourth chapter is just descriptions of additional material with a paragraph describing each items relevance. However, it does what it says on the cover – providing a way to understand terms that appear to crop up all over the place in Computer Science these days. Chapter three on applications covers some domain theory and Cartesian Closed Categories in a few pages.

I’m not sure I’d recommend this as an end to end read, but more as something to dip into when terms need to be understood.

Posted in Books | Leave a comment

Writing high-performance .NET code can be hard

Writing High-Performance .NET Code by Ben Watson

A really good book from someone who has clearly worked hard in the past to maximize the performance of .NET applications. The book has a strong emphasis on using Event-tracing for windows, ETW, for getting information about the behaviour of various components of the CLR (such as jitting or garbage collection behavior), and also makes a lot of good observations on measurement, opining that averages often not the best measurement and one should instead aim at a particular percentile (which is something that Gil Tene discusses in his talk on understanding latency)

Chapter one covers performance measurement and tools. Here the author describes ETW (by way of example), perfmon and performance counters, Windbg and its associated SOS extension for CLR debugging, decompilation using ILSpy/Reflector, the sysinternals tools such as VMMap, and also mentions profilers such as the one that comes with Visual Studio. The author seems to shy away from using profilers because of their invasiveness, and he is often more concerned with measuring the performance on systems that don’t need the applications to be restarted in order to get measurements. Certainly it would be hard work to profile a production system using rewriting .NET profilers, and ETW seems to be positioned as a low overhead logging framework while still offering great detail on the application being logged.

Chapter two covers garbage collection and it’s a very good overview. It has clear explanations on the various modes for the garbage collection, server/workstation with and without background garbage collection (for generation two) and explains how the mode maps down to the number of threads that the system is going to be using. When to force full garbage collections (hardly ever), when to compact the large object heap, when to handle the garbage collection pending event and how to track down fragmentation (using Windbg) are covered, as well as how to monitor the performance of the memory system.

Chapter three covers the JIT. When to NGEN and when to use the 4.5 background compilation mode are covered. The author also discusses the C# language constructs that generate masses of code – LINQ and dynamic code in particular, and again covers the way to monitor what the JIT is doing.

Chapter four covers asynchronous programming and is a collection of items that the author has considered when writing his own systems. There is an emphasis on using Tasks as the abstraction for chunks of work, given that these support cancellation and various combinators for combining them in interesting ways. PLINQ, timers, and await are also covered in some of the observations.

Chapter five on general coding and design looks at various things. Class versus struct and the massive savings in memory usage that structs can offer, sealing and virtual dispatch contrasted with interface dispatch at a polymorphic call site, avoiding boxing and casting, and using exceptions as a general control flow operation. There’s also an interesting note on dynamic code generation.

Chapter six looks at parts of the .NET Framework, and points out that you should understand the cost of every API call you make – particularly as the framework often provides many different ways to do the same thing (for example parsing XML). This chapters looks at some miscellaneous items that are commonly misused.

The next two chapters look at performance counters in more detail, and then look at writing your own trace events into the ETW logs.

The final two chapters discuss how you can make your team more performance focused and how to avoid the various performance traps.

I liked this book because of its practical application. The explanations of the various technologies are really good, and the text is peppered with stories of where the observations actually helped in practice. ETW is being pushed hard as the technology to use for logging and it is really good to find a book that focusses on the use of this.

Posted in Books | Leave a comment

Some useful distributed algorithms

Distributed Algorithms: An Intuitive Approach by Wan Fokkink

I kept coming across descriptions of various algorithms from distributed computing during my reading, such as this one on Paxos, but felt that I’d like to read a book that gives an overview of the many clever algorithms that exist out there. This book is excellent, covering lots of algorithms, often offering an intuitive explanation for the correctness of the algorithm and with a good set of examples and exercises to get you familiar with the algorithms.

It is divided into two sections – message passing and shared memory. In the message passing model the processes communicate via messages that pass across a network which can have various topologies in various algorithms from ring to directed to undirected and which can be FIFO or random order across the various links. The shared memory section looks at algorithms that depend on variants of the atomic test-and-set operations.

The first section breaks the algorithms into the various groupings. Snapshots – how do we capture the state of the processes and the messages in transit at a given moment, so that we could potentially debug or restart the distributed system. Waves – algorithms for visiting every node in the network, where each node only knows about various neighbours. Deadlock detection – getting a global wait graph from the network which can be used to detect circular wait patterns (and hence deadlock). Termination detection – discovering that a launched activity has finished. Garbage collection in the presence of inter-node references, and an explanation of the close relationship between GC and some of the earlier algorithms. Routing – how the network can adapt to knowledge about best paths.

The first section then continues with the more typical distributed system algorithms I have seen mentioned before. Election of a leader by a group of nodes, anonymous networks and how this affects the election process. Synchronous networks, where each process needs to take a step before all processes take another step. Handling crash and byzantine failures – in the former the process just stops communicating, in the latter it can continue behaving but in an illegal manner. The last chapter of the first part covers mutual exclusion, various types of locking in a distributed system and ways to ensure fairness.

At times one can get a little lost in the details of the algorithms which are just listed one after another in the chapters, but I found the exercises a good way to step back and think about the differences between the algorithms.

Section two has chapters of algorithms for processes running in shared memory. Chapter one covers the usual Peterson’s and Bakery algorithms, but then goes into a bit more details about memory coherence and avoids many flying cache lines by using test-and-test-and-set. Chapter two covers barriers and the next chapter covers self-stabilization. The last chapter covers online scheduling, considering the various scheduling policies that a task may have as a hard deadline.

The book is a great introduction to vast range of clever algorithms for doing operations in a distributed setting. It is well written and you are guaranteed to learn something from it.

Posted in Books | Leave a comment

It’s the season to watch loads of videos

Loads of conferences seem to have recently published videos to various streams – Strange Loop, Clojure Conj, React 2014, ICFP, ML WorkShop so here is a list of a few I found particularly interesting.

Inside Transducers and a talk from the previous year, Transducers, both by Rich Hickey discuss a new construct that has been added to the latest version of Clojure. Transducers offer a means for doing algorithmic transformations as part of a pipeline and integrate nicely with Clojure’s async coding infrastructure. Yet again Clojure seems to have taken ideas from other languages and made them into something slightly different but very practical.

I’ve been a fan of Reactive Functional Programming ever since I came across the FRAN library in Haskell many years ago, but have never had the chance to use such ideas in a real commercial application. I found this talk by Paul Betts very interesting where he discusses the use of such ideas in implementing the GitHub client. This discussion of the different formulations of FRP is also very good, There is also a talk here on the React framework which uses ideas from functional programming to control mutation of the DOM for browser applications.

Memory management has always been an interest of mine and this talk on Shenandoah is a good introduction to some of the issues of multicore garbage collection and its trade offs.   There was also a talk on why deterministic memory management could be useful on the JVM.

I very much enjoyed these talks on JavaScript. The first discusses the benefits of implementing JavaScript in JavaScript (as you get the many optimisations applied on the core library which you also want to apply to user code), and the second discusses the implementation of the Chakra, IE’s JavaScript engine.

This discussion of the merits and problems of type systems was also very pragmatic and led into some of the interesting talks at ICFP on dependent types and the use of O’CAML for cloud programming on top of the Mirage operating system. There were also plenty of interesting talks at the associated Haskell workshop such as a talk on Core,  GHC’s intermediate language. Lenses are an idea that also seem to be making their way into loads of functional languages.

Posted in Computers and Internet | Leave a comment