Pro .NET Performance by Sasha Goldshtein with Dima Zurbalev and Ido Flatow
This is a really excellent book if you want to understand what is happening at the lower levels of the CLR and the base class library. It contains lots of information that I have only previously seen spread across lots of disparate blog posts and organises it into chapters that focus on individual performance concerns. Better still, it contains lots of information that I haven’t seen before – some of it comes from the author’s experience of the BCL and some comes from experiments that he has carried out on the various versions of the CLR. The author isn’t afraid to get down into the actual assembly code when discussing some of the issues (such as the implementation of virtual dispatch), and has chosen x86 assembly code as the means of doing this. (This is fine by me, as x86 is a language I’ve programmed in for a long time, but it might have been nice to discuss the x64 architecture at some point).
Chapters 1 and 2 of the book set the scenario with a discussion of performance metrics (what we are actually trying to measure) and a look at some of the tools that can be used to do this measurement.These chapters contain a great introduction to ETW and the xperf tool that can be used to capture the logging information, following this with a look at the Visual Studio performance and memory allocation profilers, the concurrency profiler and the profiling tools available from Red Gate.These chapters stress the need for accurate measurement, and spend a little time dissecting a micro-benchmark which leads to a wrong conclusion owing to the optimisations that are carried out by the JIT.
Chapter 3, Type Internals, covers the in-memory layout of the data structures that the CLR uses at runtime, including the layout of the method tables, the management of sync blocks which are the underlying objects used for locking objects, and the use of value types to avoid the allocations that happen because of boxing. There’s a good discussion of the IL that is generated when you make a call on a value type via an interface, and how you can use generics to avoid the boxing that would normally happen when you do this – ie why there is boxing in the first call to Boo in the following code, but not in the second.
The IL for Main looks like the following; notice the use of the box before the first call.
The important point to notice is the use of the constrained IL instruction in the MakeCall static method.
Boxing call be a killer for some applications and is worth being aware of. The author shows the more modern ways to do object equality using the IEquatable<> interface which allows a non-boxing call to the equality method and contrasts this with the standard Object.Equals which requires a boxing of the argument (as it is on type object). In a later chapter he completes the discussion of how generic types like List<> use an EqualityComparer<> class to general a non-boxing equality when the type supports IEquatable<>.
They do this by offering a Default property which caches the equality comparison.
This equality comparer checks to see what interfaces the type supports, and if it supports IEquatable<>,
then the trick above is used to avoid the boxing.
Chapter four is a brilliant introduction to Garbage Collection and the variations of it that are implemented in the CLR. It covers the use of generations and how these are mapped to segments in virtual memory, the need for the threads to be brought to safe points before the garbage collection can occur, the costs of pinning, and gives really good coverage of the many flavours of garbage collection that the CLR offers. There is also discussion of the large object area, the effects that allocating large objects can have on performance and how you might use unmanaged memory instead of heap memory to avoid some of the associated overheads. There is also a brief mention of object pooling and a pointer to the System.ServiceModel.Channels.BufferManager abstract class which implements the facade for such a pool infrastructure for use by WPF.
Chapter five covers generics and the collection types, looking in some detail at the performance of the various collection types and contrasting C# generics with C++ templates and Java generics. There are a couple of good pages that look at cache considerations, and which point out that iterating through an array in the wrong way can be vastly inefficient because the code ends up working against the cache.
Chapter six looks at concurrency and parallelism as a means of improving performance. It starts off by considering the use of the ThreadPool and TPL Tasks, moving on to Parallel.For and Parallel.ForEach and PLINQ. The chapter then moves on to memory models, the in-built windows synchronisation mechanisms and more discussion of caches and false sharing and its performance problems. The chapter finishes off by looking at GPU computing, a means to get dramatic performance improvements for certain kinds of algorithms.
Chapter seven is a set of paragraphs describing various ideas around asynchronous IO, IO completion ports, which are supported by the ThreadPool.BindHandle mechanism, and the .NET Thread Pool.
The discussion moves on to various problems in the area of socket IO, including message chunking, chatty protocols and the costs of (de)serialization. The configuration parameters for WCF are also covered.
Chapter eight is a brilliant set of observations and comments about unsafe code and interoperability with the unmanaged world, via native code and via COM. It goes down to a very low level, discussing the costs of P/Invoke in great detail, including some material on marshaller stubs that I haven’t seen before. This was a very good chapter and very interesting, giving a good feeling for the overhead of using unmanaged calls in your .NET application.
Chapter nine looks at algorithms and their costs covering a few well known algorithms.
Chapter ten covers a miscellaneous set of other optimisations around the JIT, looking at the types of optimisations the JIT carries out , when range checking is eliminated for example, though this revolves around the author’s experiments as the behaviour isn’t documented by Microsoft. There’s then some clever material on processor specific optimisation, looking at certain instructions that are available on some processor variants (such the SIMD extensions to the x86 instruction set), with examples showing how .NET code can call into an assembled series of bytes via a delegate generated using Marshal.GetDelegateForFunctionPointer. The chapter finishes with a look at Reflection and code generation, and the TypedReference type.
Chapter eleven looks at a series of optimisations for ASP.NET web applications.
The book is really good indeed, covering all of the topics in enough detail to give the reader a good understanding, and offering pointers to more material. The coverage of tools for measuring the performance is brilliant, particularly the ETW material, and I loved the low level details about interoperating with unmanaged code. If you like low level details about the CLR, the book is a must read.