Stunned–I could hardly get going again

Stunning CSS3: Project-based Guide to the Latest in CSS by Zoe Mickley Gillenwater

I keep hearing about CSS 3 and HTML5, which seem to be offering a load of features aimed at making web pages more and more interactive – canvas for drawing, better layout with CSS and richer UIs with animations and transitions implemented in a declarative way without the need for programming in JavaScript.  This book comes at CSS3 from the point of view of the designer, illustrating the new features by means of a series of worked examples. I must admit that I didn’t actually work through the examples, but read the book as a means of getting familiar with the new CSS features.

Chapter one is a great introduction to the state of the world, clearly describing vendor prefixes, progressive enhancement and the state of browser support for CSS 3, with additional discussion of how to structure your CSS file to avoid slowing the browser down.

The next chapters consider the graphical effects that CSS 3 enables, including things such as rounded corners, and some tricks such as generating triangles using the borders of a zero size content div (see here for an example of what this can achieve). Along with the examples, the text describes the steps that can be taken to get the same effect (when possible) on downlevel browsers, often by using a JavaScript library to polyfill the missing functionality. We see examples of border-radius, transparent backgrounds, web fonts, text shadows and gradients, all of which are demonstrated by progressively enhancing a rather basic start page. There is also information on the new attribute selectors of CSS3, and some use of the new pseudo classes (for example nth-child or nth-child-of-type, and target for the ID matching a fragment identifier)  and pseudo elements (first-line, first-letter, before and after). I particularly liked the sections on transitions and animations, which can declaratively add some spectacular effects to a page. There is a good demo page for animations here and the specification here.

Chapter 6 discusses different screen size, illustrating the use of media queries to target CSS at particular types of device, and the final chapter discusses the new layouts of CSS 3 including the flexible box model.

I thought that reading this book was a really good way to catch up with the new features of CSS 3, and get a clear idea of how these features could be used for real. The examples are good and thorough (even though I didn’t download them and actually do the exercises) and the book is really well written. Brilliant!

Posted in Books | Leave a comment

Where I come from

Before Eric Lippert cancelled his blog series on tail calls in .NET, which started with this post on the 3rd June, the content of the post really brought home one very important point about call stacks – they show you where the code will go next, and don’t necessarily show you where you came from.

Just prior to this post, I’d been debugging a problem at work where all we had was a call stack that went through some third party code, and I’d been using Reflector to try to figure out the flow of the code. It was clear that there were some jumps in the stack trace, with the trace missing certain calls. In the past, I’d been led to believe that the C# compiler never outputs tail calls, and that any tail optimisation was the work of the 64 JIT compiler, which sometimes decides to emit a tail call in place of a standard call in release mode (whereas the 32 bit version of the JIT never does this). Eric’s original post confirmed this behaviour.

Of course, IL does have a tail call prefix that can be placed on call instructions, as a means of requesting that the JIT compiler does its best to generate a tail call, and this is indeed used by the F# compiler to get the kinds of stack space guarantees that users expect of a functional language – though the F# compiler also optimises some self tail calls into loops in the front end of the compiler so that it doesn’t need to rely on the JIT to carry out this optimisation.

You can see the difference in behaviour when running a program like the following:

class Program
{
    static void Main(string[] args)
    {
        Doit(0);
    }

    static void Doit(int x)
    {
        Console.WriteLine(x);
        Doit(x + 1);
    }
}

Running as Release/x86

171790
Process is terminated due to StackOverflowException.

Running as Release/x64 it keeps going. If you disassembly the code, you can see the jmp

0:003> .loadby sos clr
0:003> !Name2EE ConsoleApplication18.exe ConsoleApplication18.Program
  …
MethodTable: 000007fccfe23940
0:003> !DumpMT -md 000007fccfe23940
  …
000007fccfff4d50 000007fccfe23928    JIT ConsoleApplication18.Program.Doit(Int32)
0:003> !DumpMD 000007fccfe23928   
  …
CodeAddr:     000007fccfff4d50
0:003> u 000007fccfff4d50
000007fc`cfff4d50 53              push    rbx
000007fc`cfff4d51 4883ec20        sub     rsp,20h
000007fc`cfff4d55 8bd9            mov     ebx,ecx
000007fc`cfff4d57 baae000000      mov     edx,0AEh
  …
000007fc`cfff4dad ffc3            inc     ebx
000007fc`cfff4daf ebbf            jmp     000007fc`cfff4d70

This blog series looked like it was going to be informative and interesting and its a shame that Eric has cancelled it.

Posted in Computers and Internet | Leave a comment

No change here

The BCL immutable collections were released on Nuget some time ago, but I’ve only just got to play with them. There has recently been an accompanying Channel 9 interview discussing the implementation.

So called persistent collections have been the main data structure in the Clojure language for quite some time. Clojure emphasis functional paradigms, and its implementation of persistent datatypes offer good performance while keeping the benefits of no mutation – friendliness to concurrency and removal of the need for compensation actions if an exception is thrown when state is being modified.

Persistent trees seem to have become popular at Microsoft because of the Roslyn project. This project is a rewrite of the C# compiler in C#, and it is in this scenario that there is a need to efficiently modify large abstract syntax trees without copying, a scenario that perfectly matches the persistent types.

It’s easy to add the immutable collections to your project.
    Install-Package Microsoft.Bcl.Immutable –Pre

And then you can start making instances using the slightly strange pattern.
    var x = ImmutableList.Create<int>();

The pattern follows the example of the Tuple type in .NET 4. C# cannot infer the instantiation required for the type parameters of a generic class at the class level, but it can do so for method level type parameters. This syntax avoids the need to specify the type parameters in examples like:
    var x2 = ImmutableList.Create(1, 2, 3);

Using Reflector it is interesting to look at the implementation of the type itself which is based on a binary tree. Such an implementation makes it possible to frequently share most of the tree except nodes along the spine that is modified by an operation.

There are a few optimisations in the implementation. Immutable trees can be converted to mutable trees, which can implement some operations more efficiently. These can then be converted back into immutable trees, as in the following example.

var x = ImmutableList.Create<int>();
var y = x.AddRange(new [] {1,2,3,4});
var z = y.ToBuilder();
z.Add(10);
var z2 = z.ToImmutable();

The idea is that we can locally make things mutable, in order to improve efficiency, and then convert back to immutable before passing the value out of the local context. This kind of idea is used in F#, where lots of implementation is locally mutable, but non-locally things look very functional and are therefore easier to reason about.

The implementation of Sort, as shown by Reflector, pushes the contents into an Array, sorts the array and then reproduces an immutable tree.

internal ImmutableList<T>.Node Sort(int index, int count, IComparer<T> comparer)
{
    Requires.Range(index >= 0, “index”, null);
    Requires.Range(count >= 0, “count”, null);
    Requires.Argument((index + count) <= this.Count);
    Requires.NotNull<IComparer<T>>(comparer, “comparer”);
    T[] array = new T[this.Count];
    this.CopyTo(array);
    Array.Sort<T>(array, index, count, comparer);
    return ImmutableList<T>.Node.NodeTreeFromList(array.AsOrderedCollection<T>(), 0, this.Count);
}

In contrast, the Reverse operation, uses the representation,

internal ImmutableList<T>.Node Reverse(int index, int count)
{
    Requires.Range(index >= 0, “index”, null);
    Requires.Range(count >= 0, “count”, null);
    Requires.Range((index + count) <= this.Count, “index”, null);
    int num = index;
    for (int i = (index + count) – 1; num < i; i–)
    {
        T local = node[num];
        T local2 = node[i];
        node = node.ReplaceAt(i, local).ReplaceAt(num, local2);
        num++;
    }
    return node;
}

relying on the fact that ReplaceAt uses a Mutate operation which can work slightly more efficiently if the tree is mutable (ie the frozen property is set to false). This frozen properly requires that the whole tree is traversed in order to set the frozen property of each node in it.

private ImmutableList<T>.Node Mutate(ImmutableList<T>.Node left = null, ImmutableList<T>.Node right = null)
{
    if (this.frozen)
    {
        return new ImmutableList<T>.Node(this.key, left ?? this.left, right ?? this.right, false);
    }
    if (left != null)
    {
        this.left = left;
    }
    if (right != null)
    {
        this.right = right;
    }
    this.height = 1 + Math.Max(this.left.height, this.right.height);
    this.count = (1 + this.left.count) + this.right.count;
    return (ImmutableList<T>.Node) this;
}

To see how the tree gets rebalanced, look at the RotateLeft and RotateRight on the Node inner class in the ImmutableList<T> class.

The assembly also contains an immutable stack, which has Peek and Pop operations, and a pair of such stacks are used to implement immutable queues in the classic way (We push enqueued items on to one stack, and dequeue elements by popping – when there are no more elements to pop, we take the push stack and reverse it and then construct a new pop stack from the resulting elements). There are also DebuggerProxy implementations which allow Visual Studio’s visualizers to present a clearer view of the datatypes when debugging. They also throw in Dictionary, Set and Sorted Dictionaries giving a very useful set of datatypes which I’m dying to use for real.

Posted in Computers and Internet | Leave a comment

That’s too complex for me

P, NP and NP-Completeness: The basis of Computational Complexity by Oded Goldreich

I wanted a book that would take me through computational complexity in a fairly detailed mathematical way while still giving more of an idea of the motivation behind the various concepts. This book does just that! It is designed for teaching and at the start of each of the five chapters, there is a good overview discussing how the material can be taught, the motivating examples and what parts of the chapter could be left for a second reading. There’s an appendix at the end which gives a brief history of complexity theory, and the contributions of the three early discoverers – Cook, Karp and Levin.

At college I sat through the usual lectures on this subject, with its introduction of Turing machines, the demonstration that they have equivalent computational power to Register machines, and then the introduction of non-deterministic Turing machines and the class NP. This book very nearly leaves out non-deterministic Turing machines – it introduces the difference between verification (the class P) and proving (the class NP) via proof systems, and relates these two classes to the more modern search formulations via the PC and PF classes. I’d not realised before that while Cook was formulating P/NP via decision problems, Levin in the East was coming at the same ideas by way of search problems (which are the kind of problems you find more often day to day).

The book is brilliant. It discusses why polynomial time problems are considered tractable – most problems can be mapped between the different models of computation (say between one tape and two tape Turing machines) in a polynomial way, the philosophical implications of P and NP being the same, and makes a big noise about the existence of NP-Complete problems. I remember, when we studied this at university, being completely amazed that such a class exists. There is a lot of other material – the existence of problems in NP that aren’t NP-complete (assuming that P is not equal to NP as expected), and some discussion of promise problems and zero knowledge proofs. There are illuminating proofs of some NP-complete problems from first principles (using the target object to simulate a Turing machine), as well as examples of reducing one problem to another.

A fascinating book full of good discussion as well as a rigorous mathematical treatment of the subject area.

Posted in Books | Leave a comment

Stop darting about

The usual mass of interesting of talks from Google I/O. The V8 guys have done an amazing job at improving the performance of JavaScript over the years – this talk discusses the ideas that they recycled in order to make the massive leaps in performance. This case study discusses how the performance of a browser based application was improved using the logging that V8 provides – the material covered brings up a number of interesting issues (that I won’t reveal here). It’s hard to know if we’ve reaching the limits of optimisation for JavaScript, and whether we need to move to a language with semantics that make it easier to deal with. Some of DART’s benefits are covered here. Alternatively, we could just demand that runtimes optimise certain specific subsets of JavaScript.

There are also some good talks on optimising the performance of rendering – Jank free rendering in Chrome, Android graphics performance and web page design with the GPU in mind. All have the theme that we all need to understand more about how the GPU is being used to compose graphics layers.

Posted in Computers and Internet | Leave a comment

MLWorks–yes it does!

In 1989 I started my first full time programming job at a software company called Harlequin, based near Cambridge, UK. I’d had summer jobs and an industrial year before this, working for ICL in Manchester and had heard some really good things about a small software house of really smart people who were doing some great work on compilers and programming environments for the Common Lisp language. So I applied. After an interview, where the boss gave me and my bike a lift home, they offered me the job.

During my first year there, I first worked on the Common Lisp system, improving the performance of the system on various numerical benchmarks and adding some missing parts to the runtime system. After some time, I was offered the chance to work on an ML compiler that was being being written as a contract job for RSRE. There were three of us working on this – the runtime work was done by Colin who did the parser and type checker, while I wrote the lexer and the code generator. The target language was Ten15, a high level virtual machine that ran on the PERQ computer, and it was this technology that got me interested in virtual machines.

I left a year or so later to start a Phd at Cambridge, and when I returned to Harlequin the ML compiler was being rewritten to target SPARC, as the MLWorks project. I joined this team for a while, helping to finish off the runtime system and writing an initial sampling profiler. I later started to add CORBA support to the system. I left to join another team when offered the chance to port the LispWorks system to the DEC Alpha, which was fantastic opportunity to get deep knowledge about a platform (and code generators and link loaders and shared libraries and memory management and cross platform compilers).

The MLWorks system was developed over the years by various people and it is nice to see that the code has just been released as open source. I’m not quite sure how relevant the technology is now – this was in the days of single threaded runtimes and machines that were considerably slower than today’s, but it will be nice seeing the system build and get some use. I’ll just be digging through the change logs to see what code changes I made in those days.

Posted in Computers and Internet | Leave a comment

Windbg and the case of the unfreed garbage

Virtual machine technology has brought many benefits. The runtime can be much more featured than a standard runtime library of a language such as C, offering memory management, exceptions and thread management at a level that is easy to program, and in a manner that is much more cross platform. There are trade-offs at work here, as there are in all implementations. In particular, the garbage collector is implemented in user mode code (as is the whole CLR) and you get a complete garbage collectible heap in every process that is running managed code. Moreover, garbage collection is driven by allocation failure (or explicit user calls to the GC.Collect methods), waiting until there is no space in a generation before scheduling a collection, and using a number of heuristics in order to decide which of the generations should be collected. These heuristics also control when memory is returned back to the operating system. Typically, the larger the heap, the more efficient the garbage collection process, so memory can be held for longer than you might naively expect.

We’ve all used the task manager to look at the size of managed processes. When the process is blocked, no allocations means that no garbage collections are scheduled, and hence there is nothing which is going to release unused pages back to the operating system.

Of course it would be nice if the garbage collector could respond to the OS when memory gets low. But just does it do this? We might not have reflection in the unmanaged world, but a combination of symbols from a symbol server, and other tools makes it possible to investigate.

First, I knew that you can subscribe to the OS event that is raised when memory is running low via the function CreateMemoryResourceNotification. The first thing to check is whether the standard dlls that make up the CLR import this function from the OS interface libraries. Doing dumbin on clr and some other key dlls

   dumpbin c:\windows\Microsoft.NET\Framework\v4.0.30319\clr.dll /imports

listed imports for a number of libraries, but not this function.

Right, time to get dynamic. I wrote a quick console application that blocked inside Console.ReadLine() and then started it under WinDbg. Setting things up to use the Microsoft symbol server, and then using the examine command to find the symbol

  .symfix c:\localsymbols
  x *!*CreateMemoryResource*

found the symbol, so I could then breakpoint it.

   74870b4e          KERNELBASE!CreateMemoryResourceNotification
   bp KERNELBASE!CreateMemoryResourceNotification

We hit the symbol and I stepped up. This left the stack looking like this.

0:000> k
ChildEBP RetAddr 
00dbfa14 64773428 clr!WKS::gc_heap::initialize_gc+0xb0
00dbfa1c 6475e0da clr!WKS::GCHeap::Initialize+0x2f
00dbfa24 6476ae02 clr!ExecuteDLL+0x36d
00dbfb9c 6475ba2f clr!EEStartupHelper+0×846
00dbfbe4 647744fe clr!EEStartup+0x1e
00dbfc28 64801f39 clr!EnsureEEStarted+0xea
00dbfc68 64804162 clr!_CorExeMainInternal+0x8f
00dbfca4 71fff5a3 clr!_CorExeMain+0x4d
00dbfce0 72077f16 mscoreei!_CorExeMain+0x10a
00dbfcf0 72074de3 MSCOREE!ShellShim__CorExeMain+0×99
00dbfcf8 74e68543 MSCOREE!_CorExeMain_Exported+0×8
00dbfd04 7725bf39 KERNEL32!BaseThreadInitThunk+0xe
00dbfd48 7725bf0c ntdll!__RtlUserThreadStart+0×72
00dbfd60 00000000 ntdll!_RtlUserThreadStart+0x1b

The return register contains a handle

0:000> r eax
eax=00000120
0:000> !handle 120 f
Handle 120
  Type             Event
  Attributes       0×10
  GrantedAccess    0×100001:
         Synch
         QueryState
  HandleCount      22
  PointerCount     4573373
  Name             \KernelObjects\LowMemoryCondition
  Object Specific Information
    Event Type Manual Reset
    Event is Waiting

So the question is what waits on the handle? If we continue the application, we get to a point where there are only two managed threads running.

0:004> .loadby sos clr
0:004> !Threads
ThreadCount:      2
UnstartedThread:  0
BackgroundThread: 1
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                         Lock 
       ID OSID ThreadOBJ    State GC Mode     GC Alloc Context  Domain   Count Apt Exception
   0    1 2210 0119d608     2a020 Preemptive  02E34188:00000000 011655b0 1     MTA
   2    2 23c0 0116d458     2b220 Preemptive  00000000:00000000 011655b0 0     MTA (Finalizer)

The main thread is waiting inside a call to Console.Readline, The finalizer thread would be a potential thread that might wait on our event. After all, it spends most of its time waiting. So we can switch to it.

0:004> ~2s
eax=00000000 ebx=64cdf388 ecx=00000000 edx=00000000 esi=02cff850 edi=00000000
eip=7722e1a4 esp=02cff728 ebp=02cff8a8 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
ntdll!NtWaitForMultipleObjects+0xc:
7722e1a4 c21400          ret     14h
0:002> k
ChildEBP RetAddr 
02cff724 7484c752 ntdll!NtWaitForMultipleObjects+0xc
02cff8a8 64787bc5 KERNELBASE!WaitForMultipleObjectsEx+0x10b
02cff8f4 647885f6 clr!WKS::WaitForFinalizerEvent+0xbe
02cff934 646e9843 clr!WKS::GCHeap::FinalizerThreadWorker+0x6e
02cff9c4 646e98fd clr!REGUTIL::EnvGetString+0xfc
02cff9d0 646e998a clr!SHash<StringSHashTraits<_ConfigStringKeyValuePair,unsigned short,CaseSensitiveStringCompareHash<unsigned short> > >::Lookup+0×11
02cffa58 647741c6 clr!EEConfig::GetConfiguration_DontUse_+0x1b0
02cffad4 647ca0c1 clr!WKS::GCHeap::FinalizerThreadStart+0×198
02cffb6c 74e68543 clr!Thread::intermediateThreadProc+0x4d
02cffb78 7725bf39 KERNEL32!BaseThreadInitThunk+0xe
02cffbbc 7725bf0c ntdll!__RtlUserThreadStart+0×72
02cffbd4 00000000 ntdll!_RtlUserThreadStart+0x1b

Now dump the stack to try to find parameters to the function we just returned from.

0:002> dd esp
02cff728  7484c752 00000003 64cdf388 00000001
02cff738  00000000 00000000 525d7525 011a6f40
02cff748  00000000 74e5002e 00000004 00000018
02cff758  40000060 00000000 7484ee9c 767f3b30
02cff768  00000000 0000020c 00000000 00000004
02cff778  64735f28 64735f28 7681301e 011b2ad0
02cff788  011b2abc 7722dc34 74841129 00000158
02cff798  00000000 74841151 525d7a75 000007d0

Looking at the values pointed to by the third item on the stack, it looks like a series of handles, the first of which is the handle to the notification object.

0:002> dd 64cdf388
64cdf388  00000120 00000158 000001b8 011a6f20
64cdf398  00000000 00000000 6475d2f4 00000005

So it looks like the handles are passed into the kernel wait method, and we need to see what happens to the return value from WaitForMultipleObjectsEx. I happen to know that it returns the index of the handle that was signalled, so we need to see what happens if the value zero is returned.

0:002> u 64787bc5
clr!WKS::WaitForFinalizerEvent+0xbe:
64787bc5 03c6            add     eax,esi
64787bc7 83e800          sub     eax,0
64787bca 0f844b451d00    je      clr!WKS::WaitForFinalizerEvent+0x12a (6495c11b)
64787bd0 83e802          sub     eax,2
64787bd3 7589            jne     clr!WKS::WaitForFinalizerEvent+0×292 (64787b5e)
64787bd5 33f6            xor     esi,esi
64787bd7 c7442414ffffffff mov     dword ptr [esp+14h],0FFFFFFFFh
64787bdf 89742418        mov     dword ptr [esp+18h],esi

We test and branch if the result is zero to the following code.

0:002> u 6495c11b
clr!WKS::WaitForFinalizerEvent+0x12a:
6495c11b 8b0d18f6cd64    mov     ecx,dword ptr [clr!GCHeap::FinalizerThread (64cdf618)]
6495c121 c7410801000000  mov     dword ptr [ecx+8],1
6495c128 833d84f4cd6400  cmp     dword ptr [clr!g_TrapReturningThreads (64cdf484)],0
6495c12f 7405            je      clr!WKS::WaitForFinalizerEvent+0×145 (6495c136)
6495c131 e8d324efff      call    clr!Thread::RareDisablePreemptiveGC (6484e609)
6495c136 8b0d88f4cd64    mov     ecx,dword ptr [clr!g_pGCHeap (64cdf488)]
6495c13c 6a02            push    2
6495c13e 8b01            mov     eax,dword ptr [ecx]
6495c140 6a01            push    1
6495c142 6a00            push    0
6495c144 ff5064          call    dword ptr [eax+64h]
6495c147 8b0d18f6cd64    mov     ecx,dword ptr [clr!GCHeap::FinalizerThread (64cdf618)]
6495c14d c7410800000000  mov     dword ptr [ecx+8],0
6495c154 8b4104          mov     eax,dword ptr [ecx+4]
6495c157 a85f            test    al,5Fh
6495c159 7405            je      clr!WKS::WaitForFinalizerEvent+0x16f (6495c160)

The target address is fetched by indirection, so we have to look at the relevant memory locations.

0:002> dd 64cdf488
64cdf488  0116c760 64ce6d70 646b1cec 00000000

0:002> dd 0116c760
0116c760  6475e0e8 abababab abababab feeefeee

0:002> dd 6475e0e8 +64h
6475e14c  6478c77f 64794b2c 64a09013 647877ac

And so we arrive at a function named GarbageCollect, which probably does what it says on the can.

0:002> u 6478c77f
clr!WKS::GCHeap::GarbageCollect:
6478c77f 55              push    ebp
6478c780 8bec            mov     ebp,esp
6478c782 51              push    ecx
6478c783 53              push    ebx
6478c784 56              push    esi
6478c785 8b7508          mov     esi,dword ptr [ebp+8]
6478c788 57              push    edi
6478c789 894dfc          mov     dword ptr [ebp-4],ecx

In summary then, low memory notifications cause our process to do a garbage collection, presumably with the intention of releasing memory back to the system. Though we are missing metadata, as long as we don’t need to go too deep into the code of an unmanaged method, it is possible to figure out what is going on.

Posted in Computers and Internet | Leave a comment