C# fun with structs

I’ve just spent a while trying to catch up with all the changes to valuetypes in C# 7.x. As someone who doesn’t use the struct keyword in C# very often at all, it is quite amazing all of the changes to C# around this area. This talk is a very good summary.

The key point about structs is that they allow you to avoid some of the heap allocation that can be very bad for certain types of application. In the recent versions of C# the language has changed to allow the user to maintain a pointer to a struct, and hence use that rather than a copy of the original struct.

The classic example is something like the following (where we use a local function to make the example easier to read).

  var data = new[] {1, 2, 3, 4, 5};
  ref int second = ref GetItem(data);
  second = 5;

  ref int GetItem(int[] incoming)
  {
    ref int x = ref incoming[2];
    return ref x;
  }

In the example, the local variable, second, is an alias to the item in the data array, and so any change to either can be seen when viewing the data via the other item. The first thing one notices is the addition of a lot of “ref” keywords to make it clear that the local variable is holding an alias, and that the result of the method call is an alias to something else. It’s a shame that there was a better syntax for this.

There are two other things that have happened in this area. The use of the “in” modifier and the ability to define a struct as readonly.

In the classic struct as a parameter case, the struct is copied on the way in to the method. Hence, for the example struct,

struct A
{
  public int _x;
  public void Increment() => _x++;
}

we get the behaviour.

  A a = new A();
  Increment(a);

  void Increment(A x)
  {
    x.Increment();
    x.Increment();
    Debug.Assert(x._x == 2);
  }

  Debug.Assert(a._x == 0);

We can avoid the copy using ref, but this then affects the caller.

  A a = new A();
  Increment(ref a);

  void Increment(ref A x)
  {
    x.Increment();
    x.Increment();
    Debug.Assert(x._x == 2);
  }

  Debug.Assert(a._x == 2);

We can pass the argument as an “in” parameter, which stops it being copied on entry to the method.

  A a = new A();
  Increment(in a);

  void Increment(in A x)
  {
    x.Increment();
    x.Increment();
    Debug.Assert(x._x == 0);
  }

  Debug.Assert(a._x == 0);

Now, of course, we have to answer the question: how is it that we don’t see the parameter x being changed after the call to x.Increment. The answer is that the compiler takes a defensive copy when it makes the call. You can see this in the IL that is generated (and this is all covered really well by this blog post).

In the above code, the IL for the invocation of Increment is

    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: ldobj ConsoleApp15.Program/A
    L_0007: stloc.0 
    L_0008: ldloca.s a
    L_000a: call instance void ConsoleApp15.Program/A::Increment()
    L_000f: nop 
    L_0010: ldarg.0 
    L_0011: ldobj ConsoleApp15.Program/A
    L_0016: stloc.0 
    L_0017: ldloca.s a
    L_0019: call instance void ConsoleApp15.Program/A::Increment()
    L_001e: nop 

Changing the definition of A to

readonly struct A
{
  public readonly int _x;
  public void Increment() => Console.WriteLine(_x);
}

the compiler notices that the copy can be avoided and hence the IL changes to the more expected

    L_0000: nop 
    L_0001: ldarg.0 
    L_0002: call instance void ConsoleApp15.Program/A::Increment()

It all goes to show that valuetypes are a bit confusing in C#, and it is really hard to know whether you need to optimise them. You really need to do performance measurements to tell whether extra copying is really going to make a difference to the performance of your application.

I got interested in the struct related changes after looking into the implementation of Span, which is implemented as a “ref struct”. Span is a powerful new abstraction over data types such as arrays, and allows you to slice them without copying, and without performance sapping view types. To implement such a thing, the view, the Span instance, needs to be stack allocated and guaranteed to have a dynamic scope that causes it to be de-allocated before the stack frame is unwound – this is a new idea for the CLR which has never really guaranteed that stack allocated things were different before.

You can play with Span using the pre-release System.Memory Nuget package.

            var data = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
            var subdata = data.AsSpan().Slice(3, 4);

The subdata item is a Span, and looking at that in the debugger’s raw view, shows you that it has the fields:

   _byteOffset 0x00000010 System.IntPtr
   _length 4 int 
   _pinnable {int[9]} System.Pinnable 

This efficient implementation, means that we require the Span object is confined to the stack, which is all covered in this proposal document. Span (and the heap allocated version, Memory) are likely to make their way into many framework classes in the coming releases because of the amount that they can reduce allocation in cases where data is pulled from a buffer. The System.Memory package is .NET Standard and so is already available to run on a large number of platforms.

Advertisements
Posted in Uncategorized | Leave a comment

Not sure I love it, but I do understand it better

There was a recent talk at NDC entitled How to stop worrying and love msbuild by Daniel Plaisted. It was an interesting talk that discussed the history of the change to the csproj file to make it a lot smaller and tidier. The new format makes the project file look like the following, which is a massive contrast to the old multi-line mess.

  
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.0</TargetFramework>
  </PropertyGroup>
</Project>

Of course, we have to ask where all of the existing XML has gone. The build system needs to get it from somewhere, and in the other parts of the talk the speaker discusses various debugging tools that give you a better idea of what is going on.

In the past I’ve always found it really hard to debug msbuild proj files. There seem to be masses of files that get imported and the trick has always seemed to be to pick a suitable target and then work from that. Using verbose logging you can track the actions of the build system as it does its work, and then work from that.

The first trick that the talk mentions is to use the pre-processor to get all of the content into a single text file.

msbuild /pp:out.txt

That works really well. Our small project file above expands to around 10000 lines of text, which is commented so that you can see the various imports and what those imports contain. It’s really interesting looking through it to see what the build system defines.

To understand the dynamic side of things, there is a new binary logging format that the build system can dump. You can then load this into a tool to search through the execution of the build.

msbuild /bl

The structured log viewer tool makes it really easy to find your way around the build. You can search for a text string, which makes it really easy to find tasks and properties, and there is a timeline that tells you went and how long the various targets took to execute. It is fascinating to see how much work the system does before it actually calls the CSC task to compile a single file of C#.

I also notice that the documentation about msbuild has got better. A good starting point is the page that talks about the difference between Items and Properties.

msbuild has always felt like the kind of technology I would love. I like programming systems with a small core language (so msbuild has items/properties and targets and tasks) and which then add abstractions on this small core. This kind of approach leaves you needing to understand only the small core language and allows you to explore into the abstractions that have been built up around it. For many Lisp and Smalltalk systems it is this exploration phase that it made easy by the tooling (the development environment) and this has always felt like the part that the msbuild ecosystem was missing. Maybe, these new tools are going to take this pain away at long last, though it still seems to be missing a way to actually single step though the execution of a build in some kind of build debugger which would allow the user to dynamically change things to see the effects. [Apparently you could at one point debug the build scripts using Visual Studio though the debug flag doesn’t seem to be available any more]

Posted in Uncategorized | Leave a comment

Some recent C# learnings

There are a couple of C# related things that I’ve come across recently, and also some interesting C# related talks from NDC which I’ll detail below.

The first interesting thing I came across is that the names of your method parameters are semantically meaningful. I should probably have realised this in the past, but didn’t until someone at work showed me how to stop an ambiguous reference.

The example was something like:

interface IA { }
interface IB { }
class C : IA, IB { }

static void Method(IA a) { }  // set up some methods are equally good
static void Method(IB b) { }

static void Main()
{
   Method(new C());
}

The call to Method in Main cannot be resolved to either of the possible overloads. However, if you change it to

Method(a: new C());

or

Method(b: new C());

then the call is no longer ambiguous.

I’d obviously realised in the past that you had to be careful with optionals as they are inlined, potentially across assemblies, but I had never seen overload resolution changed by naming the parameters.

The second observation is to do with inlining in a Parallel.ForEach. I had some code that simplifies to roughly the following.

static void Main(string[] args)
{
  void Recurse(int x) =>
    Parallel.ForEach(new int[] { x}, Recurse);
  Recurse(1);
}

In the real code I’d added some code to try to check that we didn’t end up calling a Parallel.ForEach in the context of another Parallel.ForEach using a ThreadStatic variable to record the context on the thread.

If you run the above code with a breakpoint on the Recurse method, you’ll see it using the stack of where you first run it, due to the system trying to run the call inline [see TaskScheduler.TryRunInline]. However, rather than eventually getting a StackOverflowException, the computation is eventually moved to a different thread.

Poking around in Reflector you can find the class System.Threading.Tasks.StackGuard which has a method named CheckForSufficientStack which uses Win32 methods to see how much stack space is left on the current thread. If there isn’t enough stack space then we don’t inline the recursive call, but instead move to another thread.

I’ve also seen some good C# talks: C# 7.1 and 7.2 from NDC, high performance C# code from experience working on Bing, msbuild, which is a great introduction and covers the history of the transition to json project files and back,
what is .NET Standard?, more C# 7 related performance changes
and there is also a discussion of the future of Rx and the associated Reactor project
Continue reading

Posted in Uncategorized | Leave a comment

Some what types of reflection are there?

I can’t remember how it came up in the conversation, but I remembered the other day the famous paper by Brian Smith that introduced the idea of the reflective tower of interpreters. The paper is a little hard to read these days, but does try to explain the notion of structural and behavioural reflection in programming languages.

In order to try to understand a little more deeply, I turned to two other Scheme related papers for help. The first talks about reflection and this is extended in the second paper to give a different semantics behind the reflective tower. There is a slightly more modern implementation of the tower in Common Lisp, and some good explanations in this paper.

I also came across a number of related papers – a general discussion of reflection, and a discussion of how to allow compiled methods in this reflective tower world. A lot of this more recent work was also connected to multi-stage programming and partial evaluation

Posted in Uncategorized | Leave a comment

How do you nest your virtualization?

I was looking through some of the YouTube talks from Ignite, when I came across this interesting talk on nested virtualization in Hyper-V. Since September you have been able to provision virtual machines on Azure which support nested virtualization. This is obviously a very powerful feature, and enables many scenarios (such as testing) which you couldn’t easily do before.

This made me start thinking about how you get nested virtualization to work on other platforms such as AWS. I’d come across virtualization using binary translation in the past (as that was the way that VMWare did its thing back in the day), and came across this fairly recent paper that talks about the method. The resulting virtualization which can run in a cloud environment is covered in the paper.

That then leads to the question of whether software implementation can compare with the hardware assisted virtualization, and there are some papers such as this one that study the problem. Hardware support on Intel requires some extensions, the so called VT-X extensions, which are available on more modern processors and which make things a lot easier for the implementation.

Posted in Uncategorized | Leave a comment

Bitcoin – where does it go next?

Attack of the 50 Foot Blockchain: Bitcoin, Blockchain, Ethereum & Smart Contracts by David Gerard

I did a fair bit of reading about BitCoin in the past (after my interest had been piqued by the Coursera course on the topic), and have spent some time following the various newsgroups and issues, but have been troubled about whether Bitcoin stands at chance of succeeding in the real world.

This book is really good. It takes a massively anti-Bitcoin and associated technologies stance, and puts forward really good arguments about why Bitcoin is a massive fad.  As usual the truth is probably somewhere in the middle, but the author’s arguments about the troubles of scaling Bitcoin really make it seem useless – it can handle 7 transactions a second compared to Visa’s 50000, and many vendors gave up with it because of lack of interest, never mind the fact that it takes so long to verify a transaction that it is an impractical method of buying things for many types of transaction.

The author also gives some examples of where smart contracts have turned out to be anything but smart. He points out that legal contracts all suffer from interpretation and the regular need for arbitration, and so any kind of contract whose meaning is defined by a segment of code is never going to work at the edges where the contract depends on inputs from the real world.

There is also a good set of arguments as to why private blockchains fail to hit the mark. The current Blockchain burns as much power of Ireland, with much of the proof of work being used to randomise the miner who controls the transaction. Once you move to a private blockchain, you regain the centralisation which was one of the main selling points of blockchains, so you might just as well go back to a database instead… indeed, lots of people do not want their transactions listed in detail on a public medium so any kind of global leger is unlikely to gain traction.

In summary, a thought provoking, short read. Like many things, the technology is clever and it is really a question as to whether there is a place for it in the real world.

Posted in Uncategorized | 1 Comment

Some books since last time

For some reason I just haven’t got around to blogging for a long while, but fortunately I have had time to read a fair number of computing related books which I thought I would type up here. There are a couple of management related books thrown in for good measure.

Managing Humans by Michael Lopp

This is a collection of stories about the author’s experiences managing development teams. A fun, humorous read, which made it even clearer that management is a lot about applying common sense to a range of activities.

Troubleshooting with the Windows Sysinternals Tools by Mark Russinovich and Aaron Margosis

The sysinternals tools are amazing, with specific tools offering information that would be hard to dig out yourself. This book takes you through the various tools one-by-one and tells you many of the lesser known features of the tool. The book contains a large section of case studies on how the tools were used to diagnose a wide range of problems.

Smalltalk-80 Bits of history, words of advice by Glenn Krasner

I remember how much I enjoyed this book when I first read it 30 years ago. Lots of chapters written by different people on topics around the early Smalltalk systems, including details about implementations, porting efforts to get the standard image to run on diverse sets of hardware, and discussions of improvements to the system for the future.

Understanding Computation by Tom Stuart

This book uses implementation in Ruby as a way of understanding computation from formal semantics to automata theory to Turing machines. And you get to learn some Ruby along the way. I really enjoyed this book. The writing is interesting, and you really don’t ever to understand something until you implement it.

The One Device – the secret history of the iPhone by Brian Merchant

A very interesting read on the history of the iPhone. The book has various chapters on the various components, such as the battery and the screen, giving lots of interesting background about each area.

Hit Refresh – The quest to Rediscover Microsoft’s Soul and Imagine a Better Future For Everyone by Satya Nadella

An interesting read, part-autobiographical about Microsoft’s CEO. We learn some details about Nadella’s early years and the groups that he worked with when he joined Microsoft. There are chapters that talk about where Microsoft is going in the future – Mixed reality, artificial intelligence and quantum computing. I’m not sure I learned as much from the book as I had hoped, but worth a quick read.

Debugging Applications for Microsoft .NET and Microsoft Windows by John Robbins

This book is now fairly old and I was lucky to find a copy for one pound in a charity shop. Some good advice on debugging, and some nice debugging war stories, from a renowned conference speaker. Some of the .NET related material is a little out of date, but there’s still enough information to make this a fun read.

Developing Windows NT Device Drivers by Edward Dekker and Joseph Newcomer

Again a fairly old book that you can pick up quite cheaply. Lots of insights into the world of device drivers, and along the way a lot of insights into the implementation of the windows operating system. A very good read.

Type-driven Development with Idris by Edwin Brady

There are more and more discussions on dependent types on the various news feeds that I subscribe to. Idris is a Haskell -like language designed for type driven development, a system where the dependent types can be used to specify the desired solution, and the system can attempt to do some primitive theorem proving to verify the dependent type constraints. The book is a good introduction to the idea and to the language itself.

 

 

 

Posted in Uncategorized | Leave a comment