Stack allocated closures make it into C#

Back in the days when I worked on Lisp compilers, I remember adding stack allocation of the closure records when calling local functions. In C# 7, we now have local functions and it is interesting to look at the optimisations that are applied for these.

Just for a base line let’s have a quick look at the implementation of lambda expressions which close over variables the current method. The implementation, which has been around for a long time, re-homes the locals into a heap allocated (Display) class. This extends the lifetime of the variables allowing the reference from the lambda expression to govern their lifetime.

        static Func<int,int,int> Check(int a, int b)
        {
            return (x, y) => (x + y + a + b);
        }

This is converted into code that as the following form. “a” and “b” have been re-homed into the heap allocated instance.

private static Func<int, int, int> Check(int a, int b)
{
    <>c__DisplayClass1_0 class_ = new <>c__DisplayClass1_0();
    class_.a = a;
    class_.b = b;
    return new Func<int, int, int>(class_.b__0);
}

The DisplayClas has the following definition, where we see the fields corresponding the captured variable, the definition of the lambda method is encoded into this class too.

[CompilerGenerated]
private sealed class <>c__DisplayClass1_0
{
    public int a;
    public int b;

    internal int b__0(int x, int y)
    {
        return (((x + y) + this.a) + this.b);
    }
}

Local functions take us to code that has the following form.

        static Func<int, int, int> Check2(int a, int b)
        {
            return Local;
            
            int Local(int x, int y)
            {
                return (x + y + a + b);
            }
        }

This is code generated slightly differently,

private static Func<int, int, int> Check2(int a, int b)
{
    <>c__DisplayClass2_0 class_ = new <>c__DisplayClass2_0();
    class_.a = a;
    class_.b = b;
    return new Func<int, int, int>(class_.g__Local|0);
}

We have the same style of DisplayClass, with the body of the local added as a method (as expected).

[CompilerGenerated]
private sealed class <>c__DisplayClass2_0
{
    public int a;
    public int b;

    internal int g__Local|0(int x, int y)
    {
        return (((x + y) + this.a) + this.b);
    }
}

However, there are now more optimisation possibilities. First, if the local function is scoped to the method in which it is defined, then it would be good to avoid the heap allocation.

        static int Check3(int a, int b)
        {
            return Local(1,2) + Local(3,4);

            int Local(int x, int y)
            {
                return (x + y + a + b);
            }
        }

This is indeed what happens.

private static int Check3(int a, int b)
{
    <>c__DisplayClass3_0 class_;
    class_.a = a;
    class_.b = b;
    return (g__Local|3_0(1, 2, ref class_) + g__Local|3_0(3, 4, ref class_));
}

The DisplayClass has been optimised to a struct

[CompilerGenerated]
private struct <>c__DisplayClass3_0
{
    public int a;
    public int b;
}

and the body has been added as a method into the class in which contains the method containing the local

[CompilerGenerated]
internal static int g__Local|3_0(int x, int y, ref <>c__DisplayClass3_0 class_Ref1)
{
    return (((x + y) + class_Ref1.a) + class_Ref1.b);
}

The compiler has essentially noticed that the local method cannot escape from the method that uses it, and hence we can try to avoid the heap allocation.

We should also quickly look at the case where the local method doesn’t capture any locals.

        static int Check4(int a, int b)
        {
            return Local(1, 2) + Local(3, 4);

            int Local(int x, int y)
            {
                return (x + y);
            }
        }

In this case, the method compiles to the following

private static int Check4(int a, int b)
{
    return (g__Local|4_0(1, 2) + g__Local|4_0(3, 4));
}

and the local method is simply defined as a static method int he defining class

[CompilerGenerated]
internal static int g__Local|4_0(int x, int y)
{
    return (x + y);
}

While we are here we could quickly cover one memory management gotcha around closures and their implementation.

        static (Func<int,int,int>, Func<int,int,int>) Check(int a, int b)
        {
            return ((x, y) => (x + y + a), (x, y) => (x + y + b));
        }

The implementation decides to put the local variables into a single DisplayCLass

private static ValueTuple<Func<int, int, int>, Func<int, int, int>> Check(int a, int b)
{
    <>c__DisplayClass1_0 class_ = new <>c__DisplayClass1_0();
    class_.a = a;
    class_.b = b;
    return new ValueTuple<Func<int, int, int>, Func<int, int, int>>(new Func<int, int, int>(class_.b__0), new Func<int, int, int>(class_.b__1));
}

This means that if either of the returned lambda expressions is alive (from the pint of the view of the GC), then the variables “a” and “b” are still alive. This might not seem to matter too much, but if “a” and “b” were large objects (for example), it does mean that their lifetime can be extended further than you might expect.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s