Designing system for scalability

I’ve been doing some reading on designing systems for scalability, and I thought I could quickly post some of the useful YouTube videos that I have found. There are numerous system design problems and solutions that have videos on YouTube, but I haven’t included the ones that I have watched.

Eventually I came across this video on system design, that actually gives a good list of the various technologies that are used in some of the most scalable applications available today.

This is an introduction to how Twitter is implemented, and mentions ideas like fanning-out to Redis and Memcached. There are videos about Facebook and Instagram

The choice of database is obviously important, and it is useful to understand the in-memory databases like Redis. Transactions also come up, via myths and surprises, and how the transaction levels relate to the CAP theorem.

Uber deal with some of the reliability data by storing data on their drivers’ mobile phones.

GraphQL came up several times as an alternative to REST APIs. It often requires fewer round trips, and makes tool support easy by using a schema. There is an introduction here and the coding of a server (which explains what you can do about the N+1 problem using an online demo system).

There is a good general talk about lessons learned here.

I had heard about Bloom Filters before, but hadn’t come across the Count-min sketch algorithm

 

Advertisements
Posted in Uncategorized | Leave a comment

Kubernetes is the winner

Kubernetes: Up & Running by Kelsey Hightower, Brendan Burns and Joe Beda

Everywhere you go these days, it’s all about containers and how they should be orchestrated. Software Engineering Daily had a great series about several container management systems, and so it was time to get the book about Kubernetes, by several of the founders of the project. There is recent blog post on the history of the project here.

The book itself is really good. It explains the need for an orchestration framework, and demonstrates the various parts of the Kubernetes system. It starts by showing you how to deploy a Kubernetes cluster and works through the use of the kubectl commands. It moves on to explain pods, and the labels and annotations that you can attach to  the containers that are being managed. This is very hands on, working against a demonstration container that the authors have made available.

The following chapters cover service discovery, Replicasets, Daemonsets, Jobs and ConfigMaps and then there is a chapter that covers deployments and upgrades. The last two chapters cover how you integrate storage with your applications and how to deploy some real world applications.

The book, as you would expect, covers the material really well. If you want to try the material out on the Azure cloud, the Azure documentation contains some worked tutorials.

If you need to understand Docker a little better, then I found this post useful. Ben Hall also did a recent talk on other container technologies. A competing idea is serverless, and there is a recent paper that looks at the implementation behind this for the three major cloud platforms.

 

Posted in Uncategorized | Leave a comment

What are micro-tasks in the browser all about?

I gave a quick lightning talk at work about micro-tasks in the browser, based on a recent talk by Jake Archibald. The slides are available here.

Posted in Uncategorized | Leave a comment

Let’s get started with Docker

Essential Docker for ASP.NET Core MVC by Adam Freeman

We are allowed to spent time at work on a Friday afternoon exploring new technologies, so a colleague and I decided to work through this book. Microsoft have recently started supporting Docker running on Windows, and I thought this would be an interesting way to see how well the Windows Docker eco-system has been progressing. Also, this book targets ASP.NET 1.1 and I wanted to see if things were easier with the latest 2.1 version of ASP.NET.

The first two chapters in the book are a really brief introduction to Docker, followed by a list of the docker utility’s commands.

Installing Docker on windows was really easy, requiring us to run an installer. We did have to turn on Hyper-V for Docker to use. This clashed with the Oracle VirtualBox that we typically use for testing, but fortunately I had a spare machine on which I could leave it turned on.

In chapter four of the book you write a fairly simple ASP.NET Core application which you then publish.

dotnet publish --framework netcoreapp2.0 --configuration Release --output dist

This application is then copied across to a docker container as part of the DockerFile

FROM microsoft/aspnetcore:2.0.3
COPY dist /app
WORKDIR /app
EXPOSE 80/tcp
ENV ASPNETCORE_URLS http://+:80
ENTRYPOINT ["dotnet", "dockerplay.dll"]

which we can then use to build a Docker container.

docker build . -t apress/exampleapp -f Dockerfile

The next chapter of the book deals with Volumes and Software Defined Networking. Volumes allow you to define some storage which can be attached to a container – this allows the container to run an application that writes to the file system to store its state, say a database. When we need to rebuild the container we can then re-attach the file system to the new container, and hence not lose any data.

This is where we diverged a little from the book. The book aims at Linux and mySQL, where we wanted to use SQL Server running on windows.

For this we pulled a pre-build image containing SQL Server.

docker pull microsoft/mssql-server-windows-express

And then used a volume to store the state.

docker volume create --name testdata

docker run -d -p 7002:1433 -e sa_password=ffddfdfdfdfd -e ACCEPT_EULA=Y -v testdata:c:\data microsoft/mssql-server-windows-express

The book moves on to SDN and the demo application uses two different network segments – one for the frontend and one for the backend. In the book, a proxy is used to load balance across the three servers that are set up.

Unfortunately there was no haproxy that would run in a Windows container, so we decided to use NGINX. Again. we had to build our own container for this, and I couldn’t build on nano server (because my windows drive had become corrupted)

 

FROM microsoft/windowsservercore
ENV VERSION 1.13.9

SHELL ["powershell", "-command"]
RUN Invoke-WebRequest -Uri http://nginx.org/download/nginx-1.13.9.zip -OutFile c:\nginx-$ENV:VERSION-win64.zip; \
	Expand-Archive -Path C:\nginx-$ENV:VERSION-win64.zip -DestinationPath C:\ -Force; \
	Remove-Item -Path c:\nginx-$ENV:VERSION-win64.zip -Confirm:$False; \
	Rename-Item -Path c:\nginx-$ENV:VERSION -NewName nginx

# Make sure that Docker always uses default DNS servers which hosted by Dockerd.exe
RUN Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters' -Name ServerPriorityTimeLimit -Value 0 -Type DWord; \
	Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters' -Name ScreenDefaultServers -Value 0 -Type DWord; \
	Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters' -Name ScreenUnreachableServers -Value 0 -Type DWord
	
# Shorten DNS cache times
RUN Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters' -Name MaxCacheTtl -Value 30 -Type DWord; \
	Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters' -Name MaxNegativeCacheTtl -Value 30 -Type DWord

COPY nginx.conf c:/nginx/conf

WORKDIR /nginx
EXPOSE 80
CMD ["nginx", "-g", "\"daemon off;\""]

We had to write a config file that knew about the three instances that we wanted to load balance across

#user  nobody;
worker_processes  1;

error_log  logs/error.log;
error_log  logs/error.log  notice;
error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  1024;
}


http {
    upstream myapp1 {
        server dockerplay_mvc_1;
        server dockerplay_mvc_2;
        server dockerplay_mvc_3;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://myapp1;
        }
    }
}

We could run the various commands documented in the book to start the instances and add them to the right network. We could then load balance using the NGINX and we could refresh the web page to see that requests were being served by different machines at different times.

[There is a little too much hardwired in by name for my taste. The SDN inside docker runs a DNS that lets you look up other containers by name to get their IP address]

The next chapter of the book looks at Docker Compose. This gives you a way to wire things up using a single configuration file.

version: "3"

volumes:
  testdata:

networks:
  frontend2:
  backend2:

services:

  sqlexpress2:
    image: "microsoft/mssql-server-windows-express"
    volumes: 
      - testdata:c:\data
    networks: 
      - backend2
    environment:
      - sa_password=fddfdfdfsff
      - ACCEPT_EULA=Y

  dbinit:
    build:
      context: .
      dockerfile: Dockerfile
    networks:
      - backend2
    environment:
      - INITDB=true
      - DBHOST=sqlexpress2
      - DBPORT=1433
    depends_on:
      - sqlexpress2

  mvc:
    build:
      context: .
      dockerfile: Dockerfile
    networks:
      - backend2
      - frontend2
    environment:
      - DBHOST=sqlexpress2
      - DBPORT=1433
    depends_on:
      - sqlexpress2
    ports: 
      - 4020:4020 
      - 4021:4021

  loadbalancer:
    image: nginx
    build:
      context: ..\nginx
      dockerfile: Dockerfile
    ports: 
      - 8112:80
    networks:
      - frontend2

This is a really neat technology, allowing you to scale the various components up and down. Unfortunately for us, we didn’t have an easy way to reconfigure the load balancer when the scaling happens. In the book, the load balancer configuration has “links” and “volumne” lines that allow the compose to pass details of the instantiations of the load balanced service. We didn’t have time to look in to this.

The next chapter in the book looks at Docker Swarm. There was no equivalent on Windows, so we didn’t try it.

The last chapter of the book looks at allowing debugger access into the container. Visual Studio can do this if you run the appropriate components, but we didn’t try too hard to get this working. Later versions of Visual Studio can build containers and automatically configure them to allow debugger access.

I think our main observation was that Windows Docker seems to be a long way behind Docker on Linux.

The book was good as a set of instructions to follow, with the brief explanations helping a little to understand what was going on. Using a book that was a version behind was a good way of forcing us to debug and understand what was happening a little better.

On a related note, there’s an interview that discusses Service Fabric which is used to run loads of the Azure infrastructure.

Posted in Uncategorized | Leave a comment

Reactive extensions in action

Rx.NET In Action by Tamir Dresher

The reactive extensions have been around for a long time. I remember coming across them in C# something like a decade ago, but I don’t think I’ve seen a book or documentation that covers the whole of the implementation – sure, people spend a lot of time talking about the various combinators and hot and cold observables, but they don’t spend much time talking about schedulers and the threading model that sits below the system.

Part one of the book, which consists of three chapters, gives a basic introduction to Reactive programming, and also covers some of the C# you need to make use of the Rx libraries.

By way of some examples, the first chapter introduces us to the idea of making events a first class object,  the IObservable interface and its duality to the IEnumerable, and points out the differences between the push and pull models of event delivery. The author goes on to look at the properties described in the Reactive Manifesto. We are also introduced to marble diagrams, which allow us to visualise the various interactions.

Chapter two takes us through a “Hello, Rx” application. This time it isn’t Google suggest, which used to be the canonical example that is used in various write ups. In this book we look at a stock tracker application. This allows the author to cover how standard classes of .NET events can be converted easily into event streams, and the author gets a chance to talk a little about the threading concerns. I think that’s great, as threading is often hidden under the covers in tutorials, but as soon as you want the events to be processed by a GUI you get into the GUI library’s threading requirements.

Chapter three covers functional thinking in C#. Rx.NET encourages you to handle a pipeline for processing, with events feeding into the top of the pipeline, various filtering and processing happening in the middle, and then elements subscribing to the resulting output of the pipleine. This is a mechanism that the functional style handles very well.

The second part of the book has chapters on various Rx.NET concepts.

Chapter four starts with creating observables, which is demonstrated by writing an observer that logs the received events to the console (and we’ll use this observer through the book). Of course writing things yourself gives you a chance to break the protocol of the IObservable – in particular the protocol that the messages flow as

(OnNext) * (OnError | OnCompleted)

It is therefore often better to use the Rx library’s helpers for defining your own classes, so the author points to the ObservableBase class which makes it easy for you to define your own named types, or better still there are many overloads on Observable.Create to avoid the need to name a new type.

            var ob = Observable.Create(observer => 
            {
                Console.WriteLine("Started");
                Task.Run(() => { Task.Delay(TimeSpan.FromSeconds(2)); observer.OnNext(2); observer.OnCompleted(); });
                return () => { Console.WriteLine("Finished"); };
            });

            ob.Subscribe(x => Console.WriteLine(x));

This chapter also looks at converting the various .NET event styles to observables, and looks at converting from Enumerable to Observable and back again. We also see some of the more primitive observables that handle looping and single values.

            var evensBelow50 = Observable.Generate(0, x => x  state + 2, v => v);
            var singleValue = Observable.Return(10);
            var neverFinish = Observable.Never();
            var empty = Observable.Empty();
            var _ = Observable.Throw(new Exception("Bang"));

Chapter five covers how you make observables from asynchronous code. It starts with looking at async friendly versions of Observable.Create

            var ob = Observable.Create(async (observer, ct) => 
            {
                Console.WriteLine("Started");
                ct.Register(() => Console.WriteLine("Finished"));
                await Task.Delay(TimeSpan.FromSeconds(2));
                ct.ThrowIfCancellationRequested();
                observer.OnNext(2);
                ct.ThrowIfCancellationRequested();
                observer.OnCompleted(); 
            });

And then looks at the conversions between Task and Observable handled by the ToObservable method, and how SelectMany and Concat can be used to link different computations together.

Chapter six looks at the observer/observable relationship, in particular how to delay and re-subscribe to the observable. We walk through the DelaySubscription method and various other operators like SkipWhile and TakeUntil.

            var ob = Observable.Range(1, 5).Do(x => Console.WriteLine(x));

Some of the ideas are put together in a drawing application where the code tracks the mouse and the mouse button up and down lead to event streams starting and ending.

Chapter seven looks at controlling the temperature of observables. Observables can be categorised as hot or cold. Here cold means that the observable replays a set of event to each subscriber, whereas a hot observable only plays new events. In the case of the hot observable, if you weren’t subscribed when the event happened, then you don’t get to see it.

We start with an ISubject. Instances of this interface can act as an observer and as an observable, and Rx provides four types that implement this interface – Subject, AsyncSubject, ReplaySubject and BehaviorSubject. The book covers what all of these subjects do, and how they can be used to proxy hot and cold observables to give you something with various interesting behaviours.

Chapters eight and nine go through the many operators, from Max and Count all the way through to operators for partitioning an incoming event stream into a set of windowed buffers.

Chapter ten talks about concurrency and synchronisation, and is the best explanation I have read of this side of the Rx world. There are many types of IScheduler that are implemented by the library, ranging from a scheduler that uses threads from the thread pool to schedulers that hijack the current thread and don’t return until a series of actions have finished.

The last chapter talks about error handling and recovery, and it also touches on the subject of backpressure. It is also very good and informative.

It is worth also mentioning that the book has three appendixes – some general coverage of asynchronous programming in .NET, a section on the Disposables that the Rx library offers and a section on testing Rx which talks about how you might Unit Test your code and use test schedulers to control the execution.

It was really good to have a single place that covered all of this material. Typically you can find some of this in blog posts spread all over the internet, but having it in a consistent story that develops over eleven chapters is brilliant.

I also noticed that the pre-release System.Reactive Nuget package contains code around the IQbservable interface. It will be interesting to see where that goes in the future.

Posted in Uncategorized | Leave a comment

Stack allocated closures make it into C#

Back in the days when I worked on Lisp compilers, I remember adding stack allocation of the closure records when calling local functions. In C# 7, we now have local functions and it is interesting to look at the optimisations that are applied for these.

Just for a base line let’s have a quick look at the implementation of lambda expressions which close over variables the current method. The implementation, which has been around for a long time, re-homes the locals into a heap allocated (Display) class. This extends the lifetime of the variables allowing the reference from the lambda expression to govern their lifetime.

        static Func<int,int,int> Check(int a, int b)
        {
            return (x, y) => (x + y + a + b);
        }

This is converted into code that as the following form. “a” and “b” have been re-homed into the heap allocated instance.

private static Func<int, int, int> Check(int a, int b)
{
    <>c__DisplayClass1_0 class_ = new <>c__DisplayClass1_0();
    class_.a = a;
    class_.b = b;
    return new Func<int, int, int>(class_.b__0);
}

The DisplayClas has the following definition, where we see the fields corresponding the captured variable, the definition of the lambda method is encoded into this class too.

[CompilerGenerated]
private sealed class <>c__DisplayClass1_0
{
    public int a;
    public int b;

    internal int b__0(int x, int y)
    {
        return (((x + y) + this.a) + this.b);
    }
}

Local functions take us to code that has the following form.

        static Func<int, int, int> Check2(int a, int b)
        {
            return Local;
            
            int Local(int x, int y)
            {
                return (x + y + a + b);
            }
        }

This is code generated slightly differently,

private static Func<int, int, int> Check2(int a, int b)
{
    <>c__DisplayClass2_0 class_ = new <>c__DisplayClass2_0();
    class_.a = a;
    class_.b = b;
    return new Func<int, int, int>(class_.g__Local|0);
}

We have the same style of DisplayClass, with the body of the local added as a method (as expected).

[CompilerGenerated]
private sealed class <>c__DisplayClass2_0
{
    public int a;
    public int b;

    internal int g__Local|0(int x, int y)
    {
        return (((x + y) + this.a) + this.b);
    }
}

However, there are now more optimisation possibilities. First, if the local function is scoped to the method in which it is defined, then it would be good to avoid the heap allocation.

        static int Check3(int a, int b)
        {
            return Local(1,2) + Local(3,4);

            int Local(int x, int y)
            {
                return (x + y + a + b);
            }
        }

This is indeed what happens.

private static int Check3(int a, int b)
{
    <>c__DisplayClass3_0 class_;
    class_.a = a;
    class_.b = b;
    return (g__Local|3_0(1, 2, ref class_) + g__Local|3_0(3, 4, ref class_));
}

The DisplayClass has been optimised to a struct

[CompilerGenerated]
private struct <>c__DisplayClass3_0
{
    public int a;
    public int b;
}

and the body has been added as a method into the class in which contains the method containing the local

[CompilerGenerated]
internal static int g__Local|3_0(int x, int y, ref <>c__DisplayClass3_0 class_Ref1)
{
    return (((x + y) + class_Ref1.a) + class_Ref1.b);
}

The compiler has essentially noticed that the local method cannot escape from the method that uses it, and hence we can try to avoid the heap allocation.

We should also quickly look at the case where the local method doesn’t capture any locals.

        static int Check4(int a, int b)
        {
            return Local(1, 2) + Local(3, 4);

            int Local(int x, int y)
            {
                return (x + y);
            }
        }

In this case, the method compiles to the following

private static int Check4(int a, int b)
{
    return (g__Local|4_0(1, 2) + g__Local|4_0(3, 4));
}

and the local method is simply defined as a static method int he defining class

[CompilerGenerated]
internal static int g__Local|4_0(int x, int y)
{
    return (x + y);
}

While we are here we could quickly cover one memory management gotcha around closures and their implementation.

        static (Func<int,int,int>, Func<int,int,int>) Check(int a, int b)
        {
            return ((x, y) => (x + y + a), (x, y) => (x + y + b));
        }

The implementation decides to put the local variables into a single DisplayCLass

private static ValueTuple<Func<int, int, int>, Func<int, int, int>> Check(int a, int b)
{
    <>c__DisplayClass1_0 class_ = new <>c__DisplayClass1_0();
    class_.a = a;
    class_.b = b;
    return new ValueTuple<Func<int, int, int>, Func<int, int, int>>(new Func<int, int, int>(class_.b__0), new Func<int, int, int>(class_.b__1));
}

This means that if either of the returned lambda expressions is alive (from the pint of the view of the GC), then the variables “a” and “b” are still alive. This might not seem to matter too much, but if “a” and “b” were large objects (for example), it does mean that their lifetime can be extended further than you might expect.

Posted in Uncategorized | Leave a comment

Progressive Web Applications

Building Progressive Web Apps: Bringing The Power of Native to the Web by Tal Ater

I was interested in reading this book because I’d been hearing a lot about service workers and wanted to understand what they were about. The whole Progressive Web Application has been something that Google has been pushing, and it has taken a while for other browser vendors like Microsoft and Apple to declare their support. However, with Microsoft’s recent inclusion of service workers in the latest developer release of Edge, this seems to be a technology that is going to be big – there will be talks on PWAs at //BUILD, and there are rumours that PWAs may replace UWP as one of the implementation mechanisms behind applications in the Windows Store.

The book is a really good introduction to PWAs.

The author’s github repository contains a lot of related material, but for the book he has chapter oriented versions of a web application, Gotham Imperial Hotel, which he converts into a progressive web application  (see the branches corresponding to the various chapters). This is a really good learning mechanism, and it is nice to be able to try and step through the code to see what is happening. Google Chrome, for example, lets you fake internet disconnection so that you can try out the service worker code that handles the offline state.

After a brief introduction to service workers, the first chapter of the book takes us through service workers as a proxy and how they can intercept HTTP requests and hence act as a caching mechanism. Most importantly, the service worker still runs even if the browser is offline, and given that the service can determine that it is running offline, it can return different HTML and hence give the user an experience even in this case.

The next chapter deals with the installation of a service worker. Service workers are queued for installation, and are only actually installed when there are no tabs in the browser visiting the target site  – it would obviously be confusing if different versions of a service worker were running in the different tabs targeting a particular site.  The chapter explains the lifecycle, which includes a state where the service worker can cache resources without it being expected to handle any request interception. If the worker can’t get the data it needs, it can set its state to say that it shouldn’t be used, and we should continue with the existing worker. All of this state transition code is asynchronous, and will use Promises to avoid blocking.

The next chapter demonstrates how service workers let us write offline first applications. It this model we are really using PWAs as a deployment model. The chapter covers various patterns of caching – for example, you might try to pre-cache pages, or might instead cache on first access, but then you might also check for updates when the cache is hit and refresh the cache with the new version of the page. The author talks through the various patterns.

The next chapter looks at IndexDB. This is a local database, implemented in the browser, that allows you to store JSON objects, and offers a cursor to walk through the various tables and indexes. There are various patterns you may use to update the database between versions of the service worker.

The service worker isn’t just a proxy for intercepting web requests. The next chapter covers background sync. If our application has been working offline, when we next go back online we’d like to resync with the application’s web site. The service worker is able to receive callbacks from a sync manager, implemented in the browser, which can take care of tracking the synchronisation steps that need to be carried out and maintain the list of pending synchronisations. This service will take care of starting up our service worker if it isn’t running, and will do so at a time where we are online.

The service is acting as a proxy between a site and the many possible tabs in the browser talking to that site.  You might therefore expect there to be a mechanism for the various tabs to talk to each other (so that they can coordinate between themselves over state), and this is what the next chapter covers. Messages can be posted, and message handlers can be registered.

Installation of a PWA, to give it a more permanent presence in the browser, is covered in the next chapter. Many web sites these days, nag you to install the mobile application version of their web site via interstitials, and this chapter talks about the heuristic driven way a browser allows you to install a PWA. In Chrome, for example, there are a number of things that trigger installation – regular use of the site for example.

Once your PWA is installed as an application, you might need a way for the site to notify you of interesting things that have happened. The next chapter covers the two different push notification mechanisms that you can can use. There appears to be more standardisation work required in this area,

The book finishes with a couple of chapters of UX for Progressive Web Applications and where PWAs are going in the future.

The author has done a good job of breaking down the material into the various chapter sized chunks, and having a github application that can show you the changes related to each chapter is a really good learning mechanism. PWAs seem to have many uses, from pseudo-applications that are easy to install, to a richer way to allow web applications to run offline.

Posted in Uncategorized | Leave a comment