Some more books

As usual, I got a load of interesting books that I bought with credits I was given for Xmas.

gRPC Up and Running: Building Cloud Native Application With Go and Java for Docker and Kubernetes by Kasun Indrasiri and Danesh Kuruppu

This is a fairly short book at less than 200 pages, but gives a good introduction to gRPC, including many worked examples in both Go and Java. It talks you thorough some of the different ways that applications can interact, including brief coverage of Thrift and GraphQL, and then jumps into a worked example implemented in both Go and Java so you can see how the various interface specifications are mapped into the two languages, gRPC lets clients and servers stream values as part of a method invocation, so these streams need to manifest themselves naturally in the implementation language. The book then discusses how everything is implemented on top of the HTTP/2 protocol, and then looks at some advanced features like interceptors (for before and after send and receive actions like authorization) and load balancing and deadlines (you can fail a call if it takes too long). There are then chapters on securing the communication channel, testing via a CI pipeline and some other useful projects.

C++ Move Semantics: The Complete Guide by Nicolai Josuttis

Move semantics seem like a valuable optimization in a call-by-value language, though the interaction where the moved from object is put into a default state is still very weird to me. Anyway this book does a good job of describing the semantics and the why for move semantics. I enjoyed it but it all feels very complicated, and it appears to be easy to get into the domain of undefined behaviour.

The GO Programming Language by Alan Donovan and Brian Kernighan

I have read this book in the past, but have just started writing GO at work, so though I should give it another read. The book is a little old now and doesn’t cover parts of GO like the module system, though there are many blog posts that explain this part of the language, The book is rich with examples that help communicate the style of GO programming, and the authors are happy to express strong views on aspects such as testing and interfaces. This is a really good read and covers the language really well.

What We Cannot Know by Marcus du Sautoy

This book explores where the human understanding currently ends and how far it might expand in the future in a number of domains, from quantum physics, cosmology, logic and artificial intelligence. It’s a good read and very interesting.

Posted in Uncategorized | Leave a comment

Getting up to speed on K8s

I’ve just moved teams at work, and that has given me a chance to work on a system that runs on top of Kubernetes. It’s been a while, maybe two years, since I last did anything serious with Kubernetes, so it was time to get out some of the old books.

Kubernetes Up and Running by Kelsey Hightower, Brenda Burns and Joe Beda is a great introduction to the main ideas behind Kubernetes, and the main components. The book is fairly short, but is full of simple examples to demonstrate ideas such as Services and Pods, and gives a great overview of Kubernetes and how to use it.

The one thing that it misses (for me) is how it all works under the covers, and I enjoyed Managing Kubernetes by Brendan Burns and Craig Tracey which gives you some insight into this with chapters on installing Kubernetes on a cluster, the API server and the scheduler which put things into perspective. The book is also aimed at describing how you maintain a real world K8s cluster and so there are additional chapters on user management, authorization, networking and disaster recovery.

The next important thing to learn about is Operators how how you can use custom resource definitions and the desired state reconciliation loop to define your own higher levels concepts. There are blog posts like this one that help to explain this.

You’re going to need some understanding of docker to really understand how it all hangs together and I enjoyed Docker Up and Running by Kane and Matthias as a way to understand this. This talks a lot about how to build reliable containers for production use, and goes into some of the low level details of debugging container.

It’s also worthwhile looking at some of the issues people have had with Kubernetes in the past like these failure stories – I ran into the missing anti-affinity rules problem in the past when I worked on a K8s system. There’s also an interesting article on how it might be designed now.

There a couple of additional books such as Designing Distributed System by Brendan Burns which talks about some of the common patterns such as sidecars which are included into pods to add extra functionality. I also read Practical Microservices with Dapr and .NET by Davide Bedin which offers a more Microsoft focused view of using Kubernetes and the Distributed Application Runtime. This book covers the major ideas of Dapr like service discovery, state management and virtual actors and shows their use in the context of .NET Core applications.

Posted in Uncategorized | Leave a comment

MapReduce for the win

MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems by Donald Moner and Adam Shook

This book is fairly old now – the version I read was published in December 2012, and I was lucky enough to find it in a pile of books that I had been intending to read for some time. I’ve also been doing a lot of reading about processing Big Data using streaming systems, and have become interested in the so-called Stream-Table duality. These kind of systems have the intention of being more real time than older systems designed using MapReduce, and so it’s good to read a book that highlights all of the great things that MapReduce can do.

The book is full of practical examples. It is aimed at Hadoop (1.0.3) and uses a stackoverflow dataset for the example implementation of many of the patterns.

The first chapter of the book covers a brief history of MapReduce, and contains a pointer to the original paper, MapReduce: Simplified Data Processing on Large Clusters. The example in this chapter is the canonical work count in Hadoop.

This is then followed by chapters on various patterns.

Summarization patterns looks at ways of counting in order to analyse data – various metrics like average, min and max values, standard deviation and generating inverted indexes. This chapter also mentions the use of a combiner – using a reduce like operation before the data has been shuffled in order to improve performance and cut down on the amount of data that needs to be transferred.

Filtering patterns looks at efficiently slimming down the data, and includes discussions on generating Top-N and using Bloom filters to make it possible to quickly get rid of data we are not interested in. For the latter pattern, we need to build the filter and then make it accessible to machine doing the processing.

Data organisation patterns which looks at efficient partitioning, binning and sorting the data.

Join patterns that looks at how we efficiently implement the various join patterns on top of MapReduce. This includes joining against the data set that we are processing, as well as joining against external static data.

There are then two chapters that talk more about the meta issues. How jobs can be chained in Hadoop, and also how we can join different MapReduce jobs together to make things more efficient. There is also a chapter on various input and output patterns in Hadoop.

The book was a quick but enjoyable read, and it makes it really clear how powerful the MapReduce model is, as it allows a vast number of processing methods to be expressed using the same framework.

Posted in Uncategorized | Leave a comment

Effective STL

Effective STL – 50 Specific Ways To Improve Your Use of the Standard Template Library by Scott Meyers

While I was at Facebook, I had the chance to write some C++ code, and C++ seems to have really moved on since I last wrote it. In particular, facilities like Move semantics took some work to understand. The language now has some nice features like lambda expressions (though the lack of garbage collection makes them a little trickier to use than C#).

This book is a little out of date, but the content was still good for me to understand the STL, though I think I gained most C++ from watching lots of CppCon talks and reading various posts on this blog which includes this guide to a feature of C++ 20.

Posted in Uncategorized | Leave a comment

Links aplenty

When I was at Facebook, I didn’t really have time to blog about interesting blog posts that I’d come across, so I’ve accumulated a few interesting reads in my browser bookmarks.

An introduction to SSA and the phi function.
Paxos explained
Tracing in Linux using eBpf
The complexity of sliding block puzzles
Some notes on proving the independence of the Continuum Hypothesis
Issues writing a Linux kernel module (and RCU)
Some issues with Nagle’s algorithm in this post and this one about delayed Ack
ARM processors, lock-free and branch processing
Modern storage and the failure of the supplied Apis
Python at scale using strict modules
Module initializers in C#
Avoiding iCache misses
Linear types in Haskell
The security of Helm charts
Perceus: Strict reference counting for the Koka language
How Apache Flink snapshots state so that it can restore on failure
How debugging Blazor WebAssembly works
Compile time dependency injection for C# – a way to improve application startup

There’s an interesting course on reinforcement learning on YouTube with the slides here. This is a great explanation starting from Markov Processes and building up to the systems that DeepMind used to solve many hard problems.

And lastly a plan for preparing for a software engineering interview at Facebook.

Posted in Uncategorized | Leave a comment

That’s the way to do it

The Design of Web APIs by Arnaud Lauret

API first seems to be the way that people are writing applications these days.

This book is aimed at people developing REST Apis backed by an OpenAPI schema, and does a good job of covering a number of issues.

The first part deals with designing an API from the point of view of the user, emphasising that the API should make it easy for the user to solve the tasks that they want to do, rather than simply exposing the implementation – just making the implementation available is easy for the writer, but can make the API confusing to use and can force the user to learn names for internal implementation details. The next part of the book covers REST, and goes through the various ways that an API could be broken down into resources and the verbs that you could use to manipulate them. That is followed by a section on how to describe the API using OpenAPI.

The author then discusses how to make the API predictable, straightforward to use and secure before discussing how to make the API evolvable by versioning as well as how you go bout documenting it for clients.

The discussions are good with lots of useful ideas, and I learned a few HTTP headers and codes that I didn’t know before – the Sunset header and the 207 status code for example. There is also material on server side events and gRPC.

Posted in Uncategorized | Leave a comment

Data Science from scratch

Data Science From Scratch: First Principles with Python by Joel Grus

I absolutely loved this book. It has chapters introducing many aspects of data science, and then has many code examples of implementing many aspects of machine learning. For me, seeing the implementation details of the various algorithms, made everything fit into place. I should also say that the author has a great writing style with loads of witty remarks.

The book starts with a quick introduction to Python and then shows you how to use the matplotlib Python library to visualize your data. This is followed by chapters that give a quick introduction to linear algebra, statistics, probability and hypothesis testing, all with examples in Python. The next chapter looks at gradient descent which is going to be used in many of the chapters that follow. This is followed by two practical chapters on how to actually get hold of data from files and via the web, and parts of Python like namedtuples that will help you work with it effectively.

The folowing chapters then go through the various machine learning algorithms: k-nearest neighbours, naive Bayes, linear regression, multiple regression, logistic regression, decision trees, neural networks, deep learning, and clustering. This is followed by chapters on natural language processing, network analysis, recommender systems, and the some final chapters cover databases and sql, map reduce and data ethics.

I think that the book is just right. The material explains the algorithm to just the right depth, and the Python code makes it easy to see how you actually implement it.

Posted in Uncategorized | Leave a comment

Streaming Systems

Streaming Systems: The what, where and how of large-scale data processing by Tyler Akidau, Slava Chernyak and Reuven Lax

When I was doing a lot of reading about systems design, there were several times when the Lambda Architecture was mentioned. In several videos I watched on YouTube, the presenter would suggest that you used streaming and approximation algorithms for top-N to get real time performance, and then use MapReduce for the batch processing of data as an overnight job to get the correct results (in non-real time).

This book discusses the many issues with this architecture. For example, it can be quite hard to get the same result from the two approaches, and doing things this way means that you are effectively implementing the processing twice. The author then pushes the fact that streaming systems have now improved to the point where you don’t need the batch processing side of things.

The authors take us through the concepts behind the streaming implementations, using Apache Beam as the implementation for the demonstrations. They take us through bounded and unbounded datasets, windows, triggers and the difference between event time and processing time. In order to handle failure the streaming system also needs to handle exactly once semantics which requires persistent state (say via snapshots like Flink).

In the second part of the book, the authors look at how to extend SQL to allow users to express queries using joins. This is preceded by a discussion of the duality between tables and streams, and how you can think of streams as data in motion compared to tables which are a snapshot of the stream’s state at a moment in time.

At the very end of the book there is a chapter on the history of large-scale data processing, which starts with MapReduce and looks at things that came after.

I really liked this book. It goes through the concepts, using a series of diagrams to explain how various mixes of triggers and windows would generate results for a long running example, I liked the examples in the Beam DSL and I enjoyed the history and discussion of extending SQL’s relationships to allow joins across streams. The book also contains references to loads of interesting papers that I will now have to read.

Posted in Uncategorized | Leave a comment

That’s very unlikely

The Art of Statistics: Learning from Data by David Spiegelhalter

This book was a brilliant refresher on statistics, aimed at people without a mathematical background. It considers a number of questions, often based on a clickbait newspaper headlines, and shows how the question should really be analysed using statistical methods. It’s a really good read.

Rather strangely, I was reading this paper on a probabilistic programming language and the author contributed to one of the cited papers, though the book does talk about using simulation as a technique for using distributions so I can see how they are related.

I’ve also been doing some reading on Haskell. For quite some time I’ve been trying to understand how you can extend the standard Hindley-Milner type inference to handle some of the more interesting features like Phantom types and GADTs. At long last I came across this paper which describes how to do it. This also helps to explain some of the type checker messages that I see from time to time. While doing some reading, I also came across this article on how the IO Monad is implemented, and why you don’t get the same kind of guarantees from the unsafe functions for performing IO.

Last, two great videos on .NET. Performance improvements in .NET 5 by Stephen Toub which talks about recent optimisations – I hadn’t come across some of them before, such as being able to turn off the zero initialization of local variables, What’s so hard about pinning? by Maoni Stephens which goes into some implementation details about the .NET garbage collector.

I’ve also been reading some posts on how Linux debuggers work – these two talk about getting access to registers and this paper talks about how to stop the breakpoints stopping the target process for long. While we are talking about Linux, this post goes into detail about the durability guarantees behind various Linux file system operations.

I’ve also been doing more reading on Category Theory, and have again wondered about the proof that polymorphic functions in Haskell correspond to the natural transformations in the relevant category.

Posted in Uncategorized | Leave a comment

And a final batch of books

Working from home for six months has made it really easy to get a lot of reading done. This is the final list of books that I’ve finished during the period.

Good Strategy/Bad Strategy: The difference and why it matters by Richard Rumelt

We went through this book as part of the reading group for Tech Leads at work. The book looks at what a strategy is and contrasts it to the usual set of motivational ideas that we are often told is a strategy. The book emphasises a logical plan based on an analysis of the problem together with reasoning as to why a particular item was targeted. A mix of common sense and good tricks to get a good plan together.

An Elegant Puzzle: Systems of Engineering Management by Will Larson

This again was going to be the object of a reading group; however, the switch to working from home meant that this never happened, but I read my way through the book anyway. I must admit that I found the book hard going.

How Linux works by Brian Ward

I’m going to be moving back to using Linux day to day, and wanted a refresher on the lower level details of Linux. This book is brilliant for that purpose. As the blurb says, it covers all of the basics, though it does this in lots of interesting detail.

Webpack 5 Up and Running: A quick and practical introduction to the JavaScript application bundler by Tom Owens

I wanted to get up to speed by the new version of webpack. To be honest this book just feels like cut and pasted parts of the existing documentation, and it really feels like the book could do with some good proof reading to correct the typos and misspellings.

The Daemon, the Gnu, and the Penguin by Peter H Salus

A potted history of Unix and Linux. A quick read by interesting from a historical point of view.

Einstein’s Unfinished Revolution: The Search for What Lies Beyond the Quantum by Lee Smolin

Lee Smolin has written a loads of books over the years about the search for a unified theory of physics. This is another one that talks about his more recent ideas around quantum mechanics and how we can give it a realist interpretation. I must admit that I have enjoyed all of his books.

Programming Rust: Fast, Safe Systems Development by Jim Blandy

There are several people at work who are massive fans of Rust, and this book does a really good job both of explaining the language and discussing the benefits of using Rust to get runtime safety. The book is really good and I will certainly be using the language in the future.

Posted in Uncategorized | Leave a comment