Effective STL

Effective STL – 50 Specific Ways To Improve Your Use of the Standard Template Library by Scott Meyers

While I was at Facebook, I had the chance to write some C++ code, and C++ seems to have really moved on since I last wrote it. In particular, facilities like Move semantics took some work to understand. The language now has some nice features like lambda expressions (though the lack of garbage collection makes them a little trickier to use than C#).

This book is a little out of date, but the content was still good for me to understand the STL, though I think I gained most C++ from watching lots of CppCon talks and reading various posts on this blog which includes this guide to a feature of C++ 20.

Posted in Uncategorized | Leave a comment

Links aplenty

When I was at Facebook, I didn’t really have time to blog about interesting blog posts that I’d come across, so I’ve accumulated a few interesting reads in my browser bookmarks.

An introduction to SSA and the phi function.
Paxos explained
Tracing in Linux using eBpf
The complexity of sliding block puzzles
Some notes on proving the independence of the Continuum Hypothesis
Issues writing a Linux kernel module (and RCU)
Some issues with Nagle’s algorithm in this post and this one about delayed Ack
ARM processors, lock-free and branch processing
Modern storage and the failure of the supplied Apis
Python at scale using strict modules
Module initializers in C#
Avoiding iCache misses
Linear types in Haskell
The security of Helm charts
Perceus: Strict reference counting for the Koka language
How Apache Flink snapshots state so that it can restore on failure
How debugging Blazor WebAssembly works
Compile time dependency injection for C# – a way to improve application startup

There’s an interesting course on reinforcement learning on YouTube with the slides here. This is a great explanation starting from Markov Processes and building up to the systems that DeepMind used to solve many hard problems.

And lastly a plan for preparing for a software engineering interview at Facebook.

Posted in Uncategorized | Leave a comment

That’s the way to do it

The Design of Web APIs by Arnaud Lauret

API first seems to be the way that people are writing applications these days.

This book is aimed at people developing REST Apis backed by an OpenAPI schema, and does a good job of covering a number of issues.

The first part deals with designing an API from the point of view of the user, emphasising that the API should make it easy for the user to solve the tasks that they want to do, rather than simply exposing the implementation – just making the implementation available is easy for the writer, but can make the API confusing to use and can force the user to learn names for internal implementation details. The next part of the book covers REST, and goes through the various ways that an API could be broken down into resources and the verbs that you could use to manipulate them. That is followed by a section on how to describe the API using OpenAPI.

The author then discusses how to make the API predictable, straightforward to use and secure before discussing how to make the API evolvable by versioning as well as how you go bout documenting it for clients.

The discussions are good with lots of useful ideas, and I learned a few HTTP headers and codes that I didn’t know before – the Sunset header and the 207 status code for example. There is also material on server side events and gRPC.

Posted in Uncategorized | Leave a comment

Data Science from scratch

Data Science From Scratch: First Principles with Python by Joel Grus

I absolutely loved this book. It has chapters introducing many aspects of data science, and then has many code examples of implementing many aspects of machine learning. For me, seeing the implementation details of the various algorithms, made everything fit into place. I should also say that the author has a great writing style with loads of witty remarks.

The book starts with a quick introduction to Python and then shows you how to use the matplotlib Python library to visualize your data. This is followed by chapters that give a quick introduction to linear algebra, statistics, probability and hypothesis testing, all with examples in Python. The next chapter looks at gradient descent which is going to be used in many of the chapters that follow. This is followed by two practical chapters on how to actually get hold of data from files and via the web, and parts of Python like namedtuples that will help you work with it effectively.

The folowing chapters then go through the various machine learning algorithms: k-nearest neighbours, naive Bayes, linear regression, multiple regression, logistic regression, decision trees, neural networks, deep learning, and clustering. This is followed by chapters on natural language processing, network analysis, recommender systems, and the some final chapters cover databases and sql, map reduce and data ethics.

I think that the book is just right. The material explains the algorithm to just the right depth, and the Python code makes it easy to see how you actually implement it.

Posted in Uncategorized | Leave a comment

Streaming Systems

Streaming Systems: The what, where and how of large-scale data processing by Tyler Akidau, Slava Chernyak and Reuven Lax

When I was doing a lot of reading about systems design, there were several times when the Lambda Architecture was mentioned. In several videos I watched on YouTube, the presenter would suggest that you used streaming and approximation algorithms for top-N to get real time performance, and then use MapReduce for the batch processing of data as an overnight job to get the correct results (in non-real time).

This book discusses the many issues with this architecture. For example, it can be quite hard to get the same result from the two approaches, and doing things this way means that you are effectively implementing the processing twice. The author then pushes the fact that streaming systems have now improved to the point where you don’t need the batch processing side of things.

The authors take us through the concepts behind the streaming implementations, using Apache Beam as the implementation for the demonstrations. They take us through bounded and unbounded datasets, windows, triggers and the difference between event time and processing time. In order to handle failure the streaming system also needs to handle exactly once semantics which requires persistent state (say via snapshots like Flink).

In the second part of the book, the authors look at how to extend SQL to allow users to express queries using joins. This is preceded by a discussion of the duality between tables and streams, and how you can think of streams as data in motion compared to tables which are a snapshot of the stream’s state at a moment in time.

At the very end of the book there is a chapter on the history of large-scale data processing, which starts with MapReduce and looks at things that came after.

I really liked this book. It goes through the concepts, using a series of diagrams to explain how various mixes of triggers and windows would generate results for a long running example, I liked the examples in the Beam DSL and I enjoyed the history and discussion of extending SQL’s relationships to allow joins across streams. The book also contains references to loads of interesting papers that I will now have to read.

Posted in Uncategorized | Leave a comment

That’s very unlikely

The Art of Statistics: Learning from Data by David Spiegelhalter

This book was a brilliant refresher on statistics, aimed at people without a mathematical background. It considers a number of questions, often based on a clickbait newspaper headlines, and shows how the question should really be analysed using statistical methods. It’s a really good read.

Rather strangely, I was reading this paper on a probabilistic programming language and the author contributed to one of the cited papers, though the book does talk about using simulation as a technique for using distributions so I can see how they are related.

I’ve also been doing some reading on Haskell. For quite some time I’ve been trying to understand how you can extend the standard Hindley-Milner type inference to handle some of the more interesting features like Phantom types and GADTs. At long last I came across this paper which describes how to do it. This also helps to explain some of the type checker messages that I see from time to time. While doing some reading, I also came across this article on how the IO Monad is implemented, and why you don’t get the same kind of guarantees from the unsafe functions for performing IO.

Last, two great videos on .NET. Performance improvements in .NET 5 by Stephen Toub which talks about recent optimisations – I hadn’t come across some of them before, such as being able to turn off the zero initialization of local variables, What’s so hard about pinning? by Maoni Stephens which goes into some implementation details about the .NET garbage collector.

I’ve also been reading some posts on how Linux debuggers work – these two talk about getting access to registers and this paper talks about how to stop the breakpoints stopping the target process for long. While we are talking about Linux, this post goes into detail about the durability guarantees behind various Linux file system operations.

I’ve also been doing more reading on Category Theory, and have again wondered about the proof that polymorphic functions in Haskell correspond to the natural transformations in the relevant category.

Posted in Uncategorized | Leave a comment

And a final batch of books

Working from home for six months has made it really easy to get a lot of reading done. This is the final list of books that I’ve finished during the period.

Good Strategy/Bad Strategy: The difference and why it matters by Richard Rumelt

We went through this book as part of the reading group for Tech Leads at work. The book looks at what a strategy is and contrasts it to the usual set of motivational ideas that we are often told is a strategy. The book emphasises a logical plan based on an analysis of the problem together with reasoning as to why a particular item was targeted. A mix of common sense and good tricks to get a good plan together.

An Elegant Puzzle: Systems of Engineering Management by Will Larson

This again was going to be the object of a reading group; however, the switch to working from home meant that this never happened, but I read my way through the book anyway. I must admit that I found the book hard going.

How Linux works by Brian Ward

I’m going to be moving back to using Linux day to day, and wanted a refresher on the lower level details of Linux. This book is brilliant for that purpose. As the blurb says, it covers all of the basics, though it does this in lots of interesting detail.

Webpack 5 Up and Running: A quick and practical introduction to the JavaScript application bundler by Tom Owens

I wanted to get up to speed by the new version of webpack. To be honest this book just feels like cut and pasted parts of the existing documentation, and it really feels like the book could do with some good proof reading to correct the typos and misspellings.

The Daemon, the Gnu, and the Penguin by Peter H Salus

A potted history of Unix and Linux. A quick read by interesting from a historical point of view.

Einstein’s Unfinished Revolution: The Search for What Lies Beyond the Quantum by Lee Smolin

Lee Smolin has written a loads of books over the years about the search for a unified theory of physics. This is another one that talks about his more recent ideas around quantum mechanics and how we can give it a realist interpretation. I must admit that I have enjoyed all of his books.

Programming Rust: Fast, Safe Systems Development by Jim Blandy

There are several people at work who are massive fans of Rust, and this book does a really good job both of explaining the language and discussing the benefits of using Rust to get runtime safety. The book is really good and I will certainly be using the language in the future.

Posted in Uncategorized | Leave a comment

Haskell’s Type System is really amazing

I guess it’s no surprise, but the Haskell Type system is really amazing. The support for GADTs and other interesting type level ideas like phantom types make the language a fascinating place to understand how much types can help with the construction of only valid programs.

I’ve recently been reading Sandy Maguire’s Thinking With Types which is, as the author claims, the comprehensive manual for type level programming.

The book starts from first principles, with Sum/Product and Exponential types and a discussion of the difference between terms, types and kinds. It then goes into detail about all of the type related extensions that GHC supports including GADTs and constraints, Rank-N and existential types. This is followed by a section on computing at the type level – understanding Haskell’s dependent type support was one of the main reasons that I bought this book. This also covers type families and defunctionalization.

The book is great but it is going to take a couple of reads to understand the material.

I also read Haskell Design Patterns: Take your Haskell and functional programming skills to the next level by exploring new idioms and design patterns by Ryan Lemmer

This book is also a quick read and about half of it is also connected to using the type system to make your programs more correct. However it also has some great sections on Haskell ideas like traversals, Monads and monad transformers, as well as a discussion of different styles of functional I/O.

At the recent Haskell Love online conference the talks on Haskell to Core helped me to understand some of these ideas as the presenter covers how some of the type equality constraints are presented in the de-sugered intermediate language that GHC users, and Richard Eisenberg’s Parameters of Many Flavours also discusses the different types of implicit parameters that are carried by a Haskell function definition (and also discusses what is erased and what is available at runtime).

Posted in Uncategorized | Leave a comment

OAuth 2 in Action

I’ve never really understood where authentication fits into the OAuth protocol – I kept finding articles that mentioned OpenID Connect was an extension of OAuth that did this, but was still confused as there is often an authentication step when you use OAuth to delegate you authorization decision. How is this not authentication?

I recently found a book that explained this really well – OAuth 2 In Action by Justin Richer and Antonio Sanso.

To me this was the perfect developer book. The authors explain what OAuth is, and go through each of the flows that are supported – OAuth works across many types of device and configuration, from simple delegation of authority between two web sites to handling a mobile device or desktop application talking to a REST backend. Most importantly they have examples in Javascript of each of these scenarios and take you through the Javascript code you need to implement to support the protocol. After covering OAuth the authors talk through the extensions which are built on top of OAuth. One of these extensions is OpenID Connect. Again, we get to implement this, which is definitely the way to get a good understanding of anything.

The book has an associated Github repository for the code examples here.

I did a lightning talk on this at work recently and there’s a good talk on YouTube that explains all of this.

Posted in Uncategorized | Leave a comment

System Design Interviews

I’ve become really interested in the design of big systems, in particular how you can scale using commodity machines to planet size applications. There are lots of sites around that offer material for this purpose (and many YouTube channels too). In particular in the past I’d done a fair amount of reading on the system design primer site. I recently noticed a book on Amazon that looked good for this subject.

System Design Interview: An Insider’s Guide by Alex Xu

This book offers some chapters that describe good ways of tackling system design interview questions, including an introductory chapter that takes a web application for a single machine with a small number of users and describes the techniques for scaling that up. After that the book contains 12 example system designs, ranging from consistent hashing and rate scaling, to designing Google Drive. The chapters are short and to the point, and there are numerous typos and sentences that don’t quite scan properly, but all in all the book is a really good read which covers the key points for such interviews. Also each of the chapters has a set of 10+ links to additional material such as papers and engineering blogs of the various companies that implemented the problem. This makes it a quick but informative read, and a good way to review the material around this area.

Posted in Uncategorized | Leave a comment