I’ve been doing loads of reading about the x86/x64 architecture recently, mainly as a result of coming across the fantastic papers written by Agner Fog on his optimisation page. The microarchitecture paper is a really good read. It discusses how instructions are broken down into micro-ops on the various Intel processors, and how this relates to instruction scheduling, branch prediction and speculative execution. It was very interesting to see a clear explanation of how instructions are scheduled across the many different execution units on the chip, and the interplay between fully utilising all of the on-chip resources and the amount of logic that is required on the chip to schedule all of the potential work.
There are a number of papers on optimisation at the assembler and higher levels. This one discusses CPU dispatching by the Intel and GNU C++ compilers. I hadn’t come across this idea before – the idea is that you recognise the CPU you are running on and then generate several specialisations of the code customised to use the facilities that the particular CPU offers. This reminded me of one of the arguments for JIT compilers – the JIT certainly knows the architecture it is running on and can target its special instructions. There is obviously a trade off here though, as the target language for the JIT is often at a higher level than the CPU, making it necessary to allow access to more CPU specific instructions for use in high performance applications.
Chips seem to be offering more and more hardware support for things that are traditionally implemented in software libraries. This Intel blog post talks about the hardware transactional memory of the current Intel chip generation. Agner Fog has a few posts lamenting the competition between the hardware vendors who are competing to add different instructions to their chips.