Usually you write your VM, debug it, then write your JIT, and spend a lot of time debugging that. It’s often really complicated to get the interpreted and jitted versions of the code to behave in the same way. Laurence Tratt, who’s been writing a VMs for a language he has developed, has managed to avoid having to do the second part of this workflow. In this post, he documents a rather novel approach for writing a JIT for the VM that he has written – amazingly, he got the JIT without doing any work (or rather without writing code that actually writes any machine code instructions).
All he had to do was write the VM in a restricted subset of Python, called RPython, and then use the tool chain developed by the PyPy project. I remember coming across this project many years ago – it is a project to write a version of Python inside Python. Writing a virtual machine in a high level language is good for all of the usual reasons that we use a high level language – productivity via good debuggability and conciseness of expression. Writing the VM in the language itself has been a technique for a long time. The original Smalltalk-80 was written in Smalltalk and the Squeak Smalltalk implementation is bootstrapped by translating this VM into C which can then be compiled into a standalone virtual machine, while allowing the VM itself to be debugged using the standard Smalltalk tools. This idea has now been extended by the PyPy people to include the generation of the tracing JIT. This meta-tracer traces through both the code of the interpreter and the code that the interpreter is running – this means that the interpreter needs to pass extra information to the tracer so that it can determine when we are tracing the same code again, and it also means that the tracer doesn’t actually trace the C code but through a byte code representation of what the C code was generated from. The process is, however, fairly automatic.
Tratt uses this technique to get a VM with good performance very quickly, and it’s an impressive piece of work.
As a side note, tracing JITs seem to have recently fallen out of favour in some spheres. They rely on repeated paths of instructions and require that these paths don’t get too long before a loop is detected. The Mozilla people have gone back to a more standard method based JIT.