Threaded Interpretive Languages by R.G.Loeliger
I found this book in Oxfam at the weekend and was immediately hooked. Written in 1981, it documents the implementation of FORTH on a Z80 processor. It contains the code for the full implementation in a combination of Z80 assembler and FORTH in FORTH. The book is really interesting, and answers a number of questions I’ve had about the implementation since the early-80s when I had FORTH running on my BBC Micro.
FORTH, in case you haven’t seen it before, is a language that was designed to be easily portable and was targeted at embedded languages. It is a stack based language, with operations acting on the stack. To do a calculation such as (2+3)*5, you could enter the following expression at the FORTH system, noting that . pops the top value off the stack and prints it to the console.
5 2 3 + * .
which would respond
There are many operations that you can apply to the stack. DUP duplicates the top item on the stack, so we could define an operation to cube the top value on the stack by defining
: TRIPLE DUP DUP * * ;
and then execute this using
The FORTH system is written using a small amount of assembler, with most of the operations being written in FORTH. The system from the book uses what is known as indirect threaded interpretation – the code is more compact that true compilation, but runs faster than byte code. To get a feel for how this works, we can look at how the definition of the TRIPLE function above is compiled. Definitions are stored in a block of memory, the dictionary, which is available to the running FORTH code. A definition takes the form of 16 bit words:
(1) the name – on the old 16 bit system this was encoded as a length byte together with the first three characters
(2) link to the previous dictionary entry
(3) pointer to the handler for the body
(4) data corresponding the function definition
In the case of TRIPLE we’d have
(1) the bytes 6TRI
(2) link to previous entry
(3) address of in-built assembler stub named COLON
(4) pointers to the (3) fields of the dictionary entries for DUP, DUP, *, * followed by the address of the built-in routine SEMI.
The in-built COLON routine is responsible for working through the items in the (4) block and calling them one-by-one. SEMI is responsible for doing a return back to the caller.
Some base set of primitives in the system is implemented in assembler. These take the same format, but the (3) block points to the start of the (4) block, and this block can contain the actual assembler code for the primitive. This keeps the calling sequence the same. This mechanism around the COLON routine is called the inner interpreter.
There is an outer interpreter that takes care of reading a line from the console and converting it into code to be executed. This outer interpreter simply takes the next chunk of characters and looks them up in the dictionary. Then, depending on the mode, the address is either written to the next free word in the dictionary or the primitive is executed. This allows the :, defining keyword, to be implemented in FORTH itself. : is an immediate keyword – when the outer loop sees it, it executes it immediately, transitioning the system into compile mode after the : handler has read the name of the function from the command line and written the encoded name together with the (2) and (3) fields to the end of the dictionary. Subsequent tokens will be looked up and their addresses will be added to the dictionary. When the end of definition “;” keyword is reached, this is again an immediate keyword which will be executed and which will reset the mode back to the normal execute mode after adding a pointer to the internal SEMI routine to the words for the current definition. Numbers are also handled specially by this outer interpreter – if a lookup fails in the dictionary, the system tries to parse the characters as a number, in which case a pointer to the built-in constant handler and then the value of the constant are pushed on to the end of the dictionary.
All of the code for manipulating the dictionary can be written in FORTH itself. Primitives (that are in the dictionary and have assembler associated with their execution fields) can expose all of the various internal state to the system itself, and FORTH code can write and read from memory, so the system is very flexible.
The book is a joy to read. The author covers all of the material connected with implementing the system, starting with a high level view and moving down to a complete Z80 assembler implementation. Having the assembler answers any lingering questions that the author hasn’t covered in his very good explanations, and the author’s enthusiasm is infectious. I may well have to write a simple FORTH system in the near future!