It’s not the parsing, but a question of semantics

One very neat feature of Lisp based languages is that you don’t really need a parser. Lisp systems provide a facility called the Reader which is responsible for taking a string and converting it into a “parse tree”. The neat thing is that the form of this parsed representation is trivial to express in the datatypes that you use for doing your normal programming in the language – this language idea is called homoiconicity. Hence the Reader itself can be very simple, and it certainly makes it easy to do some forms of metaprogramming such as macros for syntax tree transformations. However, when it comes down to actually compiling some source code, systems like Clojure still need a tree representation that is annotated with semantic information about the various forms.

I’ve recently been having a play with a couple of GitHub Clojure projects that take care of this expansion, and these have been used in a number of recent language analysis tools. Such tools typically need some form of tree that better expresses the dataflow of the resulting code.

The projects are a base analyser and separate extensions for JVM and JavaScript (ClojureScript) backends.

A call to analyse  macroexpands the supplied form, and returns a tree representation of the semantic tree.
(def form (ana.jvm/analyze '(fn [x] (+ x 1))))

We can convert the tree back into source code using the  emit-form function
user=> ( form)
(fn* ([x] (clojure.lang.Numbers/add x 1)))

The top level form that we processed above is tagged as a function definition.
user=> (-> form :tag)

The function has a number of overloaded methods associated with it. We can take the first (of one) and look to see what its body looks like.
user=> ( (-> form :methods first :body))
(clojure.lang.Numbers/add x 1)

The body is a do (as each method body is contained inside an implicit do)
user=> (-> form :methods first :body :op)
user=> (-> form :methods first :body keys)
(:o-tag :tag :body? :op :env :form :statements :ret :children)

This post lists the various tree types that turn up in the annotated tree and the unit tests show the kind of items you expect in maps that represent the nodes, and this video by Timothy Baldridge covers some of the potential uses. It is very useful to have a library that will take care of macroexpanding a given form, and then express its meaning as a tree of a limited node types. To actually compile things we simply need to be able to code generate these node types, knowing that more complicated expressions define their semantics in terms of these simple nodes. Sometimes, the beauty of the ideas behind Lisp astound me.

This entry was posted in Computers and Internet. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s