Macro mayhem

Macros are one of those things that set Lisp based languages apart from the rest, so Clojure has them too. A macro is a function that is called by the compiler when it is compiling code, and this function then returns more code which the compiler sets about compiling instead of the original.

A simple example would be something like the following:
user=> (defmacro myMacro[] (println "expansion") ‘(print x))
#’user/myMacro

Our use of the defmacro macro makes things look complicated. However, if we expand the definition we made, we see that defmacro expands into the normal defn which simply defines a function, then goes on to call the setMacro method on the Var, and finally returns the new Var.
user=> (macroexpand ‘(defmacro myMacro[] (println "expansion") ‘(print x)))
(do (clojure.core/defn myMacro [] (println "expansion") (quote (print x))) (. (var myMacro) (setMacro)) (var myMacro))

The setMacro method simply adds the :macro metadata to the Var
user=> ^#’myMacro
{:macro true, :ns #<Namespace user>, :name myMacro, :file "NO_SOURCE_PATH", :line 127, :arglists ([])}

We can call this macro function and see that all it does is return the expansion.
user=> (#’myMacro)
expansion
(print x)

When the compiler processes a form, one of checks it does is to see if the first element of any sequence is a macro. In this case, it expands the macro and continues processing on the result of the expansion. Hence the use of a macro looks like a function call, but the effect is very different. In the following we see that the “expansion” gets printed when the compiler processes the code, and the resulting form, which prints 20, is patched into the definition.
user=> (defn macroUser[] (let [x 20] (myMacro) (myMacro)))
expansion
expansion
#’user/macroUser
user=> (macroUser)
2020nil

Of course, if we are going to be returning code, we need a templating method for generating chunks of source code. This is what the syntax quote offers us. Syntax quote behaves a lot like quote, but we’ll see in a moment that it allows value substitution. The main different is that the symbols in the generated body have their namespace set to the current namespace.
user=> ‘(foo 2)
(foo 2)
user=> `(foo 2)
(user/foo 2)

The main advantage of syntax quote is that it allow substitution. The use of ~ causes the following form to be evaluated, with the result being substituted into the generated form. Typically the substitution happens of a variable using syntax like ~x, but any form can follow the ~, so we could use something like ~(print x) to get the side-effect. The ~@ variant expands the folloing form to evaluate to a list and splices the list into the generated code.

user=> (let [x 2] `(foo ~x))
(user/foo 2)

user=> (let [x 2] `(foo ~x ~x ~x))
(user/foo 2 2 2)

user=> (let [x ‘(2 3 4)] `(foo 1 ~@x 5))
(user/foo 1 2 3 4 5)

user=> (let [x ‘(2 3 4)] `(foo ~(println x) ~x ~@x))
(2 3 4)
(user/foo nil (2 3 4) 2 3 4)

The syntax quote actually generates code to produce the expansion, so that side-effects can happen. Quoting the result allows us to see what the expansion is.

user=> (let [x 2] ‘`(foo ~x))
(clojure.core/seq (clojure.core/concat (clojure.core/list (quote user/foo)) (clojure.core/list x)))

The big advantage of macros is that they allow us to write constructs that don’t evaluate arguments in the standard order. If we want to implement something like if,
    (if (= 1 2) (print “a”) (print “b”))
then we want to evaluate the condition and only one of the branches. If “if” is a standard Clojure function, then the arguments are all going to be evaluated before the code for “if” runs (assuming no exceptions). Using a macro we can expand to use existing special forms to avoid this.

One gotcha in Common Lisp is that you have to be careful to avoid macros in the expanded form shadowing variables from the outer lexical scope. So, for example, if you write a macro to time something

(defmacro get-execution-time [& body]
  `(let [before (. System (nanoTime))]
     (do ~@body)
     (- (. System (nanoTime)) before)))

you’d get incorrect behaviour if you tried to

(let [before 10]
  (get-execution-time
    (print before)))

as this would expand to

(let [before 10]
  (let [before (. System (nanoTime))]
    (do (print before))
    (- (. System (nanoTime)) before)))

and hence the (print before) would print the wrong value, as the before in the print gets redirected to point to the start time instead of the outer definition.

The way that Clojure’s syntax quote namespace qualifies the symbols, and the rule that namespace qualified symbols cannot be locally bound, means that the macro as given above fails in Clojure.

user=> (defmacro get-execution-time [& body]
  `(let [before (. System (nanoTime))]
     (do ~@body)
     (- (. System (nanoTime)) before)))
#’user/get-execution-time
user=> (get-execution-time (. Thread (sleep 100)))
java.lang.Exception: Can’t let qualified name: user/before (NO_SOURCE_FILE:152)

Inside the syntax quote, we can generate new non-namespaced symbols by post-fixing a symbol with a hash. This causes the reader to generate a new symbol which is substituted for the hash postfixed version. Most importantly, within the same syntax quoted form, the same symbol can be used multiple times.

user=> `(a# a#)
(a__116__auto__ a__116__auto__)

We can use this to easily fix our macro.

user=> (defmacro get-execution-time [& body]
  `(let [before# (. System (nanoTime))]
     (do ~@body)
     (- (. System (nanoTime)) before#)))
#’user/get-execution-time
user=> (get-execution-time (. Thread (sleep 100)))
99666927
user=> (get-execution-time (. Thread (sleep 101)))
100825314

All of this is handled in the LispReader.java code, in particular the two reader macros

macros[‘`’] = new SyntaxQuoteReader();
macros[‘~’] = new UnquoteReader();

SyntaxQuoteReader sets up a map for mapping the hash post-fixed symbols to the generated symbol (“gensym”) that it maps to. The reader is called recursively when we start processing the “`” character in the SyntaxQuoteReader. This recursive read will return ~ and ~@ as uses of the symbols

static Symbol UNQUOTE = Symbol.create("clojure.core", "unquote");
static Symbol UNQUOTE_SPLICING = Symbol.create("clojure.core", "unquote-splicing");

and the code in SyntaxQuoteReader walks the returned form to get rid of uses of these symbols, converting the code into a sequence of calls to do the actual construction as we saw in the example above.

Suggested practice is that macros are deterministic and hygenic – they do not have side effects.
Also, that macros that look like normal function calls should evaluate their arguments in the normal order, and should evaluate the arguments only once. Hence a macro like
  user=> (defmacro broken[x y] `(do ~y ~x ~x))
  #’user/broken
is bad because it doesn’t behave much like a normal function with regard to argument evaluation. We can see this by using a side-effecting function as an argument.
user=> (broken (print 1) (print 2))
211nil
where a normal function would give
user=> (defn foo[x y])
#’user/foo
user=> (foo (print 1) (print 2))
12nil

The macro should be corrected to
user=> (defmacro brokenNot[x y]
  `(let [x# ~x y# ~y]
     (do y# x# x#)))
#’user/brokenNot
user=> (brokenNot (print 1) (print 2))
12nil
user=> (macroexpand ‘(brokenNot (print 1) (print 2)))
(let* [x__185__auto__ (print 1) y__186__auto__ (print 2)] (do y__186__auto__ x__185__auto__ x__185__auto__))

The above also highlights that the best way to debug the macro expansion is often to call macroexpand and macroexpand-1. Macroexpand-1 checks the first element of the sequence to see if it is macro and does the expansion if it is. macroexpand calls macroexpand-1 until there is no more expansion possible. Both are useful when debugging macros.
user=> (macroexpand ‘(brokenNot (print 1) (print 2)))
(let* [x__185__auto__ (print 1) y__186__auto__ (print 2)] (do y__186__auto__ x__185__auto__ x__185__auto__))

user=> (macroexpand-1 ‘(brokenNot (print 1) (print 2)))
(clojure.core/let [x__185__auto__ (print 1) y__186__auto__ (print 2)] (do y__186__auto__ x__185__auto__ x__185__auto__))
user=> (macroexpand-1 *1)
(let* [x__185__auto__ (print 1) y__186__auto__ (print 2)] (do y__186__auto__ x__185__auto__ x__185__auto__))

Macros are often used to implement internal DSLs and as such can be a very powerful aid to abstraction.

Advertisements
This entry was posted in Computers and Internet. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s