I’ll have to check if you are my type

I was looking through the F# specification the other day with an eye to seeing how the type inferencing works.
 
I’ve implemented a number of compilers for functional programming languages in the past, and have reasonnable knowledge about how they are implemented. To my surprise, the type interfencing is defined in terms of constraint solving. I remembered that there were a number of papers that had passed by on Lambda the Ultimate which reformulated the Milner-Hindley method in terms of constraints, so I dug out the first paper to get a feel for the area.
 
 
The paper covers type inference for a small language with a few base types (int/bool/string) and a very simple grammar containing just variable values, applications, lambda expressions and let bindings. Types consists of type variables, bases types or functions from types to types. There is also the notion of a type scheme which is the generalisation that happens in let bindings. For example, in the code
  let id = fn x -> x in (id 1, id true)
the function bound to id is given the type
  forall a. a -> a
The values in the tuple expression are then type checked using instances of this type scheme, allowing the two uses of id to have different type instantiations.
 
The way I have seen this implemented in the past is by variants of the W algorithm. This algorithm  uses unification to resolve two types to get a most general type of which the two types are an instance.
Let’s look at the expression
let id = fn x -> let y = x in y
in id id
Algorithm W takes a type environment (mapping program variables to types) and an expression and returns a  substitution (for type variables) and a type for the expression. It would type check our example expression  as follows:
–> In empty type environment let id = fn x -> let y = x in y in id id
—-> In empty type environment fn x -> let y = x in y
—-> We generate a fresh type variable representing the type of x
——> In type environment x:b1, let y = x in y
——–> In type environment x:b1, x
——–< Return no substitution and type b1 for the expression
——> We generalize the result type, b1, for x, in an environment containing x. This generalization doesn’t effect
——> b1 in this case as this type is bound further out.
——–> In the type environment x:b1, y:b1 we process y
——–< Return no substitution and type b1
——< We return no substitution and the type b1
—-< We return no substitution and type b1 -> b1
–< We now know that id has type b1 -> b1. Because this is a let binding though, we generalise this
–< to forall b1. b1 -> b1. Hence uses of id inside the let body are going to allow separate instantiations.
—-> In type environment id: forall b1. b1 -> b1 process id id
——> In type environment forall b1. b1-> b1 process id
——< We instantiate the schema with new type variables and return no substitution and b2 -> b2
——> In type environment forall b1. b1-> b1 process id
——< We instantiate the schema with new type variables and return no substitution and b3 -> b3
—-> The applications causes a new type variable b4 to be generated for the result type of (id id)
—-> and we need to unify (b2-> b2) and (b3 -> b3) -> b4
—-> This gives substitions b2 = b3 -> b3 and b4 = b3 -> b3 and the result type is b4, which is transformed
—-> to b3 -> b3
–> The final type of the expression is b3 -> b3.
The constraint approach works very differently; it happens in two phases. First a set of constriants is generated. Then the constraints are solved; it is at this point that the unification happens. Breaking things into two phases offers the possibility for generating much better type error messages.
The constraints take three forms.
A = B reflects that A and B should be unified at a later stage of the type inferencing.
A <= B reflects that A should be a generic instance of B.
To handle the let bound polymorphism, there is a third constraint type,
A <{M}= B which reflects that the variables M should remain monomorphic during this unification.
Let’s run through the same process on let id = fn x -> let y = x in y in id id

–> let id = fn x -> let y = x in y in id id
—-> fn x -> let y = x in y in id id
——> let y = x in y
——–> x
——–< Generate a new type variable, b1, and return the environment x:b1 and type b1
——–> y
——–< Generate a new type variable, b2, and return the environment y:b2 and type b2
——< Return the environment x:b2 and generate the constaints b2 <{b1}= b1
—-< Introduce a fresh type variable b3 for the result type
—-< Generate constraints b2=b3
—-< and return type b3->b2 and an empty enivronment
–> Continue the body of the let
—-> Process (id id)
——> Process id
——< Return a fresh type variable b5 and environment id:b5
——> Process id
——< Return a fresh type variable b6 and environment id:b6
—-< Add constraints b5 = b6 -> b7 for a fresh b7, and result type b7 in type environment id:b5 id:b6
–< Result type is b7 and constraints are b5 <{}= b3 -> b2 and b6 <{}= b3 -> b2

Now running the rules to solve the equations gives the same result, forall a. a -> a.
b2 <{b1}= b1 forces b2=b1 and then the key is that b5 <{}= b3->b2 can only be solved after the  constraint b2=b3 has been processed, showing that b5 <{}= b1->b1 ie that b5 has the polymorphic type
forall b1. b1 -> b1.
The constraint framework is neat in that we can see how the types flow while the result is constrained, potentially allowing better debugging of the result. Now to investigate the F# type inferencing in more detail.
 
Advertisements
This entry was posted in Computers and Internet. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s