F# can even take out the garbage

With all this talk of modelling at the talks I’ve recently atttended, I figured that it might be time to put some of this in to practice. F# supports the notion of meta-programming quite nicely; there is a means to get hold of a typed abstract syntax tree for expressions that one defines in the language. In principle therefore, one could write a garbage collector in F#, getting the benefits of high level language debugging of the algorithms used and then one could use the quotation support to code generate a garbage collector in C. This kind of idea has been used before – the Squeak smalltalk virtual machine was debugged in Smalltalk and then the VM itself was code generated as C from this object model. Unfortunately, I haven’t yet had time to complete this project. I have finished an implementation of a rudimentary garbage collector, but haven’t yet done the interesting code generation part. I figured that I might as well blog about that part and hope to get more time soon to progress further.
 
The type of garbage collector I am interested in code generating is a mark/sweep collector using pointer reversal. Mark/sweep is one of the very early garbage collection algorithms (from the late 1950’s) and without the complication of generations it is fairly easy to implement. Pointer reversal is a technique for storing the stack, which accumulates as we mark through objects, inside the heap objects themselves, avoiding the need for a garbage collector stack or mark map. It is a technique that has fallen out of favour, one reason being that the multiple writes to objects when we store the reverse pointers means that a lot of pages get touched multiple times, which might cause the virtual memory system some problems.
 
We’ll build heap allocated objects on top of this type. Objects on the heap have a flag to say if they are marked (Marked), can be initialized to safe values (Initialize),  can list the pointers that they contain (Fields) and allow us to store a pointer and offset in them (StorePointer) which we’ll need when we store the revese pointers into the objects. Fields takes an offset into the object and returns the next offset and value of any embedded pointer, and returns None if there isn’t one. When we store a pointer (StorePointer) into an object at a given offset, we return the old contents of that pointer holding field and old stored offset. By doing this overlaying, the only extra space we need per-object is space for the mark bit and the offset value.
 
type GCable() =
  class
    abstract Marked : bool with get, set
    abstract StorePointer : option<GCable * int> * int -> option<GCable * int>
    abstract Initialize : unit -> unit
    abstract Fields : int -> option<GCable * int>
  end
 
We can now define an IntCell which has a head storing an int and a tail that may contain either another IntCell or None. Class types in F# don’t have a null element so we need to use an option type to give us None to represent this null case.
 
type IntCell() =
  inherit GCable()
  let mutable marked = false
  let mutable offset = -1
  let mutable head = 0
  let mutable tail = None
  override me.Marked with get() = marked and set(value) = marked <- value
  override me.StorePointer (reverseData,y) =
    if y <> 1 then failwith "Invalid offset to store"
    let toReturn =
      if offset = -1
      then None
      else
        match tail with
        | None -> failwith "Invalid reverse tail"
        | Some datum -> Some(datum, offset) 
    match reverseData with
      | Some (x, storedOffset) ->
          offset <- storedOffset
          tail <- Some x
      | None ->
          offset <- -1
          tail <- None
    toReturn
  override me.Initialize() =
      head <- 0
      tail <- None
  override me.Fields(index) =
    match tail with
    | None -> None
    | Some value -> if index = 0 then Some (value,1) else None
  member me.head with get() = head and set(x) = head <- x
  member me.tail
     with get() = (match tail with None -> None | Some y -> Some(y :?> IntCell))
     and set(x : option<IntCell>) = tail <- (match x with None -> None | Some y -> Some (y :> GCable));;
 
Our heap will support the following interface.
 
type IGCHeap =
  interface
    abstract Allocate : System.Type -> option<GCable>
     abstract FreeSpace : int with get
    abstract Allocated : unit -> List<GCable>
    abstract MakeFree: GCable -> unit
  end;;
 
Allocate will try to allocate another object of the given type. It will return None if there is no space left. The heap will track the allocated objects (Allocated) and we have access to a MakeFree function for releasing objects. FreeSpace gives us a measure of the free space available.
 
type TestHeap(size) =
  let allocatedItems = new List<GCable>()
  let freespace() = size – allocatedItems.Count
  interface IGCHeap with
    member me.Allocate(objType) =
            if freespace() = 0
            then None
            else
              let newItem = System.Activator.CreateInstance(objType) :?> GCable
              newItem.Initialize()
              allocatedItems.Add(newItem)
              Some newItem
    member me.FreeSpace with get() = freespace()
    member me.Allocated() =
      let result = new List<GCable>()
      for item in allocatedItems do
        result.Add(item)
      result
    member me.MakeFree (item) =
      item.Marked <- false
      allocatedItems.Remove(item) |> ignore
  end;;
 
We can do some simple tests against the implementation.
let Check x = if not x then failwith "Violation"
let heap1 = new TestHeap(10) :> IGCHeap;;
Check (heap1.FreeSpace = 10);;
Check (heap1.GCCount = 0);;
let newCell (heap : IGCHeap) =
  match heap.Allocate(typeof<IntCell>) with
  | None -> failwith "no memory left"
  | Some datum -> datum :?> IntCell;;
let obj1 = newCell(heap1);;
Check (heap1.FreeSpace = 9);;
Check (heap1.GCCount = 0);;
Check (obj1.Fields(0) = None);;
obj1.head <- 50;;
Check (obj1.Fields(0) = None);;
obj1.tail <- Some obj1;;
Check (obj1.Fields(0) <> None);;
obj1.tail <- None;;
Check (obj1.Fields(0) = None);;
 
Now we can get to implementing a collector that frees items on this heap. This is the mark routine that does the pointer reversing marking.
markThrough is called for each object – it marks the object and then enters the scan function which is responsible for marking through all of the fields of the object.
The routines pass the reverse pointers as reverseData which points to the field in the previous object in the stack – this may be None which is the case for the top level entry into the marker.
let rec markThrough reverseData (obj : GCable) =
  obj.Marked <- true
  scan obj 0 reverseData
 
and scan obj index reverseData =
    match obj.Fields(index) with
    | None -> unwind reverseData obj
    | Some (target, field) ->
      if target.Marked
      then scan obj (index+1) reverseData
      else
        let result = obj.StorePointer (reverseData, field)
        markThrough (Some(obj, field)) target
 
and unwind reverseData obj =
      match reverseData with
      | None -> ()
      | Some (item, offset) ->
         let previous = item.StorePointer (Some (obj,0) , offset) 
         scan item offset previous
 
We can generate some cells and give it a quick check.
let heap2 = new TestHeap(10) :> IGCHeap;;
let obj2 = newCell heap2;;
let obj3 = newCell heap2;;
Check(not obj2.Marked);;
Check(not obj3.Marked);;
obj2.tail <- Some obj3;;
markThrough None (obj2 :> GCable);;
Check(obj2.Marked);;
Check(obj3.Marked);;
Check((fun (Some datum) -> LanguagePrimitives.PhysicalEquality datum obj3) obj2.tail);;
Check(obj3.tail = None);;
 
Now we need to sweep up after the mark phase.
let sweep (heap :IGCHeap) =
  for item in heap.Allocated() do
    if item.Marked
    then item.Marked <- false
    else heap.MakeFree item
 
And we can quickly check that it works to some degree.
 
Check (heap2.FreeSpace = 8);;
sweep heap2;;
Check (heap2.FreeSpace = 8);;
obj2.tail <- None;;
markThrough None (obj2 :> GCable);;
sweep heap2;;
Check (heap2.FreeSpace = 9);;
 
Pointer reversal is easy to get wrong and the bugs often leave the heap in a very corrupted state. By working at this high level, we can unit test the mark routine itself, by defining dummy object types with interesting pointer maps.
 
type TestCell(fields) =
  inherit GCable()
  let mutable marked = false
  let mutable offset = -1
  let mutable stored = None
  override me.Marked with get() = marked and set(value) = marked <- value
  override me.StorePointer (reverseData,y) =
    printfn "Store into %d" y
    let toReturn =
      if offset = -1
      then None
      else
        match fields.[y-1] with
        | None -> failwith "Invalid reverse lookup"
        | Some datum -> Some(datum, offset) 
    match reverseData with
      | Some (x, storedOffset) ->
          offset <- storedOffset
          fields.[y-1] <- Some x
      | None ->
          fields.[y-1] <- None
          offset <- -1
    toReturn
  override me.Initialize() = ()
  override me.Fields(index) =
     printfn "Fields %d" index
     if index >= Array.length(fields)
     then None
     else
       match fields.[index] with
       | None -> me.Fields(index+1)
       | Some other -> Some (other, index+1);;
 
We can then verify (in this case by eye) that the expected series of actions are carried out.
let a = new IntCell() :> GCable;; (a :?> IntCell).head <- 1;;
let b = new IntCell() :> GCable;; (b :?> IntCell).head <- 2;;
let c = new IntCell() :> GCable;; (c :?> IntCell).head <- 3;;
let d = new IntCell() :> GCable;; (d :?> IntCell).head <- 4;;
let fields = [| Some a;Some b; Some b;Some c; Some b; Some d |];;
let testcell = new TestCell(fields);;
Check(not a.Marked);;
Check(not b.Marked);;
Check(not c.Marked);;
Check(not d.Marked);;
markThrough None (testcell :> GCable);;
Check(a.Marked);;
Check(b.Marked);;
Check(c.Marked);;
Check(d.Marked);;
Check(fields.[0] = Some a);;
Check(fields.[1] = Some b);;
Check(fields.[2] = Some b);;
Check(fields.[3] = Some c);;
Check(fields.[4] = Some b);;
Check(fields.[5] = Some d);;
 
And now we’ve prototyped, let’s define the real thing. We’re going to push the garbage collector implementation into the heap itself. The runtime will need to communicate pointer maps to the collector, so we’ll allow the runtime to push and pop collections of roots (PushRoots/PopRoots). In practice these maps would be generated lazily.
type IGCHeap =
  interface
    abstract Allocate : System.Type -> option<GCable>
    abstract FreeSpace : int with get
    abstract PushRoots: GCable list -> unit
    abstract PopRoots : unit -> unit
    abstract Collect : unit -> unit
  end;;
 
The heap definition now includes the code for the collector.
 
type TestHeap (size) =
  let allocatedItems = new List<GCable>()
  let freespace() = size – allocatedItems.Count
  let mutable rootStack = []
  let allocate (objType : System.Type) =
              let newItem = System.Activator.CreateInstance(objType) :?> GCable
              newItem.Initialize()
              allocatedItems.Add(newItem)
              Some newItem
  let allocated() =
      let result = new List<GCable>()
      for item in allocatedItems do
        result.Add(item)
      result
  let makeFree (item : GCable) =
      item.Marked <- false
      allocatedItems.Remove(item) |> ignore
  let collector (this) =
    List.iter (fun roots -> (List.iter (fun root -> markThrough None root) roots)) rootStack
    for item in allocated() do
      if item.Marked
      then item.Marked <- false
      else makeFree item
  interface IGCHeap with
    member me.Allocate(objType) =
            if freespace() = 0
            then
               collector me
               if freespace() = 0
               then None
               else allocate(objType)
            else allocate(objType)
    member me.FreeSpace with get() = freespace()
    member me.PushRoots frameRoots = rootStack <- frameRoots :: rootStack
    member me.PopRoots () = rootStack <-  List.tl rootStack
    member me.Collect () = collector me
  end;;
 
We can then define a new heap and a utility function for allocating data.
let heap3 = new TestHeap(10) :> IGCHeap;;
let newCell () =
  match heap3.Allocate(typeof<IntCell>) with
  | None -> failwith "no memory left"
  | Some datum -> datum :?> IntCell;;
 
And then write some test functions to exercise it.
let rec test (level : int) =
 if level > 0
 then
  printfn "On entry %d" heap3.FreeSpace
  let cell1 = newCell()
  try
      heap3.PushRoots [ (cell1 :> GCable) ]
      cell1.head <- 1
      newCell() |> ignore
      newCell() |> ignore
      printfn "Before collect %d" heap3.FreeSpace
      heap3.Collect()
      printfn "After collect %d" heap3.FreeSpace
      test (level – 1)
  finally
      heap3.PopRoots();;
 
Notice how the roots are communicated to the heap by pushing a new contour after a sequence of allocations.
We can test the heap by calling this function
 
heap3.FreeSpace;;
heap3.Collect();;
heap3.FreeSpace;;
test 5;;
heap3.FreeSpace;;
heap3.Collect();;
heap3.FreeSpace;;
 
So now I think we are at the stage were we can try to code generate something, but that’s going to have to wait until next time.
 
 
Advertisements
This entry was posted in Computers and Internet. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s