Tuesday, March 17, 2015

Abstract Binding Trees, an addendum

It struck me after the last post that it might be helpful to give an example using abstract binding trees in a more nontrivial way. The pure lambda calculus has a very simple binding structure, and pretty much anything you do will work out. So I decided to show how ABTs can be used to easily support a much more involved form of binding -- namely, pattern matching.

This makes for a nice example, because it is a very modular and well-structured language feature, that naturally has a rather complex binding structure. However, as we will see, it very easily into an ABT-based API. I will do so assuming the implementation in the previous post, focusing only on the use of ABTs.

As is usual, we will need to introduce a signature module for the language.

 module Lambda =
Since pattern matching really only makes sense with a richer set of types, let's start by adding sums and products to the type language.
   type tp = Arrow of tp * tp | One | Prod of tp * tp | Sum of tp * tp 

Next, we'll give a datatype of patterns. The PWild constructor is the wildcard pattern $\_$, the PVar constructor is a variable pattern, the PPair(p, p') constructor is the pair pattern, and the Inr p and Inl p patterns are the patterns for the left and right injections of the sum type.

   type pat = PWild | PVar | PUnit | PPair of pat * pat 
            | PInl of pat | PInr of pat 

Now, we can give the datatype for the language signature itself. We add pairs, units, and sums to the language, as well as a case expression which takes a scrutinee, and a list of branches (a list of patterns and the corresponding case arms).

   type 'a t = Lam of 'a | App of 'a * 'a | Annot of tp * 'a
             | Unit | Pair of 'a * 'a | Inl of 'a | Inr of 'a
             | Case of 'a * (pat * 'a) list

One thing worth noting is that the datatype of patterns has no binding information in it at all. The basic idea is that if we will represent (say) a pair elimination $\letv{(x,y)}{e}{e'}$ as the constructor Case(e, [PPair(PVar, PVar), Abs(x, Abs(y, e'))]) (suppressing useless Tm constructors). So the PVar constructor is merely an indication that the term be an abstraction, with the number of abstractions determined by the shape of the pattern. This representation is first documented in Rob Simmons's paper Structural Focalization.

This is really the key thing that gives the ABT interface its power: binding trees have only one binding form, and we never introduce any others.

Now we can give the map and join operations for this signature.

   let map f = function
     | Lam x        -> Lam (f x)
     | App (x, y)   -> App(f x, f y)
     | Annot(t, x)  -> Annot(t, f x)
     | Unit         -> Unit 
     | Pair(x, y)   -> Pair(f x, f y)
     | Inl x        -> Inl (f x)
     | Inr x        -> Inr (f x)
     | Case(x, pys) -> Case(f x, List.map (fun (p, y) -> (p, f y)) pys)

   let join m = function
     | Lam x -> x
     | App(x, y) -> m.join x y
     | Annot(_, x) -> x
     | Unit        -> m.unit
     | Pair(x,y)   -> m.join x y
     | Inl x       -> x
     | Inr x       -> x
     | Case(x, pys) -> List.fold_right (fun (_, y) -> m.join y) pys x 

As usual, we construct the syntax by applying the Abt functor.

 module Syntax = Abt(Lambda)

We can now define a bidirectional typechecker for this language. Much of the infrastructure is the same as in the previous post.

 module Bidir = struct
   open Lambda
   open Syntax
  type ctx = (var * tp) list

We do, however, extend the is_check and is_synth functions to handle the new operations in the signature. Note that case statements are viewed as checking forms.

  let is_check = function 
    | Tm (Lam _) | Tm Unit | Tm (Pair(_, _))
    | Tm (Inl _) | Tm (Inr _) | Tm (Case(_, _))-> true
    | _ -> false
  let is_synth e = not(is_check e)

  let unabs e =
    match out e with
    | Abs(x, e) -> (x, e)
    | _ -> assert false

When we reach the typechecker itself, most of it --- the check and synth functions --- are unchanged. We have to add new cases to check to handle the new value forms (injections, pairs, and units), but they are pretty straightforward.

  let rec check ctx e tp =
    match out e, tp with
    | Tm (Lam t), Arrow(tp1, tp') ->
      let (x, e') = unabs t in
      check ((x, tp1) :: ctx) e' tp'
    | Tm (Lam _), _               -> failwith "expected arrow type"
    | Tm Unit, One  -> ()
    | Tm Unit, _ -> failwith "expected unit type"
    | Tm (Pair(e, e')), Prod(tp, tp') ->
      let () = check ctx e tp in
      let () = check ctx e' tp' in
    | Tm (Pair(_, _)), _ -> failwith "expected product type"
    | Tm (Inl e), Sum(tp, _) -> check ctx e tp
    | Tm (Inr e), Sum(_, tp) -> check ctx e tp
    | Tm (Inl _), _
    | Tm (Inr _), _          -> failwith "expected sum type"

The big difference is in checking the Case form. Now, we need to synthesize a type for the scrutinee, and then check that each branch is well-typed. This works by calling the check_branch function on each branch, passing it the pattern, arm, type and result type as arguments. For reasons that will become apparent, the pattern and its type are passed in as singleton lists. (I don't do coverage checking here, only because it doesn't interact with binding in any way.)

    | Tm (Case(e, branches)), tp ->
      let tp' = synth ctx e in
      List.iter (fun (p,e) -> check_branch ctx [p] e [tp'] tp) branches 
    | body, _ when is_synth body ->
      if tp = synth ctx e then () else failwith "Type mismatch"
    | _ -> assert false

  and synth ctx e =
    match out e with
    | Var x -> (try List.assoc x ctx with Not_found -> failwith "unbound variable")
    | Tm(Annot(tp, e)) -> let () = check ctx e tp in tp 
    | Tm(App(f, e))  ->
      (match synth ctx f with
       | Arrow(tp, tp') -> let () = check ctx e tp in tp'
       | _ -> failwith "Applying a non-function!")
    | body when is_check body -> failwith "Cannot synthesize type for checking term"
    | _ -> assert false
The way that branch checking works is by steadily deconstructing a list of patterns (and their types) into smaller lists.
  and check_branch ctx ps e tps tp_result =
    match ps, tps with

If there are no more patterns and their types, we are done, and can check that the arm has the right type.

    | [], []-> check ctx e tp_result

If we have a variable pattern, we unabstract the arm, and bind that variable to type, and recur on the smaller list of patterns and types. Note that this is the only place we have to do anything at all with binding, and it's trivial!

    | (PVar :: ps), (tp :: tps)
      -> let (x, e) = unabs e in
         check_branch ((x, tp) :: ctx) ps e tps tp_result

Wildcards and unit patterns work the same way, except that they don't bind anything.

    | (PWild :: ps), (tp :: tps)
      -> check_branch ctx ps e tps tp_result
    | (PUnit :: ps), (One :: tps)
      -> check_branch ctx ps e tps tp_result
    | (PUnit :: ps), (tp :: tps)
      -> failwith "expected term of unit type"

Pair patterns are deconstructed into two smaller types, and the product type they are checked against is broken into its two subterms, and then the list is lengthened with all of the subderivations.

    | (PPair(p, p') :: ps), (Prod(tp, tp') :: tps) 
      -> check_branch ctx (p :: p' :: ps) e (tp :: tp' :: tps) tp_result
    | (PPair(p, p') :: ps), (_ :: tps) 
      -> failwith "expected term of product type"

Sum patterns work by recurring on the sub-pattern, dropping the left or right part of the sum type, as appropriate.

    | (PInl p :: ps), (Sum(tp, _) :: tps)
      -> check_branch ctx (p :: ps) e (tp :: tps) tp_result
    | (PInl p :: ps), (_ :: tps)
      -> failwith "expected sum type"
    | (PInr p :: ps), (Sum(_, tp) :: tps)
      -> check_branch ctx (p :: ps) e (tp :: tps) tp_result
    | (PInr p :: ps), (_ :: tps)
      -> failwith "expected sum type"
    | _ -> assert false

That's it! So I hope it's clear that ABTs can handle complex binding forms very gracefully.

Friday, March 13, 2015

Abstract Binding Trees

On reddit, Paul Chiusano asked me:
I have a somewhat related question - do you know if it is possible / easy to implement one or both typechecking algorithms using a representation of terms / types where polymorphic recursion is used to enforce proper variable scoping, a la bound.

The answer to this questions is no, I haven't thought about this at all. Except when I'm using a proper proof assistant, I don't bother with using types to track scoping. In fact, I tend to view term representations with highly-structured syntax as strong evidence for layering violations: the parser shouldn't know too much about the binding or type structure of a program, because conceptually binding structure is determined after we produce the syntax tree. This is made evident when you look at languages like ML or Haskell, which have features which make it impossible to know which names are in scope prior to doing significant amounts of work (because of import in Haskell, or open in ML).

That said, the problem which motivates types-for-binding is a real one: how can we make substitution safe, once and for all?

The technique I favor is what Bob Harper has dubbed "abstract binding trees", and documented in Chapter 1 of his book. I think it originates in Nuprl, and is also closely related to Martin-Löf's "system of arities" (see Chapter 3 of Programming in Martin-Löf's Type Theory).

To understand abstract binding trees, it helps to start thinking about ordinary abstract syntax trees. ASTs are really just constants $c$, or applications of a term constructor $f$ to some sequence of terms $f(t_1, \ldots t_n)$. (If you are a lisp programmer, you can think of these as s-expressions, and Prolog programmers can think of these as the ground elements of the Herbrand universe.)

Then, to get ABTs, we extend ASTs with two constructors, one for variables $x$, and one for abstractions $x.\;t$. Abstractions are purely binders, with no reduction rules associated with them, and abstract binding trees are simply equated up to alpha-equivalence.

By programming this idea up, we can simply parameterize over the set of term constructors, and have everything to do with scoping and substitution be programmed once and for all. So we end up in a situation where everything Just Works, and all the types are satisfyingly uncomplicated. Furthermore, the implementation is simple enough to fit into a blog post. (All the code in this post in in this gist.)

I'll do it in Ocaml, and so we'll need to define some signatures that Haskell programmers get out of the box. First, we need a signature for functorial types. This signature is satisfied by any parametric datatype with a map functional. It also needs to send identities to identities and respect composition, but we can't represent that unless we're in Agda or Coq.

 module type FUNCTOR = sig
   type 'a t
   val map : ('a -> 'b) -> 'a t -> 'b t

Next, we'll introduce a type of monoids, with a unit and combining operation. I'll just use a record here, since that's a lot lighter weight than ML's modules.

 type 'a monoid = {unit : 'a ; join : 'a -> 'a -> 'a}

With these preliminaries out of the way, we can move on towareds giving the meat of the representation. First, I'll give a type of variables (and sets of variables). This time, instead of being abstract, I'll just be concrete and use strings as the representation of variables. This is partly because this is what I actually do, but partly also for the rhetorical effect of emphasising that an often-maligned representation technique is A-ok.

 type var = string
 module V = Set.Make(struct type t = var let compare = compare end)
Now, I'll give the signature for abstract binding trees.
 module type ABT = sig
   type 'a signature
   type 'a f = Var of var | Abs of var * 'a | Tm of 'a signature
   val map : ('a -> 'b) -> 'a f -> 'b f
   type t
   val into : t f -> t
   val out  : t -> t f
   val freevars : t -> V.t
   val var : V.elt -> t
   val abs : V.elt * t -> t
   val tm : t signature -> t
   val subst : t -> var -> t -> t

Here, the type 'a signature tells us what the term constructors we have are. We augment this with the 'a f type, which adds a new term former for variables Var of var, as well as a term constructor Abs of var * 'a for scoping a variable. Then we have an abstract type of terms t, which we can wrap and unwrap one level using the into and out functions.

We can also have ask for the free variables of a type with freevars, and we have some convenience constructors for variables, abstractions and terms, as well as the subst function which does the alpha-equivalence respecting substitution. We could also put in other operations (such as equality up to alpha-equivalence), but substitution suffices to illustrate the technique.

We'll also say just what a signature is, now. Basically, it's any functor which can collect up the elements of a monoid. This idea is from Björn Bringert and Aarne Ranta's paper A Pattern for Almost-Compositional Functions.

 module type SIGNATURE = sig
   include FUNCTOR
   val join : 'a monoid -> 'a t -> 'a 

We are finally in a position to give the implementation of ABTs. It's an Ocaml functor taking a signature F, and returning an instance of the ABT signature where the 'a signature type is identified with the type of the argument.

module Abt(F : SIGNATURE) : ABT with type 'a signature := 'a F.t =
  type 'a f =
    | Var of var
    | Abs of var * 'a
    | Tm of 'a F.t

  let map f = function
    | Var x -> Var x
    | Abs(x, e) -> Abs(x, f e)
    | Tm t -> Tm (F.map f t)

As you can see, the type 'a f is just the same as in the signature, and the map function just lifts the signature type's map operation to the type augmented with binding structure.

  type t = In of V.t * t f

The concrete representation of terms is just the obvious recursive type built over the signature, augmented with a set of free variables at each node. This makes computing the free variables very easy:

  let freevars (In(vs, _)) = vs

In addition, unrolling the type t simply throws away some variables and returns the subterm:

  let out (In(_, t)) = t 

On the other hand, building the embedding function with smart constructors is a little more work. First, we note that sets of variables form a monoid, with the empty set as the unit and the set union as the join operation.

  let m = {unit = V.empty; join = V.union}
We can use this to define the smart constructors. The var function simply says that its free variables are its one variable itself, and in abs(z,e), the free variables are the free variables of e minus z. For the Tm t case, we use the join operation that we required the signature to possess.
  let var x = In(V.singleton x, Var x)
  let abs(z, e) = In(V.remove z (freevars e), Abs(z, e))
  let tm t = In(F.join m (F.map freevars t), Tm t)
Then we can define into just by using these constructors.
  let into = function
    | Var x -> var x
    | Abs(x, e) -> abs(x, e)
    | Tm t -> tm t 

Now all that remains is to implement capture-avoiding substitution. First, we give a fresh function which returns a new renaming of v which is not in a set of names vs. It does this by sticking primes onto v until it finds one that isn't in vs. It's possible to do something smarter, but it serves for a blog post.

  let rec fresh vs v =
    if V.mem v vs then fresh vs (v ^ "'") else v 
We also need a variable renaming function. rename x y e renames x to y in the body e, and just walks down the term structurally. If it hits an abstractor, it will stop if the name is shadowed, and otherwise continue on down. Here, we rely on the functoriality of the signature to handle the Tm case.
  let rec rename x y (In(fvs, t)) =
    match t with
    | Var z -> if x = z then var y else var z
    | Abs(z, e) -> if x = z then abs(z, e) else abs(z, rename x y e)
    | Tm v -> tm (F.map (rename x y) v)

We can now give the definition of capture-avoiding substitution. At each abstraction, it does the usual thing of choosing a fresh identifierand renaming. It's easy to do this more efficiently (for instance with a parallel substitution), but it's not worth bothering with until it becomes a bottleneck.

  let rec subst t x body = 
    match out body with
    | Var z when x = z -> t
    | Var z            -> var z 
    | Abs(x, e) ->
      let x' = fresh (V.union (freevars t) (freevars body)) x in
      let e' = subst t x (rename x x' e) in 
      abs(x', e')
    | Tm body -> tm (F.map (subst t x) body)

I'll finish off with a simple example of how we might use it. Let's proceed by giving a syntax for the simply-typed lambda calculus, in a bidirectional style. Note that the signature type 'a t simply has one constructor for each of the term formers of the lambda calculus, and it has no binding information in it at all – all binding is handled through the ABT interface. So the constructor for functions Lam of 'a, has a single subterm — we'll use typechecking to ensure that the subterm is of the form Abs(z, e).

module Lambda =
  type tp = Base | Arrow of tp * tp 
  type 'a t = Lam of 'a | App of 'a * 'a | Let of 'a * 'a | Annot of tp * 'a
  let map f = function
    | Lam x      -> Lam (f x)
    | App (x, y) -> App(f x, f y)
    | Let (x, y) -> Let(f x, f y)
    | Annot(t, x) -> Annot(t, f x)

  let join m = function
    | Lam x -> x
    | App(x, y) -> m.join x y
    | Let(x, y) -> m.join x y
    | Annot(_, x) -> x

With this signature in hand, we can create our syntax:

module Syntax = Abt(Lambda)
Now, here's a simple bidirectional typechecker for this calculus.
module Bidir = struct
  open Lambda
  open Syntax
Contexts are just lists of variables and their types.
  type ctx = (var * tp) list
In a bidirectional typechecker, every term is either used to synthesize a type, or is checked against a type. Here, we give predicates to check that.
  let is_synth = function 
    | Tm (Lam _) | Tm (Let (_, _)) -> false
    | _ -> true
  let is_check e = not(is_synth e)
Now, we give the unabs function, which deconstructs an abstraction and fails with an assertion error if it isn't handed an abstraction.
  let unabs e =
    match out e with
    | Abs(x, e) -> (x, e)
    | _ -> assert false

The reasoning can be seen in the definition of the check ctx e tp function, which checks that term e has type tp in context ctx.

  let rec check ctx e tp =
    match out e, tp with
    | Tm (Lam t), Arrow(tp1, tp') ->
      let (x, e') = unabs t in
      check ((x, tp1) :: ctx) e' tp'
In particular, in the lambda case, we check that the outer constructor is Lam, and then unabstract the body to get the binder and subterm. This should always succeed (or else we've been passed an ill-formed binding tree). It might be possible to tighten up the definition of signatures, but since ABTs are constructed by the parser, errors are likely to be few (and easily detected by Quickcheck-y techniques).
    | Tm (Lam _), _               -> failwith "expected arrow type"
    | Tm (Let(e', t)), _ ->
      let (x, e'') = unabs t in 
      let tp1 = synth ctx e' in 
      check ((x, tp1) :: ctx) e'' tp

The let case works similarly to the lambda case.

    | body, _ when is_synth body ->
      if tp = synth ctx e then () else failwith "Type mismatch"
    | _ -> assert false

If we know a term is synthesizing, we can ask for its type, and then see if it's equal to the type we are checking against. Otherwise, we have ill-formed syntax.

  and synth ctx e =
    match out e with
    | Var x -> (try List.assoc x ctx with Not_found -> failwith "unbound variable")

If we want to synthesize a type for a variable, we just look it up in the context.

    | Tm(Annot(tp, e)) -> let () = check ctx e tp in tp

If we have a type-annotated term, we can use that to synthesize a type for it.

    | Tm(App(f, e))  ->
      (match synth ctx f with
       | Arrow(tp, tp') -> let () = check ctx e tp in tp'
       | _ -> failwith "Applying a non-function!")

To infer a type for an application, we infer a function type for the head, and then check the argument against the argument type of the function, returning the result type of the function.

    | body when is_check body -> failwith "Cannot synthesize type for checking term"
    | _ -> assert false

As you can see, this is more or less what you'd write without any binding machinery – there are a few more annoying Tm constructors, but that's about it. The key point is that the type of substitution is pure, without the need for a context monad, name supply, or similar. That leaves us free to engineer the context monad as needed for the typechecker, without having to work around the requirements of the term representation.

Monday, March 2, 2015

New Draft: Sound and Complete Bidirectional Typechecking for Higher-Rank Polymorphism and Indexed Types

Together with Joshua Dunfield, I have a new draft on bidirectional type inference to announce:

Sound and Complete Bidirectional Typechecking for Higher-Rank Polymorphism and Indexed Types

Bidirectional typechecking, in which terms either synthesize a type or are checked against a known type, has become popular for its scalability, its error reporting, and its ease of implementation. Following principles from proof theory, bidirectional typing can be applied to many type constructs. The principles underlying a bidirectional approach to indexed types (generalized algebraic datatypes) are less clear. Building on proof-theoretic treatments of equality, we give a declarative specification of typing based on focalization. This approach permits declarative rules for coverage of pattern matching, as well as support for first-class existential types using a focalized subtyping judgment. We use refinement types to avoid explicitly passing equality proofs in our term syntax, making our calculus close to languages such as Haskell and OCaml. An explicit rule deduces when a type is principal, leading to reliable substitution principles for a rich type system with significant type inference.

We also give a set of algorithmic typing rules, and prove that it is sound and complete with respect to the declarative system. The proof requires a number of technical innovations, including proving soundness and completeness in a mutually-recursive fashion.

If you are very interested in the details, we have a long (153 pages!) technical report with all of the proofs.

Don't be put off by the proofs, though -- the algorithmic type system is only modestly more complex than our earlier paper on bidirectional typechecking, and this go-around we add a lot of new features, such as intersection types, existential types, GADT-style equality constraints, and a full account of how to do pattern matching (including coverage checking) with all of these features.

Logically speaking, the punchline of the paper is that proof theory knows of two formulations of equality -- the Martin-Lof identity type, and the Girard/Schroeder-Heister equality. (Hilariously, these two notions of equality are not the same.) Furthermore, GADTs work exactly like the Girard/Schroeder-Heister equality. Everything things works uncannily well, and our rules (even the algorithmic ones) are much simpler than many of the rules you'll find in the literature.

Thursday, February 26, 2015

Midlands Graduate School 2015

Registration for the Midlands Graduate School is open! I'll be lecturing again this year on functional reactive programming, and am very much looking forward to meeting the participants.


The Midlands Graduate School in the Foundations of Computing Science (MGS) was established in 1999 as a collaboration between researchers at the Universities of Birmingham, Leicester, Nottingham, and later Sheffield. It has two main goals: to equip PhD students with a sound basis for their research by deepening their knowledge on the mathematical and conceptual foundations of computing; and to provide a platform for making contacts with established researchers in the field and with their peers who are at a similar stage in their research careers.

This year's MGS is hosted by the Department of Computer Science at the University of Sheffield. It will start on April 07 and finish on April 11.

Information about previous events can be found at the MGS web site.


MGS 2015 consists of nine courses, each with four or five hours of lectures and exercise sessions. Three of the courses are introductory or core; they should be taken by all participants. The other courses are more advanced or specialised. Participants may select them depending on their interests.

This year the invited lectures will be given by Prof Jeremy Gibbons, Oxford.

In addition there will be early evening sessions in which participants can briefly present and discuss their own research.

Core Courses

Category Theory Roy Crole, Leicester
Typed Lambda Calculus Paul Blain Levy, Birmingham
Patterns in Functional Programming Jeremy Gibbons, Oxford

Advanced Courses

Homotopy Type Theory Thorsten Altenkirch, Nottingham
Infinite Data Structures Venanzio Capretta, Nottingham
Security Protocol Verification Eike Ritter, Birmingham
Functional Reactive Programming Neelakantan Krishnaswami, Birmingham
Building Verification Tools with Isabelle Georg Struth, Sheffield

More information about the course schedule will be given shortly.


The registration deadline for MGS 2015 is Monday March 16. The registration fee is £460.

This fee includes 5 nights of accommodation and breakfasts at Hotel Ibis in Sheffield (from Monday April 06 evening to Saturday April 11 morning) as well as lunches, coffee breaks and the conference dinner.

A reduced fee without accommodation is available on request.

Participants need to register through the University of Sheffield Online Store. Payment is by debit or credit card.


MGS 2015 takes place in the Sir Frederick Mappin Building of the University of Sheffield, which is located on Mappin Street. Information on traveling to Sheffield and finding the Mappin Building is given here. A schedule of lectures and details about the lecture halls will be announced shortly.

Accommodation for MGS 2015 is at Hotel Ibis in the city centre of Sheffield.

The city centre of Sheffield is very compact. It takes about 10 minutes to walk from the train station to Hotel Ibis and about 20 min from the train station or from Hotel Ibis to Mappin Building.

There are many restaurants and pubs in the near vicinity of Mappin Building around Division Street and West Street. More restaurants can be found in the Hunter's Bar area of Sheffield along Ecclesall Road, or along London Road.


Georg Struth (G.Struth@dcs.shef.ac.uk)

Friday, December 12, 2014

Garbage collection and purity

One of my favorite things in language design is when a high-level language feature lets you change your low-level runtime system in interesting ways. In this post, I'll talk about one such idea, which I probably won't have the time to work out in detail any time soon, but which is pretty enough that I want to offer it to the interweb.

In 2004, David Bacon, Perry Cheng, and V.T. Rajan published their paper A Unified Theory of Garbage Collection, which showed that tracing collectors and reference counting were the two extreme ends of a spectrum of memory management strategies.

Consider Cheney's two-space collector. When performing a garbage collection, it will start with the root objects, copy them from old-space to new-space, and then recursively traverse the objects reachable from the roots, moving them from old-space to new-space as it goes.

An important property of Cheney's algorithm is that it never touches any object that needs to be freed; it only follows pointers from live objects to live objects.

On the other hand, consider the naive reference counting algorithm. When the reference count on an object goes to zero, the algorithm will decrement the reference counts of everything the object points to, recursively invoke itself on all of the objects whose reference counts also went to zero, and then free the original object.

Bacon, Cheng and Rajan observed that this reference counting algorithm has the opposite property as Cheney's algorithm -- it only traverses dead objects, and never follows a pointer inside a live object.

When object lifetimes are very short, tracing collectors beat reference counting, because very few objects will be live at gc time, and tracing only follows live objects. On the other hand, when object lifetimes are very long, reference counting wins, because very few objects are dead at each collection, and reference counting only follows dead objects.

So (a) the best memory management strategy to use depends on the lifetime of the object in question, and (b) every memory management algorithm can be seen as a hybrid of reference counting and tracing, based on which objects it chooses to follow.

In 2003, Stephen Blackburn and Kathryn Mckinley gave an incredibly slick application of this idea in their paper Ulterior Reference Counting. (Yes, the chronology is backwards: research is asynchronous.)

Most production garbage collectors are based on what is called "the generational hypothesis". This says that in typical programs, most objects have a very short lifetime, and only a few have a long lifetime. So it's a good idea to allocate objects into a region of memory called "the nursery", and when it fills up, to copy live objects out of it. Because of the generational hypothesis, most objects in the nursery will be dead, and so collecting the nursery will be very fast.

Blackburn and McKinley observed that the generational hypothesis also implies that if an object survives a young collection, it's likely to live for a very long time. So in their algorithm, they have a nursery as usual for tracing collectors. But for objects copied out of the nursery, they use reference counting. That is, for objects likely to have a short lifetime, they use tracing, and for objects likely to have a long lifetime, they use reference counting!

Now, if you're a functional programmer, the mere mention of reference counting very likely rings some alarm bells --- what about cycles? Reference counting is, after all, notorious for its inability to handle cyclic data.

Blackburn and McKinley handle this issue with a backup mark-and-sweep algorithm that is periodically run on the old generation. But wouldn't it be nice if we could just know that there isn't any cyclic data in the heap? Then we could do away with the backup collector, and implement a very simple "trace young, reference count old" collector.

Surprisingly, this is possible! If we program in a pure functional language, then under the usual implementation strategies, there will never be nontrivial cycles in the heap. The only way a cyclic reference could occur is in the closure of a recursive function definition, and we can simply mark such recursive pointers as something the gc doesn't need to follow.

So a very high-level property (purity) seems to imply something about our low-level runtime (the gc strategy strategy)! Proving this works (and benchmarking it) is something I don't have room on my plate for, but it's something I wish I could do...

Tuesday, December 9, 2014

Quote of the Day

From the introduction to Jean-Louis Krivine's paper, Realizability Algebras: A Program to Well-Order $\mathbb{R}$:
Indeed, when we realize usual axioms of mathematics, we need to introduce, one after the other, the very standard tools in system programming: for the law of Peirce, these are continuations (particularly useful for exceptions); for the axiom of dependent choice, these are the clock and the process numbering; for the ultrafilter axiom and the well ordering of $\mathbb{R}$, these are no less than read and write instructions on a global memory, in other words assignment.
A better example of consilience I cannot imagine!

Thursday, November 13, 2014

Curry-Howard for GUIs: Or, User Interfaces via Linear Temporal, Classical Linear Logic

Together with Jennifer Paykin and Steve Zdancewic, we have written a short note about the next phase of the long project to make GUI programming intellectually manageable.

Essentially, the idea is that the natural language of graphical user interfaces is $\pi$-calculus, typed using classical linear logic (plus some temporal modalities). Furthermore, we think that the implementation strategy of callbacks and event loops can be understood in terms of Paul-Andre Mellies' tensorial logic. So we think we can:

  1. Explain how there are secretly beautiful logical abstractions inside the apparent horror of windowing toolkits;
  2. Illustrate how to write higher-order programs which automatically maintain complex imperative invariants, and
  3. Write some Javascript programs which we think are actually $\pi$-calculus terms in disguise.
You can download the draft here. If you want to read something with more theorems than this note, I'd suggest looking at Jennifer and Steve's paper about their Linear/Producer/Consumer calculus for CLL.