I have a somewhat related question - do you know if it is possible / easy to implement one or both typechecking algorithms using a representation of terms / types where polymorphic recursion is used to enforce proper variable scoping, a la bound.

The answer to this questions is no, I haven't thought about this at
all. Except when I'm using a proper proof assistant, I don't bother
with using types to track scoping. In fact, I tend to view term
representations with highly-structured syntax as strong evidence for
layering violations: the parser *shouldn't* know too much about
the binding or type structure of a program, because conceptually
binding structure is determined *after* we produce the syntax
tree. This is made evident when you look at languages like ML or
Haskell, which have features which make it impossible to know which
names are in scope prior to doing significant amounts of work (because
of `import`

in Haskell, or `open`

in ML).

That said, the *problem* which motivates types-for-binding
is a real one: how can we make substitution safe, once and for all?

The technique I favor is what Bob Harper has dubbed "abstract binding trees", and documented in Chapter 1 of his book. I think it originates in Nuprl, and is also closely related to Martin-Löf's "system of arities" (see Chapter 3 of Programming in Martin-Löf's Type Theory).

To understand abstract binding trees, it helps to start thinking about ordinary abstract syntax trees. ASTs are really just constants $c$, or applications of a term constructor $f$ to some sequence of terms $f(t_1, \ldots t_n)$. (If you are a lisp programmer, you can think of these as s-expressions, and Prolog programmers can think of these as the ground elements of the Herbrand universe.)

Then, to get ABTs, we extend ASTs with two constructors, one for
variables $x$, and one for abstractions $x.\;t$. Abstractions
are *purely* binders, with no reduction rules associated with
them, and abstract binding trees are simply equated up to
alpha-equivalence.

By programming this idea up, we can simply parameterize over the set of term constructors, and have everything to do with scoping and substitution be programmed once and for all. So we end up in a situation where everything Just Works, and all the types are satisfyingly uncomplicated. Furthermore, the implementation is simple enough to fit into a blog post. (All the code in this post in in this gist.)

I'll do it in Ocaml, and so we'll need to define some signatures
that Haskell programmers get out of the box. First, we need a
signature for functorial types. This signature is satisfied by any
parametric datatype with a `map`

functional. It also needs
to send identities to identities and respect composition, but we can't
represent that unless we're in Agda or Coq.

module type FUNCTOR = sig type 'a t val map : ('a -> 'b) -> 'a t -> 'b t end

Next, we'll introduce a type of monoids, with a unit and combining operation. I'll just use a record here, since that's a lot lighter weight than ML's modules.

type 'a monoid = {unit : 'a ; join : 'a -> 'a -> 'a}

With these preliminaries out of the way, we can move on towareds giving the meat of the representation. First, I'll give a type of variables (and sets of variables). This time, instead of being abstract, I'll just be concrete and use strings as the representation of variables. This is partly because this is what I actually do, but partly also for the rhetorical effect of emphasising that an often-maligned representation technique is A-ok.

type var = string module V = Set.Make(struct type t = var let compare = compare end)Now, I'll give the signature for abstract binding trees.

module type ABT = sig type 'a signature type 'a f = Var of var | Abs of var * 'a | Tm of 'a signature val map : ('a -> 'b) -> 'a f -> 'b f type t val into : t f -> t val out : t -> t f val freevars : t -> V.t val var : V.elt -> t val abs : V.elt * t -> t val tm : t signature -> t val subst : t -> var -> t -> t end

Here, the type `'a signature`

tells us what the term constructors
we have are. We augment this with the `'a f`

type, which adds a
new term former for variables `Var of var`

, as well as a term
constructor `Abs of var * 'a`

for scoping a variable.
Then we have an abstract type of terms `t`

, which we can wrap
and unwrap one level using the `into`

and `out`

functions.

We can also have ask for the free variables of a type
with `freevars`

, and we have some convenience constructors
for variables, abstractions and terms, as well as
the `subst`

function which does the alpha-equivalence
respecting substitution. We could also put in other operations (such
as equality up to alpha-equivalence), but substitution suffices to illustrate
the technique.

We'll also say just what a signature is, now. Basically, it's any
functor which can collect up the elements of a monoid. This idea is
from Björn Bringert and Aarne Ranta's
paper *A
Pattern for Almost-Compositional Functions*.

module type SIGNATURE = sig include FUNCTOR val join : 'a monoid -> 'a t -> 'a end

We are finally in a position to give the implementation of ABTs. It's an Ocaml
functor taking a signature `F`

, and returning an instance of the `ABT`

signature where the `'a signature`

type is identified with the type of the argument.

module Abt(F : SIGNATURE) : ABT with type 'a signature := 'a F.t = struct type 'a f = | Var of var | Abs of var * 'a | Tm of 'a F.t let map f = function | Var x -> Var x | Abs(x, e) -> Abs(x, f e) | Tm t -> Tm (F.map f t)

As you can see, the type `'a f`

is just the same as in the signature, and
the `map`

function just lifts the signature type's map operation to the
type augmented with binding structure.

type t = In of V.t * t f

The concrete representation of terms is just the obvious recursive type built over the signature, augmented with a set of free variables at each node. This makes computing the free variables very easy:

let freevars (In(vs, _)) = vs

In addition, unrolling the type `t`

simply throws away some variables and
returns the subterm:

let out (In(_, t)) = t

On the other hand, building the embedding function with smart constructors is a little more work. First, we note that sets of variables form a monoid, with the empty set as the unit and the set union as the join operation.

let m = {unit = V.empty; join = V.union}We can use this to define the smart constructors. The

`var`

function simply says that its free variables are its one variable
itself, and in `abs(z,e)`

, the free variables are the free
variables of `e`

minus `z`

. For the ```
Tm
t
```

case, we use the `join`

operation that we required
the signature to possess.
let var x = In(V.singleton x, Var x) let abs(z, e) = In(V.remove z (freevars e), Abs(z, e)) let tm t = In(F.join m (F.map freevars t), Tm t)Then we can define

`into`

just by using these constructors.
let into = function | Var x -> var x | Abs(x, e) -> abs(x, e) | Tm t -> tm t

Now all that remains is to implement capture-avoiding
substitution. First, we give a
`fresh`

function which returns a new renaming of `v`

which is
not in a set of names `vs`

. It does this by sticking primes onto `v`

until it finds one that isn't in `vs`

. It's possible to do something smarter, but it serves for a blog post.

let rec fresh vs v = if V.mem v vs then fresh vs (v ^ "'") else vWe also need a variable renaming function.

`rename x y e`

renames `x`

to `y`

in the body `e`

, and just walks down the term structurally. If it hits an abstractor, it will stop if the name is shadowed, and otherwise continue on down. Here, we rely on the functoriality of the signature to handle the `Tm`

case.
let rec rename x y (In(fvs, t)) = match t with | Var z -> if x = z then var y else var z | Abs(z, e) -> if x = z then abs(z, e) else abs(z, rename x y e) | Tm v -> tm (F.map (rename x y) v)

We can now give the definition of capture-avoiding substitution. At each abstraction, it does the usual thing of choosing a fresh identifierand renaming. It's easy to do this more efficiently (for instance with a parallel substitution), but it's not worth bothering with until it becomes a bottleneck.

let rec subst t x body = match out body with | Var z when x = z -> t | Var z -> var z | Abs(x, e) -> let x' = fresh (V.union (freevars t) (freevars body)) x in let e' = subst t x (rename x x' e) in abs(x', e') | Tm body -> tm (F.map (subst t x) body) end

I'll finish off with a simple example of how we might use it. Let's proceed by giving a syntax for the simply-typed lambda calculus, in a bidirectional style. Note that the signature type `'a t`

simply has one constructor for each of the term formers of the lambda calculus, and it has *no* binding information in it at all – all binding is handled through the ABT interface. So the constructor for functions `Lam of 'a`

, has a single subterm — we'll use typechecking to ensure that the subterm is of the form `Abs(z, e)`

.

module Lambda = struct type tp = Base | Arrow of tp * tp type 'a t = Lam of 'a | App of 'a * 'a | Let of 'a * 'a | Annot of tp * 'a let map f = function | Lam x -> Lam (f x) | App (x, y) -> App(f x, f y) | Let (x, y) -> Let(f x, f y) | Annot(t, x) -> Annot(t, f x) let join m = function | Lam x -> x | App(x, y) -> m.join x y | Let(x, y) -> m.join x y | Annot(_, x) -> x end

With this signature in hand, we can create our syntax:

module Syntax = Abt(Lambda)Now, here's a simple bidirectional typechecker for this calculus.

module Bidir = struct open Lambda open SyntaxContexts are just lists of variables and their types.

type ctx = (var * tp) listIn a bidirectional typechecker, every term is either used to synthesize a type, or is checked against a type. Here, we give predicates to check that.

let is_synth = function | Tm (Lam _) | Tm (Let (_, _)) -> false | _ -> true let is_check e = not(is_synth e)Now, we give the

`unabs`

function, which deconstructs an abstraction and fails with an assertion error if it isn't handed an abstraction.
let unabs e = match out e with | Abs(x, e) -> (x, e) | _ -> assert false

The reasoning can be seen in the definition of the `check ctx e tp`

function, which checks that term `e`

has type `tp`

in context `ctx`

.

let rec check ctx e tp = match out e, tp with | Tm (Lam t), Arrow(tp1, tp') -> let (x, e') = unabs t in check ((x, tp1) :: ctx) e' tp'In particular, in the lambda case, we check that the outer constructor is

`Lam`

, and then unabstract the body to get the binder and subterm. This should always succeed (or else we've been passed an ill-formed binding tree). It might be possible to tighten up the definition of signatures, but since ABTs are constructed by the parser, errors are likely to be few (and easily detected by Quickcheck-y techniques).
| Tm (Lam _), _ -> failwith "expected arrow type" | Tm (Let(e', t)), _ -> let (x, e'') = unabs t in let tp1 = synth ctx e' in check ((x, tp1) :: ctx) e'' tp

The let case works similarly to the lambda case.

| body, _ when is_synth body -> if tp = synth ctx e then () else failwith "Type mismatch" | _ -> assert false

If we know a term is synthesizing, we can ask for its type, and then see if it's equal to the type we are checking against. Otherwise, we have ill-formed syntax.

and synth ctx e = match out e with | Var x -> (try List.assoc x ctx with Not_found -> failwith "unbound variable")

If we want to synthesize a type for a variable, we just look it up in the context.

| Tm(Annot(tp, e)) -> let () = check ctx e tp in tp

If we have a type-annotated term, we can use that to synthesize a type for it.

| Tm(App(f, e)) -> (match synth ctx f with | Arrow(tp, tp') -> let () = check ctx e tp in tp' | _ -> failwith "Applying a non-function!")

To infer a type for an application, we infer a function type for the head, and then check the argument against the argument type of the function, returning the result type of the function.

| body when is_check body -> failwith "Cannot synthesize type for checking term" | _ -> assert false end

As you can see, this is more or less what you'd write without any
binding machinery – there are a few more annoying `Tm`

constructors, but that's about it. The key point is that the type of
substitution is pure, without the need for a context monad, name
supply, or similar. That leaves us free to engineer the context monad
as needed for the typechecker, without having to work around the requirements
of the term representation.

Thanks for putting this post together. Since posting my question I did read up on ABTs.

ReplyDelete> That said, the problem which motivates types-for-binding is a real one: how can we make substitution safe, once and for all?

Well, it's more than that. Using types for binding ensures that *every* operation that manipulates terms is well-formed with respect to variable binding, and the typechecker ensures this. With ABTs, we get no such guarantees - you could easily implement subst incorrectly in a way that still typechecks! What ABTs buy you is the ability to define (some of) these operations once and for all, for any collection of operators. There may be operations that aren't ABT-generic but must be implemented specific to the functor chosen (again without much help from the typechecker), whereas with using types-for-binding a la bound, you have much fewer opportunities to screw up implementations (possibly at a cost of having to 'prove' more things to the compiler). It seems like it will depend on what you are doing.

You might be interested in this discussion [1] and these examples [2].

> In fact, I tend to view term representations with highly-structured syntax as strong evidence for layering violations: the parser shouldn't know too much about the binding or type structure of a program, because conceptually binding structure is determined after we produce the syntax tree.

I see what you are saying. However this (desirable) separation of layers can be handled quite nicely with the types-for-binders approach. The way this is handled is by having the parser produce terms whose free variables are of type string (essentially unresolved references). A separate phase 'resolves' these variables to produce a closed expression. That is, the term is of type `Term a`, where `a` is the type of free variables in the expression, and the parser produces a `Term String`, and we have a function like `resolve :: Term String -> Maybe (Term a)`, which converts to the resolved expression. Functions like `resolve` have a very nice signature, and again, they are very difficult to screw up due to the types. Check out the parser for the Ermine compiler [3] for an example of this.

[1]: http://pchiusano.github.io/2015-02-25/dumb-data.html#comment-1876005495

[2]: https://gist.github.com/pchiusano/687abb31875e058bfff5

[3]: https://github.com/ermine-language/ermine/blob/master/src/Ermine/Parser/Term.hs#L55

Fyi, I ported your post to Haskell.

ReplyDeleteThanks for the post. But Harper's chapter notes don’t cite Nuprl, but AUTOMATH and LF, so I always guessed that ABTs are HOAS done on paper (and without reusing binding from the metalanguage, which in fairness is the distinguishing feature of HOAS—but not the only one).

ReplyDeleteThe difference is that HOAS uses the metalanguage instead of making you write the ABT module, but you should get similar guarantees anyway. I imagine the relation is somewhere in the proof of adequacy of HOAS, and it should boil down to “you can reuse binding from the metalanguage, because the metalanguage is a lambda calculus so it uses the same definitions”.

Two critical properties are shared among your code and HOAS (I quote you):

- "Binding trees have only one binding form, and we never introduce any others."

- "Abstractions are purely binders, with no reduction rules associated with them, and abstract binding trees are simply equated up to alpha-equivalence."

Dear Paolo, there is a close connection between HOAS and ABTs, but they aren't quite the same.

ReplyDeleteWith higher-order abstract syntax, we use meta-level lambda-abstraction to represent binding. So we work modulo the alpha-beta-eta theory of the lambda calculus. With ABTs, we work modulo *only* the alpha-equivalence induced by the binding structure.

Because the typed lambda calculus is strongly normalizing, we can prove that an ABT representation of a calculus with binding is isomorphic to the the beta-normal, eta-long forms of a HOAS representation. Another way of putting it is that ABTs is HOAS minus the ability to do partial application or lambda-abstraction, so the only equational theory we need to consider for them is alpha-equivalence.

Neel, could you show how you'd do let rec with this scheme? I can't quite see how it would work, since each binding in the let rec needs to be able to refer to all other bindings in the block.

ReplyDeleteBy the way, thanks very much for these posts, they have been super helpful to me!

Given the lack of answers, here's my 2 cents: you should be able to reuse encodings of letrec for HOAS. Googling `letrec hoas` shows that solutions exist, but I can't survey them yet — https://gist.github.com/t0yv0/834748 is not obviously elegant.

ReplyDeleteMeanwhile, I'm porting your post to Scala (https://github.com/Blaisorblade/abt); I'm seeing what I can improve using extractors, and I'd like to wrap other implementation techniques behind the ABT interface (or is that stupid?). Let's see where I get.

I must be out of my mind. What is the Tm constructor in the ABT module supposed to be? I don't see this in Harper's book. Where did it come from? What is it abbreviating? Does this have to do with "symbols" which are introduced near the end of section 1.2?

ReplyDeleteHi Evan, the ABT functor takes a "signature" as an argument. A signature is basically a datatype containing of all the term formers in the language -- eg, for lambda calculus with booleans, it would be lambdas, applications, boolean constants, and conditionals. The ABT module then gives you a recursive type with those term formers, plus variable references and variable abstractions. We have to write Tm because of the way ML datatypes work -- ideally we'd like to say that Tm has all the constructors of the signature plus variables and abstractions, but that can't be expressed in plain ML. So we wrap all of F.t's constructors in a Tm, which is okay.

ReplyDelete