It struck me after the last post that it might be helpful to give an example using abstract binding trees in a more nontrivial way. The pure lambda calculus has a very simple binding structure, and pretty much anything you do will work out. So I decided to show how ABTs can be used to easily support a much more involved form of binding -- namely, pattern matching.
This makes for a nice example, because it is a very modular and well-structured language feature, that naturally has a rather complex binding structure. However, as we will see, it very easily into an ABT-based API. I will do so assuming the implementation in the previous post, focusing only on the use of ABTs.
As is usual, we will need to introduce a signature module for the language.
module Lambda = structSince pattern matching really only makes sense with a richer set of types, let's start by adding sums and products to the type language.
type tp = Arrow of tp * tp | One | Prod of tp * tp | Sum of tp * tp
Next, we'll give a datatype of patterns. The PWild
constructor is the wildcard pattern $\_$, the PVar
constructor is a variable pattern, the PPair(p, p')
constructor is the pair pattern, and the Inr p
and
Inl p
patterns are the patterns for the left and right
injections of the sum type.
type pat = PWild | PVar | PUnit | PPair of pat * pat | PInl of pat | PInr of pat
Now, we can give the datatype for the language signature itself. We add pairs, units, and sums to the language, as well as a case expression which takes a scrutinee, and a list of branches (a list of patterns and the corresponding case arms).
type 'a t = Lam of 'a | App of 'a * 'a | Annot of tp * 'a | Unit | Pair of 'a * 'a | Inl of 'a | Inr of 'a | Case of 'a * (pat * 'a) list
One thing worth noting is that the datatype of patterns
has no binding information in it at all. The basic idea is
that if we will represent (say) a pair elimination
$\letv{(x,y)}{e}{e'}$ as the constructor Case(e, [PPair(PVar,
PVar), Abs(x, Abs(y, e'))])
(suppressing
useless Tm
constructors). So the PVar
constructor is merely an indication that the term be an abstraction,
with the number of abstractions determined by the shape of the
pattern. This representation is first documented in Rob Simmons's
paper Structural
Focalization.
This is really the key thing that gives the ABT interface its power: binding trees have only one binding form, and we never introduce any others.
Now we can give the map
and join
operations for this
signature.
let map f = function | Lam x -> Lam (f x) | App (x, y) -> App(f x, f y) | Annot(t, x) -> Annot(t, f x) | Unit -> Unit | Pair(x, y) -> Pair(f x, f y) | Inl x -> Inl (f x) | Inr x -> Inr (f x) | Case(x, pys) -> Case(f x, List.map (fun (p, y) -> (p, f y)) pys) let join m = function | Lam x -> x | App(x, y) -> m.join x y | Annot(_, x) -> x | Unit -> m.unit | Pair(x,y) -> m.join x y | Inl x -> x | Inr x -> x | Case(x, pys) -> List.fold_right (fun (_, y) -> m.join y) pys x end
As usual, we construct the syntax by applying the Abt
functor.
module Syntax = Abt(Lambda)
We can now define a bidirectional typechecker for this language. Much of the infrastructure is the same as in the previous post.
module Bidir = struct open Lambda open Syntax type ctx = (var * tp) list
We do, however, extend the is_check
and is_synth
functions to handle the new operations in the signature. Note that case statements are viewed as checking forms.
let is_check = function | Tm (Lam _) | Tm Unit | Tm (Pair(_, _)) | Tm (Inl _) | Tm (Inr _) | Tm (Case(_, _))-> true | _ -> false let is_synth e = not(is_check e) let unabs e = match out e with | Abs(x, e) -> (x, e) | _ -> assert false
When we reach the typechecker itself, most of it --- the check
and synth
functions --- are unchanged. We have to add new cases to check
to handle the new value forms (injections, pairs, and units), but they are pretty straightforward.
let rec check ctx e tp = match out e, tp with | Tm (Lam t), Arrow(tp1, tp') -> let (x, e') = unabs t in check ((x, tp1) :: ctx) e' tp' | Tm (Lam _), _ -> failwith "expected arrow type" | Tm Unit, One -> () | Tm Unit, _ -> failwith "expected unit type" | Tm (Pair(e, e')), Prod(tp, tp') -> let () = check ctx e tp in let () = check ctx e' tp' in () | Tm (Pair(_, _)), _ -> failwith "expected product type" | Tm (Inl e), Sum(tp, _) -> check ctx e tp | Tm (Inr e), Sum(_, tp) -> check ctx e tp | Tm (Inl _), _ | Tm (Inr _), _ -> failwith "expected sum type"
The big difference is in checking the Case
form. Now, we need to
synthesize a type for the scrutinee, and then check that each branch is well-typed.
This works by calling the check_branch
function on each branch, passing
it the pattern, arm, type and result type as arguments. For reasons that will become
apparent, the pattern and its type are passed in as singleton lists.
(I don't do coverage checking here, only because it doesn't interact with binding
in any way.)
| Tm (Case(e, branches)), tp -> let tp' = synth ctx e in List.iter (fun (p,e) -> check_branch ctx [p] e [tp'] tp) branches | body, _ when is_synth body -> if tp = synth ctx e then () else failwith "Type mismatch" | _ -> assert false and synth ctx e = match out e with | Var x -> (try List.assoc x ctx with Not_found -> failwith "unbound variable") | Tm(Annot(tp, e)) -> let () = check ctx e tp in tp | Tm(App(f, e)) -> (match synth ctx f with | Arrow(tp, tp') -> let () = check ctx e tp in tp' | _ -> failwith "Applying a non-function!") | body when is_check body -> failwith "Cannot synthesize type for checking term" | _ -> assert falseThe way that branch checking works is by steadily deconstructing a list of patterns (and their types) into smaller lists.
and check_branch ctx ps e tps tp_result = match ps, tps with
If there are no more patterns and their types, we are done, and can check that the arm has the right type.
| [], []-> check ctx e tp_result
If we have a variable pattern, we unabstract the arm, and bind that variable to type, and recur on the smaller list of patterns and types. Note that this is the only place we have to do anything at all with binding, and it's trivial!
| (PVar :: ps), (tp :: tps) -> let (x, e) = unabs e in check_branch ((x, tp) :: ctx) ps e tps tp_result
Wildcards and unit patterns work the same way, except that they don't bind anything.
| (PWild :: ps), (tp :: tps) -> check_branch ctx ps e tps tp_result | (PUnit :: ps), (One :: tps) -> check_branch ctx ps e tps tp_result | (PUnit :: ps), (tp :: tps) -> failwith "expected term of unit type"
Pair patterns are deconstructed into two smaller types, and the product type they are checked against is broken into its two subterms, and then the list is lengthened with all of the subderivations.
| (PPair(p, p') :: ps), (Prod(tp, tp') :: tps) -> check_branch ctx (p :: p' :: ps) e (tp :: tp' :: tps) tp_result | (PPair(p, p') :: ps), (_ :: tps) -> failwith "expected term of product type"
Sum patterns work by recurring on the sub-pattern, dropping the left or right part of the sum type, as appropriate.
| (PInl p :: ps), (Sum(tp, _) :: tps) -> check_branch ctx (p :: ps) e (tp :: tps) tp_result | (PInl p :: ps), (_ :: tps) -> failwith "expected sum type" | (PInr p :: ps), (Sum(_, tp) :: tps) -> check_branch ctx (p :: ps) e (tp :: tps) tp_result | (PInr p :: ps), (_ :: tps) -> failwith "expected sum type" | _ -> assert false end
That's it! So I hope it's clear that ABTs can handle complex binding forms very gracefully.