Wednesday, August 8, 2012

An ML implementation of match compilation

In this post, I'll give an OCaml implementation of the pattern match compiler from my previous post. Just for fun, it will also pattern compilation into a series of tests.

We begin by giving some some basic types. The type var of variables is just strings, and we have a type tp of types, which are just units, sums, products, void types, and a default Other to handle all non-matchable types (e.g., functions). The type pat is the type of patterns. We have patterns for variables, wildcards, units, pairs, and left and right injections.

Finally, we have the type exp of expressions. We've thrown in the minimum number of constructs necessary to elaborate pattern matching. Namely, we have variables, let-binding, unit and pair elimination (both given in the "open-scope" style), an abort (for the empty type), and a case expression for sums.

type var = string

type tp = One | Times of tp * tp | Zero | Sum of tp * tp | Other

type pat = PVar of var
         | PWild
         | PUnit
         | PPair of pat * pat
         | PInl of pat
         | PInr of pat

type exp = EVar of var
         | ELet of var * exp * exp
         | ELetUnit of exp * exp
         | ELetPair of var * var * exp * exp
         | EAbort of exp 
         | ECase of exp * var * exp * var * exp 
First, we give some utility operations. gensym does fresh name generation. This is an effect, but a simple one, and should be easy enough to transcribe into (say) monadic Haskell.
let gensym =
  let r = ref 0 in fun () -> incr r; Printf.sprintf "u%d" !r

The filtermap operation maps an optional function over a list, returning the non-None values. I really should work out the general story for this someday -- there's obviously a nice story involving distributive laws at work here.

(* filtermap : ('a -> 'b option) -> 'a list -> 'b list *)

let rec filtermap f = function 
  | [] -> []
  | x :: xs -> (match f x with
                | None -> filtermap f xs
                | Some y -> y :: filtermap f xs)
Now, we have the simplification routines -- these correspond to the transformations of $\overrightarrow{ps \Rightarrow e}$ to $\overrightarrow{ps' \Rightarrow e'}$ in the inference rules I gave.
let simplify_unit u = (function
    | (PVar x :: ps, e) -> (ps, ELet(x, EVar u, e))
    | (PWild :: ps, e)  -> (ps, e)
    | (PUnit :: ps, e)  -> (ps, e)
    | (_, e) -> assert false)

let simplify_pair u = (function
    | (PVar x :: ps, e) -> (PWild :: PWild :: ps, ELet(x, EVar u, e))
    | (PWild :: ps, e)  -> (PWild :: PWild :: ps, e)
    | (PPair(p1, p2) :: ps, e) -> (p1 :: p2 :: ps, e)
    | (_, e) -> assert false)

let simplify_other u = (function
    | (PVar x :: ps, e) -> (ps, ELet(x, EVar u, e))
    | (PWild :: ps, e)  -> (ps, e)
    | (_, e) -> assert false)

Unlike the simplify_unit and simplify_pair routines, the simplification routines for sums return an option, which we filtermap over. This is because when we want the inl patterns, the inr patterns are well-typed, but not needed. So we need to filter them out, but it's not actually an error to see them in the branch list.

let simplify_inl u =
  filtermap (function 
    | (PVar x :: ps, e) -> Some (PWild :: ps, ELet(x, EVar u, e))
    | (PWild :: ps, e)  -> Some (PWild :: ps, e)
    | (PInl p :: ps, e) -> Some (p :: ps, e)
    | (PInr _ :: _, _)  -> None
    | (_, e) -> assert false)

let simplify_inr u =
  filtermap (function 
    | (PVar x :: ps, e) -> Some (PWild :: ps, ELet(x, EVar u, e))
    | (PWild :: ps, e)  -> Some (PWild :: ps, e)
    | (PInr p :: ps, e) -> Some (p :: ps, e)
    | (PInl _ :: _, _)  -> None
    | (_, e) -> assert false)

Now we reach coverage checking proper. Note that this is basically just a transcription of the inference rules in my previous post.

let rec cover arms tps =
  match arms, tps with
  | arms, [] -> snd (List.hd arms)
  | arms, (u, One) :: tps ->
      let e = cover (simplify_unit u arms) tps in
      ELetUnit(EVar u, e)
  | arms, (u, Times(tp1, tp2)) :: tps ->
      let u1 = gensym() in                                 
      let u2 = gensym() in                                 
      let arms = simplify_pair u arms in 
      let e = cover arms ((u1, tp1) :: (u2, tp2) :: tps) in
      ELetPair(u1, u2, EVar u, e)
  | arms, (u, Zero) :: tps ->
      EAbort (EVar u)
  | arms, (u, Sum(tp1, tp2)) :: tps ->
      let u1 = gensym() in
      let u2 = gensym() in
      let e1 = cover (simplify_inl u arms) ((u1, tp1) :: tps) in
      let e2 = cover (simplify_inr u arms) ((u2, tp2) :: tps) in
      ECase(EVar u, u1, e1, u2, e2)
  | arms, (u, Other) :: tps ->
      cover (simplify_other u arms) tps 


  1. Re. filter_map: there is a natural mapping from ('a option) to ('a list) -- for example if you consider the embedding of the tuples of fixed size 'a^n into 'a list, and see ('a option) as the disjoint union of 'a^0 and 'a^1. Once you've mapped ('a -> 'b option) into ('a -> 'b list), you have the "bind" for the list monad.

    Regarding the algorithm per se: I feel there is some redundancy in the frequent repetition of a transformation from (PVar x) to (ELet(x, EVar u,, e)) in the simplification functions. Maybe you could rephrase it instead as a rewriting step turning PVar into PWild in a principled way, before doing the simplification?

  2. Hi Gabriel, the original code I wrote had a separate simplification step, but I took it out, since it didn't end up saving any space. I generally only introduce functions either when they come from the structure of the correctness proof, or when it makes the program shorter.