Semantic Domain: May 2019

Monday, May 13, 2019

Implementing Inverse Bidirectional Typechecking

In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In

In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In this post, let's implement it! What follows is a bit of OCaml code implementing the type system of the previous post.

First, let's give a data type to represent the types of the linear type system. As usual, we will have a datatype tp with one constructor for each grammatical production. In the comment next to each constructor, I'll give the term that the constructor corresponds to.

type tp = 
  | One                (* represents 1     *)
  | Tensor of tp * tp  (* represents A ⊗ B *) 
  | Lolli of tp * tp   (* represents A ⊸ B *)

Now, we can give a datatype to represent expressions. We'll represent variables with strings, and use the datatype exp to represent expressions. As before, there is a comment connecting the datatype to the expressions of the grammar.

type var = string

type exp = 
  | Unit                               (* represents ()                  *)
  | LetUnit of exp * exp               (* represents let () = e in e'    *)
  | Pair of exp * exp                  (* represents (e, e')             *)
  | LetPair of var * var * exp * exp   (* represents let (x,y) = e in e' *)
  | Lam of var * exp                   (* represents λx. e               *)
  | App of exp * exp                   (* represents e e'                *)
  | Var of var                         (* represents x                   *)

Now we have to do something annoying, and implement some functions on the option datatype which really should be in the standard library. Basically we just want the standard functional programming structure on option types -- folds, maps, and monadic structure -- so we just go ahead an implement it.

module Option = struct
  type 'a t = 'a option

  let map f = function
    | None -> None
    | Some x -> Some (f x)


  let return x = Some x 

  let fail = None

  let (>>=) m f = 
    match m with
    | None -> None
    | Some x -> f x

   let fold some none = function
     | None -> none
     | Some x -> some x  
end

Now, we can actually implement the bidirectional typechecker. To understand the implementation, it's actually helpful to understand the interface, first.

module type TYPING =  sig
  type ctx = (var * tp option) list
  type 'a t = ctx -> ('a * ctx) option 

  val map : ('a -> 'b) -> 'a t -> 'b t 
  val return : 'a -> 'a t
  val ( >>= ) : 'a t -> ('a -> 'b t) -> 'b t

  val synth : exp -> tp t
  val check : exp -> tp -> unit t

The basic structure of our typechecker is to give a pair of operations check and synth, which respectively check that an expression e has a type tp, and infer a type for an expression. This function will be written in a monadic style, so we also have a type constructor 'a t for typechecking computations, and the usual assortment of functorial (map) and monadic (return and >>=) structure for this type.

The monadic type constructor 'a t is a pretty basic state-and-exception monad. It plumbs the context (of type ctx) through the computation, and can either return a value and an updated context, or it will fail.

An interesting feature of this context representation is that it does not map variables to types – it maps them to the option type tp option. This is because of the way that the moding will work out; the type is an output of the typing relation, and so when we put a variable into the context, we will not give it a type, and use the computation to ascribe it a type, which will be reflected in the output context. This is also why we use a full state monad rather than a reader monad for the context – we are basically implementing part of Prolog's substitution threading here.

We will also need a number of operations to implement the typechecker.

  val fail : 'a t
  val checkvar : var -> tp -> unit t
  val lookup : var -> tp t 
  val withvar : var -> 'a t -> 'a t
  val tp_eq : tp -> tp -> unit t
end

We will need to fail in order to judge programs ill-typed. The checkvar x tp operation gives the variable x the type tp. The lookup x operation will look in the context to find a a type for x, failing if x has not yet been given a type. The operation withvar x m will run the monadic computation m in a context extended with the variable x. (No type is given for the variable, because it's the job of m to give the variable a type.) Finall, there's an equality test tp_eq tp1 tp2, that acts as a guard, failing if the two arguments are unequal.

Now, we can move on to the actual implementation.

module Typing : TYPING = struct
  type ctx = (var * tp option) list 

  type 'a t = ctx -> ('a * ctx) option 

  let map f m ctx = 
    let open Option in 
    m ctx >>= fun (x, ctx) -> 
    return (f x, ctx)

  let return x = fun ctx -> Some(x, ctx)

  let (>>=) m f = fun ctx -> 
    let open Option in 
    m ctx >>= fun (a, ctx') -> 
    f a ctx'

As promised, the computation type is a state-and-exception monad, and the implementation of map and the monadic unit and bind are pretty unsurprising. More interesting are the implementations of the actual operations in the monadic interface.

  let fail : 'a t = fun ctx -> None

Failure is easy to implement – it just ignores the context, and then returns None.

  let rec checkvar (x : var) (tp : tp) : unit t = fun ctx -> 
    let open Option in 
    match ctx with
    | [] -> fail 
    | (y, None) :: rest when x = y -> return ((), (y, Some tp) :: rest)
    | (y, Some _) :: rest when x = y -> fail 
    | h :: rest -> checkvar x tp rest >>= fun ((), rest') -> 
                   return ((), h :: rest')

The way that checkvar x tp works is that it iterates through the variables in the context, looking for the hypothesis which matches the variable x. When it finds it, it returns an updated context with the type of x set to Some tp. If the variable is already set, then that means that this is the second use of the variable, and so checkvar fails – this enforces the property that variables are used at most one time. If the variable isn't in the context, then checkvar also fails, because this is an out-of-scope variable reference. All other hypotheses are left unchanged.

  let lookup x (ctx : ctx) = 
    match List.assoc_opt x ctx with
    | None -> Option.fail
    | Some None -> Option.fail
    | Some (Some tp) -> Option.return(tp, ctx)

The lookup x computation is even simpler – it returns tp if (x, Some tp) is in the context, and fails otherwise.

  let withvar (type a) (x : var) (m : a t) : a t = fun ctx -> 
    let open Option in 
    m ((x, None) :: ctx) >>= function
    | (r, (y, Some _) :: ctx') when x = y -> return (r, ctx') 
    | (r, (y, None) :: ctx') when x = y -> fail 
    | _ -> assert false

The withvar x m operation extends the context with the variable x, and then runs m in the extended context.

An invariant our context representation maintains is that the output context has exactly the same variables in exactly the same order as the input context, and so we just pop off the first variable of the output context before returning, checking to make sure that the type of the variable has been set (i.e., Some _) to ensure that the variable was used at least one time. In conjunction with checkvar ensuring that the variable is used at most one time, this will ensure that each variable is used exactly one time.

If the first variable of the output context isn't x, or if the output context is empty, then our invariant is broken, and so we signal an assertion failure.

  let tp_eq (tp1 : tp) (tp2 : tp) = if tp1 = tp2 then return () else fail

The type_eq tp1 tp2 function just turns a boolean test into a guard. Now, we can go through the synthesis and checking functions clause-by-clause:

  let rec synth = function
    | Unit -> return One

We synthesize the unit type for the unit value.

    | Pair(e1, e2) -> synth e1 >>= fun tp1 -> 
                      synth e2 >>= fun tp2 -> 
                      return (Tensor(tp1, tp2))

To synthesize a type for a pair, we synthesize types for each of the components, and then return their tensor product.

    | Lam(x, e) -> withvar x (synth e >>= fun ret_tp -> 
                              lookup x >>= fun arg_tp -> 
                              return (Lolli(arg_tp, ret_tp)))

Functions are interesting, because we need to deal with variables, and evaluation order plays out in a neat way here. We infer a type ret_tp for the body e, and then we look up the type tp_arg that the body e ascribed to the variable x. This lets us give a type Lolli(tp_arg, tp_ret) for the whole function.

    | LetUnit(e, e') -> check e One >>= fun () -> 
                        synth e'

To synthesize a type for unit elimination, we synthesize a type for the body, and check that the scrutinee has the unit type One.

    | LetPair(x, y, e, e') -> 
       withvar y (withvar x (synth e' >>= fun res_tp -> 
                             lookup x >>= fun tp1 -> 
                             lookup y >>= fun tp2 -> 
                             check e (Tensor(tp1, tp2)) >>= fun () -> 
                             return res_tp))

To eliminate a pair, we introduce (using withvar) scopes for the variables x and y, and then:

We synthesize a type res_tp for the continuation e'.
Since e' used x and y, we can look up the types they were used at (binding the type of x to tp1 and the type of y to tp2).
Then, we check that the scrutinee e has the type Tensor(tp1, tp2).
Finally, we return the type res_tp for the type of the whole expression.

    | App(_, _) -> fail 
    | Var _ -> fail

Since applications and variable references are checking, not synthesizing, we fail if we see one of them in synthesizing position. If they are in checking position, we can use the check function to typecheck them:

  and check (e : exp) (tp : tp) : unit t = 
    match e with 
    | Var x -> checkvar x tp

The variable case simply uses checkvar.

    | App(e1, e2) -> synth e2 >>= fun tp_arg -> 
                     check e1 (Lolli(tp_arg, tp))

To check an application e1 e2 at a type tp, we first synthesize the argument type by inferring a type tp_arg for e2, and then we check that e1 has the function type Lolli(tp_arg, tp).

    | e -> synth e >>= tp_eq tp
end

Finally, when we find a synthesizing term in checking position, we infer a type for it and then see if it is equal to what we expected.

This code is, at-best, lightly-tested, but I knocked together a small Github repository with the code. Enjoy!

Friday, May 10, 2019

Inverting Bidirectional Typechecking

\[\newcommand{\bnfalt}{\;\;|\;\;} \newcommand{\To}{\Rightarrow} \newcommand{\From}{\Leftarrow} \newcommand{\rule}[2]{\frac{\displaystyle \begin{matrix} #1 \end{matrix}}{\displaystyle #2}} \newcommand{\check}[3]{{#1} \vdash {#2} \From {#3}} \newcommand{\synth}[3]{{#1} \vdash {#2} \To {#3}} \newcommand{\lam}[1]{\lambda {#1}.\,} \newcommand{\inj}[2]{\mathsf{in}_{#1}({#2})} \newcommand{\case}[5]{\mathsf{case}({#1}, \inj{1}{#2} \to {#3}, \inj{2}{#4} \to {#5})} \newcommand{\match}[2]{\mathsf{match}\;{#1}\;\mathsf{of}\;[ #2 ]} \newcommand{\arm}[2]{{#1} \to {#2}} \newcommand{\downshift}[1]{\downarrow{#1}} \newcommand{\upshift}[1]{\uparrow{#1}} \newcommand{\thunk}[1]{\left\{{#1}\right\}} \newcommand{\return}[1]{\mathsf{return}\;#1} \newcommand{\fun}[1]{\lambda\;#1} \newcommand{\checkn}[3]{{#1} \rhd {#2} \From {#3}} \newcommand{\checkp}[2]{{#1} \leadsto {#2}} \newcommand{\spine}[4]{{#1} \vdash {#2} : {#3} \gg {#4}} \newcommand{\tensor}{\otimes} \newcommand{\lolli}{\multimap} \newcommand{\letv}[2]{\mathsf{let}\;{#1}={#2}\;\mathsf{in}\;} \newcommand{\unit}{\langle\rangle} \newcommand{\letunit}[1]{\letv{\unit}{#1}} \newcommand{\pair}[2]{\left\langle{#1}, {#2}\right\rangle} \newcommand{\letpair}[3]{\letv{\pair{#1}{#2}}{#3}}\]

In the traditional recipe for bidirectional typechecking, introduction forms are checked, and the principal subterm of elimination forms are inferred. However, a while back Noam Zeilberger remarked to me that in multiplicative linear logic, bidirectional typechecking worked just as well if you did it backwards. It is worth spelling out the details of this remark, and so this blog post.

First, let's give the types and grammar of multiplicative linear logic.

\[ \begin{array}{llcl} \mbox{Types} & A & ::= & 1 \bnfalt A \tensor B \bnfalt A \lolli B \\ \mbox{Terms} & e & ::= & x \bnfalt \lam{x}{e} \bnfalt e\,e' \\ & & | & \unit \bnfalt \letunit{e}{e'} \\ & & | & \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'} \\ \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x \From A \\ \end{array} \]

Our types are the unit type \(1\), the tensor product \(A \tensor B\), and the linear function space \(A \lolli B\). The unit and pair have the expected introduction forms \(\unit\) and \(\pair{e}{e'}\), and they have "pattern matching" style elimination forms. Functions are introduced with lambdas \(\lam{x}{e}\) and eliminated with applications \(e\,e'\) as usual, and of course variable references \(x\) as usual. Contexts are a bit unusual -- they pair together variables and their types as usual, but instead of treating a variable as a placeholder for a synthesizing term, we treat variables as placeholders for checking terms.

Now, let's go through the typing rules. First, we give the introduction and elimination rules for the unit type.

\[ \begin{array}{ll} \rule{ } { \synth{\cdot}{\unit}{1} } & \rule{ \synth{\Delta}{e'}{A} & \check{\Gamma}{e}{1} } { \synth{\Gamma, \Delta}{\letunit{e}{e'}}{A} } \\[1em] \end{array} \]

The introduction rule says that in an empty context, the unit value \(\unit\) synthesizes the type \(1\). The pattern-matching style elimination \(\letunit{e}{e'}\) typechecks as follows. First, we infer a type \(A\) for the body \(e'\), and then we check that the scrutinee \(e\) has the unit type \(1\).

This order is backwards from traditional bidirectional systems -- we synthesize a type for the continuation first, before checking the type of the data we are eliminating. In the case of units, this is a mere curiosity, but it gets more interesting with the tensor product type \(A \tensor B\).

\[ \begin{array}{ll} \rule{ \synth{\Gamma}{e}{A} & \synth{\Delta}{e'}{B} } { \synth{\Gamma, \Delta}{\pair{e}{e'}}{A \tensor B} } & \rule{ \synth{\Gamma, x \From A, y \From B}{e'}{C} & \check{\Delta}{e}{A \tensor B} } { \synth{\Gamma, \Delta}{\letpair{x}{y}{e}{e'}}{C} } \end{array} \]

Now, the synthesis for pairs remains intuitive. For a pair \(\pair{e}{e'}\), we first infer a type \(A\) for \(e\), and a type \(B\) for \(e'\), and then conclude that the pair has the type \(A \tensor B\). However, the typing of the pair elimination \(\letpair{x}{y}{e}{e'}\) is much wilder.

In this rule, we first check that the continuation \(e'\) has the type \(C\), and then we learn from typechecking \(e'\) that \(x\) and \(y\) were required to have had types \(A\) and \(B\) respectively. This gives us the data that we need to check that \(e\) has the type \(A \tensor B\).

The linear function type \(A \lolli B\) has a similar character:

\[ \begin{array}{ll} \rule{ \synth{\Gamma, x \From A}{e}{B} } { \synth{\Gamma}{\lam{x}{e}}{A \lolli B} } & \rule{ \synth{\Gamma}{e'}{A} & \check{\Delta}{e}{A \lolli B} } { \check{\Gamma, \Delta}{e\,e'}{B} } \end{array} \]

Here, to infer at type for the introduction form \(\lam{x}{e}\), we infer a type \(B\) for the body \(e\), and then look up what type \(A\) the parameter \(x\) was required to be for the body to typecheck. To check that an application \(e\,e'\) has the type \(B\), we infer a type \(A\) for the argument \(e'\), and then check that the function \(e\) has the function type \(A \lolli B\).

Again, the checking/synthesis mode of thse rules are precisely reversed from usual bidirectional type systems. We can see how this reversal plays out for variables below:

\[ \begin{array}{ll} \rule{ } { \check{x \From A}{x}{A} } & \rule{ \synth{\Gamma}{e}{A} & A = B} { \check{\Gamma}{e}{B} } \end{array} \]

Here, when we check that the variable \(x\) has the type \(A\), the context must be such that it demands \(x\) to have the type \(A\). (However, the switch between checking and synthesis is the same as ever.)

If you are used to regular bidirectional systems, the information flow in the variable rule (as well as for pattern matching for pairs and lambda-abstraction for functions) is a bit unexpected. We are used to having a context tell us what types each variable has. However, in this case we are not doing that! Instead, we use it to record the types that the rest of the program requires
variables to have.

This is still a "well-moded" program in the sense of logic programming. However, the moding is a bit more exotic now -- within a context, the variables are inputs, but their types are outputs. This is a bit fancier than the mode systems that usual logic programming languages have, but people have studied mode systems which can support this (such as Uday Reddy's A Typed Foundation for Directional Logic Programming).

As far as the metatheory of this system goes, I don't know very much about it. Substitution works fine -- you can easily prove a theorem of the form:

Theorem (Substitution) If \(\check{\Delta}{e}{A}\), then

If \(\check{\Gamma, x \From A, \Theta}{e'}{C}\) then \(\check{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\).

If \(\synth{\Gamma, x \From A, \Theta}{e'}{C}\) then \(\synth{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\)

However, I don't presently know a good characterization of the kind of terms are typable under this discipline. E.g., in the standard bidirectional presentation, the annotation-free terms are precisely the \(\beta\)-normal terms. However, in the reverse bidirectional system, that is not the case.

Two papers that seem closely related to this system are:

Adam Chlipala, Leaf Petersen, and Robert Harper's TLDI 2005 paper, Strict Bidirectional Type Checking, and
Ningning Xie and Bruno C. d. S. Oliveira's ESOP 2018 paper, Let Arguments Go First.

Adam and company's paper includes the traditional synthesizing bidirectional hypotheses, as well as checking hypotheses very similar to the ones in this post, but inspired by relevance logic rather than linear logic. The basic idea is that if a hypothesis is relevant, then it is okay to let checking determine its type, since we are guaranteed that the variable will appear in some checking context (which will tell us what type it should have). The same idea applies here, since linearity necessarily implies relevance.

Ningning and Bruno's paper has an application rule that looks exactly like the one in this paper -- argument types are synthesized, which permits inferring the type of a function head in a checking context. However, their system is focused on inferring polymorphic types, which makes the precise relationship a bit unclear to me.

The implementation of reverse bidirectionality is just as easy as traditional bidirectional systems, but I will leave that for my next post.