Monday, August 12, 2019

Michael Arntzenius on the Future of Coding Podcast

My PhD student Michael Arntzenius is on the Future of Coding podcast, talking about the things he has learned while developing Datafun. In the course of a PhD, people tend to work on extremely refractory problems, which is a useful discipline for developing and sharpening your worldview. Since Michael is a particularly thoughtful person, he's developed a particularly thoughtful view of computing, and so the interview is well worth reading or listening to.

This episode explores the intersections between various flavors of math and programming, and the ways in which they can be mixed, matched, and combined. Michael Arntzenius, "rntz" for short, is a PhD student at the University of Birmingham building a programming language that combines some of the best features of logic, relational, and functional programming. The goal of the project is "to find a sweet spot of something that is more powerful than Datalog, but still constrained enough that we can apply existing optimizations to it and imitate what has been done in the database community and the Datalog community." The challenge is combining the key part of Datalog (simple relational computations without worrying too much underlying representations) and of functional programming (being able to abstract out repeated patterns) in a way that is reasonably performant.

This is a wide-ranging conversation including: Lisp macros, FRP, Eve, miniKanren, decidability, computability, higher-order logics and their correspondence to higher-order types, lattices, partial orders, avoiding logical paradoxes by disallowing negation (or requiring monotonicity) in self reference (or recursion), modal logic, CRDTS (which are semi-lattices), and the place for formalism is programming. This was a great opportunity for me to brush up on (or learn for the first time) some useful mathematical and type theory key words. Hope you get a lot out of it as well -- enjoy!

I'd like to thank Steve Krouse for interviewing Michael. This interview series is a really nice idea -- a lot of the intuition or worldview underpinning research programs never makes it into a paper, but can be brought out nicely in a conversation.

Also, if you prefer reading to listening, the link also has a a transcript of the podcast available.

Thursday, July 11, 2019

All Comonads, All The Time

(If you are on the POPL 2020 program committee, please stop reading until after your reviews are done.)

It's been a few months since I've last blogged, so I figure I should say what I have been up to lately. The answer seems to be comonads. I have two new draft papers, both of which involve comonadic type theories in a fairly critical way. (I suppose that this makes me a type-theoretic hipster, who doesn't do monads now that they're popular.)

Monday, May 13, 2019

Implementing Inverse Bidirectional Typechecking

In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In

In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In this post, let's implement it! What follows is a bit of OCaml code implementing the type system of the previous post.

First, let's give a data type to represent the types of the linear type system. As usual, we will have a datatype tp with one constructor for each grammatical production. In the comment next to each constructor, I'll give the term that the constructor corresponds to.

type tp = 
  | One                (* represents 1     *)
  | Tensor of tp * tp  (* represents A ⊗ B *) 
  | Lolli of tp * tp   (* represents A ⊸ B *) 

Now, we can give a datatype to represent expressions. We'll represent variables with strings, and use the datatype exp to represent expressions. As before, there is a comment connecting the datatype to the expressions of the grammar.

type var = string

type exp = 
  | Unit                               (* represents ()                  *)
  | LetUnit of exp * exp               (* represents let () = e in e'    *)
  | Pair of exp * exp                  (* represents (e, e')             *)
  | LetPair of var * var * exp * exp   (* represents let (x,y) = e in e' *)
  | Lam of var * exp                   (* represents λx. e               *)
  | App of exp * exp                   (* represents e e'                *)
  | Var of var                         (* represents x                   *)

Now we have to do something annoying, and implement some functions on the option datatype which really should be in the standard library. Basically we just want the standard functional programming structure on option types -- folds, maps, and monadic structure -- so we just go ahead an implement it.

module Option = struct
  type 'a t = 'a option

  let map f = function
    | None -> None
    | Some x -> Some (f x)

  let return x = Some x 

  let fail = None

  let (>>=) m f = 
    match m with
    | None -> None
    | Some x -> f x

   let fold some none = function
     | None -> none
     | Some x -> some x  

Now, we can actually implement the bidirectional typechecker. To understand the implementation, it's actually helpful to understand the interface, first.

module type TYPING =  sig
  type ctx = (var * tp option) list
  type 'a t = ctx -> ('a * ctx) option 

  val map : ('a -> 'b) -> 'a t -> 'b t 
  val return : 'a -> 'a t
  val ( >>= ) : 'a t -> ('a -> 'b t) -> 'b t

  val synth : exp -> tp t
  val check : exp -> tp -> unit t

The basic structure of our typechecker is to give a pair of operations check and synth, which respectively check that an expression e has a type tp, and infer a type for an expression. This function will be written in a monadic style, so we also have a type constructor 'a t for typechecking computations, and the usual assortment of functorial (map) and monadic (return and >>=) structure for this type.

The monadic type constructor 'a t is a pretty basic state-and-exception monad. It plumbs the context (of type ctx) through the computation, and can either return a value and an updated context, or it will fail.

An interesting feature of this context representation is that it does not map variables to types – it maps them to the option type tp option. This is because of the way that the moding will work out; the type is an output of the typing relation, and so when we put a variable into the context, we will not give it a type, and use the computation to ascribe it a type, which will be reflected in the output context. This is also why we use a full state monad rather than a reader monad for the context – we are basically implementing part of Prolog's substitution threading here.

We will also need a number of operations to implement the typechecker.

  val fail : 'a t
  val checkvar : var -> tp -> unit t
  val lookup : var -> tp t 
  val withvar : var -> 'a t -> 'a t
  val tp_eq : tp -> tp -> unit t

We will need to fail in order to judge programs ill-typed. The checkvar x tp operation gives the variable x the type tp. The lookup x operation will look in the context to find a a type for x, failing if x has not yet been given a type. The operation withvar x m will run the monadic computation m in a context extended with the variable x. (No type is given for the variable, because it's the job of m to give the variable a type.) Finall, there's an equality test tp_eq tp1 tp2, that acts as a guard, failing if the two arguments are unequal.

Now, we can move on to the actual implementation.

module Typing : TYPING = struct
  type ctx = (var * tp option) list 

  type 'a t = ctx -> ('a * ctx) option 

  let map f m ctx = 
    let open Option in 
    m ctx >>= fun (x, ctx) -> 
    return (f x, ctx)

  let return x = fun ctx -> Some(x, ctx)

  let (>>=) m f = fun ctx -> 
    let open Option in 
    m ctx >>= fun (a, ctx') -> 
    f a ctx'

As promised, the computation type is a state-and-exception monad, and the implementation of map and the monadic unit and bind are pretty unsurprising. More interesting are the implementations of the actual operations in the monadic interface.

  let fail : 'a t = fun ctx -> None 

Failure is easy to implement – it just ignores the context, and then returns None.

  let rec checkvar (x : var) (tp : tp) : unit t = fun ctx -> 
    let open Option in 
    match ctx with
    | [] -> fail 
    | (y, None) :: rest when x = y -> return ((), (y, Some tp) :: rest)
    | (y, Some _) :: rest when x = y -> fail 
    | h :: rest -> checkvar x tp rest >>= fun ((), rest') -> 
                   return ((), h :: rest')

The way that checkvar x tp works is that it iterates through the variables in the context, looking for the hypothesis which matches the variable x. When it finds it, it returns an updated context with the type of x set to Some tp. If the variable is already set, then that means that this is the second use of the variable, and so checkvar fails – this enforces the property that variables are used at most one time. If the variable isn't in the context, then checkvar also fails, because this is an out-of-scope variable reference. All other hypotheses are left unchanged.

  let lookup x (ctx : ctx) = 
    match List.assoc_opt x ctx with
    | None ->
    | Some None ->
    | Some (Some tp) -> Option.return(tp, ctx)

The lookup x computation is even simpler – it returns tp if (x, Some tp) is in the context, and fails otherwise.

  let withvar (type a) (x : var) (m : a t) : a t = fun ctx -> 
    let open Option in 
    m ((x, None) :: ctx) >>= function
    | (r, (y, Some _) :: ctx') when x = y -> return (r, ctx') 
    | (r, (y, None) :: ctx') when x = y -> fail 
    | _ -> assert false

The withvar x m operation extends the context with the variable x, and then runs m in the extended context.

An invariant our context representation maintains is that the output context has exactly the same variables in exactly the same order as the input context, and so we just pop off the first variable of the output context before returning, checking to make sure that the type of the variable has been set (i.e., Some _) to ensure that the variable was used at least one time. In conjunction with checkvar ensuring that the variable is used at most one time, this will ensure that each variable is used exactly one time.

If the first variable of the output context isn't x, or if the output context is empty, then our invariant is broken, and so we signal an assertion failure.

  let tp_eq (tp1 : tp) (tp2 : tp) = if tp1 = tp2 then return () else fail 

The type_eq tp1 tp2 function just turns a boolean test into a guard. Now, we can go through the synthesis and checking functions clause-by-clause:

  let rec synth = function
    | Unit -> return One

We synthesize the unit type for the unit value.

    | Pair(e1, e2) -> synth e1 >>= fun tp1 -> 
                      synth e2 >>= fun tp2 -> 
                      return (Tensor(tp1, tp2))

To synthesize a type for a pair, we synthesize types for each of the components, and then return their tensor product.

    | Lam(x, e) -> withvar x (synth e >>= fun ret_tp -> 
                              lookup x >>= fun arg_tp -> 
                              return (Lolli(arg_tp, ret_tp)))

Functions are interesting, because we need to deal with variables, and evaluation order plays out in a neat way here. We infer a type ret_tp for the body e, and then we look up the type tp_arg that the body e ascribed to the variable x. This lets us give a type Lolli(tp_arg, tp_ret) for the whole function.

    | LetUnit(e, e') -> check e One >>= fun () -> 
                        synth e'

To synthesize a type for unit elimination, we synthesize a type for the body, and check that the scrutinee has the unit type One.

    | LetPair(x, y, e, e') -> 
       withvar y (withvar x (synth e' >>= fun res_tp -> 
                             lookup x >>= fun tp1 -> 
                             lookup y >>= fun tp2 -> 
                             check e (Tensor(tp1, tp2)) >>= fun () -> 
                             return res_tp))

To eliminate a pair, we introduce (using withvar) scopes for the variables x and y, and then:

  1. We synthesize a type res_tp for the continuation e'.
  2. Since e' used x and y, we can look up the types they were used at (binding the type of x to tp1 and the type of y to tp2).
  3. Then, we check that the scrutinee e has the type Tensor(tp1, tp2).
  4. Finally, we return the type res_tp for the type of the whole expression.
    | App(_, _) -> fail 
    | Var _ -> fail 

Since applications and variable references are checking, not synthesizing, we fail if we see one of them in synthesizing position. If they are in checking position, we can use the check function to typecheck them:

  and check (e : exp) (tp : tp) : unit t = 
    match e with 
    | Var x -> checkvar x tp 

The variable case simply uses checkvar.

    | App(e1, e2) -> synth e2 >>= fun tp_arg -> 
                     check e1 (Lolli(tp_arg, tp))

To check an application e1 e2 at a type tp, we first synthesize the argument type by inferring a type tp_arg for e2, and then we check that e1 has the function type Lolli(tp_arg, tp).

    | e -> synth e >>= tp_eq tp

Finally, when we find a synthesizing term in checking position, we infer a type for it and then see if it is equal to what we expected.

This code is, at-best, lightly-tested, but I knocked together a small Github repository with the code. Enjoy!

Friday, May 10, 2019

Inverting Bidirectional Typechecking

\[\newcommand{\bnfalt}{\;\;|\;\;} \newcommand{\To}{\Rightarrow} \newcommand{\From}{\Leftarrow} \newcommand{\rule}[2]{\frac{\displaystyle \begin{matrix} #1 \end{matrix}}{\displaystyle #2}} \newcommand{\check}[3]{{#1} \vdash {#2} \From {#3}} \newcommand{\synth}[3]{{#1} \vdash {#2} \To {#3}} \newcommand{\lam}[1]{\lambda {#1}.\,} \newcommand{\inj}[2]{\mathsf{in}_{#1}({#2})} \newcommand{\case}[5]{\mathsf{case}({#1}, \inj{1}{#2} \to {#3}, \inj{2}{#4} \to {#5})} \newcommand{\match}[2]{\mathsf{match}\;{#1}\;\mathsf{of}\;[ #2 ]} \newcommand{\arm}[2]{{#1} \to {#2}} \newcommand{\downshift}[1]{\downarrow{#1}} \newcommand{\upshift}[1]{\uparrow{#1}} \newcommand{\thunk}[1]{\left\{{#1}\right\}} \newcommand{\return}[1]{\mathsf{return}\;#1} \newcommand{\fun}[1]{\lambda\;#1} \newcommand{\checkn}[3]{{#1} \rhd {#2} \From {#3}} \newcommand{\checkp}[2]{{#1} \leadsto {#2}} \newcommand{\spine}[4]{{#1} \vdash {#2} : {#3} \gg {#4}} \newcommand{\tensor}{\otimes} \newcommand{\lolli}{\multimap} \newcommand{\letv}[2]{\mathsf{let}\;{#1}={#2}\;\mathsf{in}\;} \newcommand{\unit}{\langle\rangle} \newcommand{\letunit}[1]{\letv{\unit}{#1}} \newcommand{\pair}[2]{\left\langle{#1}, {#2}\right\rangle} \newcommand{\letpair}[3]{\letv{\pair{#1}{#2}}{#3}}\]

In the traditional recipe for bidirectional typechecking, introduction forms are checked, and the principal subterm of elimination forms are inferred. However, a while back Noam Zeilberger remarked to me that in multiplicative linear logic, bidirectional typechecking worked just as well if you did it backwards. It is worth spelling out the details of this remark, and so this blog post.

First, let's give the types and grammar of multiplicative linear logic.

\[ \begin{array}{llcl} \mbox{Types} & A & ::= & 1 \bnfalt A \tensor B \bnfalt A \lolli B \\ \mbox{Terms} & e & ::= & x \bnfalt \lam{x}{e} \bnfalt e\,e' \\ & & | & \unit \bnfalt \letunit{e}{e'} \\ & & | & \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'} \\ \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x \From A \\ \end{array} \]

Our types are the unit type \(1\), the tensor product \(A \tensor B\), and the linear function space \(A \lolli B\). The unit and pair have the expected introduction forms \(\unit\) and \(\pair{e}{e'}\), and they have "pattern matching" style elimination forms. Functions are introduced with lambdas \(\lam{x}{e}\) and eliminated with applications \(e\,e'\) as usual, and of course variable references \(x\) as usual. Contexts are a bit unusual -- they pair together variables and their types as usual, but instead of treating a variable as a placeholder for a synthesizing term, we treat variables as placeholders for checking terms.

Now, let's go through the typing rules. First, we give the introduction and elimination rules for the unit type.

\[ \begin{array}{ll} \rule{ } { \synth{\cdot}{\unit}{1} } & \rule{ \synth{\Delta}{e'}{A} & \check{\Gamma}{e}{1} } { \synth{\Gamma, \Delta}{\letunit{e}{e'}}{A} } \\[1em] \end{array} \]

The introduction rule says that in an empty context, the unit value \(\unit\) synthesizes the type \(1\). The pattern-matching style elimination \(\letunit{e}{e'}\) typechecks as follows. First, we infer a type \(A\) for the body \(e'\), and then we check that the scrutinee \(e\) has the unit type \(1\).

This order is backwards from traditional bidirectional systems -- we synthesize a type for the continuation first, before checking the type of the data we are eliminating. In the case of units, this is a mere curiosity, but it gets more interesting with the tensor product type \(A \tensor B\).

\[ \begin{array}{ll} \rule{ \synth{\Gamma}{e}{A} & \synth{\Delta}{e'}{B} } { \synth{\Gamma, \Delta}{\pair{e}{e'}}{A \tensor B} } & \rule{ \synth{\Gamma, x \From A, y \From B}{e'}{C} & \check{\Delta}{e}{A \tensor B} } { \synth{\Gamma, \Delta}{\letpair{x}{y}{e}{e'}}{C} } \end{array} \]

Now, the synthesis for pairs remains intuitive. For a pair \(\pair{e}{e'}\), we first infer a type \(A\) for \(e\), and a type \(B\) for \(e'\), and then conclude that the pair has the type \(A \tensor B\). However, the typing of the pair elimination \(\letpair{x}{y}{e}{e'}\) is much wilder.

In this rule, we first check that the continuation \(e'\) has the type \(C\), and then we learn from typechecking \(e'\) that \(x\) and \(y\) were required to have had types \(A\) and \(B\) respectively. This gives us the data that we need to check that \(e\) has the type \(A \tensor B\).

The linear function type \(A \lolli B\) has a similar character:

\[ \begin{array}{ll} \rule{ \synth{\Gamma, x \From A}{e}{B} } { \synth{\Gamma}{\lam{x}{e}}{A \lolli B} } & \rule{ \synth{\Gamma}{e'}{A} & \check{\Delta}{e}{A \lolli B} } { \check{\Gamma, \Delta}{e\,e'}{B} } \end{array} \]

Here, to infer at type for the introduction form \(\lam{x}{e}\), we infer a type \(B\) for the body \(e\), and then look up what type \(A\) the parameter \(x\) was required to be for the body to typecheck. To check that an application \(e\,e'\) has the type \(B\), we infer a type \(A\) for the argument \(e'\), and then check that the function \(e\) has the function type \(A \lolli B\).

Again, the checking/synthesis mode of thse rules are precisely reversed from usual bidirectional type systems. We can see how this reversal plays out for variables below:

\[ \begin{array}{ll} \rule{ } { \check{x \From A}{x}{A} } & \rule{ \synth{\Gamma}{e}{A} & A = B} { \check{\Gamma}{e}{B} } \end{array} \]

Here, when we check that the variable \(x\) has the type \(A\), the context must be such that it demands \(x\) to have the type \(A\). (However, the switch between checking and synthesis is the same as ever.)

If you are used to regular bidirectional systems, the information flow in the variable rule (as well as for pattern matching for pairs and lambda-abstraction for functions) is a bit unexpected. We are used to having a context tell us what types each variable has. However, in this case we are not doing that! Instead, we use it to record the types that the rest of the program requires
variables to have.

This is still a "well-moded" program in the sense of logic programming. However, the moding is a bit more exotic now -- within a context, the variables are inputs, but their types are outputs. This is a bit fancier than the mode systems that usual logic programming languages have, but people have studied mode systems which can support this (such as Uday Reddy's A Typed Foundation for Directional Logic Programming).

As far as the metatheory of this system goes, I don't know very much about it. Substitution works fine -- you can easily prove a theorem of the form:

Theorem (Substitution) If \(\check{\Delta}{e}{A}\), then

  1. If \(\check{\Gamma, x \From A, \Theta}{e'}{C}\) then \(\check{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\).
  2. If \(\synth{\Gamma, x \From A, \Theta}{e'}{C}\) then \(\synth{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\)

However, I don't presently know a good characterization of the kind of terms are typable under this discipline. E.g., in the standard bidirectional presentation, the annotation-free terms are precisely the \(\beta\)-normal terms. However, in the reverse bidirectional system, that is not the case.

Two papers that seem closely related to this system are:

  1. Adam Chlipala, Leaf Petersen, and Robert Harper's TLDI 2005 paper, Strict Bidirectional Type Checking, and

  2. Ningning Xie and Bruno C. d. S. Oliveira's ESOP 2018 paper, Let Arguments Go First.

Adam and company's paper includes the traditional synthesizing bidirectional hypotheses, as well as checking hypotheses very similar to the ones in this post, but inspired by relevance logic rather than linear logic. The basic idea is that if a hypothesis is relevant, then it is okay to let checking determine its type, since we are guaranteed that the variable will appear in some checking context (which will tell us what type it should have). The same idea applies here, since linearity necessarily implies relevance.

Ningning and Bruno's paper has an application rule that looks exactly like the one in this paper -- argument types are synthesized, which permits inferring the type of a function head in a checking context. However, their system is focused on inferring polymorphic types, which makes the precise relationship a bit unclear to me.

The implementation of reverse bidirectionality is just as easy as traditional bidirectional systems, but I will leave that for my next post.

Tuesday, April 2, 2019

Some Paper Announcements

It has been a busy year for me so far, and it's nice to be able to tell the world about some of the fruits of that labour. In the next few months my coauthors and I will be presenting a few papers at PLDI and ECOOP. All of these papers will probably change a bit more before assuming their final forms, but I'd like to circulate their existence to give non-reviewers a chance to say their bit.

  • A Typed, Algebraic Approach to Parsing, Neel Krishnaswami and Jeremy Yallop. Draft, accepted for publication at PLDI 2019.

    In this paper, we recall the definition of the context-free expressions (or µ-regular expressions), an algebraic presentation of the context-free languages. Then, we define a core type system for the context-free expressions which gives a compositional criterion for identifying those context-free expressions which can be parsed unambiguously by predictive algorithms in the style of recursive descent or LL(1).

    Next, we show how these typed grammar expressions can be used to derive a parser combinator library which both guarantees linear-time parsing with no backtracking and single-token lookahead, and which respects the natural denotational semantics of context-free expressions. Finally, we show how to exploit the type information to write a staged version of this library, which produces dramatic increases in performance, even outperforming code generated by the standard parser generator tool ocamlyacc.

    In this paper we redo some classical results in parsing theory using the tools of type theory and semantics, and the results is very satisfying. Partly, this is due to the fact that the techniques we used are actually all known, but merely by chance happened to fall out of fashion.

    It was also nice seeing just how big a win staging is, and how comprehensively it erases the costs of using combinators. I'd really like to take another look at using staged type systems as the basis of a macro system a la MacroML.

  • NumLin: Linear Types for Linear Algebra, Dhruv Makwana and Neel Krishnaswmi. Draft, conditionally accepted for publication at ECOOP 2019.

    We present NumLin, a functional programming language designed to express the APIs of low-level linear algebra libraries (such as BLAS/LAPACK) safely and explicitly, through a brief description of its key features and several illustrative examples. We show that NumLin’s type system is sound and that its implementation improves upon naïve implementations of linear algebra programs, almost towards C-levels of performance. Lastly, we contrast it to other recent developments in linear types and show that using linear types and fractional permissions to express the APIs of low-level linear algebra libraries is a simple and effective idea.

    One of the things that has always surprised me about linear types is that it has always seemed to me like they would be a natural fit for working with array/matrix programs, and yet if you look around there is surprisingly little work on this. In this paper Dhruv Makwana and I decided to try it to see if it could actually worked, and we learned two things:

    1. Linear types are indeed a good fit for array programming.
    2. But really making it work calls for fractional permissions, and to date the type systems for working with fractions have been surprisingly complicated.
    3. However, we came up with a trick for drastically reducing the implementation complexity of fractions, to the point that it's barely more complicated than standard HM type inference.
  • A Program Logic for First-Order Encapsulated WebAssembly, Conrad Watt, Petar Maksimov, Neel Krishnaswami, Phillipa Gardner. Draft, accepted for publication at ECOOP 2019.

    We introduce Wasm Logic, a sound program logic for first-order, encapsulated WebAssembly.

    We design a novel assertion syntax, tailored to WebAssembly's stack-based semantics and the strong guarantees given by WebAssembly's type system, and show how to adapt the standard separation logic triple and proof rules in a principled way to capture WebAssembly's uncommon structured control flow. Using Wasm Logic, we specify and verify a simple WebAssembly B-tree library, giving abstract specifications independent of the underlying implementation.

    We mechanise Wasm Logic and its soundness proof in full in Isabelle/HOL. As part of the soundness proof, we formalise and fully mechanise a novel, big-step semantics of WebAssembly, which we prove equivalent, up to transitive closure, to the original WebAssembly small-step semantics.

    For me, there was a bit of nostalgia in this paper: I started out my research career working on separation logic, and it was nice to be able to do that stuff again. Also, it was nice to find out what the heck Webassembly actually is!

Friday, August 10, 2018

Polarity and bidirectional typechecking

This past July, Max New asked me about the relationship between bidirectional typechecking and semantic notions like polarization and call-by-push-value. I told him it was complicated, and that I would try to write something about the relationship. I was reminded of this since a couple of days ago, Conor McBride wrote a blog post where he laid out how he approaches bidirectional type systems. This is a fantastic post, and you should read it (probably twice or more, actually).

In his post, he remarks:

I like my type systems to be bidirectional. Different people mean different things by the word “bidirectional”, and I’m one of them. Some people relate the notion to “polarity”, and I’m not one of them.

I'm someone who would like to join the club Conor abjures, but I don't know how to!

It sure looks like bidirectional typechecking has a deep relationship to polarization or call-by-push-value, but I don't know how to make it work correctly. So, in this post is to explain why people think it does some deep semantic import, and then I'll talk about the mismatch that creates a problem I don't know the right way to handle. (Also, I apologize in advance for the lack of citations -- I really want to get this post out the door. If I didn't cite a paper you like (or even wrote), please link to it in the comments.)

The reason that people think that bidirectional typechecking must have a deep semantic meaning arises from how it works out in the simply-typed lambda calculus. Let's try writing some things down and seeing what happens. First, recall that bidirectional typechecking categorizes all terms as either "checking" or "synthesizing" and introduces two mutually recursive typing judgements for them.

\[\newcommand{\bnfalt}{\;\;|\;\;} \newcommand{\To}{\Rightarrow} \newcommand{\From}{\Leftarrow} \newcommand{\rule}[2]{\frac{\displaystyle \begin{matrix} #1 \end{matrix}}{\displaystyle #2}} \newcommand{\check}[3]{{#1} \vdash {#2} \From {#3}} \newcommand{\synth}[3]{{#1} \vdash {#2} \To {#3}} \newcommand{\lam}[1]{\lambda {#1}.\;} \newcommand{\inj}[2]{\mathsf{in}_{#1}({#2})} \newcommand{\case}[5]{\mathsf{case}({#1}, \inj{1}{#2} \to {#3}, \inj{2}{#4} \to {#5})} \newcommand{\match}[2]{\mathsf{match}\;{#1}\;\mathsf{of}\;[ #2 ]} \newcommand{\arm}[2]{{#1} \to {#2}} \newcommand{\downshift}[1]{\downarrow{#1}} \newcommand{\upshift}[1]{\uparrow{#1}} \newcommand{\thunk}[1]{\left\{{#1}\right\}} \newcommand{\return}[1]{\mathsf{return}\;#1} \newcommand{\fun}[1]{\lambda\;#1} \newcommand{\checkn}[3]{{#1} \rhd {#2} \From {#3}} \newcommand{\checkp}[2]{{#1} \leadsto {#2}} \newcommand{\spine}[4]{{#1} \vdash {#2} : {#3} \gg {#4}} \newcommand{\unit}{()} \]

\[ \begin{array}{llcl} \mbox{Types} & A & ::= & b \bnfalt A \to B \\ \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x:A \\ \mbox{Checking} & e & ::= & \lam{x}{e} \bnfalt t \\ \mbox{Synthesizing} & t & ::= & x \bnfalt t\;e \bnfalt e : A \\ \end{array} \]

We'll start with a pretty minimal calculus -- we've got a base type \(b\), and functions. Contexts work as usual, giving variables their types, but terms are divided into checking terms and synthesizing terms. Checking terms are either introduction forms (just lambda-abstraction, here) or any synthesizing term (the intuition is that if we can infer a type for a term, we can surely check its type as well). Synthesizing terms are either variables (we can just look up their type in the context) applications, or explicitly annotated terms.

\[\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ \synth{\Gamma}{t}{A} & A = B } { \check{\Gamma}{t}{B} } & \rule{ \check{\Gamma}{e}{A} } { \synth{\Gamma}{e : A}{A} } \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}\]

You can see the intro/elim pattern in the first line -- when we check a lambda-abstraction (an introduction form) against a type \(A \to B\), we put the variable \(x\) in the context at type \(A\), and check the body at type \(B\). When we apply a function, we first infer a type \(A \to B\) for the function expression, which gives us a type \(A\) to check the argument against.

So far, this is pretty standard stuff. Now, let's tweak the rules slightly.

\[\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ \color{red}{\synth{\Gamma}{t}{b}} } { \color{red}{\check{\Gamma}{t}{b}} } & \color{red}{\mbox{(No annotation rule)}} \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}\]

Now, we've made two changes. First, we've deleted the annotation rule, and second, we've restricted the checking/synthesis switch to only occur at base types \(b\).

  1. The effect of deleting the annotation rule is that it is not longer possible to write beta-reducible terms. A term in the simply-typed lambda-calculus written \((\lam{x:b}{x})\;y\) which reduces to \(y\) would be written \((\lam{x}{x} : b \to b)\;y\) in a bidirectional system -- but without annotations these terms can no longer be written.

  2. The effect of restricting the check/synth switch rule is that it is no longer possible to write eta-expandable terms. If \(f : b \to b \to b\) and \(x : b\), then the term \(f\;x\) would synthesize the type \(b \to b\) in the original system. However, it no longer typechecks in our restricted system. To make it work, we have to eta-expand the term, so that we write \(\lam{y}{f\;x\;y}\) instead. This now checks against \(b \to b\) as we expect.

So the joint-effect of these two restrictions is that only beta-normal, eta-long terms typecheck. The reason these terms are so important is that any two beta-eta equal terms will have the same normal form. So having a simple, easy characterization of these normal forms is really great! Moreover, this characterization is easy to extend to products:

\[\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ } { \check{\Gamma}{()}{1} } & \mbox{(No unit elimination rule)} \\[1em] \rule{ \check{\Gamma}{e_1}{A_1} & \check{\Gamma}{e_2}{A_2} } { \check{\Gamma}{(e_1, e_2)}{A_1 \times A_2} } & \rule{ \synth{\Gamma}{e}{A_1 \times A_2} & i \in \{1,2\} } { \synth{\Gamma}{\pi_i(e)}{A_i} } \\[1em] \rule{ \synth{\Gamma}{t}{b} } { \check{\Gamma}{t}{b} } & \mbox{(No annotation rule)} \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}\]

This type system still characterizes normal forms in the STLC with units and products. Adding these constructors starts to give us a pattern:

  1. Introduction forms (lambda-abstractions, pairs, units) are checking.
  2. Elimination forms (applications, projections) are synthesizing.

Since units + pairs + functions are syntax for everything in cartesian closed categories, this is actually pretty wonderful. We seem to have a simple rule for characterizing beta-normal, eta-long forms.

But what happens when we try to add sums to the language? Let's try to follow our recipe, and see what happens:

\[\begin{matrix} \rule{ \check{\Gamma}{e}{A_i} & i \in \{1,2\} } { \check{\Gamma}{\inj{i}{e}}{A_1 + A_2} } & \rule{ \synth{\Gamma}{t}{A_1 + A_2} & \check{\Gamma, x_1:A_1}{e_1}{C} & \check{\Gamma, x_2:A_2}{e_2}{C} } { \check{\Gamma}{\case{t}{x_1}{e_1}{x_2}{e_2}}{C} } \end{matrix}\]

The introduction form seems to work. The elimination form is a bit more complicated -- it's the same syntax as always, but the checking/synthesis moding is a bit subtle. The expectation created by units/pairs/functions would be that both the scrutinee and the whole case form should synthesize, but expecting two branches with different contexts (i.e., \(\Gamma, x_1:A_1\) for \(e_1\) and \(\Gamma, x_2:A_2\) for \(e_2\)) to synthesize the same type is a morally dubious expectation (eg, it would not make sense in a dependently-typed setting). So we are led to say that case is checking, but that the scrutinee is synthesizing.

This imposes some restrictions on what does and doesn't count as a typeable term. For example, because \(\mathsf{case}\) is checking rather than synthesizing, we can never write an expression like:

\[a:((b \to A) + (b \to A)), x:b \vdash \case{a}{f}{f}{g}{g}\;x \From A\]

Instead of applying an argument to a case expression of function type, we have to push the arguments into the branches:

\[a:((b \to A) + (b \to A)), x:b \vdash \case{a}{f}{f\;x}{g}{g\;x} \From A\]

From the point of view of typing normal forms, this actually doesn't seem too bad, because most people would consider the second term simpler than the first, and so this gives us a "nicer" notion of normal form. However, this doesn't seem like a real explanation, since our rules permit things like the following:

\[\synth{f : b \to b, x:b+b}{f\;\case{x}{y}{y}{z}{z}}{b}\]

To get to a better explanation before the heat death of the universe, I'm going to skip over about 20 years of research, and jump straight to polarized type theory.

\[\begin{matrix} \mbox{Positive Types} & P & ::= & 1 \bnfalt P \times Q \bnfalt P + Q \bnfalt \downshift{N} \\ \mbox{Negative Types} & N & ::= & P \to N \bnfalt \upshift{P} \\ \mbox{Values} & v & ::= & () \bnfalt (v,v) \bnfalt \inj{i}{v} \bnfalt \thunk{t} \\ \mbox{Spines} & s & ::= & \cdot \bnfalt v\;s \\[1em] \mbox{Terms} & t & ::= & \return{v} \bnfalt \fun{\overrightarrow{\arm{p_i}{t_i}}} \bnfalt \match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}} \\ \mbox{Patterns} & p & ::= & () \bnfalt (p,p') \bnfalt \inj{i}{p} \bnfalt \thunk{x} \\[1em] \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, x:N \\ \mbox{Typing Values} & & & \check{\Gamma}{v}{P} \\ \mbox{Typing Spines} & & & \spine{\Gamma}{s}{N}{M} \\ \mbox{Typing Terms} & & & \checkn{\Gamma}{t}{N} \\ \mbox{Typing Patterns} & & & \checkp{p:P}{\Delta} \\ \end{matrix}\]

The key idea in polarized type theory is to divide types into two categories, the positive types (sums, strict products, and suspended computations, denoted by \(P\)) and the negative types (basically, functions, denoted by \(N\)). Positive types are basically those that are eliminated with pattern matching, and the negative types are the ones that are eliminated by supplying arguments. Negatives types can be embedded into positive types using the "downshift" type \(\downshift{N}\) (representing suspended computations) and positive types can be embedded into the negatives using the "upshift" \(\upshift{P}\) (denoting computations producing \(P\)'s).

The funny thing about this setup is that despite arising from meditation upon invariants of proof theory, we end up with a syntax that is much closer to practical functional programming languages than the pure typed lambda calculus! For example, syntax for polarized calculi tends to have pattern matching. However, one price of this is a proliferation of judgements. We usually end up introducing separate categories of values (for introducing positive types) and spines (argument lists for elminating negative types), as well as terms (how to put values and spines together in computations, as well as introducing negative types) and patterns (how to eliminate positive types).

Now, let's talk through the rules. First up is the \(\check{\Gamma}{v}{P}\) judgement for checking the type of positive values.

\[\begin{matrix} \rule{} { \check{\Gamma}{()}{1} } & \rule{ \check{\Gamma}{v}{P} & \check{\Gamma}{v'}{Q} } { \check{\Gamma}{(v,v')}{P \times Q} } \\[1em] \rule{ \check{\Gamma}{v}{P_i} & i \in \{1,2\} } { \check{\Gamma}{\inj{i}{v}}{P_1 + P_2} } & \rule{ \checkn{\Gamma}{t}{N} } { \check{\Gamma}{\thunk{t}}{\downshift{N}} } \end{matrix}\]

The rules for units, pairs and sums are the same as always. The rule for downshift says that if a term \(t\) checks at a negative type \(N\), then the thunked term \(\thunk{t}\) will check against the downshifted type \(\downshift{N}\).

We'll see the rules for terms in a bit, but next will come the rules for spines, in the judgement \(\spine{\Gamma}{s}{N}{M}\). This judgement says that if the spine \(s\) is applied to a head of type \(N\), it will produce a result of type \(M\). In this judgement, the type \(N\) is an algorithmic input, and the type \(M\) is an output.

\[\begin{matrix} \rule{ } { \spine{\Gamma}{\cdot}{N}{N} } \qquad \rule{ \check{\Gamma}{v}{P} & \spine{\Gamma}{s}{N}{M} } { \spine{\Gamma}{v\;s}{P \to N}{M} } % \\[1em] % \rule{ \forall i < n.\; & \checkp{p_i:P}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{M} } % { \spine{\Gamma}{\match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{\upshift{P}}{M} } \end{matrix}\]

The first rule says that if you have an empty argument list then the result is the same as the input, and the second rule says that if \(v\) is a value of type \(P\), and \(s\) is an argument list sending \(N\) to \(M\), then the extended argument list \(v\;s\) sends the function type \(P \to N\) to \(M\).

With values and spines in hand, we can talk about terms, in the term typing judgement \(\checkn{\Gamma}{t}{N}\), which checks that a term \(t\) has the type \(N\).

\[\begin{matrix} \rule{ \check{\Gamma}{v}{P} } { \checkn{\Gamma}{\return{v}}{\upshift{P}} } \qquad \rule{ \forall i < n.\; & \checkp{p_i:P}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{N} } { \checkn{\Gamma}{\fun{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{P \to N} } \\[1em] \rule{ x:M \in \Gamma & \spine{\Gamma}{s}{M}{\upshift{Q}} & \forall i < n.\; \checkp{p_i:Q}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{\upshift{P}} } { \checkn{\Gamma}{\match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{\upshift{P}} } \end{matrix}\]

The rule for \(\return{v}\) says that we can embed a value \(v\) of type \(P\) into the upshift type \(\upshift{P}\) by immediately returning it. Lambda abstractions are pattern-style -- instead of a lambda binder \(\lam{x}{t}\), we give a whole list of patterns and branches \(\fun{\overrightarrow{\arm{p_i}{t_i}}}\) to check at the type \(P \to N\). As a result, we need a judgement \(\checkp{p_i:P}{\Delta_i}\) which gives the types of the bindings \(\Delta_i\) of the pattern \(p_i\), and then we check each \(t_i\) against the result type \(N\).

The match statement \(\match{x\cdot s}{\overrightarrow{\arm{p_i}{t_i}}}\) also has similar issues in its typing rule. First, it finds a variable in the context, applies some arguments to it to find a value result of type \(\upshift{Q}\), and then pattern matches against type \(Q\). So we check that the spine \(s\) sends \(M\) to the type \(\upshift{Q}\), and then check that the patterns \(p_i\) yield variables \(\Delta_i\) at the type \(Q\), and then check the \(t_i\) against the type \(\upshift{P}\).

Restricting the type at which we can match forces us to eta-expand terms of function type. Also, these rules omit a side-condition for pattern coverage. (I have an old blog post about how to do that if you are curious.)

Both lambda-abstraction and application/pattern-matching need the judgement \(\checkp{p:P}{\Delta}\) to find the types of the bindings. The rules for these are straightforward: \[\begin{matrix} \rule{ } { \checkp{\thunk{x} {:} \downshift{\!N}}{x:N} } & \rule{ } { \checkp{\unit : 1}{\cdot} } \\[1em] \rule{ \checkp{p_1 : P_1}{\Delta_1} & \checkp{p_2 : P_2}{\Delta_2} } { \checkp{(p_1,p_2) : P_1 \times P_2}{\Delta_1, \Delta_2} } & \rule{ \checkp{p:P_i}{\Delta} & i \in \{1,2\} } { \checkp{\inj{i}{p} : P_1 + P_2}{\Delta} } \end{matrix}\]

Units yield no variables at type \(1\), pair patterns \((p_1, p_2)\) return the variables of each component, coproduct injections \(\inj{i}{p}\) return the variables of the sub-pattern \(p\), and thunk patterns \(\thunk{x}\) at type \(\downshift{N}\) return that variable \(x\) at type \(N\).

At this point, it sure looks like we have a perfect bidirectional type system for a polarized calculus. What's the problem? The problem is that I palmed a card! Here's the relevant bit of the grammar I kicked things off with: \[\begin{matrix} \ldots \\ \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, \color{red}{x:N} \\ \ldots \end{matrix}\]

The context \(\Gamma\) has been restricted to only contain variables of negative type. It doesn't allow variables of positive type! And, I don't know how to add it in the "right" way. If we wanted positive variables (in fact, call-by-push-value only has positive variables), we could add them in the following way:

\[\begin{matrix} \mbox{Values} & v & ::= & () \bnfalt (v,v) \bnfalt \inj{i}{v} \bnfalt \thunk{t} \bnfalt \color{red}{u} \\ \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, x:N \bnfalt \Gamma, \color{red}{u:P} \\ \mbox{Patterns} & p & ::= & () \bnfalt (p,p') \bnfalt \inj{i}{p} \bnfalt \thunk{x} \bnfalt \color{red}{u} \\[1em] \end{matrix}\] So we add value variables \(u\) to the syntax of values, and so we have to also add them to contexts, and also extend pattern matching with them to bind values. Then, the rules for these things would look like the following: \[\begin{matrix} \rule{ } {\checkp{u:P}{u:P}} & \rule{ u:Q \in \Gamma & Q = P } { \check{\Gamma}{u}{P} } \end{matrix}\]

So a variable pattern at value type simply binds the variable at that type, and when we use a value variable we have the check that the type in the context matches the type that we're checking the term at.

And that's the wrong thing to do! The bidirectional recipe says that we should check equality of types only when we switch between checking and synthesis, and so while this rule might or might not work, it's clearly not arranged the information flow properly, since we have a random-looking subsumption test in the value variable rule.

Some additional idea is needed, and I'm not sure what, yet.

Monday, August 6, 2018

Category Theory in PL research

Over on Twitter, Max New wrote:

Maybe the reason category theory seems useless for PL to many people is that it only gets really useful when you start doing "sophisticated" stuff like internal, enriched and fibered categories and the intros for computer scientists don't get to those things.

Max is a very sharp person, but I do not agree with this!

The most useful applications of category theory in my own research have all been very, very elementary, to the point that people who (unlike me) are genuinely good at category theory are always a little surprised that I can get away with such primitive machinery.

Basically, the key idea is to formulate models of languages/libraries in terms of categories of structured sets with structure-preserving maps (aka homomorphisms). And, well, that's it!

I don't have a really crisp explanation of the process of figuring out how to go from "I have a property I'd like to represent mathematically" to "sets should have the following structure and the following properties should be preserved by the homomorphisms", but I can give an example to illustrate the recipe.

  1. Let's suppose we want to control the memory usage of programs.

  2. Think of a type as a set, together with some operations talking about the property in question. In this case, let's say that a type is a pair \((X \in \mathrm{Set}, w_X : X \to \mathbb{N})\), where \(X\) are the elements of the type and the "weight function" \(w_X\) gives you a number for each value telling you how much memory gets used.

  3. Now, let's define what the functions \(f : (X, w_X) \to (Y, w_Y)\) should be. Since values are sets, obviously a function between two types should be a function \(f : X \to Y\) on the underlying sets. However, we have a weight function for each type, and so we should do something sensible them. The first, most obvious idea is to say that maps should preserve the weight exactly -- i.e., that \(w_Y(f(x)) = w_X(x)\).

    This means that memory is never allocated, but it also means that memory is never de-allocated. So since we probably do want to allow deallocation, it makes sense to weaken this condition to something more like \(w_Y(f(x)) \leq w_X(x)\).

  4. Now, we go off and prove things about this language. In short order we will discover that this is a monoidal closed category, and so we can give a linearly-typed language for memory-controlled programs.

This is basically the idea in the late Martin Hofmann's 2003 paper, Linear types and non-size-increasing polynomial time computation. He called this category the category of "length spaces". This paper was a revelation to me, because it was the first time that I really understood how plain old ordinary mathematics could be used to model genuinely computational phenomena.

There are a lot of examples like this -- here are four more I like:

  • Domains and continuous functions

    The great grand-parent of using structured sets to model computations is, of course, domain theory. In this setting, a type is a set along with a complete partial order on it, and functions must be continuous with respect to this order. The intuition is that an always-nonterminating program is the least informative program, and that as programs loop on fewer and fewer inputs, they get more and more defined. Continuity basically means that giving a program a more-defined input will never make it loop more often.

    Steve Vickers's book Topology via Logic was how I learned to think about domains in a way that brought the computational intuition to the forefront.

  • Nominal sets and equivariant maps

    In languages like ML, Haskell and Java, it is permissible to create pointers and test them for pointer equality, but it is not permissible to compare pointers for order (you can use == but you can't use <). This means that the order in which pointers are created is not observable, a fact which is very important for both the equational theory and optimization of programs.

    To model this, we can introduce a set of locations (aka names, or atoms). To model the fact that we can't observe the order, we can say that we want the meanings of programs to be independent of permutations of the names -- as long as we don't identify two locations, the program can't tell if we reorder the locations in the heap.

    So, we can abstractly model the idea of reordering locations by saying that a type is a pair \((X, a : Perm \times X \to X)\), where \(X\) is a set of values, and the "action" \(a\) tells you how to rename all the values in the set when you are given a permutation. Then, a morphism \(f : (X, a) \to (Y, b)\) will be a function such that \(b(\pi, f(x)) = f(a(\pi, x))\) -- that is, we get the same answer whether we rename the argument to \(f\) or apply the function and then rename.

    I learned this from Andy Pitts's work on nominal logic -- here is a nice introduction for SIGLOG news which he wrote.

  • Partial orders and monotone maps

    My student Michael Arntzenius has been looking at higher-order database query languages, and one of the ideas we have been working with (in the language Datafun), has been inspired by database query languages and dataflow analysis, where recursive queries are formulated as fixed points of monotone functions on partial orders.

  • Ultrametric spaces and non-expansive maps

    In joint work with Nick Benton, we used the category of (wait for it) complete 1-bounded bisected ultrametric spaces and non-expansive maps to model reactive computations.

    Our goal was to model reactive programs, which have the property of causality. That is, an interactive system can only produce an output at time \(n\) from the first \(n\) inputs -- it can't look into the future to make an action.

    We began with this standard definition of causality, which was defined on streams, and then -- after reading the classic paper of Alpern and Schneider Defining Liveness -- we noticed that it corresponded to nonexpansiveness when streams are given the Cantor metric (i.e., the distance between two streams is \(2^{-n}\) when they first differ at the \(n\)-th position). This led us to look at categories of metric spaces, and we were able to adapt some work of Lars Birkedal and his collaborators to devise a new language for reactive programming.

One point I want to make is that while this is not a mechanical process, it is also "ordinary mathematics". In the last two cases, we had to try very hard to understand the problems we were trying to solve, but we didn't need vast amounts of categorical machinery. We really only needed elementary definitions from category theory, and we needed it because it was the technology that let us recognise when we had actually found a nice algebraic solution to our problem.