## Friday, August 10, 2018

### Polarity and bidirectional typechecking

This past July, Max New asked me about the relationship between bidirectional typechecking and semantic notions like polarization and call-by-push-value. I told him it was complicated, and that I would try to write something about the relationship. I was reminded of this since a couple of days ago, Conor McBride wrote a blog post where he laid out how he approaches bidirectional type systems. This is a fantastic post, and you should read it (probably twice or more, actually).

In his post, he remarks:

I like my type systems to be bidirectional. Different people mean different things by the word “bidirectional”, and I’m one of them. Some people relate the notion to “polarity”, and I’m not one of them.

I'm someone who would like to join the club Conor abjures, but I don't know how to!

It sure looks like bidirectional typechecking has a deep relationship to polarization or call-by-push-value, but I don't know how to make it work correctly. So, in this post is to explain why people think it does some deep semantic import, and then I'll talk about the mismatch that creates a problem I don't know the right way to handle. (Also, I apologize in advance for the lack of citations -- I really want to get this post out the door. If I didn't cite a paper you like (or even wrote), please link to it in the comments.)

The reason that people think that bidirectional typechecking must have a deep semantic meaning arises from how it works out in the simply-typed lambda calculus. Let's try writing some things down and seeing what happens. First, recall that bidirectional typechecking categorizes all terms as either "checking" or "synthesizing" and introduces two mutually recursive typing judgements for them.


$\begin{array}{llcl} \mbox{Types} & A & ::= & b \bnfalt A \to B \\ \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x:A \\ \mbox{Checking} & e & ::= & \lam{x}{e} \bnfalt t \\ \mbox{Synthesizing} & t & ::= & x \bnfalt t\;e \bnfalt e : A \\ \end{array}$

We'll start with a pretty minimal calculus -- we've got a base type $$b$$, and functions. Contexts work as usual, giving variables their types, but terms are divided into checking terms and synthesizing terms. Checking terms are either introduction forms (just lambda-abstraction, here) or any synthesizing term (the intuition is that if we can infer a type for a term, we can surely check its type as well). Synthesizing terms are either variables (we can just look up their type in the context) applications, or explicitly annotated terms.

$\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ \synth{\Gamma}{t}{A} & A = B } { \check{\Gamma}{t}{B} } & \rule{ \check{\Gamma}{e}{A} } { \synth{\Gamma}{e : A}{A} } \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}$

You can see the intro/elim pattern in the first line -- when we check a lambda-abstraction (an introduction form) against a type $$A \to B$$, we put the variable $$x$$ in the context at type $$A$$, and check the body at type $$B$$. When we apply a function, we first infer a type $$A \to B$$ for the function expression, which gives us a type $$A$$ to check the argument against.

So far, this is pretty standard stuff. Now, let's tweak the rules slightly.

$\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ \color{red}{\synth{\Gamma}{t}{b}} } { \color{red}{\check{\Gamma}{t}{b}} } & \color{red}{\mbox{(No annotation rule)}} \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}$

Now, we've made two changes. First, we've deleted the annotation rule, and second, we've restricted the checking/synthesis switch to only occur at base types $$b$$.

1. The effect of deleting the annotation rule is that it is not longer possible to write beta-reducible terms. A term in the simply-typed lambda-calculus written $$(\lam{x:b}{x})\;y$$ which reduces to $$y$$ would be written $$(\lam{x}{x} : b \to b)\;y$$ in a bidirectional system -- but without annotations these terms can no longer be written.

2. The effect of restricting the check/synth switch rule is that it is no longer possible to write eta-expandable terms. If $$f : b \to b \to b$$ and $$x : b$$, then the term $$f\;x$$ would synthesize the type $$b \to b$$ in the original system. However, it no longer typechecks in our restricted system. To make it work, we have to eta-expand the term, so that we write $$\lam{y}{f\;x\;y}$$ instead. This now checks against $$b \to b$$ as we expect.

So the joint-effect of these two restrictions is that only beta-normal, eta-long terms typecheck. The reason these terms are so important is that any two beta-eta equal terms will have the same normal form. So having a simple, easy characterization of these normal forms is really great! Moreover, this characterization is easy to extend to products:

$\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ } { \check{\Gamma}{()}{1} } & \mbox{(No unit elimination rule)} \\[1em] \rule{ \check{\Gamma}{e_1}{A_1} & \check{\Gamma}{e_2}{A_2} } { \check{\Gamma}{(e_1, e_2)}{A_1 \times A_2} } & \rule{ \synth{\Gamma}{e}{A_1 \times A_2} & i \in \{1,2\} } { \synth{\Gamma}{\pi_i(e)}{A_i} } \\[1em] \rule{ \synth{\Gamma}{t}{b} } { \check{\Gamma}{t}{b} } & \mbox{(No annotation rule)} \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}$

This type system still characterizes normal forms in the STLC with units and products. Adding these constructors starts to give us a pattern:

1. Introduction forms (lambda-abstractions, pairs, units) are checking.
2. Elimination forms (applications, projections) are synthesizing.

Since units + pairs + functions are syntax for everything in cartesian closed categories, this is actually pretty wonderful. We seem to have a simple rule for characterizing beta-normal, eta-long forms.

But what happens when we try to add sums to the language? Let's try to follow our recipe, and see what happens:

$\begin{matrix} \rule{ \check{\Gamma}{e}{A_i} & i \in \{1,2\} } { \check{\Gamma}{\inj{i}{e}}{A_1 + A_2} } & \rule{ \synth{\Gamma}{t}{A_1 + A_2} & \check{\Gamma, x_1:A_1}{e_1}{C} & \check{\Gamma, x_2:A_2}{e_2}{C} } { \check{\Gamma}{\case{t}{x_1}{e_1}{x_2}{e_2}}{C} } \end{matrix}$

The introduction form seems to work. The elimination form is a bit more complicated -- it's the same syntax as always, but the checking/synthesis moding is a bit subtle. The expectation created by units/pairs/functions would be that both the scrutinee and the whole case form should synthesize, but expecting two branches with different contexts (i.e., $$\Gamma, x_1:A_1$$ for $$e_1$$ and $$\Gamma, x_2:A_2$$ for $$e_2$$) to synthesize the same type is a morally dubious expectation (eg, it would not make sense in a dependently-typed setting). So we are led to say that case is checking, but that the scrutinee is synthesizing.

This imposes some restrictions on what does and doesn't count as a typeable term. For example, because $$\mathsf{case}$$ is checking rather than synthesizing, we can never write an expression like:

$a:((b \to A) + (b \to A)), x:b \vdash \case{a}{f}{f}{g}{g}\;x \From A$

Instead of applying an argument to a case expression of function type, we have to push the arguments into the branches:

$a:((b \to A) + (b \to A)), x:b \vdash \case{a}{f}{f\;x}{g}{g\;x} \From A$

From the point of view of typing normal forms, this actually doesn't seem too bad, because most people would consider the second term simpler than the first, and so this gives us a "nicer" notion of normal form. However, this doesn't seem like a real explanation, since our rules permit things like the following:

$\synth{f : b \to b, x:b+b}{f\;\case{x}{y}{y}{z}{z}}{b}$

To get to a better explanation before the heat death of the universe, I'm going to skip over about 20 years of research, and jump straight to polarized type theory.

$\begin{matrix} \mbox{Positive Types} & P & ::= & 1 \bnfalt P \times Q \bnfalt P + Q \bnfalt \downshift{N} \\ \mbox{Negative Types} & N & ::= & P \to N \bnfalt \upshift{P} \\ \mbox{Values} & v & ::= & () \bnfalt (v,v) \bnfalt \inj{i}{v} \bnfalt \thunk{t} \\ \mbox{Spines} & s & ::= & \cdot \bnfalt v\;s \\[1em] \mbox{Terms} & t & ::= & \return{v} \bnfalt \fun{\overrightarrow{\arm{p_i}{t_i}}} \bnfalt \match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}} \\ \mbox{Patterns} & p & ::= & () \bnfalt (p,p') \bnfalt \inj{i}{p} \bnfalt \thunk{x} \\[1em] \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, x:N \\ \mbox{Typing Values} & & & \check{\Gamma}{v}{P} \\ \mbox{Typing Spines} & & & \spine{\Gamma}{s}{N}{M} \\ \mbox{Typing Terms} & & & \checkn{\Gamma}{t}{N} \\ \mbox{Typing Patterns} & & & \checkp{p:P}{\Delta} \\ \end{matrix}$

The key idea in polarized type theory is to divide types into two categories, the positive types (sums, strict products, and suspended computations, denoted by $$P$$) and the negative types (basically, functions, denoted by $$N$$). Positive types are basically those that are eliminated with pattern matching, and the negative types are the ones that are eliminated by supplying arguments. Negatives types can be embedded into positive types using the "downshift" type $$\downshift{N}$$ (representing suspended computations) and positive types can be embedded into the negatives using the "upshift" $$\upshift{P}$$ (denoting computations producing $$P$$'s).

The funny thing about this setup is that despite arising from meditation upon invariants of proof theory, we end up with a syntax that is much closer to practical functional programming languages than the pure typed lambda calculus! For example, syntax for polarized calculi tends to have pattern matching. However, one price of this is a proliferation of judgements. We usually end up introducing separate categories of values (for introducing positive types) and spines (argument lists for elminating negative types), as well as terms (how to put values and spines together in computations, as well as introducing negative types) and patterns (how to eliminate positive types).

Now, let's talk through the rules. First up is the $$\check{\Gamma}{v}{P}$$ judgement for checking the type of positive values.

$\begin{matrix} \rule{} { \check{\Gamma}{()}{1} } & \rule{ \check{\Gamma}{v}{P} & \check{\Gamma}{v'}{Q} } { \check{\Gamma}{(v,v')}{P \times Q} } \\[1em] \rule{ \check{\Gamma}{v}{P_i} & i \in \{1,2\} } { \check{\Gamma}{\inj{i}{v}}{P_1 + P_2} } & \rule{ \checkn{\Gamma}{t}{N} } { \check{\Gamma}{\thunk{t}}{\downshift{N}} } \end{matrix}$

The rules for units, pairs and sums are the same as always. The rule for downshift says that if a term $$t$$ checks at a negative type $$N$$, then the thunked term $$\thunk{t}$$ will check against the downshifted type $$\downshift{N}$$.

We'll see the rules for terms in a bit, but next will come the rules for spines, in the judgement $$\spine{\Gamma}{s}{N}{M}$$. This judgement says that if the spine $$s$$ is applied to a head of type $$N$$, it will produce a result of type $$M$$. In this judgement, the type $$N$$ is an algorithmic input, and the type $$M$$ is an output.

$\begin{matrix} \rule{ } { \spine{\Gamma}{\cdot}{N}{N} } \qquad \rule{ \check{\Gamma}{v}{P} & \spine{\Gamma}{s}{N}{M} } { \spine{\Gamma}{v\;s}{P \to N}{M} } % \\[1em] % \rule{ \forall i < n.\; & \checkp{p_i:P}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{M} } % { \spine{\Gamma}{\match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{\upshift{P}}{M} } \end{matrix}$

The first rule says that if you have an empty argument list then the result is the same as the input, and the second rule says that if $$v$$ is a value of type $$P$$, and $$s$$ is an argument list sending $$N$$ to $$M$$, then the extended argument list $$v\;s$$ sends the function type $$P \to N$$ to $$M$$.

With values and spines in hand, we can talk about terms, in the term typing judgement $$\checkn{\Gamma}{t}{N}$$, which checks that a term $$t$$ has the type $$N$$.

$\begin{matrix} \rule{ \check{\Gamma}{v}{P} } { \checkn{\Gamma}{\return{v}}{\upshift{P}} } \qquad \rule{ \forall i < n.\; & \checkp{p_i:P}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{N} } { \checkn{\Gamma}{\fun{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{P \to N} } \\[1em] \rule{ x:M \in \Gamma & \spine{\Gamma}{s}{M}{\upshift{Q}} & \forall i < n.\; \checkp{p_i:Q}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{\upshift{P}} } { \checkn{\Gamma}{\match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{\upshift{P}} } \end{matrix}$

The rule for $$\return{v}$$ says that we can embed a value $$v$$ of type $$P$$ into the upshift type $$\upshift{P}$$ by immediately returning it. Lambda abstractions are pattern-style -- instead of a lambda binder $$\lam{x}{t}$$, we give a whole list of patterns and branches $$\fun{\overrightarrow{\arm{p_i}{t_i}}}$$ to check at the type $$P \to N$$. As a result, we need a judgement $$\checkp{p_i:P}{\Delta_i}$$ which gives the types of the bindings $$\Delta_i$$ of the pattern $$p_i$$, and then we check each $$t_i$$ against the result type $$N$$.

The match statement $$\match{x\cdot s}{\overrightarrow{\arm{p_i}{t_i}}}$$ also has similar issues in its typing rule. First, it finds a variable in the context, applies some arguments to it to find a value result of type $$\upshift{Q}$$, and then pattern matches against type $$Q$$. So we check that the spine $$s$$ sends $$M$$ to the type $$\upshift{Q}$$, and then check that the patterns $$p_i$$ yield variables $$\Delta_i$$ at the type $$Q$$, and then check the $$t_i$$ against the type $$\upshift{P}$$.

Restricting the type at which we can match forces us to eta-expand terms of function type. Also, these rules omit a side-condition for pattern coverage. (I have an old blog post about how to do that if you are curious.)

Both lambda-abstraction and application/pattern-matching need the judgement $$\checkp{p:P}{\Delta}$$ to find the types of the bindings. The rules for these are straightforward: $\begin{matrix} \rule{ } { \checkp{\thunk{x} {:} \downshift{\!N}}{x:N} } & \rule{ } { \checkp{\unit : 1}{\cdot} } \\[1em] \rule{ \checkp{p_1 : P_1}{\Delta_1} & \checkp{p_2 : P_2}{\Delta_2} } { \checkp{(p_1,p_2) : P_1 \times P_2}{\Delta_1, \Delta_2} } & \rule{ \checkp{p:P_i}{\Delta} & i \in \{1,2\} } { \checkp{\inj{i}{p} : P_1 + P_2}{\Delta} } \end{matrix}$

Units yield no variables at type $$1$$, pair patterns $$(p_1, p_2)$$ return the variables of each component, coproduct injections $$\inj{i}{p}$$ return the variables of the sub-pattern $$p$$, and thunk patterns $$\thunk{x}$$ at type $$\downshift{N}$$ return that variable $$x$$ at type $$N$$.

At this point, it sure looks like we have a perfect bidirectional type system for a polarized calculus. What's the problem? The problem is that I palmed a card! Here's the relevant bit of the grammar I kicked things off with: $\begin{matrix} \ldots \\ \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, \color{red}{x:N} \\ \ldots \end{matrix}$

The context $$\Gamma$$ has been restricted to only contain variables of negative type. It doesn't allow variables of positive type! And, I don't know how to add it in the "right" way. If we wanted positive variables (in fact, call-by-push-value only has positive variables), we could add them in the following way:

$\begin{matrix} \mbox{Values} & v & ::= & () \bnfalt (v,v) \bnfalt \inj{i}{v} \bnfalt \thunk{t} \bnfalt \color{red}{u} \\ \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, x:N \bnfalt \Gamma, \color{red}{u:P} \\ \mbox{Patterns} & p & ::= & () \bnfalt (p,p') \bnfalt \inj{i}{p} \bnfalt \thunk{x} \bnfalt \color{red}{u} \\[1em] \end{matrix}$ So we add value variables $$u$$ to the syntax of values, and so we have to also add them to contexts, and also extend pattern matching with them to bind values. Then, the rules for these things would look like the following: $\begin{matrix} \rule{ } {\checkp{u:P}{u:P}} & \rule{ u:Q \in \Gamma & Q = P } { \check{\Gamma}{u}{P} } \end{matrix}$

So a variable pattern at value type simply binds the variable at that type, and when we use a value variable we have the check that the type in the context matches the type that we're checking the term at.

And that's the wrong thing to do! The bidirectional recipe says that we should check equality of types only when we switch between checking and synthesis, and so while this rule might or might not work, it's clearly not arranged the information flow properly, since we have a random-looking subsumption test in the value variable rule.

Some additional idea is needed, and I'm not sure what, yet.

1. This seems like a cut-off sentence (or paragraph): "When we apply a term"

2. Thanks, fixed:

"When we apply a term"

-->

"When we apply a function, we first *infer* a type $A \to B$ for the function expression, which gives us a type $A$ to *check* the argument against."

3. "The bidirectional recipe says that we should check equality of types only when we switch between checking and synthesis."

Can you explain why that should be such a stringent requirement?

From your post the purpose of a bidirectional type system is to only type beta-short eta-long normal forms, and I would guess that introducing positive variables does not break that.

Another feature of bidirectional typing rules is that they remove choice points in typing rules without needing annotated terms: like the domain type in the application rule. Introducing "u" does not seem to break that either.

4. Another, possibly Immortal (not sure yet), variant of bidirectional typechecking is where at negative types, you NEVER infer. Neutrals of negative type appear only in checking mode...

I haven't fully worked out the details of this approach, but it seems to give a more natural account of subtyping which automatically justifies eta expansion and contraction. I'll let you know when I have more.

I think what we're seeing is that there's something closely connected to polarity in bidirectional type checking, but we shouldn't be adopt dogmatism about what bidirectional type checking means. In my opinion, it should be *determined* by the logical structure which we observe from proof theoretic concerns, rather than being some immovable thing which we struggle to make fit with other notions. Dogmatism about these things, however useful in the short term, is a funnel into weakness.

- Jon Sterling

1. How would the application rule look like then?

5. Andrea: Part of the point is that I can't give a good explanation!

1. Bidirectional typechecking is not a single thing -- it is a set of heuristics about how to design type systems, for which people have varying intuitions.
2. There seems to be a close connection between polarization and bidirectional typechecking.
3. However, the connection is not tight enough for me to feel confident in saying "bidirectional typechecking is polarization" or anything even remotely close to that.
4. For example, in a polarized type theory, you carefully restrict things to only permit negative variables. However, the syntax of call-by-push-value, which has exactly the same type structure as polarization, carefully restricts things so you have only positive variables.
5. So the semantics of types isn't enough to tell us what the syntax of the bidirectional inference rules should be. This is annoying, but in retrospect it shouldn't be a huge surprise, since polarization
can be seen as either a refinement of Moggi's monadic metalanguage or the exponential of linear logic, and neither of those are canonical (ie, you can have two different, non-isomorphic monads or bangs in your language).
6. As a result, we have to try and figure things out from purely proof-theoretic considerations, which is always a harder row to hoe than when you can use both proof theory and semantics to constrain the search space.
7. This means that we have to find the "right" set of metatheorems we want to hold for a bidirectional type system to assess candidate type systems.
8. We don't have any consensus for that set of metatheorems.
9. As a result, bidirectional typechecking is not a single thing -- it is a set of heuristics about how to design type systems, for which people have varying intuitions.

As a result, I strongly encourage you to design type systems that I don't like. If you can prove theorems that I do like about them, then perhaps I will change my mind. :)

1. I see. I guess even if the rule for positive variables does not break relevant metatheorems it would already depart from polarized type systems.

By the way, to clarify things to myself, for polarization to be a semantic notion you need semantics that distinguish between computations and values, right?

6. @Jon: immortal?

1. What I mean by that word is something which will stand the test of time, not in the sense of being used permanently, but in the sense of being distinguished as a "nodal point" in the design space, a moment of local harmony between theory and practice.

7. "So a variable pattern at value type simply binds the variable at that type, and when we use a value variable we have the check that the type in the context matches the type that we're checking the term at.

"And that's the wrong thing to do! The bidirectional recipe says that we should check equality of types only when we switch between checking and synthesis..."

Matt Oliveri: But wait, let me see if I understand.

Variables' types are always synthesized, since they're looked up in the context. So you're switching between checking and synthesis in that rule. Are you saying there should've been a separate rule just for the switch, as usual? But isn't that actually the only rule with a direction switch? It lets you avoid eta expanding at positive types. (You can pass along a value without matching against it.)

8. It should be
P,Q ::= ..
P ::= ..
right?

Also excellent post.

1. Yes, that's right.

9. Correct me if I'm wrong, but I think the fact that CBPV only has positive vars and this polarized type theory has only negative vars is not a semantic difference, but only a difference in how much focussing discipline the two calculi enforce.

I.e. it looks like we can translate a judgment x:N |> t <= N' in this polarized calculus to a judgment x:UN |- t : N' in CBPV. We never have positive variables in the fully focussed polarized world because when we have a function type P -> N we cannot simply pop the argument off of the stack into the current environment, we have to immediately deeply pattern match on it, whereas in CBPV we have the rule

G, x : P |- M : N
-------------------------
G |- lambda x. M : P -> N

Need to look in more detail to see how this would affect the bidirectional stuff, though.

10. Hi Max, as far as I can tell, it's not a semantic difference. However, in order to make progress on sorting out what bidirectional typechecking actually is, and how it relates to polarization and call-by-push-value, we need to make it into a semantic difference!

Eg, it's folklore that for MLL, you can invert the checking and synthesis rules (so lambdas synthesise and applications check). This fact will remain merely a curiosity until we have a semantics for bidirectional typechecking.

Luckily, Google tells me that Andreas Abel and Christian Sattler have just released a draft, Normalization by Evaluation for Call-by-Push-Value and Polarized Lambda-Calculus, which seems likely to help!