## Tuesday, May 7, 2013

### Strong Normalization without Logical Relations

Proof theorists often observe that linear logic is like regular logic, only everything works better.

In this post, I'll give a very striking instance of this maxim, by taking second-order multiplicative-additive linear logic (MALL) and showing how to prove strong normalization for it, without using a logical relation. That is, there is a purely inductive argument that a powerful second-order logic normalizes! Girard originally proved this for classical linear logic, using proof nets (the "small normalization theorem", Corollary 4.22, page 71 of the original Linear Logic paper), but it turns out that a variant works for lambda-terms and intuitionistic linear logic as well. $$\newcommand{\lolli}{\multimap} \newcommand{\tensor}{\otimes} \newcommand{\with}{\&} \newcommand{\all}[2]{\forall {#1}.\;{#2}} \newcommand{\fix}[2]{\mu {#1}.\;{#2}} \newcommand{\letv}[3]{\mathsf{let}\,{#1} = {#2}\;\mathsf{in}\;{#3}} \newcommand{\Fun}[2]{\Lambda {#1}.\;{#2}} \newcommand{\fun}[2]{\lambda {#1}.\;{#2}} \newcommand{\unit}{\left(\right)} \newcommand{\letunit}[2]{\letv{\unit}{#1}{#2}} \newcommand{\pair}[2]{\left({#1},{#2}\right)} \newcommand{\letpair}[4]{\letv{\pair{#1}{#2}}{#3}{#4}} \newcommand{\unita}{\left[\right]} \newcommand{\paira}[2]{\left[{#1}, {#2}\right]} \newcommand{\fst}[1]{\mathsf{fst}\,{#1}} \newcommand{\snd}[1]{\mathsf{snd}\,{#1}} \newcommand{\inl}[1]{\mathsf{inl}\,{#1}} \newcommand{\inr}[1]{\mathsf{inr}\,{#1}} \newcommand{\case}[5]{\mathsf{case}({#1}, \inl{#2} \to {#3}, \inr{#4} \to {#5})} \newcommand{\abort}[1]{\mathsf{abort}\,{#1}} \newcommand{\fold}[1]{\mathsf{in}\,{#1}} \newcommand{\unfold}[1]{\mathsf{out}\,{#1}} \newcommand{\judge}[4]{{#1};{#2} \vdash {#3} : {#4}} \newcommand{\judgetp}[2]{{#1} \vdash {#2} : \mathsf{type}} \newcommand{\judgectx}[2]{{#1} \vdash {#2} : \mathsf{ctx}} %% Rule names \newcommand{\rulename}[3]{{#2}{\mathrm{#3}}_{\mathrm{#1}}} \newcommand{\intro}[2][]{\rulename{#1}{#2}{I}} \newcommand{\elim}[2][]{\rulename{#1}{#2}{E}} \newcommand{\Var}{\rulename{}{}{Var}} \newcommand{\AllI}{\intro{\forall}} \newcommand{\AllE}{\elim{\forall}} \newcommand{\LolliI}{\intro{\lolli}} \newcommand{\LolliE}{\elim{\lolli}} \newcommand{\UnitI}{\intro{1}} \newcommand{\UnitE}{\elim{1}} \newcommand{\TensorI}{\intro{\tensor}} \newcommand{\TensorE}{\elim{\tensor}} \newcommand{\TopI}{\intro{\top}} \newcommand{\TopE}{\elim{\top}} \newcommand{\WithI}{\intro{\with}} \newcommand{\WithEFst}{\elim[fst]{\with}} \newcommand{\WithESnd}{\elim[snd]{\with}} \newcommand{\ZeroI}{\intro{0}} \newcommand{\ZeroE}{\elim{0}} \newcommand{\SumIInl}{\intro[inl]{\oplus}} \newcommand{\SumIInr}{\intro[inr]{\oplus}} \newcommand{\SumE}{\elim{\oplus}} \newcommand{\MuI}{\intro{\mu}} \newcommand{\MuE}{\elim{\mu}} \newcommand{\size}[1]{\left|#1\right|} \newcommand{\inferrule}[3][]{\frac{#2}{#3}\;{#1}}$$

First, let's recall the type system for second order MALL. $$\begin{array}{llcl} \mbox{Types} & A & ::= & \all{\alpha}{A} \bnfalt \alpha \bnfalt A \lolli B \bnfalt 1 \bnfalt A \tensor B \bnfalt \top \bnfalt A \with B \bnfalt 0 \bnfalt A \oplus B \\[1em] \mbox{Type Contexts} & \Delta & ::= & \cdot \bnfalt \Delta, \alpha \\[1em] \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x:A \\[1em] \mbox{Terms} & e & ::= & x \bnfalt \Fun{\alpha}{e} \bnfalt e\;A \bnfalt \fun{x:A}{e} \bnfalt e\;e \\ & & | & \unit \bnfalt \letunit{e}{e'} \bnfalt \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'}\\ & & | & \unita \bnfalt \paira{e}{e'} \bnfalt \fst{e} \bnfalt \snd{e} \\ & & | & \inl{e} \bnfalt \inr{e} \bnfalt \case{e}{x}{e'}{x}{e''} \bnfalt \abort{e} \end{array}$$

The simple types are linear functions $A \lolli B$, tensor products $A \tensor B$, Cartesian products $A \with B$, and coproducts $A \oplus B$. We also have quantified types $\all{\alpha}{A}$ and variable references $\alpha$. We have two kinds of contexts. $\Delta$ is the context of type variables, and $\Gamma$ gives types to term variables. Below, we give the judgement form for the well-formedness of types, $\judgetp{\Delta}{A}$, which says that $A$'s free variables all lie in $\Delta$. $$\boxed{\judgetp{\Delta}{A}} \\ \inferrule[] {\alpha \in \Delta} {\judgetp{\Delta}{\alpha}} \\ \begin{array}{ll} \inferrule[] {\judgetp{\Delta, \alpha}{A}} {\judgetp{\Delta}{\all{\alpha}{A}}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \lolli B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{1}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \tensor B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{\top}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \with B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{0}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \oplus B}} \end{array}$$

Once we know how to assert that types are well-formed, we can lift this to a well-formedness judgement for contexts $\judgectx{\Delta}{\Gamma}$. $$\boxed{\judgectx{\Delta}{\Gamma}} \\ \inferrule[] { } {\judgectx{\Delta}{\cdot}} \qquad \inferrule[] {\judgectx{\Delta}{\Gamma} \\ \judgetp{\Delta}{A}} {\judgectx{\Delta}{\Gamma, x:A}}$$

Now, we can give the typing rules for the terms of second-order MALL. The typing judgement $\judge{\Delta}{\Gamma}{e}{A}$ says that assuming that the free type variables are in $\Delta$, andall the term variables have types given by $\Gamma$, then the term $e$ has the type $A$.

Variable references $x$ are just looked up in the environment by the rule $\Var$. Note that the context contains only a single binding for the variable $x$. This is demanded by linearity, since if there were other variables in the context, the expression $x$ could not use them!

Type abstractions are introduced with the $\AllI$ rule, and simply typecheck the body of $\Fun{\alpha}{e}$ in a context extended by $\alpha$. A type application checked by $\AllE$ simply substitutes a type for a quantified variable. The linear functions have the usual introduction rule in $\LolliI$, and the application rule $\LolliE$ checks that the function has type $A \lolli B$ and the argument has type $A$. Note that the elimination rule splits the context into two parts, with one set of variables being used by the function and the other by the argument --- this enforces the linear use of variables.

We introduce units with the $\UnitI$ rule --- it requires no resources to create a unit, since it has no free variables in it. However, eliminating a unit in $\UnitE$ does divide the unit, since we may produce a unit-typed value from a term like $f x : 1$, where $f : P \lolli 1$ and $x : P$. Tensor products are introduced with the $\TensorI$ rule, which splits the variables in the context between the two sides of the pair $\pair{e}{e'}$. Likewise, the elimination form $\TensorE$ uses the variables in $\Gamma$ to produce a term of type $A \tensor B$, and then binds the variables $x:A$ and $y:B$ to typecheck the continuation expression $e'$.

We can contrast tensor products $A \tensor B$ with the Cartesian product $A \with B$. The introduction rule $\WithI$ reuses the same context $\Gamma$ when typechecking the two branches of $\paira{e}{e'}$. As a result, there are two projective eliminations $\WithEFst$ and $\WithESnd$, each of which consumes a pair $A \with B$ to produce either an $A$, or a $B$. Intuitively, a Cartesian pair is lazy, and the projections $\fst{e}$ and $\snd{e}$ let the programmer choose which of the two possibilities they want. In contrast, the tensor product $A \tensor B$ is strict, and so you get access to both components (at the price of not allowing the two sides to share variables).

Finally, we also have the coproduct $A \oplus B$. This works much as it does in the intuitionistic case: $\SumIInl$ takes an $A$ and gives an $A \oplus B$, and $\SumIInr$ does the same with a $B$. The case statement takes an expression $e$ of type $A \oplus B$, and then branches to $e'$ or $e''$ depending on whether or not it is the left or right branch. Note that $e$ has a disjoint set of variables $\Gamma$ from the case branches, but that $e'$ and $e''$ share the same set of variables $\Gamma''$ since only one of them will run. $$\boxed{\judge{\Delta}{\Gamma}{e}{A}} \\ \inferrule[\Var] { } {\judge{\Delta}{x:A}{x}{A}} \\ \begin{array}{cc} \inferrule[\AllI] {\judge{\Delta, \alpha}{\Gamma}{e}{A}} {\judge{\Delta}{\Gamma}{\Fun{\alpha}{e}}{\all{\alpha}{A}}} & \inferrule[\AllE] {\judge{\Delta}{\Gamma}{e}{\all{\alpha}{B}} \\ \judgetp{\Delta}{A}} {\judge{\Delta}{\Gamma}{e\;A}{[A/\alpha]B}} \\[2em] \inferrule[\LolliI] {\judge{\Delta}{\Gamma, x:A}{e}{B}} {\judge{\Delta}{\Gamma}{\fun{x:A}{e}}{A \to B}} & \inferrule[\LolliE] {\judge{\Delta}{\Gamma}{e}{A \to B} \\ \judge{\Delta}{\Gamma'}{e'}{A}} {\judge{\Delta}{\Gamma,\Gamma'}{e\;e'}{B}} \\[2em] \inferrule[\UnitI] { } {\judge{\Delta}{\cdot}{\unit}{1}} & \inferrule[\UnitE] {\judge{\Delta}{\Gamma}{e}{1} \\ \judge{\Delta}{\Gamma'}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letunit{e}{e'}}{C}} \\[2em] \inferrule[\TensorI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma'}{e'}{B} } {\judge{\Delta}{\Gamma,\Gamma'}{\pair{e}{e'}}{A \tensor B}} & \inferrule[\TensorE] {\judge{\Delta}{\Gamma}{e}{A \tensor B} \\\\ \judge{\Delta}{\Gamma', x:A, y:B}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letpair{x}{y}{e}{e'}}{C}} \\[2em] \inferrule[\TopI] { } {\judge{\Delta}{\Gamma}{\unita}{\top}} & \\[1em] \inferrule[\WithI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma}{e'}{B} } {\judge{\Delta}{\Gamma}{\paira{e}{e'}}{A \with B}} & \begin{array}{l} \inferrule[\WithEFst] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\fst{e}}{A}} \\[1em] \inferrule[\WithESnd] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\snd{e}}{B}} \end{array} \\[3em] \begin{array}{l} \inferrule[\SumIInl] {\judge{\Delta}{\Gamma}{e}{A }} {\judge{\Delta}{\Gamma}{\inl{e}}{A \oplus B}} \\[1em] \inferrule[\SumIInr] {\judge{\Delta}{\Gamma}{e}{ B}} {\judge{\Delta}{\Gamma}{\inr{e}}{A \oplus B}} \end{array} & \begin{array}{l} \inferrule[\SumE] {\judge{\Delta}{\Gamma}{e}{A \oplus B} \\\\ \judge{\Delta}{\Gamma', x:A}{e'}{C} \\\\ \judge{\Delta}{\Gamma', y:B}{e''}{C} } {\judge{\Delta}{\Gamma, \Gamma'}{\case{e}{x}{e'}{y}{e''}}{C}} \end{array} \\[3em] & \inferrule[\ZeroE] {\judge{\Delta}{\Gamma}{e}{0}} {\judge{\Delta}{\Gamma}{\abort{e}}{C}} \end{array}$$

Now, we can define a size function $\size{e}$ on lambda-terms. $$\begin{array}{lcl} \size{x} & = & 0 \\ \size{\all{\alpha}{e}} & = & \size{e} \\ \size{e\;A} & = & 1 + \size{e} \\ \size{\fun{x:A}{e}} & = & \size{e} \\ \size{e\;e'} & = & 1 + \size{e} + \size{e'} \\ \size{\unit} & = & 1 \\ \size{\letv{\unit}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\pair{e}{e'}} & = & 1 + \size{e} + \size{e'} \\ \size{\letv{\pair{x}{y}}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\unita} & = & 0\\ \size{\paira{e}{e'}} & = & \max(\size{e}, \size{e'}) \\ \size{\fst{e}} & = & 1 + \size{e} \\ \size{\snd{e}} & = & 1 + \size{e} \\ \size{\inl{e}} & = & 1 + \size{e} \\ \size{\inr{e}} & = & 1 + \size{e} \\ \size{\case{e}{x}{e'}{y}{e''}} & = & \size{e} + \max(\size{e'}, \size{e''}) \\ \end{array}$$

This function basically just counts the size of the term, adding an extra one for either the introduction or elimination for each connective. (I followed the pattern of adding 1 to the introduction rule for positive connectives, and 1 to the elimination for negative connectives, but really it doesn't seem to matter.)

Now, we can prove a lemma showing how sizes interact with substitution. Namely, linearity means that the size can never get larger when you substitute a term for a variable.

Size-respecting substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $\judge{\Delta}{\Gamma', x:A}{e'}{C}$. Then $\size{[e/x]e'} \leq \size{e} + \size{e'}$.
Proof This follows by a structural induction on the derivation of $e'$. Rather than giving the full proof, I'll just make the observations needed to prove this.

In the $\WithI$ and $\SumE$ cases, we need the following fact: $$\max(a + x, b + x) = \max(a,b) + x$$ This is because $e$ can be substituted into multiple branches in these two rules.

In other cases, such as $\TensorI$, we have two subterms of $e'$ (so $e' = \pair{e'_1}{e'_2}$). Then typing indicates that $x$ can occur in only one of the branches. As a result, $[e/x]\pair{e'_1}{e'_2}$ will equal either $\pair{[e/x]e'_1}{e'_2}$ or $\pair{e'_1}{[e/x]e'_2}$, and then the size will be less than or equal to $1 + \size{e} + \size{e'_1} + \size{e'_2} = \size{e} + \size{e'}$.

The less than or equal to arises because of the presence of the unit $\top$, which ignores variables in its proof term $\unita$. In its absence substitution would preserve size exactly.

We also need to prove the same thing for type substitution.

Size-respecting type substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta, \alpha}{\Gamma}{e}{C}$ and $\judgetp{\Delta}{A}$. Then $\size{[A/\alpha]e} = \size{e}$.
Proof This is a routine induction, with no surprises, since types never affect the size of a term.
Next, let's write down the beta reductions: $$\begin{array}{lcl} (\Fun{\alpha}{e})\;A & \mapsto & [A/\alpha]e \\ (\fun{x:A}{e})\;e' & \mapsto & [e'/x]e \\ \letv{\unit}{\unit}{e'} & \mapsto & e' \\ \letv{\pair{x}{y}}{\pair{e_1}{e_2}}{e'} & \mapsto & [e_1/x, e_2/y]e' \\ \fst{\paira{e}{e'}} & \mapsto & e \\ \snd{\paira{e}{e'}} & \mapsto & e' \\ \case{\inl{e}}{x}{e'}{y}{e''} & \mapsto & [e/x]e' \\ \case{\inr{e}}{x}{e'}{y}{e''} & \mapsto & [e/y]e'' \\ \end{array}$$

Now, we can verify that $\beta$-reduction reduces size:

Shrinking. If $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $e\;\mapsto e'$, then $\size{e'} < \size{e}$.
Proof The proof of this is essentially nothing but the observation that every $\beta$-reduction removes an introduction, removes an elimination, and performs a subsitution, and so the size of the result term must get strictly smaller.

As a result, we know that all reduction strategies lead to a normal form. Since $\beta$-reduction makes a term strictly smaller, the process must eventually stop, no matter what order you perform the reductions in. So we (almost) have strong normalization. The missing piece is that we also need the Church-Rosser property, to ensure that all sequences of reductions eventually reach the same normal form. But this post is getting quite long enough, so I won't prove that -- I'll just assert it. :)

I will, however, offer one last reward for those of you who persevered to the end. Let's add unrestricted recursive types to our language: $$\inferrule[\MuI] {\judge{\Delta}{\Gamma}{e}{[\fix{\alpha}{A}/\alpha]A}} {\judge{\Delta}{\Gamma}{\fold{e}}{\fix{\alpha}{A}}} \qquad \inferrule[\MuE] {\judge{\Delta}{\Gamma}{e}{\fix{\alpha}{A}}} {\judge{\Delta}{\Gamma}{\unfold{e}}{[\fix{\alpha}{A}/\alpha]A}}$$

The $\beta$-rule is the obvious one you might expect: $$\begin{array}{lcl} \unfold{(\fold{e})} & \mapsto & e \end{array}$$

And let's extend the size function, too: $$\begin{array}{lcl} \size{\fold{e}} & = & 1 + \size{e} \\ \size{\unfold{e}} & = & \size{e} \end{array}$$

Now, note that all of the lemmas we used continue to hold. So we can add unrestricted recursive types, with no restrictions on negative occurences of type variables, to second-order MALL and the language is still strongly normalizing!

Another way of putting things is that contraction (i.e., using variables more than once) is essential to programming things like the Y combinator. Without it, recursive types are not enough to enable dangerous self-references like Russell's paradox.

Acknowledgements. I'd like to thank Stéphane Gimenez for directing me to Girard's small normalization theorem.

1. Great post! Is it really linearity that makes this possible? How close can one get to adding delta : a -> a (X) a and e : a -> I before the substitution lemma breaks down? e seems safe -- maybe there's a modified delta that still allows for a nice proof.

2. Another way of putting things is that contraction (i.e., using variables more than once) is essential to programming things like the Y combinator. Without it, recursive types are not enough to enable dangerous self-references like Russell's paradox.

This is a really interesting punchline. If you decided to put in reference types, would linearity rule out things like Landin's knot?

3. Lindsey: yes, it would, but for a boring reason -- there's really no useful way to distinguish a linear reference to a value from the value itself! If you have the only reference to the value, then that's basically the same as being the only person with the value itself.

In stuff like the work on the L3 language, you can see that they make references interesting by distinguishing between pointers (which can be duplicated) and the capability to dereference a pointer (which is linear). However, even adding this much duplication to the system means that recursive types permit them to code up Landin's knot. Their system does, however, have the really nice property that higher-order store does not permit Landin's knot unless you add recursive types.

Darius: Yes, you can add weakening without breaking anything -- affine typing is fine.

It's possible to support limited forms of contraction, but it's extremely delicate. Most of the work on this is in the area called "implicit complexity theory", and even very limited forms of contraction very, very rapidly increases the amount of computational power you have available. See Martin Hoffmann's Linear types and non-size-increasing polynomial time computation for a good introduction to this, especially the section on "restricted duplication".