Proof theorists often observe that linear logic is like regular logic, only everything works better.
In this post, I'll give a very striking instance of this maxim, by taking second-order multiplicative-additive linear logic (MALL) and showing how to prove strong normalization for it, without using a logical relation. That is, there is a purely inductive argument that a powerful second-order logic normalizes! Girard originally proved this for classical linear logic, using proof nets (the "small normalization theorem", Corollary 4.22, page 71 of the original Linear Logic paper), but it turns out that a variant works for lambda-terms and intuitionistic linear logic as well. $$ \newcommand{\lolli}{\multimap} \newcommand{\tensor}{\otimes} \newcommand{\with}{\&} \newcommand{\all}[2]{\forall {#1}.\;{#2}} \newcommand{\fix}[2]{\mu {#1}.\;{#2}} \newcommand{\letv}[3]{\mathsf{let}\,{#1} = {#2}\;\mathsf{in}\;{#3}} \newcommand{\Fun}[2]{\Lambda {#1}.\;{#2}} \newcommand{\fun}[2]{\lambda {#1}.\;{#2}} \newcommand{\unit}{\left(\right)} \newcommand{\letunit}[2]{\letv{\unit}{#1}{#2}} \newcommand{\pair}[2]{\left({#1},{#2}\right)} \newcommand{\letpair}[4]{\letv{\pair{#1}{#2}}{#3}{#4}} \newcommand{\unita}{\left[\right]} \newcommand{\paira}[2]{\left[{#1}, {#2}\right]} \newcommand{\fst}[1]{\mathsf{fst}\,{#1}} \newcommand{\snd}[1]{\mathsf{snd}\,{#1}} \newcommand{\inl}[1]{\mathsf{inl}\,{#1}} \newcommand{\inr}[1]{\mathsf{inr}\,{#1}} \newcommand{\case}[5]{\mathsf{case}({#1}, \inl{#2} \to {#3}, \inr{#4} \to {#5})} \newcommand{\abort}[1]{\mathsf{abort}\,{#1}} \newcommand{\fold}[1]{\mathsf{in}\,{#1}} \newcommand{\unfold}[1]{\mathsf{out}\,{#1}} \newcommand{\judge}[4]{{#1};{#2} \vdash {#3} : {#4}} \newcommand{\judgetp}[2]{{#1} \vdash {#2} : \mathsf{type}} \newcommand{\judgectx}[2]{{#1} \vdash {#2} : \mathsf{ctx}} %% Rule names \newcommand{\rulename}[3]{{#2}{\mathrm{#3}}_{\mathrm{#1}}} \newcommand{\intro}[2][]{\rulename{#1}{#2}{I}} \newcommand{\elim}[2][]{\rulename{#1}{#2}{E}} \newcommand{\Var}{\rulename{}{}{Var}} \newcommand{\AllI}{\intro{\forall}} \newcommand{\AllE}{\elim{\forall}} \newcommand{\LolliI}{\intro{\lolli}} \newcommand{\LolliE}{\elim{\lolli}} \newcommand{\UnitI}{\intro{1}} \newcommand{\UnitE}{\elim{1}} \newcommand{\TensorI}{\intro{\tensor}} \newcommand{\TensorE}{\elim{\tensor}} \newcommand{\TopI}{\intro{\top}} \newcommand{\TopE}{\elim{\top}} \newcommand{\WithI}{\intro{\with}} \newcommand{\WithEFst}{\elim[fst]{\with}} \newcommand{\WithESnd}{\elim[snd]{\with}} \newcommand{\ZeroI}{\intro{0}} \newcommand{\ZeroE}{\elim{0}} \newcommand{\SumIInl}{\intro[inl]{\oplus}} \newcommand{\SumIInr}{\intro[inr]{\oplus}} \newcommand{\SumE}{\elim{\oplus}} \newcommand{\MuI}{\intro{\mu}} \newcommand{\MuE}{\elim{\mu}} \newcommand{\size}[1]{\left|#1\right|} \newcommand{\inferrule}[3][]{\frac{#2}{#3}\;{#1}} $$
First, let's recall the type system for second order MALL. $$ \begin{array}{llcl} \mbox{Types} & A & ::= & \all{\alpha}{A} \bnfalt \alpha \bnfalt A \lolli B \bnfalt 1 \bnfalt A \tensor B \bnfalt \top \bnfalt A \with B \bnfalt 0 \bnfalt A \oplus B \\[1em] \mbox{Type Contexts} & \Delta & ::= & \cdot \bnfalt \Delta, \alpha \\[1em] \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x:A \\[1em] \mbox{Terms} & e & ::= & x \bnfalt \Fun{\alpha}{e} \bnfalt e\;A \bnfalt \fun{x:A}{e} \bnfalt e\;e \\ & & | & \unit \bnfalt \letunit{e}{e'} \bnfalt \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'}\\ & & | & \unita \bnfalt \paira{e}{e'} \bnfalt \fst{e} \bnfalt \snd{e} \\ & & | & \inl{e} \bnfalt \inr{e} \bnfalt \case{e}{x}{e'}{x}{e''} \bnfalt \abort{e} \end{array} $$
The simple types are linear functions $A \lolli B$, tensor products $A \tensor B$, Cartesian products $A \with B$, and coproducts $A \oplus B$. We also have quantified types $\all{\alpha}{A}$ and variable references $\alpha$. We have two kinds of contexts. $\Delta$ is the context of type variables, and $\Gamma$ gives types to term variables. Below, we give the judgement form for the well-formedness of types, $\judgetp{\Delta}{A}$, which says that $A$'s free variables all lie in $\Delta$. $$ \boxed{\judgetp{\Delta}{A}} \\ \inferrule[] {\alpha \in \Delta} {\judgetp{\Delta}{\alpha}} \\ \begin{array}{ll} \inferrule[] {\judgetp{\Delta, \alpha}{A}} {\judgetp{\Delta}{\all{\alpha}{A}}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \lolli B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{1}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \tensor B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{\top}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \with B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{0}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \oplus B}} \end{array} $$
Once we know how to assert that types are well-formed, we can lift this to a well-formedness judgement for contexts $\judgectx{\Delta}{\Gamma}$. $$ \boxed{\judgectx{\Delta}{\Gamma}} \\ \inferrule[] { } {\judgectx{\Delta}{\cdot}} \qquad \inferrule[] {\judgectx{\Delta}{\Gamma} \\ \judgetp{\Delta}{A}} {\judgectx{\Delta}{\Gamma, x:A}} $$
Now, we can give the typing rules for the terms of second-order MALL. The typing judgement $\judge{\Delta}{\Gamma}{e}{A}$ says that assuming that the free type variables are in $\Delta$, andall the term variables have types given by $\Gamma$, then the term $e$ has the type $A$.
Variable references $x$ are just looked up in the environment by the rule $\Var$. Note that the context contains only a single binding for the variable $x$. This is demanded by linearity, since if there were other variables in the context, the expression $x$ could not use them!
Type abstractions are introduced with the $\AllI$ rule, and simply typecheck the body of $\Fun{\alpha}{e}$ in a context extended by $\alpha$. A type application checked by $\AllE$ simply substitutes a type for a quantified variable. The linear functions have the usual introduction rule in $\LolliI$, and the application rule $\LolliE$ checks that the function has type $A \lolli B$ and the argument has type $A$. Note that the elimination rule splits the context into two parts, with one set of variables being used by the function and the other by the argument --- this enforces the linear use of variables.
We introduce units with the $\UnitI$ rule --- it requires no resources to create a unit, since it has no free variables in it. However, eliminating a unit in $\UnitE$ does divide the unit, since we may produce a unit-typed value from a term like $f x : 1$, where $f : P \lolli 1$ and $x : P$. Tensor products are introduced with the $\TensorI$ rule, which splits the variables in the context between the two sides of the pair $\pair{e}{e'}$. Likewise, the elimination form $\TensorE$ uses the variables in $\Gamma$ to produce a term of type $A \tensor B$, and then binds the variables $x:A$ and $y:B$ to typecheck the continuation expression $e'$.
We can contrast tensor products $A \tensor B$ with the Cartesian product $A \with B$. The introduction rule $\WithI$ reuses the same context $\Gamma$ when typechecking the two branches of $\paira{e}{e'}$. As a result, there are two projective eliminations $\WithEFst$ and $\WithESnd$, each of which consumes a pair $A \with B$ to produce either an $A$, or a $B$. Intuitively, a Cartesian pair is lazy, and the projections $\fst{e}$ and $\snd{e}$ let the programmer choose which of the two possibilities they want. In contrast, the tensor product $A \tensor B$ is strict, and so you get access to both components (at the price of not allowing the two sides to share variables).
Finally, we also have the coproduct $A \oplus B$. This works much as it does in the intuitionistic case: $\SumIInl$ takes an $A$ and gives an $A \oplus B$, and $\SumIInr$ does the same with a $B$. The case statement takes an expression $e$ of type $A \oplus B$, and then branches to $e'$ or $e''$ depending on whether or not it is the left or right branch. Note that $e$ has a disjoint set of variables $\Gamma$ from the case branches, but that $e'$ and $e''$ share the same set of variables $\Gamma''$ since only one of them will run. $$ \boxed{\judge{\Delta}{\Gamma}{e}{A}} \\ \inferrule[\Var] { } {\judge{\Delta}{x:A}{x}{A}} \\ \begin{array}{cc} \inferrule[\AllI] {\judge{\Delta, \alpha}{\Gamma}{e}{A}} {\judge{\Delta}{\Gamma}{\Fun{\alpha}{e}}{\all{\alpha}{A}}} & \inferrule[\AllE] {\judge{\Delta}{\Gamma}{e}{\all{\alpha}{B}} \\ \judgetp{\Delta}{A}} {\judge{\Delta}{\Gamma}{e\;A}{[A/\alpha]B}} \\[2em] \inferrule[\LolliI] {\judge{\Delta}{\Gamma, x:A}{e}{B}} {\judge{\Delta}{\Gamma}{\fun{x:A}{e}}{A \to B}} & \inferrule[\LolliE] {\judge{\Delta}{\Gamma}{e}{A \to B} \\ \judge{\Delta}{\Gamma'}{e'}{A}} {\judge{\Delta}{\Gamma,\Gamma'}{e\;e'}{B}} \\[2em] \inferrule[\UnitI] { } {\judge{\Delta}{\cdot}{\unit}{1}} & \inferrule[\UnitE] {\judge{\Delta}{\Gamma}{e}{1} \\ \judge{\Delta}{\Gamma'}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letunit{e}{e'}}{C}} \\[2em] \inferrule[\TensorI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma'}{e'}{B} } {\judge{\Delta}{\Gamma,\Gamma'}{\pair{e}{e'}}{A \tensor B}} & \inferrule[\TensorE] {\judge{\Delta}{\Gamma}{e}{A \tensor B} \\\\ \judge{\Delta}{\Gamma', x:A, y:B}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letpair{x}{y}{e}{e'}}{C}} \\[2em] \inferrule[\TopI] { } {\judge{\Delta}{\Gamma}{\unita}{\top}} & \\[1em] \inferrule[\WithI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma}{e'}{B} } {\judge{\Delta}{\Gamma}{\paira{e}{e'}}{A \with B}} & \begin{array}{l} \inferrule[\WithEFst] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\fst{e}}{A}} \\[1em] \inferrule[\WithESnd] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\snd{e}}{B}} \end{array} \\[3em] \begin{array}{l} \inferrule[\SumIInl] {\judge{\Delta}{\Gamma}{e}{A }} {\judge{\Delta}{\Gamma}{\inl{e}}{A \oplus B}} \\[1em] \inferrule[\SumIInr] {\judge{\Delta}{\Gamma}{e}{ B}} {\judge{\Delta}{\Gamma}{\inr{e}}{A \oplus B}} \end{array} & \begin{array}{l} \inferrule[\SumE] {\judge{\Delta}{\Gamma}{e}{A \oplus B} \\\\ \judge{\Delta}{\Gamma', x:A}{e'}{C} \\\\ \judge{\Delta}{\Gamma', y:B}{e''}{C} } {\judge{\Delta}{\Gamma, \Gamma'}{\case{e}{x}{e'}{y}{e''}}{C}} \end{array} \\[3em] & \inferrule[\ZeroE] {\judge{\Delta}{\Gamma}{e}{0}} {\judge{\Delta}{\Gamma}{\abort{e}}{C}} \end{array} $$
Now, we can define a size function $\size{e}$ on lambda-terms. $$ \begin{array}{lcl} \size{x} & = & 0 \\ \size{\all{\alpha}{e}} & = & \size{e} \\ \size{e\;A} & = & 1 + \size{e} \\ \size{\fun{x:A}{e}} & = & \size{e} \\ \size{e\;e'} & = & 1 + \size{e} + \size{e'} \\ \size{\unit} & = & 1 \\ \size{\letv{\unit}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\pair{e}{e'}} & = & 1 + \size{e} + \size{e'} \\ \size{\letv{\pair{x}{y}}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\unita} & = & 0\\ \size{\paira{e}{e'}} & = & \max(\size{e}, \size{e'}) \\ \size{\fst{e}} & = & 1 + \size{e} \\ \size{\snd{e}} & = & 1 + \size{e} \\ \size{\inl{e}} & = & 1 + \size{e} \\ \size{\inr{e}} & = & 1 + \size{e} \\ \size{\case{e}{x}{e'}{y}{e''}} & = & \size{e} + \max(\size{e'}, \size{e''}) \\ \end{array} $$
This function basically just counts the size of the term, adding an extra one for either the introduction or elimination for each connective. (I followed the pattern of adding 1 to the introduction rule for positive connectives, and 1 to the elimination for negative connectives, but really it doesn't seem to matter.)
Now, we can prove a lemma showing how sizes interact with substitution. Namely, linearity means that the size can never get larger when you substitute a term for a variable.
Size-respecting substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $\judge{\Delta}{\Gamma', x:A}{e'}{C}$. Then $\size{[e/x]e'} \leq \size{e} + \size{e'}$.
Proof This follows by a structural induction on the derivation of $e'$. Rather than giving the full proof, I'll just make the observations needed to prove this.In the $\WithI$ and $\SumE$ cases, we need the following fact: $$ \max(a + x, b + x) = \max(a,b) + x $$ This is because $e$ can be substituted into multiple branches in these two rules.
In other cases, such as $\TensorI$, we have two subterms of $e'$ (so $e' = \pair{e'_1}{e'_2}$). Then typing indicates that $x$ can occur in only one of the branches. As a result, $[e/x]\pair{e'_1}{e'_2}$ will equal either $\pair{[e/x]e'_1}{e'_2}$ or $\pair{e'_1}{[e/x]e'_2}$, and then the size will be less than or equal to $1 + \size{e} + \size{e'_1} + \size{e'_2} = \size{e} + \size{e'}$.
The less than or equal to arises because of the presence of the unit $\top$, which ignores variables in its proof term $\unita$. In its absence substitution would preserve size exactly.
We also need to prove the same thing for type substitution.
Size-respecting type substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta, \alpha}{\Gamma}{e}{C}$ and $\judgetp{\Delta}{A}$. Then $\size{[A/\alpha]e} = \size{e}$.
Proof This is a routine induction, with no surprises, since types never affect the size of a term.Next, let's write down the beta reductions: $$ \begin{array}{lcl} (\Fun{\alpha}{e})\;A & \mapsto & [A/\alpha]e \\ (\fun{x:A}{e})\;e' & \mapsto & [e'/x]e \\ \letv{\unit}{\unit}{e'} & \mapsto & e' \\ \letv{\pair{x}{y}}{\pair{e_1}{e_2}}{e'} & \mapsto & [e_1/x, e_2/y]e' \\ \fst{\paira{e}{e'}} & \mapsto & e \\ \snd{\paira{e}{e'}} & \mapsto & e' \\ \case{\inl{e}}{x}{e'}{y}{e''} & \mapsto & [e/x]e' \\ \case{\inr{e}}{x}{e'}{y}{e''} & \mapsto & [e/y]e'' \\ \end{array} $$
Now, we can verify that $\beta$-reduction reduces size:
Shrinking. If $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $e\;\mapsto e'$, then $\size{e'} < \size{e}$.
Proof The proof of this is essentially nothing but the observation that every $\beta$-reduction removes an introduction, removes an elimination, and performs a subsitution, and so the size of the result term must get strictly smaller.
As a result, we know that all reduction strategies lead to a normal form. Since $\beta$-reduction makes a term strictly smaller, the process must eventually stop, no matter what order you perform the reductions in. So we (almost) have strong normalization. The missing piece is that we also need the Church-Rosser property, to ensure that all sequences of reductions eventually reach the same normal form. But this post is getting quite long enough, so I won't prove that -- I'll just assert it. :)
I will, however, offer one last reward for those of you who persevered to the end. Let's add unrestricted recursive types to our language: $$ \inferrule[\MuI] {\judge{\Delta}{\Gamma}{e}{[\fix{\alpha}{A}/\alpha]A}} {\judge{\Delta}{\Gamma}{\fold{e}}{\fix{\alpha}{A}}} \qquad \inferrule[\MuE] {\judge{\Delta}{\Gamma}{e}{\fix{\alpha}{A}}} {\judge{\Delta}{\Gamma}{\unfold{e}}{[\fix{\alpha}{A}/\alpha]A}} $$
The $\beta$-rule is the obvious one you might expect: $$ \begin{array}{lcl} \unfold{(\fold{e})} & \mapsto & e \end{array} $$
And let's extend the size function, too: $$ \begin{array}{lcl} \size{\fold{e}} & = & 1 + \size{e} \\ \size{\unfold{e}} & = & \size{e} \end{array} $$
Now, note that all of the lemmas we used continue to hold. So we can add unrestricted recursive types, with no restrictions on negative occurences of type variables, to second-order MALL and the language is still strongly normalizing!
Another way of putting things is that contraction (i.e., using variables more than once) is essential to programming things like the Y combinator. Without it, recursive types are not enough to enable dangerous self-references like Russell's paradox.
Acknowledgements. I'd like to thank Stéphane Gimenez for directing me to Girard's small normalization theorem.