Proof theorists often observe that linear logic is like regular logic, only
*everything works better*.

In this post, I'll give a very striking instance of this maxim, by
taking second-order multiplicative-additive linear logic (MALL) and
showing how to prove strong normalization for it, *without*
using a logical relation. That is, there is a **purely inductive**
argument that a powerful second-order logic normalizes! Girard
originally proved this for classical linear logic, using proof nets
(the "small normalization theorem", Corollary 4.22, page 71 of the original
Linear Logic paper), but it turns out that a variant works for
lambda-terms and intuitionistic linear logic as well.
$$
\newcommand{\lolli}{\multimap}
\newcommand{\tensor}{\otimes}
\newcommand{\with}{\&}
\newcommand{\all}[2]{\forall {#1}.\;{#2}}
\newcommand{\fix}[2]{\mu {#1}.\;{#2}}
\newcommand{\letv}[3]{\mathsf{let}\,{#1} = {#2}\;\mathsf{in}\;{#3}}
\newcommand{\Fun}[2]{\Lambda {#1}.\;{#2}}
\newcommand{\fun}[2]{\lambda {#1}.\;{#2}}
\newcommand{\unit}{\left(\right)}
\newcommand{\letunit}[2]{\letv{\unit}{#1}{#2}}
\newcommand{\pair}[2]{\left({#1},{#2}\right)}
\newcommand{\letpair}[4]{\letv{\pair{#1}{#2}}{#3}{#4}}
\newcommand{\unita}{\left[\right]}
\newcommand{\paira}[2]{\left[{#1}, {#2}\right]}
\newcommand{\fst}[1]{\mathsf{fst}\,{#1}}
\newcommand{\snd}[1]{\mathsf{snd}\,{#1}}
\newcommand{\inl}[1]{\mathsf{inl}\,{#1}}
\newcommand{\inr}[1]{\mathsf{inr}\,{#1}}
\newcommand{\case}[5]{\mathsf{case}({#1}, \inl{#2} \to {#3}, \inr{#4} \to {#5})}
\newcommand{\abort}[1]{\mathsf{abort}\,{#1}}
\newcommand{\fold}[1]{\mathsf{in}\,{#1}}
\newcommand{\unfold}[1]{\mathsf{out}\,{#1}}
\newcommand{\judge}[4]{{#1};{#2} \vdash {#3} : {#4}}
\newcommand{\judgetp}[2]{{#1} \vdash {#2} : \mathsf{type}}
\newcommand{\judgectx}[2]{{#1} \vdash {#2} : \mathsf{ctx}}
%% Rule names
\newcommand{\rulename}[3]{{#2}{\mathrm{#3}}_{\mathrm{#1}}}
\newcommand{\intro}[2][]{\rulename{#1}{#2}{I}}
\newcommand{\elim}[2][]{\rulename{#1}{#2}{E}}
\newcommand{\Var}{\rulename{}{}{Var}}
\newcommand{\AllI}{\intro{\forall}}
\newcommand{\AllE}{\elim{\forall}}
\newcommand{\LolliI}{\intro{\lolli}}
\newcommand{\LolliE}{\elim{\lolli}}
\newcommand{\UnitI}{\intro{1}}
\newcommand{\UnitE}{\elim{1}}
\newcommand{\TensorI}{\intro{\tensor}}
\newcommand{\TensorE}{\elim{\tensor}}
\newcommand{\TopI}{\intro{\top}}
\newcommand{\TopE}{\elim{\top}}
\newcommand{\WithI}{\intro{\with}}
\newcommand{\WithEFst}{\elim[fst]{\with}}
\newcommand{\WithESnd}{\elim[snd]{\with}}
\newcommand{\ZeroI}{\intro{0}}
\newcommand{\ZeroE}{\elim{0}}
\newcommand{\SumIInl}{\intro[inl]{\oplus}}
\newcommand{\SumIInr}{\intro[inr]{\oplus}}
\newcommand{\SumE}{\elim{\oplus}}
\newcommand{\MuI}{\intro{\mu}}
\newcommand{\MuE}{\elim{\mu}}
\newcommand{\size}[1]{\left|#1\right|}
\newcommand{\inferrule}[3][]{\frac{#2}{#3}\;{#1}}
$$

First, let's recall the type system for second order MALL.
$$
\begin{array}{llcl}
\mbox{Types} &
A & ::= & \all{\alpha}{A} \bnfalt \alpha \bnfalt A \lolli B \bnfalt 1 \bnfalt A \tensor B \bnfalt \top \bnfalt A \with B \bnfalt
0 \bnfalt A \oplus B
\\[1em]
\mbox{Type Contexts} &
\Delta & ::= & \cdot \bnfalt \Delta, \alpha
\\[1em]
\mbox{Contexts} &
\Gamma & ::= & \cdot \bnfalt \Gamma, x:A
\\[1em]
\mbox{Terms} &
e & ::= & x \bnfalt \Fun{\alpha}{e} \bnfalt e\;A \bnfalt \fun{x:A}{e} \bnfalt e\;e \\
& & | & \unit \bnfalt \letunit{e}{e'} \bnfalt \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'}\\
& & | & \unita \bnfalt \paira{e}{e'} \bnfalt \fst{e} \bnfalt \snd{e} \\
& & | & \inl{e} \bnfalt \inr{e} \bnfalt \case{e}{x}{e'}{x}{e''} \bnfalt \abort{e}
\end{array}
$$

The simple types are linear functions $A \lolli B$, tensor products $A
\tensor B$, Cartesian products $A \with B$, and coproducts $A \oplus
B$. We also have quantified types $\all{\alpha}{A}$ and variable
references $\alpha$. We have two kinds of contexts. $\Delta$ is the
context of type variables, and $\Gamma$ gives types to term variables.
Below, we give the judgement form for the well-formedness of types,
$\judgetp{\Delta}{A}$, which says that $A$'s free variables all lie in
$\Delta$.
$$
\boxed{\judgetp{\Delta}{A}}
\\
\inferrule[]
{\alpha \in \Delta}
{\judgetp{\Delta}{\alpha}}
\\
\begin{array}{ll}
\inferrule[]
{\judgetp{\Delta, \alpha}{A}}
{\judgetp{\Delta}{\all{\alpha}{A}}}
&
\inferrule[]
{\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}}
{\judgetp{\Delta}{A \lolli B}}
\\[2em]
\inferrule[]
{ }
{\judgetp{\Delta}{1}}
&
\inferrule[]
{\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}}
{\judgetp{\Delta}{A \tensor B}}
\\[2em]
\inferrule[]
{ }
{\judgetp{\Delta}{\top}}
&
\inferrule[]
{\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}}
{\judgetp{\Delta}{A \with B}}
\\[2em]
\inferrule[]
{ }
{\judgetp{\Delta}{0}}
&
\inferrule[]
{\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}}
{\judgetp{\Delta}{A \oplus B}}
\end{array}
$$

Once we know how to assert that types are well-formed, we can
lift this to a well-formedness judgement for contexts $\judgectx{\Delta}{\Gamma}$.
$$
\boxed{\judgectx{\Delta}{\Gamma}}
\\
\inferrule[]
{ }
{\judgectx{\Delta}{\cdot}}
\qquad
\inferrule[]
{\judgectx{\Delta}{\Gamma} \\ \judgetp{\Delta}{A}}
{\judgectx{\Delta}{\Gamma, x:A}}
$$

Now, we can give the typing rules for the terms of second-order
MALL. The typing judgement $\judge{\Delta}{\Gamma}{e}{A}$ says that
assuming that the free type variables are in $\Delta$, andall the term
variables have types given by $\Gamma$, then the term $e$ has the type
$A$.

Variable references $x$ are just looked up in the environment
by the rule $\Var$. Note that the context contains only a single
binding for the variable $x$. This is demanded by linearity, since
if there were other variables in the context, the expression $x$
could not use them!

Type abstractions are introduced with the $\AllI$ rule, and simply
typecheck the body of $\Fun{\alpha}{e}$ in a context extended by
$\alpha$. A type application checked by $\AllE$ simply substitutes a
type for a quantified variable. The linear functions have the usual
introduction rule in $\LolliI$, and the application rule $\LolliE$
checks that the function has type $A \lolli B$ and the argument has
type $A$. Note that the elimination rule splits the context into two
parts, with one set of variables being used by the function and the
other by the argument --- this enforces the linear use of variables.

We introduce units with the $\UnitI$ rule --- it requires no resources
to create a unit, since it has no free variables in it. However, eliminating
a unit in $\UnitE$ *does* divide the unit, since we may produce a unit-typed
value from a term like $f x : 1$, where $f : P \lolli 1$ and $x : P$.
Tensor products are introduced with the $\TensorI$ rule, which splits the
variables in the context between the two sides of the pair $\pair{e}{e'}$.
Likewise, the elimination form $\TensorE$ uses the variables in $\Gamma$ to produce
a term of type $A \tensor B$, and then binds the variables $x:A$ and $y:B$ to
typecheck the continuation expression $e'$.

We can contrast tensor products $A \tensor B$ with the Cartesian product
$A \with B$. The introduction rule $\WithI$ reuses the *same* context $\Gamma$
when typechecking the two branches of $\paira{e}{e'}$. As a result, there
are two projective eliminations $\WithEFst$ and $\WithESnd$, each of which
consumes a pair $A \with B$ to produce either an $A$, or a $B$. Intuitively,
a Cartesian pair is *lazy*, and the projections $\fst{e}$ and
$\snd{e}$ let the programmer *choose* which of the two possibilities
they want. In contrast, the tensor product $A \tensor B$ is strict, and
so you get access to both components (at the price of not allowing the
two sides to share variables).

Finally, we also have the coproduct $A \oplus B$. This works much as
it does in the intuitionistic case: $\SumIInl$ takes an $A$ and gives
an $A \oplus B$, and $\SumIInr$ does the same with a $B$. The case
statement takes an expression $e$ of type $A \oplus B$, and then
branches to $e'$ or $e''$ depending on whether or not it is the left
or right branch. Note that $e$ has a disjoint set of variables
$\Gamma$ from the case branches, but that $e'$ and $e''$ share the
same set of variables $\Gamma''$ since only one of them will run.
$$
\boxed{\judge{\Delta}{\Gamma}{e}{A}}
\\
\inferrule[\Var]
{ }
{\judge{\Delta}{x:A}{x}{A}}
\\
\begin{array}{cc}
\inferrule[\AllI]
{\judge{\Delta, \alpha}{\Gamma}{e}{A}}
{\judge{\Delta}{\Gamma}{\Fun{\alpha}{e}}{\all{\alpha}{A}}}
&
\inferrule[\AllE]
{\judge{\Delta}{\Gamma}{e}{\all{\alpha}{B}} \\
\judgetp{\Delta}{A}}
{\judge{\Delta}{\Gamma}{e\;A}{[A/\alpha]B}}
\\[2em]
\inferrule[\LolliI]
{\judge{\Delta}{\Gamma, x:A}{e}{B}}
{\judge{\Delta}{\Gamma}{\fun{x:A}{e}}{A \to B}}
&
\inferrule[\LolliE]
{\judge{\Delta}{\Gamma}{e}{A \to B} \\
\judge{\Delta}{\Gamma'}{e'}{A}}
{\judge{\Delta}{\Gamma,\Gamma'}{e\;e'}{B}}
\\[2em]
\inferrule[\UnitI]
{ }
{\judge{\Delta}{\cdot}{\unit}{1}}
&
\inferrule[\UnitE]
{\judge{\Delta}{\Gamma}{e}{1} \\
\judge{\Delta}{\Gamma'}{e'}{C}}
{\judge{\Delta}{\Gamma,\Gamma'}{\letunit{e}{e'}}{C}}
\\[2em]
\inferrule[\TensorI]
{\judge{\Delta}{\Gamma}{e}{A} \\
\judge{\Delta}{\Gamma'}{e'}{B} }
{\judge{\Delta}{\Gamma,\Gamma'}{\pair{e}{e'}}{A \tensor B}}
&
\inferrule[\TensorE]
{\judge{\Delta}{\Gamma}{e}{A \tensor B} \\\\
\judge{\Delta}{\Gamma', x:A, y:B}{e'}{C}}
{\judge{\Delta}{\Gamma,\Gamma'}{\letpair{x}{y}{e}{e'}}{C}}
\\[2em]
\inferrule[\TopI]
{ }
{\judge{\Delta}{\Gamma}{\unita}{\top}}
&
\\[1em]
\inferrule[\WithI]
{\judge{\Delta}{\Gamma}{e}{A} \\
\judge{\Delta}{\Gamma}{e'}{B} }
{\judge{\Delta}{\Gamma}{\paira{e}{e'}}{A \with B}}
&
\begin{array}{l}
\inferrule[\WithEFst]
{\judge{\Delta}{\Gamma}{e}{A \with B}}
{\judge{\Delta}{\Gamma}{\fst{e}}{A}}
\\[1em]
\inferrule[\WithESnd]
{\judge{\Delta}{\Gamma}{e}{A \with B}}
{\judge{\Delta}{\Gamma}{\snd{e}}{B}}
\end{array}
\\[3em]
\begin{array}{l}
\inferrule[\SumIInl]
{\judge{\Delta}{\Gamma}{e}{A }}
{\judge{\Delta}{\Gamma}{\inl{e}}{A \oplus B}}
\\[1em]
\inferrule[\SumIInr]
{\judge{\Delta}{\Gamma}{e}{ B}}
{\judge{\Delta}{\Gamma}{\inr{e}}{A \oplus B}}
\end{array}
&
\begin{array}{l}
\inferrule[\SumE]
{\judge{\Delta}{\Gamma}{e}{A \oplus B} \\\\
\judge{\Delta}{\Gamma', x:A}{e'}{C} \\\\
\judge{\Delta}{\Gamma', y:B}{e''}{C} }
{\judge{\Delta}{\Gamma, \Gamma'}{\case{e}{x}{e'}{y}{e''}}{C}}
\end{array}
\\[3em]
&
\inferrule[\ZeroE]
{\judge{\Delta}{\Gamma}{e}{0}}
{\judge{\Delta}{\Gamma}{\abort{e}}{C}}
\end{array}
$$

Now, we can define a *size function* $\size{e}$ on lambda-terms.
$$
\begin{array}{lcl}
\size{x} & = & 0 \\
\size{\all{\alpha}{e}} & = & \size{e} \\
\size{e\;A} & = & 1 + \size{e} \\
\size{\fun{x:A}{e}} & = & \size{e} \\
\size{e\;e'} & = & 1 + \size{e} + \size{e'} \\
\size{\unit} & = & 1 \\
\size{\letv{\unit}{e}{e'}} & = & \size{e} + \size{e'} \\
\size{\pair{e}{e'}} & = & 1 + \size{e} + \size{e'} \\
\size{\letv{\pair{x}{y}}{e}{e'}} & = & \size{e} + \size{e'} \\
\size{\unita} & = & 0\\
\size{\paira{e}{e'}} & = & \max(\size{e}, \size{e'}) \\
\size{\fst{e}} & = & 1 + \size{e} \\
\size{\snd{e}} & = & 1 + \size{e} \\
\size{\inl{e}} & = & 1 + \size{e} \\
\size{\inr{e}} & = & 1 + \size{e} \\
\size{\case{e}{x}{e'}{y}{e''}} & = & \size{e} + \max(\size{e'}, \size{e''}) \\
\end{array}
$$

This function basically just counts the size of the term, adding an extra one
for either the introduction or elimination for each connective. (I followed the
pattern of adding 1 to the introduction rule for positive connectives, and 1 to
the elimination for negative connectives, but really it doesn't seem to matter.)

Now, we can prove a lemma showing how sizes interact with substitution. Namely,
linearity means that the size can never get larger when you substitute a term for
a variable.

**Size-respecting substitution.**
Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $\judge{\Delta}{\Gamma', x:A}{e'}{C}$. Then
$\size{[e/x]e'} \leq \size{e} + \size{e'}$.

*Proof*
This follows by a structural induction on the derivation of $e'$. Rather than giving
the full proof, I'll just make the observations
needed to prove this.
In the $\WithI$ and $\SumE$ cases, we need the following fact:
$$
\max(a + x, b + x) = \max(a,b) + x
$$
This is because $e$ can be substituted into multiple branches in these two rules.

In other cases, such as $\TensorI$, we have two subterms of $e'$
(so $e' = \pair{e'_1}{e'_2}$). Then typing indicates that $x$ can occur in
only one of the branches. As a result, $[e/x]\pair{e'_1}{e'_2}$ will
equal either $\pair{[e/x]e'_1}{e'_2}$ or $\pair{e'_1}{[e/x]e'_2}$, and
then the size will be less than or equal to $1 + \size{e} +
\size{e'_1} + \size{e'_2} = \size{e} + \size{e'}$.

The less than or equal to arises because of the presence of the unit $\top$, which
ignores variables in its proof term $\unita$. In its absence substitution would preserve size
exactly.

We also need to prove the same thing for type substitution.

**Size-respecting type substitution.**
Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta, \alpha}{\Gamma}{e}{C}$ and $\judgetp{\Delta}{A}$. Then
$\size{[A/\alpha]e} = \size{e}$.

*Proof*
This is a routine induction, with no surprises, since types never affect the size of a term.

Next, let's write down the beta reductions:
$$
\begin{array}{lcl}
(\Fun{\alpha}{e})\;A & \mapsto & [A/\alpha]e \\
(\fun{x:A}{e})\;e' & \mapsto & [e'/x]e \\
\letv{\unit}{\unit}{e'} & \mapsto & e' \\
\letv{\pair{x}{y}}{\pair{e_1}{e_2}}{e'} & \mapsto & [e_1/x, e_2/y]e' \\
\fst{\paira{e}{e'}} & \mapsto & e \\
\snd{\paira{e}{e'}} & \mapsto & e' \\
\case{\inl{e}}{x}{e'}{y}{e''} & \mapsto & [e/x]e' \\
\case{\inr{e}}{x}{e'}{y}{e''} & \mapsto & [e/y]e'' \\
\end{array}
$$

Now, we can verify that $\beta$-reduction reduces size:

**Shrinking.**
If $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $e\;\mapsto e'$, then $\size{e'} < \size{e}$.

*Proof*
The proof of this is essentially nothing but the observation that every $\beta$-reduction removes an
introduction, removes an elimination, and performs a subsitution, and so the size of the result term must
get strictly smaller.

As a result, we know that all reduction strategies lead to a normal
form. Since $\beta$-reduction makes a term strictly smaller, the
process must eventually stop, no matter what order you perform the
reductions in. So we (almost) have strong normalization. The missing
piece is that we also need the Church-Rosser property, to ensure that
all sequences of reductions eventually reach the *same* normal
form. But this post is getting quite long enough, so I won't prove
that -- I'll just assert it. :)

I will, however, offer one last reward for those of you who
persevered to the end. Let's add unrestricted recursive types to our
language:
$$
\inferrule[\MuI]
{\judge{\Delta}{\Gamma}{e}{[\fix{\alpha}{A}/\alpha]A}}
{\judge{\Delta}{\Gamma}{\fold{e}}{\fix{\alpha}{A}}}
\qquad
\inferrule[\MuE]
{\judge{\Delta}{\Gamma}{e}{\fix{\alpha}{A}}}
{\judge{\Delta}{\Gamma}{\unfold{e}}{[\fix{\alpha}{A}/\alpha]A}}
$$

The $\beta$-rule is the obvious one you might expect:
$$
\begin{array}{lcl}
\unfold{(\fold{e})} & \mapsto & e
\end{array}
$$

And let's extend the size function, too:
$$
\begin{array}{lcl}
\size{\fold{e}} & = & 1 + \size{e} \\
\size{\unfold{e}} & = & \size{e}
\end{array}
$$

Now, note that all of the lemmas we used continue to hold. So
we can add unrestricted recursive types, with no restrictions on
negative occurences of type variables, to second-order MALL and the
language is *still strongly normalizing!*

Another way of putting things is that contraction (i.e., using
variables more than once) is *essential* to programming things
like the Y combinator. Without it, recursive types are not enough to
enable dangerous self-references like Russell's paradox.

**Acknowledgements.** I'd like to thank Stéphane
Gimenez for directing me to Girard's small normalization theorem.