Tuesday, May 7, 2013

Strong Normalization without Logical Relations

Proof theorists often observe that linear logic is like regular logic, only everything works better.

In this post, I'll give a very striking instance of this maxim, by taking second-order multiplicative-additive linear logic (MALL) and showing how to prove strong normalization for it, without using a logical relation. That is, there is a purely inductive argument that a powerful second-order logic normalizes! Girard originally proved this for classical linear logic, using proof nets (the "small normalization theorem", Corollary 4.22, page 71 of the original Linear Logic paper), but it turns out that a variant works for lambda-terms and intuitionistic linear logic as well. $$ \newcommand{\lolli}{\multimap} \newcommand{\tensor}{\otimes} \newcommand{\with}{\&} \newcommand{\all}[2]{\forall {#1}.\;{#2}} \newcommand{\fix}[2]{\mu {#1}.\;{#2}} \newcommand{\letv}[3]{\mathsf{let}\,{#1} = {#2}\;\mathsf{in}\;{#3}} \newcommand{\Fun}[2]{\Lambda {#1}.\;{#2}} \newcommand{\fun}[2]{\lambda {#1}.\;{#2}} \newcommand{\unit}{\left(\right)} \newcommand{\letunit}[2]{\letv{\unit}{#1}{#2}} \newcommand{\pair}[2]{\left({#1},{#2}\right)} \newcommand{\letpair}[4]{\letv{\pair{#1}{#2}}{#3}{#4}} \newcommand{\unita}{\left[\right]} \newcommand{\paira}[2]{\left[{#1}, {#2}\right]} \newcommand{\fst}[1]{\mathsf{fst}\,{#1}} \newcommand{\snd}[1]{\mathsf{snd}\,{#1}} \newcommand{\inl}[1]{\mathsf{inl}\,{#1}} \newcommand{\inr}[1]{\mathsf{inr}\,{#1}} \newcommand{\case}[5]{\mathsf{case}({#1}, \inl{#2} \to {#3}, \inr{#4} \to {#5})} \newcommand{\abort}[1]{\mathsf{abort}\,{#1}} \newcommand{\fold}[1]{\mathsf{in}\,{#1}} \newcommand{\unfold}[1]{\mathsf{out}\,{#1}} \newcommand{\judge}[4]{{#1};{#2} \vdash {#3} : {#4}} \newcommand{\judgetp}[2]{{#1} \vdash {#2} : \mathsf{type}} \newcommand{\judgectx}[2]{{#1} \vdash {#2} : \mathsf{ctx}} %% Rule names \newcommand{\rulename}[3]{{#2}{\mathrm{#3}}_{\mathrm{#1}}} \newcommand{\intro}[2][]{\rulename{#1}{#2}{I}} \newcommand{\elim}[2][]{\rulename{#1}{#2}{E}} \newcommand{\Var}{\rulename{}{}{Var}} \newcommand{\AllI}{\intro{\forall}} \newcommand{\AllE}{\elim{\forall}} \newcommand{\LolliI}{\intro{\lolli}} \newcommand{\LolliE}{\elim{\lolli}} \newcommand{\UnitI}{\intro{1}} \newcommand{\UnitE}{\elim{1}} \newcommand{\TensorI}{\intro{\tensor}} \newcommand{\TensorE}{\elim{\tensor}} \newcommand{\TopI}{\intro{\top}} \newcommand{\TopE}{\elim{\top}} \newcommand{\WithI}{\intro{\with}} \newcommand{\WithEFst}{\elim[fst]{\with}} \newcommand{\WithESnd}{\elim[snd]{\with}} \newcommand{\ZeroI}{\intro{0}} \newcommand{\ZeroE}{\elim{0}} \newcommand{\SumIInl}{\intro[inl]{\oplus}} \newcommand{\SumIInr}{\intro[inr]{\oplus}} \newcommand{\SumE}{\elim{\oplus}} \newcommand{\MuI}{\intro{\mu}} \newcommand{\MuE}{\elim{\mu}} \newcommand{\size}[1]{\left|#1\right|} \newcommand{\inferrule}[3][]{\frac{#2}{#3}\;{#1}} $$

First, let's recall the type system for second order MALL. $$ \begin{array}{llcl} \mbox{Types} & A & ::= & \all{\alpha}{A} \bnfalt \alpha \bnfalt A \lolli B \bnfalt 1 \bnfalt A \tensor B \bnfalt \top \bnfalt A \with B \bnfalt 0 \bnfalt A \oplus B \\[1em] \mbox{Type Contexts} & \Delta & ::= & \cdot \bnfalt \Delta, \alpha \\[1em] \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x:A \\[1em] \mbox{Terms} & e & ::= & x \bnfalt \Fun{\alpha}{e} \bnfalt e\;A \bnfalt \fun{x:A}{e} \bnfalt e\;e \\ & & | & \unit \bnfalt \letunit{e}{e'} \bnfalt \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'}\\ & & | & \unita \bnfalt \paira{e}{e'} \bnfalt \fst{e} \bnfalt \snd{e} \\ & & | & \inl{e} \bnfalt \inr{e} \bnfalt \case{e}{x}{e'}{x}{e''} \bnfalt \abort{e} \end{array} $$

The simple types are linear functions $A \lolli B$, tensor products $A \tensor B$, Cartesian products $A \with B$, and coproducts $A \oplus B$. We also have quantified types $\all{\alpha}{A}$ and variable references $\alpha$. We have two kinds of contexts. $\Delta$ is the context of type variables, and $\Gamma$ gives types to term variables. Below, we give the judgement form for the well-formedness of types, $\judgetp{\Delta}{A}$, which says that $A$'s free variables all lie in $\Delta$. $$ \boxed{\judgetp{\Delta}{A}} \\ \inferrule[] {\alpha \in \Delta} {\judgetp{\Delta}{\alpha}} \\ \begin{array}{ll} \inferrule[] {\judgetp{\Delta, \alpha}{A}} {\judgetp{\Delta}{\all{\alpha}{A}}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \lolli B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{1}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \tensor B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{\top}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \with B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{0}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \oplus B}} \end{array} $$

Once we know how to assert that types are well-formed, we can lift this to a well-formedness judgement for contexts $\judgectx{\Delta}{\Gamma}$. $$ \boxed{\judgectx{\Delta}{\Gamma}} \\ \inferrule[] { } {\judgectx{\Delta}{\cdot}} \qquad \inferrule[] {\judgectx{\Delta}{\Gamma} \\ \judgetp{\Delta}{A}} {\judgectx{\Delta}{\Gamma, x:A}} $$

Now, we can give the typing rules for the terms of second-order MALL. The typing judgement $\judge{\Delta}{\Gamma}{e}{A}$ says that assuming that the free type variables are in $\Delta$, andall the term variables have types given by $\Gamma$, then the term $e$ has the type $A$.

Variable references $x$ are just looked up in the environment by the rule $\Var$. Note that the context contains only a single binding for the variable $x$. This is demanded by linearity, since if there were other variables in the context, the expression $x$ could not use them!

Type abstractions are introduced with the $\AllI$ rule, and simply typecheck the body of $\Fun{\alpha}{e}$ in a context extended by $\alpha$. A type application checked by $\AllE$ simply substitutes a type for a quantified variable. The linear functions have the usual introduction rule in $\LolliI$, and the application rule $\LolliE$ checks that the function has type $A \lolli B$ and the argument has type $A$. Note that the elimination rule splits the context into two parts, with one set of variables being used by the function and the other by the argument --- this enforces the linear use of variables.

We introduce units with the $\UnitI$ rule --- it requires no resources to create a unit, since it has no free variables in it. However, eliminating a unit in $\UnitE$ does divide the unit, since we may produce a unit-typed value from a term like $f x : 1$, where $f : P \lolli 1$ and $x : P$. Tensor products are introduced with the $\TensorI$ rule, which splits the variables in the context between the two sides of the pair $\pair{e}{e'}$. Likewise, the elimination form $\TensorE$ uses the variables in $\Gamma$ to produce a term of type $A \tensor B$, and then binds the variables $x:A$ and $y:B$ to typecheck the continuation expression $e'$.

We can contrast tensor products $A \tensor B$ with the Cartesian product $A \with B$. The introduction rule $\WithI$ reuses the same context $\Gamma$ when typechecking the two branches of $\paira{e}{e'}$. As a result, there are two projective eliminations $\WithEFst$ and $\WithESnd$, each of which consumes a pair $A \with B$ to produce either an $A$, or a $B$. Intuitively, a Cartesian pair is lazy, and the projections $\fst{e}$ and $\snd{e}$ let the programmer choose which of the two possibilities they want. In contrast, the tensor product $A \tensor B$ is strict, and so you get access to both components (at the price of not allowing the two sides to share variables).

Finally, we also have the coproduct $A \oplus B$. This works much as it does in the intuitionistic case: $\SumIInl$ takes an $A$ and gives an $A \oplus B$, and $\SumIInr$ does the same with a $B$. The case statement takes an expression $e$ of type $A \oplus B$, and then branches to $e'$ or $e''$ depending on whether or not it is the left or right branch. Note that $e$ has a disjoint set of variables $\Gamma$ from the case branches, but that $e'$ and $e''$ share the same set of variables $\Gamma''$ since only one of them will run. $$ \boxed{\judge{\Delta}{\Gamma}{e}{A}} \\ \inferrule[\Var] { } {\judge{\Delta}{x:A}{x}{A}} \\ \begin{array}{cc} \inferrule[\AllI] {\judge{\Delta, \alpha}{\Gamma}{e}{A}} {\judge{\Delta}{\Gamma}{\Fun{\alpha}{e}}{\all{\alpha}{A}}} & \inferrule[\AllE] {\judge{\Delta}{\Gamma}{e}{\all{\alpha}{B}} \\ \judgetp{\Delta}{A}} {\judge{\Delta}{\Gamma}{e\;A}{[A/\alpha]B}} \\[2em] \inferrule[\LolliI] {\judge{\Delta}{\Gamma, x:A}{e}{B}} {\judge{\Delta}{\Gamma}{\fun{x:A}{e}}{A \to B}} & \inferrule[\LolliE] {\judge{\Delta}{\Gamma}{e}{A \to B} \\ \judge{\Delta}{\Gamma'}{e'}{A}} {\judge{\Delta}{\Gamma,\Gamma'}{e\;e'}{B}} \\[2em] \inferrule[\UnitI] { } {\judge{\Delta}{\cdot}{\unit}{1}} & \inferrule[\UnitE] {\judge{\Delta}{\Gamma}{e}{1} \\ \judge{\Delta}{\Gamma'}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letunit{e}{e'}}{C}} \\[2em] \inferrule[\TensorI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma'}{e'}{B} } {\judge{\Delta}{\Gamma,\Gamma'}{\pair{e}{e'}}{A \tensor B}} & \inferrule[\TensorE] {\judge{\Delta}{\Gamma}{e}{A \tensor B} \\\\ \judge{\Delta}{\Gamma', x:A, y:B}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letpair{x}{y}{e}{e'}}{C}} \\[2em] \inferrule[\TopI] { } {\judge{\Delta}{\Gamma}{\unita}{\top}} & \\[1em] \inferrule[\WithI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma}{e'}{B} } {\judge{\Delta}{\Gamma}{\paira{e}{e'}}{A \with B}} & \begin{array}{l} \inferrule[\WithEFst] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\fst{e}}{A}} \\[1em] \inferrule[\WithESnd] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\snd{e}}{B}} \end{array} \\[3em] \begin{array}{l} \inferrule[\SumIInl] {\judge{\Delta}{\Gamma}{e}{A }} {\judge{\Delta}{\Gamma}{\inl{e}}{A \oplus B}} \\[1em] \inferrule[\SumIInr] {\judge{\Delta}{\Gamma}{e}{ B}} {\judge{\Delta}{\Gamma}{\inr{e}}{A \oplus B}} \end{array} & \begin{array}{l} \inferrule[\SumE] {\judge{\Delta}{\Gamma}{e}{A \oplus B} \\\\ \judge{\Delta}{\Gamma', x:A}{e'}{C} \\\\ \judge{\Delta}{\Gamma', y:B}{e''}{C} } {\judge{\Delta}{\Gamma, \Gamma'}{\case{e}{x}{e'}{y}{e''}}{C}} \end{array} \\[3em] & \inferrule[\ZeroE] {\judge{\Delta}{\Gamma}{e}{0}} {\judge{\Delta}{\Gamma}{\abort{e}}{C}} \end{array} $$

Now, we can define a size function $\size{e}$ on lambda-terms. $$ \begin{array}{lcl} \size{x} & = & 0 \\ \size{\all{\alpha}{e}} & = & \size{e} \\ \size{e\;A} & = & 1 + \size{e} \\ \size{\fun{x:A}{e}} & = & \size{e} \\ \size{e\;e'} & = & 1 + \size{e} + \size{e'} \\ \size{\unit} & = & 1 \\ \size{\letv{\unit}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\pair{e}{e'}} & = & 1 + \size{e} + \size{e'} \\ \size{\letv{\pair{x}{y}}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\unita} & = & 0\\ \size{\paira{e}{e'}} & = & \max(\size{e}, \size{e'}) \\ \size{\fst{e}} & = & 1 + \size{e} \\ \size{\snd{e}} & = & 1 + \size{e} \\ \size{\inl{e}} & = & 1 + \size{e} \\ \size{\inr{e}} & = & 1 + \size{e} \\ \size{\case{e}{x}{e'}{y}{e''}} & = & \size{e} + \max(\size{e'}, \size{e''}) \\ \end{array} $$

This function basically just counts the size of the term, adding an extra one for either the introduction or elimination for each connective. (I followed the pattern of adding 1 to the introduction rule for positive connectives, and 1 to the elimination for negative connectives, but really it doesn't seem to matter.)

Now, we can prove a lemma showing how sizes interact with substitution. Namely, linearity means that the size can never get larger when you substitute a term for a variable.

Size-respecting substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $\judge{\Delta}{\Gamma', x:A}{e'}{C}$. Then $\size{[e/x]e'} \leq \size{e} + \size{e'}$.
Proof This follows by a structural induction on the derivation of $e'$. Rather than giving the full proof, I'll just make the observations needed to prove this.

In the $\WithI$ and $\SumE$ cases, we need the following fact: $$ \max(a + x, b + x) = \max(a,b) + x $$ This is because $e$ can be substituted into multiple branches in these two rules.

In other cases, such as $\TensorI$, we have two subterms of $e'$ (so $e' = \pair{e'_1}{e'_2}$). Then typing indicates that $x$ can occur in only one of the branches. As a result, $[e/x]\pair{e'_1}{e'_2}$ will equal either $\pair{[e/x]e'_1}{e'_2}$ or $\pair{e'_1}{[e/x]e'_2}$, and then the size will be less than or equal to $1 + \size{e} + \size{e'_1} + \size{e'_2} = \size{e} + \size{e'}$.

The less than or equal to arises because of the presence of the unit $\top$, which ignores variables in its proof term $\unita$. In its absence substitution would preserve size exactly.

We also need to prove the same thing for type substitution.

Size-respecting type substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta, \alpha}{\Gamma}{e}{C}$ and $\judgetp{\Delta}{A}$. Then $\size{[A/\alpha]e} = \size{e}$.
Proof This is a routine induction, with no surprises, since types never affect the size of a term.
Next, let's write down the beta reductions: $$ \begin{array}{lcl} (\Fun{\alpha}{e})\;A & \mapsto & [A/\alpha]e \\ (\fun{x:A}{e})\;e' & \mapsto & [e'/x]e \\ \letv{\unit}{\unit}{e'} & \mapsto & e' \\ \letv{\pair{x}{y}}{\pair{e_1}{e_2}}{e'} & \mapsto & [e_1/x, e_2/y]e' \\ \fst{\paira{e}{e'}} & \mapsto & e \\ \snd{\paira{e}{e'}} & \mapsto & e' \\ \case{\inl{e}}{x}{e'}{y}{e''} & \mapsto & [e/x]e' \\ \case{\inr{e}}{x}{e'}{y}{e''} & \mapsto & [e/y]e'' \\ \end{array} $$

Now, we can verify that $\beta$-reduction reduces size:

Shrinking. If $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $e\;\mapsto e'$, then $\size{e'} < \size{e}$.
Proof The proof of this is essentially nothing but the observation that every $\beta$-reduction removes an introduction, removes an elimination, and performs a subsitution, and so the size of the result term must get strictly smaller.

As a result, we know that all reduction strategies lead to a normal form. Since $\beta$-reduction makes a term strictly smaller, the process must eventually stop, no matter what order you perform the reductions in. So we (almost) have strong normalization. The missing piece is that we also need the Church-Rosser property, to ensure that all sequences of reductions eventually reach the same normal form. But this post is getting quite long enough, so I won't prove that -- I'll just assert it. :)

I will, however, offer one last reward for those of you who persevered to the end. Let's add unrestricted recursive types to our language: $$ \inferrule[\MuI] {\judge{\Delta}{\Gamma}{e}{[\fix{\alpha}{A}/\alpha]A}} {\judge{\Delta}{\Gamma}{\fold{e}}{\fix{\alpha}{A}}} \qquad \inferrule[\MuE] {\judge{\Delta}{\Gamma}{e}{\fix{\alpha}{A}}} {\judge{\Delta}{\Gamma}{\unfold{e}}{[\fix{\alpha}{A}/\alpha]A}} $$

The $\beta$-rule is the obvious one you might expect: $$ \begin{array}{lcl} \unfold{(\fold{e})} & \mapsto & e \end{array} $$

And let's extend the size function, too: $$ \begin{array}{lcl} \size{\fold{e}} & = & 1 + \size{e} \\ \size{\unfold{e}} & = & \size{e} \end{array} $$

Now, note that all of the lemmas we used continue to hold. So we can add unrestricted recursive types, with no restrictions on negative occurences of type variables, to second-order MALL and the language is still strongly normalizing!

Another way of putting things is that contraction (i.e., using variables more than once) is essential to programming things like the Y combinator. Without it, recursive types are not enough to enable dangerous self-references like Russell's paradox.

Acknowledgements. I'd like to thank Stéphane Gimenez for directing me to Girard's small normalization theorem.

Monday, April 29, 2013

John C. Reynolds, June 1, 1935 - April 28, 2013

Yesterday, John Reynolds passed away. I had the privilege of being one of his graduate students, and much of what I know about what it means to be a scientist is due to his example. I just want to share a few small anecdotes about him.

  1. When I was a new graduate student, I began working on the project which would eventually become my thesis. When I described one of my early approaches to this problem, he asked a series of questions about how I would handle one sticky technical point after another, and I showed him how the language and its semantics were carefully set up so that this issue in question could not arise. After I finished my explanation, he nodded thoughtfully, and told me, "Neel, you've done some very good engineering, but engineering is not science! Good engineers develop a sense of how to avoid obstacles -- but a scientist's job is to face them head-on and flatten them."

    This made a really big impression on me. As academic computer scientists, we have two great luxuries: we can choose what to study, and we can spend as long as necessary studying it. As a result, our ethical obligation is one step removed from solving problems: our role is to produce understanding, so that engineers (indeed, all of society) have new methods to solve new classes of problems.

  2. Once, John told me about some of his work at Argonne National Laboratory, where he designed the COGENT programming language, which he told me was what convinced him that he should leave physics to do computer science. He said that while today he could (indeed, many people could) design much better languages, he would always regard it fondly, since it was his first successful language design.

    I asked him what made it successful, and he told me that there was another scientist at Argonne, who used it to write some simulation code. When John read the program listing, he realized that he could not understand how the program worked, since it relied domain knowledge that he lacked. So this was John's definition of a successful language design: if you have a user who has used it to write a program you couldn't have, your language has succeeded, since it has helped a fellow human being solve one of their own particular problems.

    I've always liked his definition, since it manages to avoid an obsession with nose-counting popularity metrics, while still remembering the essentially social purpose of language design.

  3. Another time, I asked John what he thought about object-oriented programming. He told me he thought it was too soon to tell. I burst out laughing, since I thought he was joking: Simula 67 is considerably older than I am, and I thought that surely that was sufficient time to form an opinion.

    But he insisted that, no, he was not kidding! From a formal, mathematical, perspective, he did not like object-oriented programming very much: he had studied various formalizations of objects in his work on Idealized Algol and on intersection types, and found they introduced some unwanted complexities. However, so many excellent engineers still thought that objects were a good idea, that he was unwilling to dismiss the possibility that there was a beautiful mathematical theory that we haven't discovered yet.

    I thought this was a great demonstration of Keats's negative capability, but what makes it really impressive was that John married it to a profound intellectual rigour: I've never known anyone who could produce counterexamples faster than John could. While he tested every idea ruthlessly, he never closed himself off from the world, giving him a rare ability to make judgements without passing judgement.

I haven't really said anything about John's enormous scientific achievements, because the deepest lesson I learned from him is not any one of his theorems, but rather his fundamentally humane vision of science, as something we do for and with our fellow scientists and community.

Ars longa, vita brevis, but together we make a chain through the ages --- and more links of that chain than most can claim were forged by you, John.

Wednesday, April 10, 2013

Thanks to Olle Fredriksson

Joshua and I thought our new typechecking algorithm for higher-rank polymorphism was easy to implement, but I have to admit that we weren't 100% sure about this, since we've been immersed in this work for months.

However, Olle Fredriksson wrote us within one day of the draft going online with a link to his Haskell implementation. This was immensely reassuring, since (a) he implemented it so quickly, and (b) he didn't need to ask us how to implement it -- this means we didn't leave anything important out in the description!

In fact, Olle is technically the first person to implement the algorithm in the paper, since both Joshua and I have actually implemented variants of this algorithm for different languages. I'll try to write a small ML implementation to go along with Olle's Haskell code in the next few weeks. If you want to beat me to it, though, I have no objections!

Thursday, April 4, 2013

Spring Things

Part of the reason for the silence on this blog is that it's been a very busy spring for me. I've got three new drafts to announce, each of which I played at least some role in:
  • Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism, Joshua Dunfield and Neelakantan R. Krishnaswami.
    Bidirectional typechecking, in which terms either synthesize a type or are checked against a known type, has become popular for its scalability (unlike Damas-Milner type inference, bidirectional typing remains decidable even for very expressive type systems), its error reporting, and its relative ease of implementation. Following design principles from proof theory, bidirectional typing can be applied to many type constructs. The principles underlying a bidirectional approach to polymorphism, however, are less obvious. We give a declarative, bidirectional account of higher-rank polymorphism, grounded in proof theory; this calculus enjoys many properties such as η-reduction and predictability of annotations. We give an algorithm for implementing the declarative system; our algorithm is remarkably simple and well-behaved, despite being both sound and complete.

    In this paper, we show how to extend bidirectional typechecking to support higher-rank polymorphism. Bidirectional typechecking is attractive in part because it is very easy to implement, and happily our extension remains simple -- it turns out the fanciest data structure you need to support higher-rank polymorphism is an ordered list!

  • Mtac: A Monad for Typed Tactic Programming in Coq, Beta Ziliani, Derek Dreyer, Neelakantan R. Krishnaswami, Aleksandar Nanevski, Viktor Vafeiadis.

    Effective support for custom proof automation is essential for large-scale interactive proof development. However, existing languages for automation via tactics either (a) provide no way to specify the behavior of tactics within the base logic of the accompanying theorem prover, or (b) rely on advanced type-theoretic machinery that is not easily integrated into established theorem provers.

    We present Mtac, a lightweight but powerful extension to Coq that supports dependently-typed tactic programming. Mtac tactics have access to all the features of ordinary Coq programming, as well as a new set of typed tactical primitives. We avoid the need to touch the trusted kernel typechecker of Coq by encapsulating uses of these new tactical primitives in a monad, and instrumenting Coq so that it executes monadic tactics during type inference.

    You can find the software at the Mtac web page.

    Since I wasn't the lead author, I see no need to be modest: this is really, really, good. The power-to-weight ratio of this approach to tactics is incredibly high, and I seriously think that every dependently typed language implementor ought to consider adding something like Mtac to their system.

  • Higher-Order Reactive Programming without Spacetime Leaks, Neelakantan R. Krishnaswami.

    Functional reactive programming (FRP) is an elegant approach to declaratively specify reactive systems. However, the powerful abstractions of FRP have historically made it difficult to predict and control the resource usage of programs written in this style.

    In this paper, we give a new language for higher-order reactive programming. This language generalizes and simplifies prior type systems for reactive programming, supporting the use of first-class streams, such as streams of streams; first-class functions and higher-order operations; and permits encoding many temporal operations beyond streams, such as terminatable streams, events, and even resumptions with first-class schedulers. Furthermore, our language supports an efficient implementation strategy permitting us to eagerly deallocate old values and statically rule out spacetime leaks, a notorious source of inefficiency in reactive programs. Furthermore, these memory guarantees are achieved without the use of a complex substructural type discipline.

    We also show that our implementation strategy of eager deallocation is safe, by showing the soundness of our type system with a novel step-indexed Kripke logical relation.

    You can download an almost completely undocumented implementation to play with this kind of FRP language. It's also good if you want to see what a language with linear types looks like. Also, I use the type inference algorithm described in the first draft. This doesn't have dependent types (yet?), so no Mtac, though. :)

Thursday, March 7, 2013

Interview with John Reynolds

I've been ridiculously busy the last few months, so this is just a quick link: John retired recently, so I should probably post something in his honor. When I have time, I'll post a little something about some of John's work (with Peter O'Hearn) on how to use functor-category semantics as a guide to programmming with polymorphism. If you don't want to wait, you can look at their paper From Idealized Algol to Polymorphic Linear Lambda Calculus.

Monday, December 3, 2012

Total Functional Programming in a Partial Impure Language

In my last post, I showed how you could implement the Int construction over resumptions, to derive a higher-order programming language from a first-order one. However, this is not yet a practical tool. The code I gave presented the language as a bunch of combinators, which rapidly get quite painful to write serious programs. Most of us prefer the lambda calculus to SKI combinators for good reason --- programs with variables and binders are much easier to read, since the code that actually does work in not obscured in a rat's nest of plumbing combinators that route data from one combinator to another.

I had hoped to rectify that in this post, by showing how to compile the linear lambda calculus into combinator terms. But when I started to write it, I realized that I needed to write a warmup post first, which would introduce the basic ideas in a simpler setting. So I'll defer compiling linear types for a little while, and show how to embed total functional programming into OCaml.

Wednesday, November 28, 2012

The Geometry of Interaction, as an OCaml program

In this post, I'll show how to turn the Geometry of Interaction construction --- a model of linear logic --- into a runnable Ocaml code. . The basic idea behind this paper is that we can view higher-order behavior, as a disciplined mode of use of a first-order language with recursion --- basically, if you fix a calling convention, then you can compile first-class functions directly and compositionally into a first-order program.

This is interesting because there are a lot of combinator libraries out there which can be seen as DSLs with a natural notion of "first-order program", but which lack a good notion of "higher-order program". (For example, most Haskell libraries based on arrows qualify.) As a programming model, this is very unsatisfying, since this restriction basically says that the author of the library can build interesting abstractions, but clients of the library can't. Wouldn't it be nice if there were a systematic way to turn any first-order language (with a bit of structure) into a higher-order one?

That's exactly what the G construction (also called the Int construction) does. It says that if you have a first-order language with sequential and parallel composition, plus feedback, then you've really got a higher-order language. Stated more formally, in categorical jargon, it says that there is a way of constructing a monoidal closed category from any symmetric monoidal category with a trace operator. (Monoidal products mean parallel composition, categories give you sequential composition, and trace is how category theorists pronounce recursion.)

Instead of giving the categorical derivation, I'll work very concretely, following the same example Abramsky gave in his classic paper Retracing Some Paths in Process Algebra.

First, let's get some Ocaml preliminaries out of the way. I'm just defining function composition and empty and sum types, and their operations. These should really be in the Ocaml Pervasives environment, but they aren't. The Haskell prelude is nicer in this regard.

let ($) f g x = f (g x)

type ('a,'b) sum = Inl of 'a | Inr of 'b
let inl x = Inl x
let inr y = Inr y
let case f g = function Inl x -> f x | Inr y -> g y
let sum f g = case (inl $ f) (inr $ g)
let swap v = case inr inl v 

type zero = Void of zero
let rec abort (Void v) = abort v

Our running example will come from dataflow and visual programming languages like Simulink. These languages have a model of programming in which you have a bunch of stateful components (adders, multipliers, accumulators), which are connected together by means of wires. This is naturally a first-order formalism, in that it really only makes sense to send data over the wires --- you can't send components over the wire. One way of modeling these kinds of stateful components is by means of resumptions.

A resumption from $I$ (read "input") to $O$ (read "output") is essentially an element of the solution to the domain equation $R = I \to (O \times R)$. You can think of a gadget of this type as something which takes an input, and then gives you an output, plus a new resumption to use on the next input. So it's a bit like a state machine, but formulated in such a way that you don't have to give an explicit state set. It's easy to turn this domain equation into a recursive ML type:

module Resumption =
struct 

  type ('i,'o) r = R of ('i -> 'o * ('i,'o) r)

The identity resumption just echoes its input as its output.

  (* val id : ('a, 'a) r *)
  let rec id = R(fun x -> (x, id))

To compose two resumptions $f$ and $g$, we just feed an input into $f$, and take $f$'s output as the input into $g$.

  (* val ( >> ) : ('a, 'b) r -> ('b, 'c) r -> ('a, 'c) r *)
  let rec (>>) (R f) (R g) = R(fun x ->
    let (y, f') = f x in
    let (z, g') = g y in
    (z, f' >> g'))

Note that with composition, we have a category $R$, the category of resumptions. The objects are types, a morphisms $f : I \to O$ is a resumption taking $I$ inputs and producing $O$ outputs.

This category is also "monoidal closed", with the tensor product $A \otimes C$ getting defined as the sum type $A + C$. The intuition is that if we think of $A$ and $C$ as message types, we may want to consider how to react to $A$ and $C$ messages simultaneously. In particular, if we have two machines $f : A \to B$ and $g : C \to D$, we can combine them by taking an $A+C$ message and dispatching to either $f$ or $g$ respectively.

  (* val ( ** ) : ('a, 'b) r -> ('c, 'd) r -> (('a, 'c) sum, ('b, 'd) sum) r *)
  let rec ( ** ) (R f) (R g) = R(function
    | Inl x -> let (y, f') = f x in (inl y, f' ** (R g))
    | Inr x -> let (y, g') = g x in (inr y, (R f) ** g'))

We can also implement feedback --- if we have an $f : A \otimes B \to C \otimes B$, then we can construct a map $A \to C$ by feeding $f$ the $A$, and repeatedly sending the output back into the input until $f$ coughs up a $C$ rather than a $B$:

  (* val trace : (('a, 'b) sum, ('c, 'b) sum) r -> ('a, 'c) r *)
  let rec trace f = R(fun a ->
    let rec loop (R f) v =
      match f v with
      | (Inl c, f') -> (c, trace f')
      | (Inr b, f') -> loop f' (inr b)
    in
    loop f (inl a))

Now, we can also implement various associativity and commutativity properties. The names are not particularly great (they are just the standard names for the natural transformations associated with a tensor product in category theory), but all the real action is in the types.

  (* val sym : (('a, 'b) sum, ('b, 'a) sum) r *)
  let rec sym = R(fun v -> (swap v, sym))
  
  (* val rho : (('a, zero) sum, 'a) r *)
  let rec rho : (('a,zero) sum, 'a) r = R(function
    | Inl a -> (a, rho)
    | Inr z -> abort z)

  (* val rho' : ('a, ('a, zero) sum) r *)
  let rec rho' : ('a, ('a, zero) sum) r = R(fun a -> (inl a, rho'))

  (* val lambda : ((zero, 'a) sum, 'a) r *)
  let rec lambda : ((zero,'a) sum, 'a) r = R(function
    | Inl z -> abort z
    | Inr a -> (a, lambda))

  (* val lambda' : ('a, (zero, 'a) sum) r *)
  let rec lambda' : ('a, (zero, 'a) sum) r = R(fun a -> (inr a, lambda'))

  (* val alpha : ((('a, 'b) sum, 'c) sum, ('a, ('b, 'c) sum) sum) r *)
  let rec alpha : ((('a,'b) sum, 'c) sum, ('a, ('b,'c) sum) sum) r = R(function
    | Inl (Inl a) -> (inl a, alpha)
    | Inl (Inr b) -> (inr (inl b), alpha)
    | Inr c       -> (inr (inr c), alpha))
 
  (* val alpha' : (('a, ('b, 'c) sum) sum, (('a, 'b) sum, 'c) sum) r *)
  let rec alpha' : (('a, ('b,'c) sum) sum, (('a,'b) sum, 'c) sum) r = R(function
    | Inl a       -> (inl (inl a), alpha')
    | Inr (Inl b) -> (inl (inr b), alpha')
    | Inr (Inr c) -> (inr c, alpha'))
end 

Now, we give the G construction in terms of the resumption module we've just defined. In the G construction, we build a new category $G(R)$ from the old one we've just defined. The basic idea here is to talk about bidirectional communication, and so for objects in our new category $G$ will be pairs of objects from our old one.

Objects of $G$ will be pairs $(A^+, A^-)$, where you can think of $A^+$ as a type of messages sent out, and the type $A^-$ as the type of messages that are accepted.

A morphisms of $G$, $f : (A^+,A^-) \to (B^+, B^-)$ will be resumption of type $A^+ \otimes B^- \to A^- \otimes B^+$. Note that the polarities of $A$ and $B$ are flipped in this mapping -- we are transforming $A$ messages out to $B$ messages out, and we need to accept $B$ messages in order to transform them into $A$ messages in.

module G =
struct
  open Resumption

Now, let's define the type of morphisms in $G(R)$. Since we don't have type-level pairing in ML, we end up having four type parameters. We use a dummy G constructor to get the type system to produce more useful inferred types.

  type ('aplus, 'aminus, 'bplus, 'bminus) map =
      G of (('aplus, 'bminus) sum, ('aminus, 'bplus) sum) r

The identity map in $G(R)$ is just the symmetry map in $R$.

  let gid = G sym 
Composition is trickier. If we have a map $f : (A^+,A^-) \to (B^+, B^-)$ and $g : (B^+,B^-) \to (C^+,C^-)$, we can compose it by feeding $f$'s $B$-output into $g$'s input, and vice-versa, using the trace operator. It's best visualized with the following diagram: Layer 1 f g A A B B B B C C + + + + - - - -

Actually doing so requires applying the associativity and commutativity properties of sum types $A + B \simeq B + A$ and $(A + B) + C \simeq A + (B + C)$.

  let rec assoc : ((('aplus,'cminus) sum, ('bminus,'bplus) sum) sum,
                  (('aplus, 'bminus) sum, ('bplus, 'cminus) sum) sum) r =
    R(function
      | Inl (Inl aplus)  -> (inl (inl aplus), assoc)
      | Inl (Inr cminus) -> (inr (inr cminus), assoc)
      | Inr (Inl bminus) -> (inl (inr bminus), assoc)
      | Inr (Inr bplus)  -> (inr (inl bplus), assoc))

  let rec assoc2 : ((('aminus, 'bplus) sum, ('bminus, 'cplus) sum) sum,
                   (('aminus, 'cplus) sum, ('bminus, 'bplus) sum) sum) r  =
    R(function
      | Inl (Inl aminus) -> (inl (inl aminus), assoc2)
      | Inr (Inr cplus)  -> (inl (inr cplus), assoc2)
      | Inl (Inr bplus)  -> (inr (inr bplus), assoc2)
      | Inr (Inl bminus) -> (inr (inl bminus), assoc2))

  (* val gcompose :
      ('a, 'b, 'c, 'd) map -> ('c, 'd, 'e, 'f) map -> ('a, 'b, 'e, 'f) map *)
  let gcompose (G f) (G g) = G (trace (assoc >> (f ** g) >> assoc2))

Once we have composition, we can define a tensor product on the $G(R)$ category as well, so that $(A^+,A^-) \oplus (B^+,B^-) = (A^+ + B^+, A^- + B^-)$. Again, implementing the functorial action requires implementing the associativity maps.

  (* (A+, A-) * (X+, X-) = (A+ + X+, A- + A-) 
     f : A+ + B- -> A- + B+
     g : X+ + Y- -> X- + Y+
     f ** g : (A+ + B-) + (X+ + Y-) -> (A- + B+) + (X- + Y+)
     r  : (A+ + X+) + (B- + Y-) -> (A+ + B-) + (X+ + Y-)
     r' : (A- + B+) + (X- + Y+) -> (A- + X-) + (B+ + Y+)

  *)

  (* val ( + ) :
      ('a, 'b, 'c, 'd) map ->
      ('e, 'f, 'g, 'h) map ->
      (('a, 'e) sum, ('b, 'f) sum, ('c, 'g) sum, ('d, 'h) sum) map *)
  let (+) (G f) (G g) =
    let rec r = R(function
      | Inl (Inl aplus)  -> (inl (inl aplus), r)
      | Inl (Inr xplus)  -> (inr (inl xplus), r)
      | Inr (Inl bminus) -> (inl (inr bminus), r)
      | Inr (Inr yminus) -> (inr (inr yminus), r))
    in
    let rec r' = R(function
      | Inl (Inl aminus) -> (inl (inl aminus), r')
      | Inl (Inr bplus)  -> (inr (inl bplus),  r')
      | Inr (Inl xminus) -> (inl (inr xminus), r')
      | Inr (Inr yplus)  -> (inr (inr yplus),  r'))
    in
    G (r >> f ** g >> r')

One cool thing about this category (which leads to the coolest fact about it) is that it naturally supports a dualization, which sends $(A^+,A^-)^\ast \triangleq (A^-,A^+)$. This also has an action on morphisms, which flips around the direction, so that from $f : (A^+,A^-) \to (B^+,B^-)$ we can get $f^\ast : (B^+,B^-)^\ast \to (A^+,A^-)^\ast$.

  (* val dualize : ('a, 'b, 'c, 'd) map -> ('d, 'c, 'b, 'a) map *)
  let dualize (G (R f)) =
    let rec dual f = R(fun v ->
      let (v', f') = f (swap v) in
      (swap v', dual f))
    in
    G (dual f)

Now, here's what I consider the neatest fact about the Int construction -- it turns a traced monoidal category into a monoidal closed category. So we can define the linear exponential $(A^+,A^-) \multimap (B^+,B^-) \triangleq (A^+,A^-)^\ast \otimes (B^+,B^-)$.

 (* Here's a bit of unrolling type definitions to make things easier to grok. 

    (A+, A-) -o (B+, B-) = (A+, A-)* * (B+, B-)
                         = (A-, A+) * (B+, B-)
                         = (A- + B+, A+ + B-)
    
    Hom((A+,A-) * (B+,B-), (C+, C-)) 
        = Hom((A+ + B+, A- + B-), (C+, C-))
        = ((A+, B+) sum, (A- + B-) sum, C+, C-) map 
        = (((A+, B+) sum, C+) sum, ((A- + B-) sum, C-) sum) r

    Hom((A+,A-), (B+,B-) -o (C+,C-)) 
        = Hom((A+,A-), (B- + C+, B+ + C-))
        = (A+, A-, (B-, C+) sum, (B+, C-) sum) map 
        = ((A+, (B+, C-) sum) sum, (A-, (B-, C+) sum) sum) r 
 *)

  (* val curry :
      (('a, 'b) sum, ('c, 'd) sum, 'e, 'f) map ->
      ('a, 'c, ('d, 'e) sum, ('b, 'f) sum) map *)
  let curry (G f) =
    let rec curry (R f) = R(fun v ->
      let (v', f') = f (case (inl $ inl) (case (inl $ inr) inr) v) in
      let v'' = case (case inl (inr $ inl)) (inr $ inr) v' in
      (v'', curry f'))
    in
    G(curry f)

  (* val uncurry :
      ('a, 'b, ('c, 'd) sum, ('e, 'f) sum) map ->
      (('a, 'e) sum, ('b, 'c) sum, 'd, 'f) map *)
  let uncurry (G f) =
    let rec uncurry (R f) = R(fun v ->
      let (v', f') = f (case (case inl (inr $ inl)) (inr $ inr) v) in
      let v'' = case (inl $ inl) (case (inl $ inr) inr) v' in
      (v'', uncurry f'))
    in
    G (uncurry f)
end

The ML types for curry and uncurry somewhat obscure the fact that they implement the isomorphism $$\mathrm{Hom}((A^+,A^-) \otimes (B^+,B^-), (C^+,C^-)) \simeq \mathrm{Hom}((A^+,A^-), (B^+,B^-) \multimap (C^+,C^-))$$

However, now that we have this isomorphism, it's possible to compile arbitrary lambda-terms in multiplicative linear logic down into compositions of resumptions. But that's a story for another post!

In the meantime, if this intrigues you, you can play with the Verity compiler by Dan Ghica and his students, as well as the IntML compiler by Ulrich Schöpp and Ugo Dal Lago, both of which are compilers using the GoI interpretation. Verity is aimed at distributed computing, and IntML is a programming language whose type system allows you to write only (and all) the LOGSPACE programs. As you can see, there are a lot of potential applications...