Friday, December 20, 2013

PhD Opportunities at the University of Birmingham

My university, the University of Birmingham, is looking for applicants to the CS PhD program. I'm putting our advertisement on my blog, in case you (or your students, if you're a professor) are looking for a graduate program -- well, we're looking for students!

Let me just say that my colleagues are a terrific bunch, who know and do work on all kinds of fantastic things. I am very happy to able to work with them, and can confirm that this is a very exciting group to be a part of.


We invite applications for PhD study at the University of Birmingham.

We are a group of (mostly) theoretical computer scientists who explore fundamental concepts in computation and programming language semantics. This often involves profound and surprising connections between different areas of computer science and mathematics. From category theory to lambda-calculus and computational effects, from topology to constructive mathematics, from game semantics to program compilation, this is a diverse field of research that continues to provide new insight and underlying structure.

  • See our webpage, with links to individual researchers, here:

    http://www.cs.bham.ac.uk/research/groupings/theory/

  • Information about PhD applications may be found here:

    http://www.cs.bham.ac.uk/admissions/postgraduate-research/

  • If you are considering applying, please contact any of us. We will be very happy to discuss the opportunities available.
    • Martin Escardo (topology, computation with infinite objects, constructive mathematics, intuitionistic type theory)
    • Dan Ghica (game semantics, heterogeneous computing, model checking)
    • Achim Jung (mathematical structures in the foundations of computing: logic, topology, order)
    • Neel Krishnaswami (type theory, verification, substructural logic, interactive computation)
    • Paul Levy (denotational semantics, lambda-calculus with effects, nondeterminism, category theory, game semantics)
    • Uday Reddy (semantics of state, separation logic)
    • Eike Ritter (security protocol verification)
    • Hayo Thielecke (abstract machines, concurrent and functional programming, software security)
    • Steve Vickers (constructive mathematics and topology, category theory and toposes)

Thursday, November 7, 2013

Antimirov Derivatives for Regular Expressions

Brzozowski derivatives are one of the shibboleths of functional programming: if you ask someone about implementing regular expressions, and you get back an answer involving derivatives of regular expressions, then you have almost surely identified a functional programmer.

The reason that functional programmers like derivatives so much is that they offer an elegantly algebraic and inductive approach to string matching. The derivative function $\delta_c$ takes a character $c$ and a regular expression $r$, and returns a new regular expression $r'$, which accepts a string $s$ if and only if $r$ accepts $c\cdot s$. This function can be defined as follows: $$ \begin{array}{lcl} \delta_c(c) & = & \epsilon \\ \delta_c(c') & = & \bot \\ \delta_c(\epsilon) & = & \bot \\ \delta_c(r_1 \cdot r_2) & = & \left\{ \begin{array}{ll} \delta_c(r_1) \cdot r_2 \vee \delta_c(r_2) & \mbox{if } r_1 \mbox{ nullable} \\ \delta_c(r_1) \cdot r_2 & \mbox{otherwise} \end{array} \right. \\ \delta_c(\bot) & = & \bot \\ \delta_c(r_1 \vee r_2) & = & \delta_c(r_1) \vee \delta_c(r_2) \\ \delta_c(r\ast) & = & \delta_c(r)\cdot r\ast \end{array} $$

Now we can turn the 1-character derivative into a word derivative as follows: $$ \begin{array}{lcl} \delta_\epsilon(r) & = & r \\ \delta_{c\cdot w}(r) & = & \delta_w(\delta_c(r)) \end{array} $$

Again, this is a simple inductive definition. So to match a string we just call the 1-character derivative for each character, and then check to see if the final regular expression is nullable. However, there's a catch! As you can see from the definition of $\delta_c$, there is no guarantee that the size of the derivative is bounded. So matching a long string can potentially lead to the construction of very large derivatives. This is especially unfortunate, since it is possible (using DFAs) to match strings using regular expressions in constant space.

In his 1962 paper, Brzozowski showed that the number of derivatives of a regular expression is finite, if you quotient by the associativity, commutativity, and idempotence axioms for for union $r_1 \vee r_2$. That is, suppose we consider regular expressions modulo an equivalence relation $\equiv$ such that: $$ \begin{array}{lcl} r \vee r & \equiv & r \\ r \vee \bot & \equiv & r \\ r \vee r' & \equiv & r' \vee r \\ r_1 \vee (r_2 \vee r_3) & \equiv & (r_1 \vee r_2) \vee r_3 \\ \end{array} $$

Then his theorem guarantees that the set of derivatives is finite. However, computing with derivatives up to equivalence is rather painful. Even computing equality and ordering is tricky, since union can occur anywhere in a regular expression, and without that it's difficult to implement higher-level data structures such as sets of regular expressions.

So for the most part, derivatives have remained a minor piece of functional programming folklore: they are cute, but a good implementation of derivatives is not really simple enough to displace the usual Dragon Book automata constructions.

However, in 1995 Valentin Antimirov introduced the notion of a partial derivative of a regular expression. The idea is that if you have a regular expression $r$ and a character $c$, then a partial derivative is a regular expression $r'$ such that if $r'$ accepts a word $s$, then $r$ accepts $c \cdot s$. Unlike a derivative, this is only a if-then relationship, and not an if-and-only-if relationship.

Then, rather than taking regular expressions modulo the ACUI equations, we can construct sets of partial derivatives, which collectively accept the same strings as the Brzozowski derivative. The idea is that we can use the algebraic properties of sets to model the effect of the ACUI equations. Below, I give the partial derivative function: $$ \begin{array}{lcl} \alpha_c(c) & = & \setof{\epsilon} \\ \alpha_c(c') & = & \emptyset \\ \alpha_c(\epsilon) & = & \emptyset \\ \alpha_c(r_1 \cdot r_2) & = & \left\{ \begin{array}{ll} \comprehend{r \cdot r_2}{ r \in \alpha_c(r_1)} \cup \alpha_c(r_2) & \mbox{if } r_1 \mbox{ nullable} \\ \comprehend{r \cdot r_2}{ r \in \alpha_c(r_1)} & \mbox{otherwise} \end{array} \right. \\ \alpha_c(\bot) & = & \emptyset \\ \alpha_c(r_1 \vee r_2) & = & \alpha_c(r_1) \cup \alpha_c(r_2) \\ \alpha_c(r\ast) & = & \comprehend{ r' \cdot r\ast }{ r' \in \alpha_c(r) } \end{array} $$

Partial derivatives can be lifted to words just as ordinary derivatives can be, and it is relatively easy to prove that the set of partial word derivatives of a regular expression is finite. We can show this even without taking a quotient, and so partial derivatives lend themselves even more neatly to an efficient implementation than Brzozowski derivatives do.

I'll illustrate this point by using Antimirov derivatives to construct a DFA-based regular expression matcher in ~50 lines of code. You can find the Ocaml source code here.

First, let's define a datatype for regular expressions.

type re = C of char | Nil | Seq of re * re 
        | Bot | Alt of re * re | Star of re

So C is the constructor for single-character strings, Nil and Seq correspond to $\epsilon$ and $r\cdot r'$, and Bot and Alt correspond to $\bot$ and $r \vee r'$.

Next, we'll define the nullability function, which returns true if the regular expression accepts the empty string, and false if it doesn't. It's the obvious recursive definition.

let rec null = function
  | C _ | Bot -> false
  | Nil | Star _ -> true
  | Alt(r1, r2) -> null r1 || null r2
  | Seq(r1, r2) -> null r1 && null r2
Now come some scaffolding code. The R module is the type of finite sets of regular expressions, the M module are finite maps keyed by sets of regexps, and I is the type of finite sets of ints. The rmap function maps a function over a set of regexps, building a new regexp. (I don't know why this is not in the standard Set signature.)
module R = Set.Make(struct type t = re let compare = compare end)
let rmap f rs = R.fold (fun r -> R.add (f r)) rs R.empty

module M = Map.Make(R)
module I = Set.Make(struct type t = int let compare = compare end)
The aderiv function implements the Antimirov derivative function $\alpha_c$ described above. It's basically just a direct transcription of the mathematical definition into Ocaml. The deriv function applies the derivative to a whole set of regular expressions and takes the union.
let rec aderiv c = function
  | C c' when c = c' -> R.singleton Nil 
  | C _ | Nil | Bot -> R.empty
  | Alt(r, r') -> R.union (aderiv c r) (aderiv c r')
  | Seq(r1, r2) -> R.union (rmap (fun r1' -> Seq(r1', r2)) (aderiv c r1))
                           (if null r1 then aderiv c r2 else R.empty)
  | Star r -> rmap (fun r' -> Seq(r', Star r)) (aderiv c r)

let deriv c rs = R.fold (fun r acc -> R.union (aderiv c r) acc) rs R.empty

Since the set of partial derivatives is finite, this means that the powerset of this set is also finite, and so by treating sets of partial derivatives as states, we can construct a deterministic finite-state automaton for matching regular expressions. Let's define an Ocaml type for DFAs:

type dfa = {size : int; fail : int; 
            trans : (int * char * int) list; final : int list}

Here, size is the number of states, and we'll use integers in the range [0,size) to label the states. We'll use fail to label the sink state for non-matching strings, and take trans to be the list of transitions. The final field is a list of accepting states for the DFA.

Now, we'll need a little more scaffolding. The enum function is a "functional for-loop" looping from i to max. We use this to write charfold, which lets us fold over all of the ASCII characters.

let rec enum f v i max = if i < max then enum f (f i v) (i+1) max else v
let charfold f init  = enum (fun i -> f (Char.chr i)) init 0 256

We use this to define the dfa function, which constructs a DFA from a regular expression:

let find rs (n,m) = try M.find rs m, (n,m) with _ -> n, (n+1, M.add rs n m)

let dfa r =
  let rec loop s v t f rs =
    let (x, s) = find rs s in
    if I.mem x v then (s, v, t, f)
    else charfold (fun c (s, v, t, f) ->
                     let rs' = deriv c rs in
                     let (y, s) = find rs' s in
                     loop s v ((x,c,y) :: t) f rs')
           (s, I.add x v, t, if R.exists null rs then x :: f else f) in
  let (s, v, t, f) = loop (0, M.empty) I.empty [] [] (R.singleton r) in
  let (fail, (n, m)) = find R.empty s in 
  { size = n; fail = fail; trans = t; final = f }

The find function takes a set of regular expression and returns a numeric index for it. To do this, it uses a state (n, m), where n is a counter for a gensym, and m is the map we use to map sets of regular expressions to their indices.

The main work happens in the loop function. The s parameter is the state parameter for find, and v is the visited set storing the set of states we have previously visited. The t parameter is the list of transitions built to date, and f are the final states generated so far.

The rs parameter is the current set of regular expressions. We first look up its index x, and if we have visited it, we return. Otherwise, we add the current state to the visited set (and to the final set if any of its elements are nullable), and iterate over the ASCII characters. For each character c, we can take the derivative of rs, and find its index y. Then, we can add the transition (x, c, y) to t, and loop on the derivative. Essentially, this does a depth-first search of the spanning tree.

We then kick things off with empty states and the singleton set of the argument regexp r, and build a DFA from the return value. Note that the failure state is simply the index corresponding to the empty set of partial derivatives. The initial state will always be 0, since the first state we find will be the singleton set r.

Since we have labelled states by integers from 0 to size, we can easily build a table-based matcher from our dfa type.

type table = { m : int array array; accept : bool array; error : int }

let table d = 
  { error = d.fail;
    accept = Array.init d.size (fun i -> List.mem i d.final);
    m = (let a = Array.init d.size (fun _ -> Array.make 256 0) in
         List.iter (fun (x, c, y) -> a.(x).(Char.code c) <- y) d.trans; a) }

Here, the table type has a field m for the transition table, an array of booleans indicating whether the state accepts or not, and the error state. We build the table in the obvious way in the table function, by building an array of array of integers and initializing it using the list of transitions.

Then, we can give the matching function. The matches' function takes a table t, a string s, an index i, and a current state x. It will just trundle along updating its state, as long as we have not yet reached the end of the string (or hit the error state), and then it will report whether it ends in an accepting state or not. The re_match function just calls matches' at index 0, in state 0.

let rec matches' t s i x =
  if i < String.length s && x != t.error 
  then matches' t s (i+1) t.m.(x).(Char.code s.[i])
  else t.accept.(x)

let re_match t s = matches' t s 0 0

And that's it! Obviously more work is needed to make this a real regexp implementation, but the heart of the story is here. It's also possible to improve this implementation a bit further by adding a DFA minimization step, but that's a tale for another day. (DFA minimization is a very pretty little coinductive algorithm, and so makes a nice companion to the inductive story here.)

Tuesday, July 23, 2013

What Declarative Languages Are

On his blog, Bob Harper asks what, if anything, a declarative language is. He notes that "declarative" is often used to mean "logic or functional programming", and is (justly) skeptical that this is a useful pairing.

However, there's actually a surprisingly simple and useful definition of declarative language: a declarative language is any language with a semantics has some nontrivial existential quantifiers in it. To illustrate this definition, let's begin by looking at some examples of declarative languages:

  • Regular expressions
  • Context-free grammars
  • Database query languages based on relational algebra (eg, SQL)
  • Logic programming languages (eg, Prolog)
  • Constraint-based languages for layout (eg, CSS)

The common factor is that all of these languages have a semantics which relies on some existential quantifiers which it is not immediately obvious how to discharge. For example, the semantics of regular expressions can be explained in terms of string membership, by giving a judgement $w \in r$ meaning that the string $w$ is an inhabitant of the regular expression $r$. $$ \begin{array}{c} \frac{\displaystyle } {\displaystyle \cdot \in \epsilon} \\[1em] \frac{\displaystyle } {\displaystyle c \in c} \\[1em] \frac{\displaystyle w \in r_i \qquad i \in \{1,2\} } {\displaystyle w \in r_1 \vee r_2} \\[1em] \mbox{(no rule for $w \in \bot$)} \\[1em] \frac{\displaystyle \exists w_1, w_2.\; w = w_1 \cdot w_2 \qquad w_1 \in r_1 \qquad w_2 \in r_2 } {\displaystyle w \in r_1 \cdot r_2} \\[1em] \frac{\displaystyle } {\displaystyle \cdot \in r*} \\[1em] \frac{\displaystyle \exists w_1, w_2.\; w = w_1 \cdot w_2 \qquad w_1 \in r \qquad w_2 \in r* } {\displaystyle w \in r*} \end{array} $$

In particular, note the appearance of an existential quantifier in the premises of the sequential composition and Kleene star cases, and note the nondeterministic choice of a branch in the alternation case. So read as a logic program, this semantics is not well-moded.

Of course, to implement a regular expression matcher, you need an operational semantics which is well-moded and functional, which we get by constructing a finite state machine from the regular expression. But now we have a proof obligation to show that the operational semantics agrees with the declarative semantics.

We can make a similar distinction for each of the examples above. Context-free languages also have a declarative membership relation, but are recognized with parsing algorithms such as CYK and Earley parsing. Query languages have a declarative semantics in terms of the relational algebra, where relational composition is hiding an existential, but are implemented in terms of nested loops and indices. Logic programming has a declarative semantics in terms of the model theory of first-order logic, and an implementation in terms of backtracking and unification. Constraint languages have a declarative semantics in terms of simultaneous satisfaction of a collection of equations, with the existentials lurking in the values assigned to the free variables, but are implemented in terms of simplification and propagation through imperative graphs.

This also lets us make the prediction that the least-loved features of any declarative language will be the ones that expose the operational model, and break the declarative semantics. So we can predict that people will dislike (a) backreferences in regular expressions, (b) ordered choice in grammars, (c) row IDs in query languages, (d) cut in Prolog, (e) constraint priorities in constraint languages. And lo and behold, these are indeed the features which programmers are encouraged to avoid as much as possible!

This definition also lets say that functional programming is not a declarative language -- the reduction relation of the lambda calculus is well-moded, in that we do not need to guess any part of a term to figure out how to reduce it. (And if you fix an evaluation order, the reduction relation is even deterministic.) The same is true for imperative languages like ML or Haskell: now we're just additionally threading a store through the reduction relation.

Wednesday, June 19, 2013

Internalizing Parametricity at CSL 2013

My paper with Derek Dreyer, Internalizing Relational Parametricity in the Extensional Calculus of Constructions, has been accepted to CSL 2013!

I've never been to Torino before, nor published at CSL. I especially like the fact that the CSL proceedings are Creative Commons-licensed.

Monday, June 10, 2013

Papers at ICFP 2013

All three of the papers my coauthors and I submitted this year were accepted! I'll add links, once we've revised the papers to take into account the comments and suggestions the reviewers made.

Tuesday, May 7, 2013

Strong Normalization without Logical Relations

Proof theorists often observe that linear logic is like regular logic, only everything works better.

In this post, I'll give a very striking instance of this maxim, by taking second-order multiplicative-additive linear logic (MALL) and showing how to prove strong normalization for it, without using a logical relation. That is, there is a purely inductive argument that a powerful second-order logic normalizes! Girard originally proved this for classical linear logic, using proof nets (the "small normalization theorem", Corollary 4.22, page 71 of the original Linear Logic paper), but it turns out that a variant works for lambda-terms and intuitionistic linear logic as well. $$ \newcommand{\lolli}{\multimap} \newcommand{\tensor}{\otimes} \newcommand{\with}{\&} \newcommand{\all}[2]{\forall {#1}.\;{#2}} \newcommand{\fix}[2]{\mu {#1}.\;{#2}} \newcommand{\letv}[3]{\mathsf{let}\,{#1} = {#2}\;\mathsf{in}\;{#3}} \newcommand{\Fun}[2]{\Lambda {#1}.\;{#2}} \newcommand{\fun}[2]{\lambda {#1}.\;{#2}} \newcommand{\unit}{\left(\right)} \newcommand{\letunit}[2]{\letv{\unit}{#1}{#2}} \newcommand{\pair}[2]{\left({#1},{#2}\right)} \newcommand{\letpair}[4]{\letv{\pair{#1}{#2}}{#3}{#4}} \newcommand{\unita}{\left[\right]} \newcommand{\paira}[2]{\left[{#1}, {#2}\right]} \newcommand{\fst}[1]{\mathsf{fst}\,{#1}} \newcommand{\snd}[1]{\mathsf{snd}\,{#1}} \newcommand{\inl}[1]{\mathsf{inl}\,{#1}} \newcommand{\inr}[1]{\mathsf{inr}\,{#1}} \newcommand{\case}[5]{\mathsf{case}({#1}, \inl{#2} \to {#3}, \inr{#4} \to {#5})} \newcommand{\abort}[1]{\mathsf{abort}\,{#1}} \newcommand{\fold}[1]{\mathsf{in}\,{#1}} \newcommand{\unfold}[1]{\mathsf{out}\,{#1}} \newcommand{\judge}[4]{{#1};{#2} \vdash {#3} : {#4}} \newcommand{\judgetp}[2]{{#1} \vdash {#2} : \mathsf{type}} \newcommand{\judgectx}[2]{{#1} \vdash {#2} : \mathsf{ctx}} %% Rule names \newcommand{\rulename}[3]{{#2}{\mathrm{#3}}_{\mathrm{#1}}} \newcommand{\intro}[2][]{\rulename{#1}{#2}{I}} \newcommand{\elim}[2][]{\rulename{#1}{#2}{E}} \newcommand{\Var}{\rulename{}{}{Var}} \newcommand{\AllI}{\intro{\forall}} \newcommand{\AllE}{\elim{\forall}} \newcommand{\LolliI}{\intro{\lolli}} \newcommand{\LolliE}{\elim{\lolli}} \newcommand{\UnitI}{\intro{1}} \newcommand{\UnitE}{\elim{1}} \newcommand{\TensorI}{\intro{\tensor}} \newcommand{\TensorE}{\elim{\tensor}} \newcommand{\TopI}{\intro{\top}} \newcommand{\TopE}{\elim{\top}} \newcommand{\WithI}{\intro{\with}} \newcommand{\WithEFst}{\elim[fst]{\with}} \newcommand{\WithESnd}{\elim[snd]{\with}} \newcommand{\ZeroI}{\intro{0}} \newcommand{\ZeroE}{\elim{0}} \newcommand{\SumIInl}{\intro[inl]{\oplus}} \newcommand{\SumIInr}{\intro[inr]{\oplus}} \newcommand{\SumE}{\elim{\oplus}} \newcommand{\MuI}{\intro{\mu}} \newcommand{\MuE}{\elim{\mu}} \newcommand{\size}[1]{\left|#1\right|} \newcommand{\inferrule}[3][]{\frac{#2}{#3}\;{#1}} $$

First, let's recall the type system for second order MALL. $$ \begin{array}{llcl} \mbox{Types} & A & ::= & \all{\alpha}{A} \bnfalt \alpha \bnfalt A \lolli B \bnfalt 1 \bnfalt A \tensor B \bnfalt \top \bnfalt A \with B \bnfalt 0 \bnfalt A \oplus B \\[1em] \mbox{Type Contexts} & \Delta & ::= & \cdot \bnfalt \Delta, \alpha \\[1em] \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x:A \\[1em] \mbox{Terms} & e & ::= & x \bnfalt \Fun{\alpha}{e} \bnfalt e\;A \bnfalt \fun{x:A}{e} \bnfalt e\;e \\ & & | & \unit \bnfalt \letunit{e}{e'} \bnfalt \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'}\\ & & | & \unita \bnfalt \paira{e}{e'} \bnfalt \fst{e} \bnfalt \snd{e} \\ & & | & \inl{e} \bnfalt \inr{e} \bnfalt \case{e}{x}{e'}{x}{e''} \bnfalt \abort{e} \end{array} $$

The simple types are linear functions $A \lolli B$, tensor products $A \tensor B$, Cartesian products $A \with B$, and coproducts $A \oplus B$. We also have quantified types $\all{\alpha}{A}$ and variable references $\alpha$. We have two kinds of contexts. $\Delta$ is the context of type variables, and $\Gamma$ gives types to term variables. Below, we give the judgement form for the well-formedness of types, $\judgetp{\Delta}{A}$, which says that $A$'s free variables all lie in $\Delta$. $$ \boxed{\judgetp{\Delta}{A}} \\ \inferrule[] {\alpha \in \Delta} {\judgetp{\Delta}{\alpha}} \\ \begin{array}{ll} \inferrule[] {\judgetp{\Delta, \alpha}{A}} {\judgetp{\Delta}{\all{\alpha}{A}}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \lolli B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{1}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \tensor B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{\top}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \with B}} \\[2em] \inferrule[] { } {\judgetp{\Delta}{0}} & \inferrule[] {\judgetp{\Delta}{A} \\ \judgetp{\Delta}{B}} {\judgetp{\Delta}{A \oplus B}} \end{array} $$

Once we know how to assert that types are well-formed, we can lift this to a well-formedness judgement for contexts $\judgectx{\Delta}{\Gamma}$. $$ \boxed{\judgectx{\Delta}{\Gamma}} \\ \inferrule[] { } {\judgectx{\Delta}{\cdot}} \qquad \inferrule[] {\judgectx{\Delta}{\Gamma} \\ \judgetp{\Delta}{A}} {\judgectx{\Delta}{\Gamma, x:A}} $$

Now, we can give the typing rules for the terms of second-order MALL. The typing judgement $\judge{\Delta}{\Gamma}{e}{A}$ says that assuming that the free type variables are in $\Delta$, andall the term variables have types given by $\Gamma$, then the term $e$ has the type $A$.

Variable references $x$ are just looked up in the environment by the rule $\Var$. Note that the context contains only a single binding for the variable $x$. This is demanded by linearity, since if there were other variables in the context, the expression $x$ could not use them!

Type abstractions are introduced with the $\AllI$ rule, and simply typecheck the body of $\Fun{\alpha}{e}$ in a context extended by $\alpha$. A type application checked by $\AllE$ simply substitutes a type for a quantified variable. The linear functions have the usual introduction rule in $\LolliI$, and the application rule $\LolliE$ checks that the function has type $A \lolli B$ and the argument has type $A$. Note that the elimination rule splits the context into two parts, with one set of variables being used by the function and the other by the argument --- this enforces the linear use of variables.

We introduce units with the $\UnitI$ rule --- it requires no resources to create a unit, since it has no free variables in it. However, eliminating a unit in $\UnitE$ does divide the unit, since we may produce a unit-typed value from a term like $f x : 1$, where $f : P \lolli 1$ and $x : P$. Tensor products are introduced with the $\TensorI$ rule, which splits the variables in the context between the two sides of the pair $\pair{e}{e'}$. Likewise, the elimination form $\TensorE$ uses the variables in $\Gamma$ to produce a term of type $A \tensor B$, and then binds the variables $x:A$ and $y:B$ to typecheck the continuation expression $e'$.

We can contrast tensor products $A \tensor B$ with the Cartesian product $A \with B$. The introduction rule $\WithI$ reuses the same context $\Gamma$ when typechecking the two branches of $\paira{e}{e'}$. As a result, there are two projective eliminations $\WithEFst$ and $\WithESnd$, each of which consumes a pair $A \with B$ to produce either an $A$, or a $B$. Intuitively, a Cartesian pair is lazy, and the projections $\fst{e}$ and $\snd{e}$ let the programmer choose which of the two possibilities they want. In contrast, the tensor product $A \tensor B$ is strict, and so you get access to both components (at the price of not allowing the two sides to share variables).

Finally, we also have the coproduct $A \oplus B$. This works much as it does in the intuitionistic case: $\SumIInl$ takes an $A$ and gives an $A \oplus B$, and $\SumIInr$ does the same with a $B$. The case statement takes an expression $e$ of type $A \oplus B$, and then branches to $e'$ or $e''$ depending on whether or not it is the left or right branch. Note that $e$ has a disjoint set of variables $\Gamma$ from the case branches, but that $e'$ and $e''$ share the same set of variables $\Gamma''$ since only one of them will run. $$ \boxed{\judge{\Delta}{\Gamma}{e}{A}} \\ \inferrule[\Var] { } {\judge{\Delta}{x:A}{x}{A}} \\ \begin{array}{cc} \inferrule[\AllI] {\judge{\Delta, \alpha}{\Gamma}{e}{A}} {\judge{\Delta}{\Gamma}{\Fun{\alpha}{e}}{\all{\alpha}{A}}} & \inferrule[\AllE] {\judge{\Delta}{\Gamma}{e}{\all{\alpha}{B}} \\ \judgetp{\Delta}{A}} {\judge{\Delta}{\Gamma}{e\;A}{[A/\alpha]B}} \\[2em] \inferrule[\LolliI] {\judge{\Delta}{\Gamma, x:A}{e}{B}} {\judge{\Delta}{\Gamma}{\fun{x:A}{e}}{A \to B}} & \inferrule[\LolliE] {\judge{\Delta}{\Gamma}{e}{A \to B} \\ \judge{\Delta}{\Gamma'}{e'}{A}} {\judge{\Delta}{\Gamma,\Gamma'}{e\;e'}{B}} \\[2em] \inferrule[\UnitI] { } {\judge{\Delta}{\cdot}{\unit}{1}} & \inferrule[\UnitE] {\judge{\Delta}{\Gamma}{e}{1} \\ \judge{\Delta}{\Gamma'}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letunit{e}{e'}}{C}} \\[2em] \inferrule[\TensorI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma'}{e'}{B} } {\judge{\Delta}{\Gamma,\Gamma'}{\pair{e}{e'}}{A \tensor B}} & \inferrule[\TensorE] {\judge{\Delta}{\Gamma}{e}{A \tensor B} \\\\ \judge{\Delta}{\Gamma', x:A, y:B}{e'}{C}} {\judge{\Delta}{\Gamma,\Gamma'}{\letpair{x}{y}{e}{e'}}{C}} \\[2em] \inferrule[\TopI] { } {\judge{\Delta}{\Gamma}{\unita}{\top}} & \\[1em] \inferrule[\WithI] {\judge{\Delta}{\Gamma}{e}{A} \\ \judge{\Delta}{\Gamma}{e'}{B} } {\judge{\Delta}{\Gamma}{\paira{e}{e'}}{A \with B}} & \begin{array}{l} \inferrule[\WithEFst] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\fst{e}}{A}} \\[1em] \inferrule[\WithESnd] {\judge{\Delta}{\Gamma}{e}{A \with B}} {\judge{\Delta}{\Gamma}{\snd{e}}{B}} \end{array} \\[3em] \begin{array}{l} \inferrule[\SumIInl] {\judge{\Delta}{\Gamma}{e}{A }} {\judge{\Delta}{\Gamma}{\inl{e}}{A \oplus B}} \\[1em] \inferrule[\SumIInr] {\judge{\Delta}{\Gamma}{e}{ B}} {\judge{\Delta}{\Gamma}{\inr{e}}{A \oplus B}} \end{array} & \begin{array}{l} \inferrule[\SumE] {\judge{\Delta}{\Gamma}{e}{A \oplus B} \\\\ \judge{\Delta}{\Gamma', x:A}{e'}{C} \\\\ \judge{\Delta}{\Gamma', y:B}{e''}{C} } {\judge{\Delta}{\Gamma, \Gamma'}{\case{e}{x}{e'}{y}{e''}}{C}} \end{array} \\[3em] & \inferrule[\ZeroE] {\judge{\Delta}{\Gamma}{e}{0}} {\judge{\Delta}{\Gamma}{\abort{e}}{C}} \end{array} $$

Now, we can define a size function $\size{e}$ on lambda-terms. $$ \begin{array}{lcl} \size{x} & = & 0 \\ \size{\all{\alpha}{e}} & = & \size{e} \\ \size{e\;A} & = & 1 + \size{e} \\ \size{\fun{x:A}{e}} & = & \size{e} \\ \size{e\;e'} & = & 1 + \size{e} + \size{e'} \\ \size{\unit} & = & 1 \\ \size{\letv{\unit}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\pair{e}{e'}} & = & 1 + \size{e} + \size{e'} \\ \size{\letv{\pair{x}{y}}{e}{e'}} & = & \size{e} + \size{e'} \\ \size{\unita} & = & 0\\ \size{\paira{e}{e'}} & = & \max(\size{e}, \size{e'}) \\ \size{\fst{e}} & = & 1 + \size{e} \\ \size{\snd{e}} & = & 1 + \size{e} \\ \size{\inl{e}} & = & 1 + \size{e} \\ \size{\inr{e}} & = & 1 + \size{e} \\ \size{\case{e}{x}{e'}{y}{e''}} & = & \size{e} + \max(\size{e'}, \size{e''}) \\ \end{array} $$

This function basically just counts the size of the term, adding an extra one for either the introduction or elimination for each connective. (I followed the pattern of adding 1 to the introduction rule for positive connectives, and 1 to the elimination for negative connectives, but really it doesn't seem to matter.)

Now, we can prove a lemma showing how sizes interact with substitution. Namely, linearity means that the size can never get larger when you substitute a term for a variable.

Size-respecting substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $\judge{\Delta}{\Gamma', x:A}{e'}{C}$. Then $\size{[e/x]e'} \leq \size{e} + \size{e'}$.
Proof This follows by a structural induction on the derivation of $e'$. Rather than giving the full proof, I'll just make the observations needed to prove this.

In the $\WithI$ and $\SumE$ cases, we need the following fact: $$ \max(a + x, b + x) = \max(a,b) + x $$ This is because $e$ can be substituted into multiple branches in these two rules.

In other cases, such as $\TensorI$, we have two subterms of $e'$ (so $e' = \pair{e'_1}{e'_2}$). Then typing indicates that $x$ can occur in only one of the branches. As a result, $[e/x]\pair{e'_1}{e'_2}$ will equal either $\pair{[e/x]e'_1}{e'_2}$ or $\pair{e'_1}{[e/x]e'_2}$, and then the size will be less than or equal to $1 + \size{e} + \size{e'_1} + \size{e'_2} = \size{e} + \size{e'}$.

The less than or equal to arises because of the presence of the unit $\top$, which ignores variables in its proof term $\unita$. In its absence substitution would preserve size exactly.

We also need to prove the same thing for type substitution.

Size-respecting type substitution. Suppose $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta, \alpha}{\Gamma}{e}{C}$ and $\judgetp{\Delta}{A}$. Then $\size{[A/\alpha]e} = \size{e}$.
Proof This is a routine induction, with no surprises, since types never affect the size of a term.
Next, let's write down the beta reductions: $$ \begin{array}{lcl} (\Fun{\alpha}{e})\;A & \mapsto & [A/\alpha]e \\ (\fun{x:A}{e})\;e' & \mapsto & [e'/x]e \\ \letv{\unit}{\unit}{e'} & \mapsto & e' \\ \letv{\pair{x}{y}}{\pair{e_1}{e_2}}{e'} & \mapsto & [e_1/x, e_2/y]e' \\ \fst{\paira{e}{e'}} & \mapsto & e \\ \snd{\paira{e}{e'}} & \mapsto & e' \\ \case{\inl{e}}{x}{e'}{y}{e''} & \mapsto & [e/x]e' \\ \case{\inr{e}}{x}{e'}{y}{e''} & \mapsto & [e/y]e'' \\ \end{array} $$

Now, we can verify that $\beta$-reduction reduces size:

Shrinking. If $\judgectx{\Delta}{\Gamma}$ and $\judge{\Delta}{\Gamma}{e}{A}$ and $e\;\mapsto e'$, then $\size{e'} < \size{e}$.
Proof The proof of this is essentially nothing but the observation that every $\beta$-reduction removes an introduction, removes an elimination, and performs a subsitution, and so the size of the result term must get strictly smaller.

As a result, we know that all reduction strategies lead to a normal form. Since $\beta$-reduction makes a term strictly smaller, the process must eventually stop, no matter what order you perform the reductions in. So we (almost) have strong normalization. The missing piece is that we also need the Church-Rosser property, to ensure that all sequences of reductions eventually reach the same normal form. But this post is getting quite long enough, so I won't prove that -- I'll just assert it. :)

I will, however, offer one last reward for those of you who persevered to the end. Let's add unrestricted recursive types to our language: $$ \inferrule[\MuI] {\judge{\Delta}{\Gamma}{e}{[\fix{\alpha}{A}/\alpha]A}} {\judge{\Delta}{\Gamma}{\fold{e}}{\fix{\alpha}{A}}} \qquad \inferrule[\MuE] {\judge{\Delta}{\Gamma}{e}{\fix{\alpha}{A}}} {\judge{\Delta}{\Gamma}{\unfold{e}}{[\fix{\alpha}{A}/\alpha]A}} $$

The $\beta$-rule is the obvious one you might expect: $$ \begin{array}{lcl} \unfold{(\fold{e})} & \mapsto & e \end{array} $$

And let's extend the size function, too: $$ \begin{array}{lcl} \size{\fold{e}} & = & 1 + \size{e} \\ \size{\unfold{e}} & = & \size{e} \end{array} $$

Now, note that all of the lemmas we used continue to hold. So we can add unrestricted recursive types, with no restrictions on negative occurences of type variables, to second-order MALL and the language is still strongly normalizing!

Another way of putting things is that contraction (i.e., using variables more than once) is essential to programming things like the Y combinator. Without it, recursive types are not enough to enable dangerous self-references like Russell's paradox.

Acknowledgements. I'd like to thank Stéphane Gimenez for directing me to Girard's small normalization theorem.

Monday, April 29, 2013

John C. Reynolds, June 1, 1935 - April 28, 2013

Yesterday, John Reynolds passed away. I had the privilege of being one of his graduate students, and much of what I know about what it means to be a scientist is due to his example. I just want to share a few small anecdotes about him.

  1. When I was a new graduate student, I began working on the project which would eventually become my thesis. When I described one of my early approaches to this problem, he asked a series of questions about how I would handle one sticky technical point after another, and I showed him how the language and its semantics were carefully set up so that this issue in question could not arise. After I finished my explanation, he nodded thoughtfully, and told me, "Neel, you've done some very good engineering, but engineering is not science! Good engineers develop a sense of how to avoid obstacles -- but a scientist's job is to face them head-on and flatten them."

    This made a really big impression on me. As academic computer scientists, we have two great luxuries: we can choose what to study, and we can spend as long as necessary studying it. As a result, our ethical obligation is one step removed from solving problems: our role is to produce understanding, so that engineers (indeed, all of society) have new methods to solve new classes of problems.

  2. Once, John told me about some of his work at Argonne National Laboratory, where he designed the COGENT programming language, which he told me was what convinced him that he should leave physics to do computer science. He said that while today he could (indeed, many people could) design much better languages, he would always regard it fondly, since it was his first successful language design.

    I asked him what made it successful, and he told me that there was another scientist at Argonne, who used it to write some simulation code. When John read the program listing, he realized that he could not understand how the program worked, since it relied domain knowledge that he lacked. So this was John's definition of a successful language design: if you have a user who has used it to write a program you couldn't have, your language has succeeded, since it has helped a fellow human being solve one of their own particular problems.

    I've always liked his definition, since it manages to avoid an obsession with nose-counting popularity metrics, while still remembering the essentially social purpose of language design.

  3. Another time, I asked John what he thought about object-oriented programming. He told me he thought it was too soon to tell. I burst out laughing, since I thought he was joking: Simula 67 is considerably older than I am, and I thought that surely that was sufficient time to form an opinion.

    But he insisted that, no, he was not kidding! From a formal, mathematical, perspective, he did not like object-oriented programming very much: he had studied various formalizations of objects in his work on Idealized Algol and on intersection types, and found they introduced some unwanted complexities. However, so many excellent engineers still thought that objects were a good idea, that he was unwilling to dismiss the possibility that there was a beautiful mathematical theory that we haven't discovered yet.

    I thought this was a great demonstration of Keats's negative capability, but what makes it really impressive was that John married it to a profound intellectual rigour: I've never known anyone who could produce counterexamples faster than John could. While he tested every idea ruthlessly, he never closed himself off from the world, giving him a rare ability to make judgements without passing judgement.

I haven't really said anything about John's enormous scientific achievements, because the deepest lesson I learned from him is not any one of his theorems, but rather his fundamentally humane vision of science, as something we do for and with our fellow scientists and community.

Ars longa, vita brevis, but together we make a chain through the ages --- and more links of that chain than most can claim were forged by you, John.

Wednesday, April 10, 2013

Thanks to Olle Fredriksson

Joshua and I thought our new typechecking algorithm for higher-rank polymorphism was easy to implement, but I have to admit that we weren't 100% sure about this, since we've been immersed in this work for months.

However, Olle Fredriksson wrote us within one day of the draft going online with a link to his Haskell implementation. This was immensely reassuring, since (a) he implemented it so quickly, and (b) he didn't need to ask us how to implement it -- this means we didn't leave anything important out in the description!

In fact, Olle is technically the first person to implement the algorithm in the paper, since both Joshua and I have actually implemented variants of this algorithm for different languages. I'll try to write a small ML implementation to go along with Olle's Haskell code in the next few weeks. If you want to beat me to it, though, I have no objections!

Thursday, April 4, 2013

Spring Things

Part of the reason for the silence on this blog is that it's been a very busy spring for me. I've got three new drafts to announce, each of which I played at least some role in:
  • Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism, Joshua Dunfield and Neelakantan R. Krishnaswami.
    Bidirectional typechecking, in which terms either synthesize a type or are checked against a known type, has become popular for its scalability (unlike Damas-Milner type inference, bidirectional typing remains decidable even for very expressive type systems), its error reporting, and its relative ease of implementation. Following design principles from proof theory, bidirectional typing can be applied to many type constructs. The principles underlying a bidirectional approach to polymorphism, however, are less obvious. We give a declarative, bidirectional account of higher-rank polymorphism, grounded in proof theory; this calculus enjoys many properties such as η-reduction and predictability of annotations. We give an algorithm for implementing the declarative system; our algorithm is remarkably simple and well-behaved, despite being both sound and complete.

    In this paper, we show how to extend bidirectional typechecking to support higher-rank polymorphism. Bidirectional typechecking is attractive in part because it is very easy to implement, and happily our extension remains simple -- it turns out the fanciest data structure you need to support higher-rank polymorphism is an ordered list!

  • Mtac: A Monad for Typed Tactic Programming in Coq, Beta Ziliani, Derek Dreyer, Neelakantan R. Krishnaswami, Aleksandar Nanevski, Viktor Vafeiadis.

    Effective support for custom proof automation is essential for large-scale interactive proof development. However, existing languages for automation via tactics either (a) provide no way to specify the behavior of tactics within the base logic of the accompanying theorem prover, or (b) rely on advanced type-theoretic machinery that is not easily integrated into established theorem provers.

    We present Mtac, a lightweight but powerful extension to Coq that supports dependently-typed tactic programming. Mtac tactics have access to all the features of ordinary Coq programming, as well as a new set of typed tactical primitives. We avoid the need to touch the trusted kernel typechecker of Coq by encapsulating uses of these new tactical primitives in a monad, and instrumenting Coq so that it executes monadic tactics during type inference.

    You can find the software at the Mtac web page.

    Since I wasn't the lead author, I see no need to be modest: this is really, really, good. The power-to-weight ratio of this approach to tactics is incredibly high, and I seriously think that every dependently typed language implementor ought to consider adding something like Mtac to their system.

  • Higher-Order Reactive Programming without Spacetime Leaks, Neelakantan R. Krishnaswami.

    Functional reactive programming (FRP) is an elegant approach to declaratively specify reactive systems. However, the powerful abstractions of FRP have historically made it difficult to predict and control the resource usage of programs written in this style.

    In this paper, we give a new language for higher-order reactive programming. This language generalizes and simplifies prior type systems for reactive programming, supporting the use of first-class streams, such as streams of streams; first-class functions and higher-order operations; and permits encoding many temporal operations beyond streams, such as terminatable streams, events, and even resumptions with first-class schedulers. Furthermore, our language supports an efficient implementation strategy permitting us to eagerly deallocate old values and statically rule out spacetime leaks, a notorious source of inefficiency in reactive programs. Furthermore, these memory guarantees are achieved without the use of a complex substructural type discipline.

    We also show that our implementation strategy of eager deallocation is safe, by showing the soundness of our type system with a novel step-indexed Kripke logical relation.

    You can download an almost completely undocumented implementation to play with this kind of FRP language. It's also good if you want to see what a language with linear types looks like. Also, I use the type inference algorithm described in the first draft. This doesn't have dependent types (yet?), so no Mtac, though. :)

Thursday, March 7, 2013

Interview with John Reynolds

I've been ridiculously busy the last few months, so this is just a quick link: John retired recently, so I should probably post something in his honor. When I have time, I'll post a little something about some of John's work (with Peter O'Hearn) on how to use functor-category semantics as a guide to programmming with polymorphism. If you don't want to wait, you can look at their paper From Idealized Algol to Polymorphic Linear Lambda Calculus.