Wednesday, November 17, 2021

POPL till you DROPL

My PhD student Faustyna Krawiec, along with me and several other coauthors, has a new paper out, which is due to appear at POPL 2022:

Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation

Faustyna Krawiec, Simon Peyton-Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew Fitzgibbon.

In this paper, we give a simple and efficient implementation of reverse-mode automatic differentiation, which both extends easily to higher-order functions, and has run time and memory consumption linear in the run time of the original program. In addition to a formal description of the translation, we also describe an implementation of this algorithm, and prove its correctness by means of a logical relations argument.

We give an incredibly simple implementation of reverse-mode AD, which works in a language with higher-order functions, and is asymptotically efficient. When we started the work I was hoping it would be a good excuse to learn some differential geometry, but it turned out there is such a simple algorithm that we didn't need any fancy machinery! So I still don't know any differential geometry, but we've got a cool AD algorithm to share.

That's the only POPL paper this year with my name on it, but it turns out that a lot of students and close collaborators have papers without me as well! It might not matter for my academic prestige, but it makes me happy, especially when I see how interesting the work is!

  1. My Phd student, Dima Szamozvancev and his other advisor Marcelo Fiore, have a paper Formal Metatheory of Second-Order Abstract Syntax. They reformulated Marcelo's work on second-order abstract syntax to fit into a constructive type theory, and then mechanised it in Agda. As a result, you can give a description of a syntax, and automatically get out the substitution operators and their correctness property. One particularly powerful feature is that it doesn't just handle binding, but also metavariables. This is essential if you want handle things like inference rules as first-class values!

  2. My newest PhD student, Ilya Kaysin, has a paper, The Leaky Semicolon (no link yet) with several other people giving a compositional semantics to weak memory. As a fun fact, Ilya is entering his PhD with more POPL publications than I exited with. :)

  3. Vikraman Chaudhury, is a (former) PhD student of Amr Sabry's, but he spent a year visiting me and we wrote a paper together on the semantics of capabilities. At this POPL, he has a paper with Amr Sabry and Jacek Karwowski, Symmetries in Reversible Programming: From Symmetric Rig Groupoids to Reversible Programming Languages4, which gives a nice semantics for reversible programming languages.

Tuesday, September 28, 2021

Design Benchmarks

A few weeks ago, Shriram Krishnamurthi asked an interesting question on Twitter:

It's common in PL to have benchmarks for performance. But does anyone know of PL design benchmarks?

This caught my attention because my longest-running research program -- my work on functional reactive programming, with 7 papers so far -- has been explicitly guided by a handful of design benchmarks.

I've never written them down explcitly, so I thought I'd take the opportunity to do so.

  1. The language must validate the full beta- and eta-laws for all of the type constructors in the language.

    This is a proxy benchmark, of course. My actual goal is to make languages which are easy to program with and reason about.

    However, this is an open-ended and vague goal, and so I needed to find a specific, concrete operationalisation of it. There are many possible proxies for this goal, and the one I picked was to look for programming languages which validate a rich collection of semantically-justified equations. This specific choice was made for three reasons.

    • First, this goal it pushes me away from spending time on quality-of-life features like rich standard libraries. These features obviously do make a language easier to use, but they do not directly touch upon any FRP-specific problems.

    • Second, the search for well-behaved equational properties pushes me towards taking features known to be useful, and trying to decompose them into smaller type-theoretic components.

      As Jeff Raskin once noted, in user interfaces "intuitive" and "familiar" are basically synonyms, which means it is good to have a design benchmark which does not penalise doing things that are different from the old standard.

    • Third, déformation professionnelle. This particular goal is the one that the toolbox I've acquired over the years -- expertise in semantics and type theory -- is best-adapted to move me towards.

      If my expertise lay in running user studies, writing model checkers, or programming with continuations, then I would probably approach the problem differently! For example, the work on the Garnet toolkit is both extremely good and very influential, and very different in character from anything I have pursued.

      Obviously, I could learn how to write a constraint solver or run a user study, but the art is long, and life short. I think it's OK for people to focus on different things, and to learn from each other. :)

  2. I have a small collection of tiny programs which are intended to model various common UI patterns, and the languages I design must express these programs.

    The purpose of these programs is basically to prevent over-simplifying away real complexity. If we're trying to optimise a language design for ease of reasoning, the easiest way of "achieving" the goal is to design an impoverished language.

    For example, I check whether I can write programs which dynamically create new widgets, to ensure that I can't design something which restricts programmers to a fixed dataflow graph. Likewise, a program with two widgets sending notifications "sideways" to each other keeps me from trying to rely on hierarchical encapsulation. (This is super tempting, but as far as I can tell, no actual program consistently does this.)

  3. The language must admit a simple implementation strategy.

    This design principle arose as a reaction to a negative lesson.

    My first paper on higher-order FRP, Ultrametric Semantics of Reactive Programs, used a system of dynamic, incremental dependency propagation (a baby version of the stuff Matt Hammer and Umut Acar used to do), and it ended up requiring a really weird implementation strategy for closures. In addition to being able to call them, I also needed to be able to move forward them in time, and this required a really weird set of hacks to go into a closure and move everything in its captured environment forward in time as well.

    This was painful to implement, difficult to prove correct, and it leaked memory like a sieve. Trying to deal with these issues led directly to my ICFP 2013 paper on FRP, which is basically a type system for immediate mode GUI programming.

    Note that I don't mind if the implementation strategy is challenging to prove correct (eg, my POPL 2021 paper with Simon Spies and Derek Dreyer is about inventing some needed machinery for modelling retained-mode GUI implementations), but the actual running code has to be simple.

    This means I haven't looked at any dataflow or constraint stuff in years, but sometimes a design rule is there to prevent you from doing certain things.

Thursday, July 8, 2021

Postdoctoral opening in the TypeFoundry project

I have an opening for a postdoctoral position in Cambridge with the European ERC Consolidator Grant project TypeFoundry. This is a 2-year position, with the possibility of extension.

The TypeFoundry project aims to use recent developments in proof theory and semantics, such as polarized type theory and call-by-push-value, to identify the theoretical structures underpinning bidirectional type inference.

Informal enquiries are welcome; please contact me at: <nk480@cl.cam.ac.uk>.

More details and an application form can be found at <https://www.jobs.cam.ac.uk/job/30485/>.

Monday, February 15, 2021

Five (and a Half) Derivatives in Language Theory

Thanks to the influence of machine learning, differentiating programs has become a big business. However, there are a lot of other things in programming language theory which are also derivatives, and people sometimes get confused about the inter-relationships. So I decided to lay out some of them and some of their interconnections.

1. Brzozowski Derivatives

The first kind of derivative is the humble, yet beloved, Brzozowski derivative. Given a character set \(\Sigma\) and a regular language \(L \subseteq \Sigma^\ast\), the Brzozowski derivative of \(L\) with respect to the character \(c \in \Sigma\) is

\[ \delta_c(L) = \left\{ w \in \Sigma^\ast \;\mid|\; c \cdot w \in L \right\} \]

That is, it's the subset of \(L\) prefixed with \(c\), minus the leading character. It can be defined as a transformations on regular expressions \(r\) as follows:

\[ \begin{array}{lcll} \delta_c(0) & = & \bot & \\ \delta_c(r_1 + r_2) & = & \delta_c(r_1) + \delta_c(r_2) & \\ \delta_c(c') & = & \epsilon & \mbox{when } c = c'\\ \delta_c(c') & = & \bot & \mbox{when } c \neq c' \\ \delta_c(\epsilon) & = & \bot & \\ \delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 & \mbox{when $r_1$ not nullable} \\ \delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 + \delta_c(r_2) & \mbox{when $r_1$ nullable} \\ \delta_c(r \ast) & = & \delta_c(r) \cdot r\ast & \\ \end{array} \]

The Brzozowski derivative is called a derivative for two reasons. First, it is linear with respect to \(+\) on regular expressions -- i.e., \(\delta_c(r_1 + r_2) = \delta_c(r_1) + \delta_c(r_2)\). Second, it satisfies the derivative rule for monomials \(\delta_c(c^{n+1}) = c^n\). You may recall that the high-school rule for derivatives says \(\delta_c(c^{n+1}) = n \times c^n\), but recall that in regular expressions, \(r + r = r\), so \(n \times r = r\).

However, this is a weak notion of derivative, which satisfies few of properties with other derivatives. The reason for this is that the rule for products does not match the product rule for high-school derivatives -- if the Brzozowski derivative satisfied the high school rule, it would be

\[ \delta_c(r_1 \cdot r_2) = \delta_c(r_1) \cdot r_2 + r_1 \cdot \delta_c(r_2) \]

2. Differential algebras

If we try to change the definition of Brzozowski derivative to match the high school rule, we get what is called a differential algebra. A differential algebra is a semiring \(S\), plus a collection of unary functions \(\delta_i : S \to S\) which are called the derivations. The axioms for derivations mirror the rules for differentiating polynomials -- the motivating model for differential algebras is a semiring of polynomials with addition and multiplication of polynomials as the semiring structure, and the derivation is the derivatives with respect to the polynomial's variable. The rules we get are:

\[ \begin{array}{lcll} \delta_c(0) & = & 0 & \\ \delta_c(r_1 + r_2) & = & \delta_c(r_1) + \delta_c(r_2) & \\ \delta_c(c') & = & \epsilon & \mbox{when } c = c'\\ \delta_c(c') & = & \bot & \mbox{when } c \neq c' \\ \delta_c(\epsilon) & = & 0 & \\ \delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 + r_1 \cdot \delta_c(r_2) & \\ \delta_c(r \ast) & = & r\ast \cdot \delta_c(r) \cdot r\ast & \\ \end{array} \]

Note that the only things that changed is the clause for concatenation and the Kleene star. The axioms of a differential algebra actually say nothing about Kleene star, because they are specified for semirings, but to handle full regular languages you need to change the definition so that the differential respects the equation \(r \ast = \epsilon + r \cdot r\ast\). If memory serves, Dexter Kozen has studied this extension. (Even if memory doesn't serve, he's a safe bet...!)

If you look at this definition for a while, you realize that the original Brzozowski derivative removes a single character \(c\) from the front of a string, and the differential removes a single \(c\) from anywhere in the string.

For regular algebras, this is a peculiar thing to want to do, but it suddenly becomes a lot more interesting when we move from semirings of polynomials to semiring categories of polynomial functors.

3. Derivatives of Data Types

One interesting factoid is that sets "look like" a semiring. The Cartesian product \(A \times B\) and the disjoint union \(A + B\) obviously don't satisfy the semiring equations, but if you replace the equality symbol with isomorphisms, then they do. If you take this analogy seriously and try categorify the notion of a semiring, we get what is variously called a rig category, semiring category, or bimonoidal category.

Instead of a set plus two binary operations, we have a category with two monoidal structures with the semiring equations becoming isomorphisms. The exact commuting diagrams needed were worked out in the early 70s, independently by Laplaza and Kelly.

One semiring category very useful category for programming is the category of polynomial functors. Most inductive datatypes like lists, binary trees, and so on can be understood as the least fixed point of sum and products. We can formalise this idea by giving a grammar of sums, products and

\[ \begin{array}{lcll} F,G & ::= & \mathrm{Id} & \\ & | & \underline{0} \\ & | & F \oplus G & \\ & | & \underline{1} \\ & | & F \otimes G & \\ & | & K(A) & \mbox{where } A \in \mathrm{Set} \\ \end{array} \]

Every term of this grammar can be interpreted as a functor:

\[ \newcommand{\interp}[1]{[\![ #1 ]\!]} \begin{array}{lcll} \interp{F} & : & Set \to Set \\ \interp{\mathrm{Id}} & = & X \mapsto X \\ \interp{\underline{0}} & = & X \mapsto \interp{F \oplus G} & = & X \mapsto \interp{F}\,X + \interp{G}\,X \\ \interp{F \otimes G} & = & X \mapsto \interp{F}\,X \times \interp{G}\,X \\ \interp{K(A)} & = & X \mapsto A \end{array} \]

So you can see that \(\oplus\) is interpreted as the coproduct functor, \(\otimes\) is the product functor, \(K(A)\) is the constant functor, and \(\mathrm{Id}\) is interpreted as the identity functor.

The least fixed point of one of these polynomials is an inductive type. So to model lists of type X, we would define the list functor

\[ \mathrm{ListF} \triangleq K(1) \oplus (K(X) \otimes \mathrm{Id}) \]

and then take its least fixed point.

Since we have a semiring category, we can ask if it has a categorical analogue of a derivation, to give it differential structure. And these polynomials do! The derivative with respect to the variable \(\mathrm{Id}\) can be defined as

\[ \begin{array}{lcll} \delta(K(A)) & = & 0 & \\ \delta(\underline{0}) & = & 0 & \\ \delta(F \oplus G) & = & \delta(F) \oplus \delta(G) & \\ \delta(\mathrm{Id}) & = & 1 & \\ \delta(\underline{1}) & = & 0 & \\ \delta(F \otimes G) & = & \delta(F) \otimes G \oplus F \otimes \delta(G) & \\ \end{array} \]

If you have a functor for a binary tree:

\[ \mathrm{TreeF} \triangleq \underline{1} \oplus (\mathrm{Id} \otimes \mathrm{Id}) \]

Then the differential is: \[ \delta(\mathrm{TreeF}) = (\underline{1} \otimes \mathrm{Id}) \oplus (\mathrm{Id} \otimes \underline{1}) \]

This tells you whether the left or the right position was chosen. The utility of this is that given an element \(x \in X\), and you have an element of \(\delta(F)\,X\), then you can "fill in the gap" to get an element of X. That is, you can define:

\[ \mathit{fill}_F : \delta(F)\,X \times X \to F\,X \]

If you take \(X\) to be the type of trees \(\mu(\mathrm{TreeF})\), this gives you an indication of whether to take the left or right subtree, plus the tree you didn't follow. Then a list of these things gives you a path into a tree -- which is just Huet's famous zipper.

The exposition so far largely glosses Conor McBride's famous paper The Derivative of a Regular Type is the Type of its One-Hole Contexts.

The precise connection to linearity and derivatives was explored by Abbot, Altenkirch, Ghani and McBride in their paper Derivatives of Containers, using significantly more categorical machinery. In their paper, Altenkirch et al explain how the \(\mathit{fill}\) operation is really a linear function

\[ \mathit{fill}_F : \delta(F)\,X \otimes X \multimap F\,X \]

and how \(\delta(F)\) can be viewed as giving a kind of tangent for \(F\) at \(X\).

4. Differential Linear Logic

The appearance of linear types here should be delightful rather than shocking. We think of the ususal derivative as associating a tangent vector space to each point of an original space, and some of the primordial models of linear type theory are finite-dimensional vector spaces and linear transformations (a model of the multiplicative-additive fragment of classical linear logic, indeed also of the exponentials when the field the vector space is over is finite), and Banach spaces and short maps, a simple model of intuitionistic linear logic including the full exponential.

As a result, we should expect that we should be able to find a way of making differentiation makes sense within linear type theory, and since a type theory is the initial model of a class of models, this means that derivative-like constructions are something we should look for any time linearity arises.

The pure form of this recipe can be found in Erhard and Regnier's differential linear logic. One of the really cool features of this line of work is that derivatives arise as an extension to the usual structural rules for the derivatives -- you just adjust how to treat variables and derivatives appear like magic. One fact about this line of work is that it has tended to focus on proof nets rather than sequent calculus, which made it less accessible than one would like. What solved this issue for me was Marie Kerjean's A Logical Account for Linear Partial Differential Equations.

I do want to admit that despite starting off this section saying you shouldn't be shocked, actually I am -- this stuff really seems like sorcery to me. So even though I believe intellectually that everything should work out like this, I'm still astounded that it does. (It's rather like the feeling I get whenever I run a program I did a machine-checked correctness proof of.)

5. Automatic Differentiation

These days, the most famous kind of differentiation in language theory is, of course, automatic differentiation, which takes a program and rewrites it to produce a new program computing the derivative of the original program. Automatic differentiation comes in two main varieties, called forward mode and reverse mode.

The idea of forward mode is incredibly simple -- you just treat real numbers as an abstract datatype, and replace your original real number implementation with a new one based on dual numbers. Intuitively, a dual number is a pair \(a + b \epsilon\), where \(\epsilon\) is just a formal infinitesimal which is "nilpotent" (i.e., \(\epsilon^2 = 0\)).

Then the data abstraction properties of the typed lambda calculus ensure that when you run the program with the dual number implementation of reals, you are returned a pair of the original value and its partial derivatives (i.e. the Jacobian matrix). Dan Piponi blogged about this over 15 years ago(!), but even though I read it at the time, the algorithm was so simple I failed to recognise its importance!

Reverse mode is more complicated -- it comes from the observation that in machine learning, you are generally optimising a function \(R^n \to R\), where the result is some scoring function. Since \(n\) might be very large and the dimensionality of the result is just 1, you can achieve superior computational complexity by computing the transpose of the Jacobian rather than the Jacobian itself.

When you do so, the resulting syntactic program transformations all look like they are doing fancy continuation manipulations, and their semantic counterparts all look like manipulations of the dual space. So the syntatic continuation/double-negation operations are reflecting the fact that that finite-dimensional vector spaces are isomorphic to their double-duals (i.e., \(V^{\ast\ast} \simeq V\)).

In addition to being really neat, this is actually a really nice problem to look at. People basically care about reverse mode AD because it goes fast, which means you get this lovely interplay between operational concerns and semantic models. A second bonus is that AD offers computer scientists a good reason to learn some differential geometry, which in turn is a good excuse to learn general relativity.

6? Incremental Computation

After all this talk of differentiation, you might wonder -- what about finite differences?

It turns out that they have a nice interpretation in lambda calculus, too! In 2015, Yufei Cai, Paolo Giarusso, Tillman Rendel and Klaus Ostermann introduced the incremental lambda calculus.

The basic idea behind this is that for each type \(X\), you introduce a type of "changes" \(\Delta X\) for it too. A morphism between \((X, \Delta X)\) and \((Y, \Delta Y)\) is a pair of functions \(f : X \to Y\) and \(\mathit{df} : X \to \Delta X \to \Delta Y\), where \(\mathit{df}\) is the incrementalization -- \(df\,x\,\mathit{dx}\) tells you how the to change the output of \(f\),x to account for the change \(\mathit{dx}\) to the input.

This looks a lot like what you get out of the tangent space functor, but it turns out that the incremental lambda calculus is not simply a derivative a la the differential lambda calculus, because the derivative of a function is unique, and these incrementalisations don't have to be. Michael Arntzenius and I used the ILC in our paper on extending seminaive evaluation from Datalog to a higher-order language, and it was critical for the efficiency of our result that we had this freedom.

Mario Picallo-Alvarez worked out answers to a lot of the semantics questions underlying this in his PhD thesis, Change actions: from incremental computation to discrete derivatives, showing how incremental lambda calculus arises as a generalisation of the models of the differential lambda calculus.

He noted that the "real" semantics for change types should look like a category, in which a type is a category where the objects are values, and a morphisms \(\delta : v_1 \to v_2\) is evidence that \(v_1\) can be changed into \(v_2\). On the grounds that 2-categories are painful, Mario worked with change monoids (i.e., no typing, but yes to identities and composition), and the Giarusso-style change structures that Michael and I used retained the typed changes, but discarded all the categorical axioms. So there's more to be done here!

Monday, December 14, 2020

TypeFoundry: new ERC Consolidator Grant

I am very pleased to have received an ERC Consolidator Grant for my TypeFoundry proposal.

This will be a five year project to develop the foundations of bidirectional type inference. If you are interested in pursuing a PhD in this area, or conversely, are finishing a PhD in this area, please get in touch!

Many modern programming languages, whether developed in industry, like Rust or Java, or in academia, like Haskell or Scala, are typed. All the data in a program is classified by its type (e.g., as strings or integers), and at compile-time programs are checked for consistent usage of types, in a process called type checking. Thus, the expression 3 + 4 will be accepted, since the + operator takes two numbers as arguments, but the expression 3 + "hello" will be rejected, as it makes no sense to add a number and a string. Though this is a simple idea, sophisticated type system can track properties like algorithmic complexity, data-race freedom, differential privacy, and data abstraction.

In general, programmers must annotate programs to tell compilers the types to check. In theoretical calculi, it is easy to demand enough annotations to trivialize typechecking, but this can make the annotation burden unbearable: often larger than the program itself! So, to transfer results from formal calculi to new programming languages, we need type inference algorithms, which reconstruct missing data from partially-annotated programs.

However, the practice of type inference has outpaced its theory. Compiler authors have implemented many type inference systems, but the algorithms are often ad-hoc or folklore, and the specifications they are meant to meet are informal or nonexistent. The makes it hard to learn how to implement type inference, hard to build alternative implementations (whether for new compilers or analysis engines for IDEs), and hard for programmers to predict if refactorings will preserve typability.

In TypeFoundry, we will use recent developments in proof theory and semantics (like polarized type theory and call-by-push-value) to identify the theoretical structure underpinning type inference, and use this theory to build a collection of techniques for type inference capable of scaling up to the advanced type system features in both modern and future languages.

One of the things that makes me happy about this (beyond the obvious benefits of research funding and the recognition from my peers) is that it shows off the international character of science. I'm an Indian-American researcher, working in the UK, and being judged and funded by researchers in Europe. There has been a truly worrying amount of chauvinism and jingoism in public life recently, and the reminder that cosmopolitanism and universalism are real things too is very welcome.

Also, if you are planning on submitting a Starting Grant or Consolidator proposal to the ERC in the coming year about programming languages, verification or the like, please feel free to get in touch, and I'll be happy to share advice.

Wednesday, December 2, 2020

Church Encodings, Inductive Types, and Relational Parametricity

My blogging has been limited this past year due to RSI, but I do not want to leave things entirely fallow, and last year I wrote an email which can be edited into a decent enough blog post.

Quite often, people will hear that System F, the polymorphic lambda calculus, satisfies a property called relational parametricity. We also often hear people say that the parametricity property of System F lets you prove that a Church encoding of an inductive datatypes actually satisfies all the equational properties we expect of inductive types.

But it's surprisingly hard to find an explicit account of how you go from the basic properties of parametricy (the relational interpretation of System F and the identity extension principle) to the proof that inductive types are definable. I learned how this works from Bob Atkey's paper Relational Parametricity for Higher Kinds, but the proofs are kind of terse, since his page count was mostly focused on the new stuff he had invented.

In the sequel, I'm going to assume that you know what the relational model of system F looks like, that the term calculus satisfies the abstraction theorem (i.e., the fundamental theorem of logical relations), and also that the model satisfies the identity extension property -- if you take the relational interpretation of a type B(α), and fill in the type variable α with the equality relation for the type A, then the relational interpretation of B[A/α] will the equality relation for the type B[A/α]. In what follows I will often write Id[X] to mean the equality relation for the type X.

For completeness' sake, I also give a quick summary of the model at the end of the post.

Haskell-style functors in System F

Given all this preface, we now define a functor F as a type expression F(-) with a hole, along with an operation fmap : ∀a,b. (a → b) → F(a) → F(b), such that

fmap _ _ id = id
fmap _ _ f ∘ fmap _ _ g = fmap _ _ (f ∘ g)

I've written _ to indicate System F type arguments I'm suppressing. Basically think of these as Haskell-style functors.

Next, we can define the inductive type μF using the usual Church encoding as:

μF = ∀a. (F(a) → a) → a

foldF : ∀a. (F(a) → a) → μF → a
foldf = Λa λf : F(a) → a. λx:μF. x [a] f

inF : F(μF) → μF
inF x = Λa. λf : F(a) → a. 
          let g : μF → a       = fold [μF] [a] f in
          let h : F(μF) → F(a) = fmap [μF] [a] g in
          let v : F(a)          = h x in
          f v

I've written out inF with type-annotated local bindings to make it easier to read. If you inlined all the local bindings and suppressed type annotations, then it would read:

inF : F(μF) → μF
inF x = Λa. λf : F(a) → a. 
          f (fmap (fold f) x)

F-algebra Homomorphisms and the Graph Lemma

Now, we can prove a lemma about F-algebra homomorphisms, which is the tool we will use to prove initiality. But what are F-algebra homomorphisms?

An F-algebra is a pair (X, g : F(X) → X). An F-algebra homomorphism between two F-algebras (X, k : F(X) → X) and (Y, k' : F(Y) → Y) is a function f : X → Y such that the following ASCII-art diagram commutes:

 F(X) — k  → X
  |           |
 fmap X Y f   f
  ↓           ↓
 F(Y) — k' → Y

That is, for all u : F(X), we want f(k u) = k'(fmap [X] [Y] f u)

Before we prove something about homomorphisms, we need to prove a technical lemma called the graph lemma, which will let us connect parametricity with our functorial definitions.

Graph Lemma: Let (F, fmap) be a functor, and h : A → B be a function. Define the graph relation <h> to be the relation {(a,b) | b = f(a) }. Then F<h> ⊆ <fmap [A] [B] h>.

Proof:

  1. First, note that fmap has the type ∀a b. (a → b) → F(a) → F(b).
  2. Since fmap is parametric, we can choose the relations <h> and the identity to instantiate F with, giving us:

    (fmap [A] [B], fmap [B] [B]) ∈ (<h> → Id[B]) → (F<h> → Id<B>)
  3. Note that (h, id) ∈ <h> → Id[B].
  4. Hence (fmap [A] [B] h, fmap [B] [B] id) ∈ (F<h> → Id<B>)
  5. By functoriality, (fmap [A] [B] h, id) ∈ (F<h> → Id<B>)
  6. Assume (x, y) ∈ F<h>.
    1. Hence (fmap [A] [B] h x, id y) ∈ Id<B>)
    2. So fmap [A] [B] h x = y.
  7. Therefore (x, y) ∈ <fmap [A] [B] h>.
  8. Therefore F<h> ⊆ <fmap [A] [B] h>.

Graph relations are just functions viewed as relations, and the graph lemma tells us that the relational semantics of a type constructor applied to a graph relation <h> will behave like the implementation of the fmap term. In other words, it connects the relational semantics to the code implementing functoriality. (As an aside, this feels like it should be an equality, but I only see how to prove the inclusion.)

We use this lemma in the proof of the homomorphism lemma, which we state below:

Homomorphism Lemma: Let (F, fmap) be a functor. Given two F-algebras (A, k : F(A) → A) and (B, k' : F(B) → B), and an F-algebra homomorphism h : (A,k) → (B,k'), then for all e : μF, we have

e [B] k' = h (e [A] k)

Proof:

  1. First, by the parametricity of e, we know that

     (e [A], e [B]) ∈ (F<h> → <h>) → <h>
  2. We want to apply the arguments (k, k') to (e [A], e [B]), so we have to show that (k, k') ∈ F<h> → <h>.
  3. To show this, assume that (x, y) ∈ F<h>.

    1. Now we have to show that (k x, k' y) ∈ <h>.
    2. Unfolding the definition of <h>, we need (h (k x), k' y) ∈ Id[B].
    3. Since h is an F-algebra homomorphism, we have that h (k x) = k' (fmap [A] [B] h x).
    4. So we need to show (k' (fmap A B h x), k' y) ∈ Id[B].
    5. Now, we know that (x, y) ∈ F<h>.
    6. By the graph lemma, we know F<h> ⊆ <fmap [A] [B] h>.
    7. So (x, y) ∈ <fmap [A] [B] h>.
    8. Unfolding the definition, we know y = fmap [A] [B] h x.
    9. So we want (k' (fmap [A] [B] h x), k' (fmap [A] [B] h x)) ∈ Id[B].
    10. Since Id[B] is an equality relation, this is immediate.
  4. Hence (k, k') ∈ F<h> → <h>.
  5. Therefore (e [A] k, e [B] k') ∈ <h>.
  6. Unfolding the definition of <h>, we know e [B] k' = h(e [A] k). This is what we wanted to show.

Discussion:

The whole machinery of F-algebra homomorphisms basically exists to phrase the commuting conversions in a nice way. We just proved that for e : μF, we have

 e [B] k' = h (e [A] k)

Recall that for Church encoded inductive types, the fold is basically the identity, so this result is equivalent (up to beta) to:

 foldF [B] k' e = h (foldF [A] k' e)

So this lets us shifts contexts out of iterators if they are F-algebra homomorphisms. Note that this proof was also the one where the graph lemma is actually used.

The Beta and Eta Rules for Inductive Types

Next, we'll prove the beta- and eta-rules for inductive types. I'll do it in three steps:

  • The beta rule (eg, let (x,y) = (e1, e2) in e' == [e1/x, e2/y]e'),
  • The basic eta rule (eg, let (x,y) = e in (x,y) == e)
  • The commuting conversions (eg, C[e] = let (x,y) = e in C[(x,y)])

Theorem (beta rule): If k : F(A) → A and e : F(μF), then

 foldF [A] k (inF e) = k (fmap [μF] [A] (foldF [A] k) e)

Proof: This follows by unfolding the definitions and beta-reducing. (You don't even need parametricity for this part!)

This shows that for any F-algebra (A, k), foldF [A] k is an F-algebra homomorphism from (μF, inF) to (A, k).

Theorem (basic eta) For all e : μF, we have e = e [μF] inF

Proof:

  1. Assume an arbitrary B and k : F(B) → B.
    1. Note h = foldF [B] k is an F-algebra homomorphism from (μF, inF) to (B, k) by the beta rule.
    2. Then by our lemma, e [B] k = h (e [μF] inF).
    3. Unfolding, h (e [μF] inF) = foldF [B] k (e [μF] inF).
    4. Unfolding, foldF [B] k (e [μF] inF) = e [μF] inF [B] k.
    5. So e [B] k = e [μF] inF [B] k.
  2. So for all B and k : F(B) → B, we have e [B] k = e [μF] inF [B] k.
  3. By extensionality, e = e [μF] inF.

Theorem (commuting conversions): If k : F(A) → A and f : μF → A and f is an F-algebra homomorphism from (μF, inF) to (A, k), then f = foldF [A] k.

Proof:

  1. We want to show f = foldF [A] k.
  2. Unfolding foldF, we want to show f = λx:μF. x [A] k
  3. Assume e : μF.
    1. Now we will show f e = e [A] k.
    2. By the homomorphism lemma, and the fact that f : (μF, inF) → (A, k) we know that e [A] k = f (e [μF] inF).
    3. By the basic eta rule, e [A] k = f e
  4. So for all e, we have f e = e [A] k.
  5. Hence by extensionality f = λx:μF. x [A] k.

Note that what I (as a type theorist) called "commuting conversions" is exactly the same as "initiality" (to a category theorist), so now we have shown that the Church encoding of a functor actually lets us define the initial algebra.

Thus, we know that inductive types are definable!

Appendix: the Relational Model of System F

In this appendix, I'll quickly sketch one particular model of System F, basically inspired from the PER model of Longo and Moggi. What we will do is to define types as partial equivalence relations (PERs for short) on terms. Recall that a partial equivalence relation A on a set X is a relation on X, which is (a) symmetric (i.e., if (x, x') ∈ R then (x',x) ∈ R), and (b) transitive (if (x,y) ∈ X and (y,z) ∈ R then (x,z) ∈ R).

We take the set of semantic types to be the set of PERs on terms which are closed under evalation in either direction:

Type = { X ∈ PER(Term) | ∀x, x', y, y'. 
                           if x ↔∗ x' and y ↔∗ y' 
                           then (x,y) ∈ X ⇔ (x',y') ∈ X}

PERs form a complete lattice, with the meet given by intersection, which is the property which lets us interpret the universal quantifier of System F. We can then take a type environment θ to be a map from variables to semantic types, and interpret types as follows:

〚a〛θ ≜ θ(α)
〚A → B〛θ ≜ { (e, e') ∈ Term × Term | ∀ (t,t') ∈ 〚A〛θ. 
                                          (e t,e' t') ∈ 〚B〛θ }
〚∀a. A〛θ ≜ ⋂_{X ∈ Type} 〚A〛(θ, X/α) 

This is the PER model of System F, which is quite easy to work with but not quite strong enough to model parametricity. To model parametricity, we need a second semantics for System F, the so-called relational semantics.

A relation R between two PERs A and B over a set X, is a a relation on X which respects A and B. Suppose that (a,a') ∈ A and (b,b') ∈ B. Then, a relation R respects A and B when (a, b) ∈ R if and only if (a', b') ∈ R.

Now, suppose we have two environments θ₁ and θ₂ sending type variables to types, and a relation environment ρ that sends each type variable a to a relation respecting θ₁(a) and θ₂(a). Then we can define the relational interpretation of System F types as follows:

R〚a〛θ₁ θ₂ ρ ≜ θ(α)
R〚A → B〛θ ≜ { (e, e') ∈ Term × Term | ∀ (t,t') ∈ R〚A〛θ₁ θ₂ ρ. 
                                          (e t,e' t') ∈ R〚B〛θ₁ θ₂ ρ }
R〚∀a. A〛θ ≜ ⋂_{X, Y ∈ Type, R ∈ Rel(X,Y)} 
                  R〚A〛(θ₁, X/a) (θ₂, Y/a) (ρ, R/a)

Additionally, we need to redefine the PER model's interpretation of the ∀a.A case, to use the relational model:

〚∀a. A〛θ ≜ ⋂_{X, Y ∈ Type, R ∈ Rel(X,Y)} 
                 R〚A〛θ θ (Id[θ], R/a)

Here, to give a PER interpetation for the forall type, we use the relational interpretation, duplicating the type environment and using the identity relation for each variable (written Id(θ)). By identity relation, we mean that given a PER A, the PER A is a relation between between A and A which respects A.

This modified definition ensures the model satisfies the identity extension property -- if you take the relational interpretation of a type B(α), and fill in the type variable α with the equality relation for the type A, then the relational interpretation of B[A/α] will the equality relation for the type B[A/α]. In what follows I will often write Id[X] to mean the equality relation for the type X.

The term calculus for System F also satisfies the abstraction property, (aka the fundamental property of logical relations), which says that given a well typed term Θ; · ⊢ e : A and two type environments θ₁ and θ₂, and a relation environment ρ between them, then e is related to itself in R〚A〛θ θ ρ.

Friday, June 19, 2020

PLDI 2020 Conference Report

I just finished "attending" PLDI 2020, a programming languages conference. Like many conferences in computer science, due to COVID-19, on short notice the physical conference had to be cancelled and replaced with an online virtual conference. Talks were streamed to Youtube, questions were posted to a custom Slack channel, there was a startling variety of online chat applications to hold discussions with, and so on.

It was obviously an enormous amount of work to put together (there was even a custom chat application, Clowdr, written specifically for SIGPLAN(?) conferences), which makes me feel very unkind to report that for me, the online conference experience was a complete waste of time.

So, what went wrong?

To understand this, it is worth thinking about what the purpose of a conference is. The fundamental purpose of a conference is to meet people with shared interests and talk to them. The act of talking to people is fundamental, since it is how (a) you get to see new perspectives about subjects you are interested in, and (b) how you build the social networks that make it possible for you to become and remain an expert. (Recall the 19th century economist Alfred Marshall's description of this process: "The mysteries of the trade become no mysteries; but are as it were in the air, and children learn many of them unconsciously.”)

Even talks -- the things we build conferences around -- exist to facilitate this process of conversation. Talks at conferences really have two important functions: first, the topic of a talk is a filtering device, which helps identify the subset of the audience which is interested in this topic. This means that in the break after the talk, it is now much easier to find people to talk to who share interests with you.

Second, talks supply Schelling-style focal points: you and your interlocutor have common knowledge that you are both interested in the topic of the session, and you both also know that you saw the talk, which gives you a subject of conversation. (Note: as a speaker, you do communicate with your audience, but indirectly, as your talk becomes the seed of a conversation between audience members, which they will use to develop their own understandings.)

The fundamental problem with PLDI 2020 was that it was frustratingly hard to actually talk to people. The proliferation of timezones and chat applications meant that I found it incredibly difficult to actually find someone to talk to -- at least one of the apps (gather.town) just never worked for me, sending me into an infinite sign-up loop, and the others were either totally empty of people who were actually there, or did not seem suited for conversation. (One key issue is that many people log in to an app, and then stop paying attention to it, which makes presence indications useless.)

So if social distancing and travel restrictions due to COVID-19 remain in force (as seems likely), I think it would be better to simply convert our PL conferences fully into journals, and look for other ways to make online networking happen. The sheer amount of labour going into PLDI 2020 supplies strong evidence that simply trying to replicate a physical conference online cannot be made to work with any amount of effort.

However, before I convince everyone that I am totally right about everything, there are still online conferences that will happen -- for example, ICFP in August. So to make sure that I can look forward to the experience rather than dreading it, I will need to make a plan to actually make sure I talk to people.

So, if you are reading this, and are (or know) a researcher in PL who would like to talk, then give me a ping and we can arrange something for ICFP.

  • I am explicitly happy to talk to graduate students and postdocs about their research.

  • I know the most about dependent types, linear and modal types, type inference, separation logic, and program verification.

  • I know less about (but still like talking about) compiler stuff, theorem proving, SMT, Datalog, macros, and parsing.

  • This list is not an exhaustive list of things I'm interested in. One of the nice things about conversations is that you can use shared touchstones to discover how to like new things.

Once you get in touch, I'll put you on a list, and then once the conference talks are listed, we can organize a time to have a chat after we have all seen a talk we are interested in. (Having seen a talk means we will all have shared common knowledge fresh in our minds.) Assuming enough people are interested, I will aim for meetings of 3-5 people, including myself.