Thursday, March 17, 2022

Fold Considered Annoying

I recently read Shriram Krishamurthi and Kathi Fisler's ICER 2021 paper, Developing Behavioral Concepts of Higher-Order Functions. In this paper, they study not the theory of higher-order functions, bur rather the pedagogy of higher-order functions: how do we teach students how to program with them?

In particular, they studied how students understood different higher-order functions -- e.g., map, filter, fold and co. -- by asking them to perform tasks like classifying which functions were similar, which example input/outputs could conceivably be produced by particular HOFs, and so on.

Since this is a blog, I can say that one thing I really liked about this paper is that it confirmed my prejudices! In particular, they found that fold was much harder for students to understand than any of the other higher-order functions they studied.

This is something I have long believed, both from anecdotal evidence and for mathematical reasons. I'll pass over the anecdotes (since the paper has actual data), and focus instead on what I see as the core mathematical issue: the specification of fold is dramatically more complex than the specification of map or filter.

Both map and filter have extremely simple equational specifications. For example, map is characterised by the following three laws:

map f []         = []
map f (xs ++ ys) = (map f xs) ++ (map f ys)
map f [x]        = [f x]

Basically, this says that map takes nils to nils, distributes over appends, and applies f to elements pointwise. All three of these equations are universally-quantified, unconditional equations, which is about as simple as a specification can get.

The specification for filter is only a little more complex:

filter p []         = []
filter p (xs ++ ys) = (filter p xs) ++ (filter p ys)
filter p [x]        = if p(x) then [x] else []

filter does nothing to nils, distributes over appends, and for each element either keeps it or discards it. Again, All three of these equations are universally-quantified, unconditional equations.

On the other hand, what is the specification for fold? That is given by its universal property, which reads:

 [h [] = nil ∧
  (∀x, xs. cons x (h xs) = h (x :: xs))]  // Uh-oh!
(fold nil cons = h)

This is not an equational specification at all! It is a bi-implication, and while the right-to-left direction is basically the definition of fold, the left-to-right direction contains a formula with a logical quantifier in the premise. We know from nearly a century of experience with high school calculus that quantifier alternations are a giant obstacle to understanding: every year thousands of students find themselves utterly baffled by epsilon-delta arguments, and here we have something harder – we have an implication with a universally quantified premise.

So I've always been very careful whenever I teach students about the fold function, because I think it has a very challenging spec for students to wrap their heads around.

If you're watching out for quantifiers, that also suggests that take-while will also be one of the more challenging for functions students – takewhile p xs returns the longest prefix of xs satisfying the predicate p, and "longest prefix" is going to be involve another higher-rank quantified formula.

The evidence in the paper seems weakly consistent with this idea, in that they report that take-while does seem harder for students, but it does not seem to be as much harder as I would have expected.

Anyway, I found this a really interesting paper!

Thursday, March 3, 2022

Simple Type Inference for System F

Henry Mercer, Cameron Ramsay, and I have a new draft paper on type inference out! Check out Implicit Polarized F: Local Type Inference for Impredicativity.

System F, the polymorphic lambda calculus, features the principle of impredicativity: polymorphic types may be explicitly instantiated at other types, enabling many powerful idioms such as Church encoding and data abstraction. Unfortunately, type applications need to be implicit for a language to be human-usable, and the problem of inferring all type applications in System F is undecidable. As a result, language designers have historically avoided impredicative type inference.

We reformulate System F in terms of call-by-push-value, and study type inference for it. Surprisingly, this new perspective yields a novel type inference algorithm which is extremely simple to implement (not even requiring unification), infers many types, and has a simple declarative specification. Furthermore, our approach offers type theoretic explanations of how many of the heuristics used in existing algorithms for impredicative polymorphism arise.

This algorithm is absurdly easy to implement, too.

Because we don't need unification, it also looks like this kind of algorithm ought to play very nicely with dependent types. My PhD student Ilya Kaisin is looking at this problem now, so stay tuned for even more.

Monday, January 17, 2022

Static typing vs. Dynamic Typing vs. Dana Scott

If you read about programming languages on the Internet, you'll inevitably run into (or even participate in) long arguments about whether dynamic or static typing is better. A long time ago, before I went to grad school, I enjoyed arguing with people on the Internet about this very topic.

But, I no longer find myself interested in that debate. This is for two reasons. One, like so many addicts before me, I've moved onto harder drugs, such as frequentism versus Bayesianism. (Frequentism is good, actually, and p-values are one of the great inventions of science. Just so you know.)

Two, I learned that these debates largely end up being echoes of an argument Dana Scott had with himself. You can tell he's a genius, since he ended up producing many of the strongest arguments for and many of the strongest arguments against!

One of the best arguments against grounding semantics on the untyped lambda calculus is Dana Scott's unpublished paper from 1969, A type-theoretical alternative to ISWIM, CUCH, OWHY. In this paper, he argues that untyped syntactic models are difficult to reason about, and we should design meta-languages with a clear mathematical status, in order to facilitate reasoning about languages and programs.

But why is it unpublished? Let's see what Scott himself has to say:

In particular, it was recognized that there were posets whose continuous function spaces of self-maps could be identified with the given posets themselves. And so there actually were “traditional” models of lambda-calculus that could be defined without first going through the proof theory of the formal system itself (and which could be related to many other mathematically meaningful structures and theories as well).

This revelation was both gratifying and embarrassing. After writing with tiresome sarcasm about the lack of meaning in the type-free lambda-calculus introduced only formally, the author himself found an interesting kind of semantical interpretation for the “type-free” language.

The result that a universal domain can be constructed from these lattice theoretic models of the lambda calculus is described in his epochal 1971 paper Continuous Lattices.

Indeed, Scott's results continue to have a lot of influence today. This goes well beyond being a technical tool in the toolbox: the sensibility and style of thinking of he invented continues to pervade our research culture and shapes the latest methods.

  1. The invention of the universal domain taught us that an inconsistent logic can still have a meaningful semantics. Prior to this work, it was believed that inconsistent logics were degenerate (due to everything following from false), but the D∞ model disproves that — it dramatically expands the kinds of arguments that we can model mathematically.

    This is incredibly philosophically important, since you can't use mathematics to study something you don't have a mathematical model of! And since as mathematicians we spend a lot of our time making wrong arguments, it's very valuable to be able to understand them.

    (As an unrelated aside, this is one of the things that I as a proof theorist most admire about finite model theory – things like conflict-driven clause learning in SAT solvers exploit the idea that being wrong can be productive in a really elegant way.)

  2. Having a universal domain makes it possible to formalise the intuition that a type is a subset of the set of all possible values in a mathematically coherent way. Naively, this idea is problematic, but with a universal domain a la Scott the approach works ("types as retracts" in the jargon).

    This approach underpins the functioning of program logics like Hoare logic (see Mellies and Zeilberger's paper Functors are Type Refinement Systems), is used to define the semantics of dependent type theories like Nuprl, and is also important for formulating the semantics of System F, the polymorphic lambda calculus.

  3. The construction of the universal domain as the limit of a sequence of finitary approximations lets you derive the abstract domains of abstract interpretation by taking Scott's inverse limit construction and cutting off the sequence at a finite point rather than going all the way to infinity. This is important for understanding how static analysis fits into the landscape of semantics -- see David Schmidt's paper Abstract Interpretation from a Denotational Semantics Perspective.

  4. In the domain-theoretic model, the idea of finitary information is realised by modelling a program as either giving you an answer or going into an infinite loop. If you change the idea of no-answer from looping to signalling an error (say, with an exception), then you can similarly derive gradual typing!

    Max S. New's and Amal Ahmed's Graduality from Embedding-projection Pairs really nails this idea down, but it has been one of the driving intuitions since the very beginning of work on the topic. Fritz Henglein's original paper explicitly formulates the rules for casts in analogy with the embedding-projection pairs found in the inverse limit construction.

    There's an important lesson here about how analogies can be productive for a long time before we give a completely formal expression of them -- Henglein wrote his paper in 1994!

  5. Since Dana Scott has very good taste, his emphasis was on the algebraic properties of his model construction, and so he quickly moved from complete lattices to domain theory.

    But this also meant that he laid out the path for people to look at other ways of obtaining universal domains. One super-important approach was pioneered by Maurice Nivat and the "Dutch school" of denotational semantics in the late 1970s and early 1980s, where instead of domains people looked at metric spaces and Banach's theorem. Bakker and Zucker's 1982 paper is still quite readable is a still-readable account of the early thinking on this topic, and it reached a mature form with the work of America and Rutten.

    This technique still extremely heavily used today, except for various reasons we have renamed it "step-indexing". This is the key technical mechanism which lets us apply universal domain constructions to give a good semantics to assembly language, because the ability to pun data and code by setting the PC and jumping needs the same mathematical machinery as interpreting the untyped lambda calculus.

  6. This point about the equivocation of code and data in assembly language holds more generally. For example, Löb's theorem can be understood as a modal variant of the derivation of the Y-combinator, illustrates how the interaction of self-reference and quotation resembles the ideas behind the D∞-style structure. This idea hasn't been worked out as nicely as New and Ahmed worked things out for gradual types, but what's a blog post without a provocation to future research?

Thinking of "typed versus untyped languages" sort of misses the point, theoretically. We can neither pull them apart, nor do we really want to. To quote Heraclitus (as translated by Kathleen Freeman):

They do not understand how that which differs with itself is in agreement: harmony consists of opposing tension, like that of the bow and the lyre.

Thursday, December 30, 2021

What is a programming language?

One of the more surprising facts about the discipline of programming language theory is that it is actually possible to define what programming languages are in a reasonably mathematically satisfying way.

  1. A language is a presentation of a (generalised) algebraic theory. Basically, think of a language as a set of generators and relations in the style of abstract algebra.

    You need to beef up the universal algebra story a bit to handle variables and binding (e.g., see the work of Fiore and Hamana on higher-order algebraic theories), but the core intuition that a language is a set of generators for terms, plus a set of equations these terms satisfy is exactly the right one.

    For example:

    1. the simply-typed lambda calculus
    2. regular expressions
    3. relational algebra
  2. A model of a a language is literally just any old mathematical structure which supports the generators of the language and respects the equations.

    For example:

    1. we can model the typed lambda calculus using sets for types and mathematical functions for terms,
    2. we can model regular expressions as denoting particular languages (ie, sets of strings)
    3. we can model relational algebra expressions as sets of tuples
  3. A model of computation or machine model is basically a description of an abstract machine that we think can be implemented with physical hardware, at least in principle. So these are things like finite state machines, Turing machines, Petri nets, pushdown automata, register machines, circuits, and so on. Basically, think of models of computation as the things you study in a computability class.

    Nearly all models of computation are transition systems, where you have a set of states, and a transition function which tells you how a computation can change one state into another. There are lots of generalisations of this (relations instead of functions, I/O, system calls, message passing communication, etc.), and the slick way of accounting for this generality is by modelling transition systems as coalgebras (i.e., the set S paired with a coalgebra map step : S → F(S), where F is a functor.

    In addition to this, every model of computation also comes with a notion of observation – in its most basic form, how do you tell when a program has finished executing? Observations can get quite intricate: to model computational complexity, you have to be able to observe runtime and memory usage, and for concurrent programs you can observe intermediate program states (due to the possibility of concurrent interference). More practically, the compiler correctness theorem for something like CompCert says that the sequence of system calls the compiled binary produces will be the same as the one the C abstract machine will produce.

    I don't know a nice algebraic/categorical characterisation of the notion of observation, though probably Alexandra Silva does.

    The Church-Turing thesis characterises which abstract machines we think it is possible to physically implement.

    Some examples of models of computation are:

    1. Turing machines,
    2. finite state machines
    3. RAM machine programs
  4. A language is a programming language when you can give at least one model of the language via a machine model.

    For example:

    1. the types of the lambda calculus can be viewed as partial equivalence relations over Gödel codes for some universal turing machine, and the terms of a type can be assigned to equivalence classes of the corresponding PER.

    2. Regular expressions can be interpreted into finite state machines quotiented by bisimulation.

    3. A set in relational algebra can be realised as equivalence classes of B-trees, and relational algebra expressions as nested loops walking over these trees.

    Note that in all three cases we have to quotient the states of the machine model by a suitable equivalence relation to preserve the equations of the language's theory.

    This quotient is very important, and is the source of a lot of confusion. It hides the equivalences the language theory wants to deny, but that is not always what the programmer wants – e.g., is merge sort equal to bubble sort? As mathematical functions, they surely are, but if you consider them as operations running on an actual computer, then we will have strong preferences! (Conversely, the ability to equate radically different machine configurations is the reason why optimising compilers and refactoring are things.)

    Moreover, we have to set up this quotient so it agrees with the machine model's notion of observation – for example, we want exactly and only the strings accepted by a regular expression to end in accepting states of the finite state machine. This property is called adequacy, and is what justifies the claim that this is a reasonable implementation of the programming language.

In other words, a programming language is a language which has an adequate implementation in a model of computation. Note that this definition does not mandate the implementation strategy!

A common source of confusion arises from the fact that if you have a nice type-theoretic language (like the STLC), then:

  1. the term model of this theory will be the initial model in the category of models, and
  2. you can turn the terms into a machine model by orienting some of the equations the lambda-theory satisfies and using them as rewrites.

These oriented equations are where operational semantics comes from, at least for purely functional languages. As a result we often abuse language to say the simply typed lambda calculus "is" a programming language.

But in fact, the lambda calculus is a presentation of an algebraic theory, and it is the same programming language regardless of whether it is implemented by term rewriting or by compilation to x86 machine code.

Wednesday, December 1, 2021

Obliteratingly Fast Parser Combinators

Jeremy Yallop, Ningning Xie, and I have a new draft out about combinator parsing, entitled Fusing Lexing and Parsing. The abstract is short, but sweet:

Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity, but harmful for performance.

We show how to fuse together separately-defined lexers and parsers, drastically improving performance without compromising modularity. Our staged parser combinator library, flap, provides a standard parser combinator interface, but generates psecialized token-free code that runs several times faster than ocamlyacc on a range of benchmarks.

I'm excited about this for three reasons.

First, even though the performance is way better this time around, going from competitive with lex+yacc to 3-9 times faster, the implementation is actually simpler than in our 2019 paper – we don't need a complex staging pipeline because we introduce an IR which makes all the invariants needed for fast code generation lexically apparent.

Second, really fast parsers are potentially a powerful tool for applications like IDEs. If you are fast enough, you are free from having to mess around with things like incremental parsing, because if you can parse a whole editor buffer in a millisecond it's fine to just reparse everything on every keystroke. Imagine what could happen if we could fused in typechecking, too! That's currently just wishful thinking (aka "future work"), of course, but since typechecking is mostly a tree traversal, it's not totally hopeless.

Finally, at a meta-level this is a good illustration of the incorrectness of the standard engineer proverb "fast, good, cheap – pick two". This advices presumes that we lie on the Pareto frontier, so the best you can do is manage tradeoffs. But that's actually a false assumption – it is possible to do things which are unconditionally better along all three dimensions.

No source code is ready for release yet, but Jeremy is likely to release an OPAM package for it in due course.

Wednesday, November 17, 2021

POPL till you DROPL

My PhD student Faustyna Krawiec, along with me and several other coauthors, has a new paper out, which is due to appear at POPL 2022:

Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation

Faustyna Krawiec, Simon Peyton-Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew Fitzgibbon.

In this paper, we give a simple and efficient implementation of reverse-mode automatic differentiation, which both extends easily to higher-order functions, and has run time and memory consumption linear in the run time of the original program. In addition to a formal description of the translation, we also describe an implementation of this algorithm, and prove its correctness by means of a logical relations argument.

We give an incredibly simple implementation of reverse-mode AD, which works in a language with higher-order functions, and is asymptotically efficient. When we started the work I was hoping it would be a good excuse to learn some differential geometry, but it turned out there is such a simple algorithm that we didn't need any fancy machinery! So I still don't know any differential geometry, but we've got a cool AD algorithm to share.

That's the only POPL paper this year with my name on it, but it turns out that a lot of students and close collaborators have papers without me as well! It might not matter for my academic prestige, but it makes me happy, especially when I see how interesting the work is!

  1. My Phd student, Dima Szamozvancev and his other advisor Marcelo Fiore, have a paper Formal Metatheory of Second-Order Abstract Syntax. They reformulated Marcelo's work on second-order abstract syntax to fit into a constructive type theory, and then mechanised it in Agda. As a result, you can give a description of a syntax, and automatically get out the substitution operators and their correctness property. One particularly powerful feature is that it doesn't just handle binding, but also metavariables. This is essential if you want handle things like inference rules as first-class values!

  2. My newest PhD student, Ilya Kaysin, has a paper, The Leaky Semicolon (no link yet) with several other people giving a compositional semantics to weak memory. As a fun fact, Ilya is entering his PhD with more POPL publications than I exited with. :)

  3. Vikraman Chaudhury, is a (former) PhD student of Amr Sabry's, but he spent a year visiting me and we wrote a paper together on the semantics of capabilities. At this POPL, he has a paper with Amr Sabry and Jacek Karwowski, Symmetries in Reversible Programming: From Symmetric Rig Groupoids to Reversible Programming Languages4, which gives a nice semantics for reversible programming languages.

Tuesday, September 28, 2021

Design Benchmarks

A few weeks ago, Shriram Krishnamurthi asked an interesting question on Twitter:

It's common in PL to have benchmarks for performance. But does anyone know of PL design benchmarks?

This caught my attention because my longest-running research program -- my work on functional reactive programming, with 7 papers so far -- has been explicitly guided by a handful of design benchmarks.

I've never written them down explcitly, so I thought I'd take the opportunity to do so.

  1. The language must validate the full beta- and eta-laws for all of the type constructors in the language.

    This is a proxy benchmark, of course. My actual goal is to make languages which are easy to program with and reason about.

    However, this is an open-ended and vague goal, and so I needed to find a specific, concrete operationalisation of it. There are many possible proxies for this goal, and the one I picked was to look for programming languages which validate a rich collection of semantically-justified equations. This specific choice was made for three reasons.

    • First, this goal it pushes me away from spending time on quality-of-life features like rich standard libraries. These features obviously do make a language easier to use, but they do not directly touch upon any FRP-specific problems.

    • Second, the search for well-behaved equational properties pushes me towards taking features known to be useful, and trying to decompose them into smaller type-theoretic components.

      As Jeff Raskin once noted, in user interfaces "intuitive" and "familiar" are basically synonyms, which means it is good to have a design benchmark which does not penalise doing things that are different from the old standard.

    • Third, déformation professionnelle. This particular goal is the one that the toolbox I've acquired over the years -- expertise in semantics and type theory -- is best-adapted to move me towards.

      If my expertise lay in running user studies, writing model checkers, or programming with continuations, then I would probably approach the problem differently! For example, the work on the Garnet toolkit is both extremely good and very influential, and very different in character from anything I have pursued.

      Obviously, I could learn how to write a constraint solver or run a user study, but the art is long, and life short. I think it's OK for people to focus on different things, and to learn from each other. :)

  2. I have a small collection of tiny programs which are intended to model various common UI patterns, and the languages I design must express these programs.

    The purpose of these programs is basically to prevent over-simplifying away real complexity. If we're trying to optimise a language design for ease of reasoning, the easiest way of "achieving" the goal is to design an impoverished language.

    For example, I check whether I can write programs which dynamically create new widgets, to ensure that I can't design something which restricts programmers to a fixed dataflow graph. Likewise, a program with two widgets sending notifications "sideways" to each other keeps me from trying to rely on hierarchical encapsulation. (This is super tempting, but as far as I can tell, no actual program consistently does this.)

  3. The language must admit a simple implementation strategy.

    This design principle arose as a reaction to a negative lesson.

    My first paper on higher-order FRP, Ultrametric Semantics of Reactive Programs, used a system of dynamic, incremental dependency propagation (a baby version of the stuff Matt Hammer and Umut Acar used to do), and it ended up requiring a really weird implementation strategy for closures. In addition to being able to call them, I also needed to be able to move forward them in time, and this required a really weird set of hacks to go into a closure and move everything in its captured environment forward in time as well.

    This was painful to implement, difficult to prove correct, and it leaked memory like a sieve. Trying to deal with these issues led directly to my ICFP 2013 paper on FRP, which is basically a type system for immediate mode GUI programming.

    Note that I don't mind if the implementation strategy is challenging to prove correct (eg, my POPL 2021 paper with Simon Spies and Derek Dreyer is about inventing some needed machinery for modelling retained-mode GUI implementations), but the actual running code has to be simple.

    This means I haven't looked at any dataflow or constraint stuff in years, but sometimes a design rule is there to prevent you from doing certain things.