Thursday, March 17, 2022

Fold Considered Annoying

I recently read Shriram Krishamurthi and Kathi Fisler's ICER 2021 paper, Developing Behavioral Concepts of Higher-Order Functions. In this paper, they study not the theory of higher-order functions, bur rather the pedagogy of higher-order functions: how do we teach students how to program with them?

In particular, they studied how students understood different higher-order functions -- e.g., map, filter, fold and co. -- by asking them to perform tasks like classifying which functions were similar, which example input/outputs could conceivably be produced by particular HOFs, and so on.

Since this is a blog, I can say that one thing I really liked about this paper is that it confirmed my prejudices! In particular, they found that fold was much harder for students to understand than any of the other higher-order functions they studied.

This is something I have long believed, both from anecdotal evidence and for mathematical reasons. I'll pass over the anecdotes (since the paper has actual data), and focus instead on what I see as the core mathematical issue: the specification of fold is dramatically more complex than the specification of map or filter.

Both map and filter have extremely simple equational specifications. For example, map is characterised by the following three laws:

map f []         = []
map f (xs ++ ys) = (map f xs) ++ (map f ys)
map f [x]        = [f x]

Basically, this says that map takes nils to nils, distributes over appends, and applies f to elements pointwise. All three of these equations are universally-quantified, unconditional equations, which is about as simple as a specification can get.

The specification for filter is only a little more complex:

filter p []         = []
filter p (xs ++ ys) = (filter p xs) ++ (filter p ys)
filter p [x]        = if p(x) then [x] else []

filter does nothing to nils, distributes over appends, and for each element either keeps it or discards it. Again, All three of these equations are universally-quantified, unconditional equations.

On the other hand, what is the specification for fold? That is given by its universal property, which reads:

 [h [] = nil ∧
  (∀x, xs. cons x (h xs) = h (x :: xs))]  // Uh-oh!
          iff 
(fold nil cons = h)

This is not an equational specification at all! It is a bi-implication, and while the right-to-left direction is basically the definition of fold, the left-to-right direction contains a formula with a logical quantifier in the premise. We know from nearly a century of experience with high school calculus that quantifier alternations are a giant obstacle to understanding: every year thousands of students find themselves utterly baffled by epsilon-delta arguments, and here we have something harder – we have an implication with a universally quantified premise.

So I've always been very careful whenever I teach students about the fold function, because I think it has a very challenging spec for students to wrap their heads around.

If you're watching out for quantifiers, that also suggests that take-while will also be one of the more challenging for functions students – takewhile p xs returns the longest prefix of xs satisfying the predicate p, and "longest prefix" is going to be involve another higher-rank quantified formula.

The evidence in the paper seems weakly consistent with this idea, in that they report that take-while does seem harder for students, but it does not seem to be as much harder as I would have expected.

Anyway, I found this a really interesting paper!

Thursday, March 3, 2022

Simple Type Inference for System F

Henry Mercer, Cameron Ramsay, and I have a new draft paper on type inference out! Check out Implicit Polarized F: Local Type Inference for Impredicativity.

System F, the polymorphic lambda calculus, features the principle of impredicativity: polymorphic types may be explicitly instantiated at other types, enabling many powerful idioms such as Church encoding and data abstraction. Unfortunately, type applications need to be implicit for a language to be human-usable, and the problem of inferring all type applications in System F is undecidable. As a result, language designers have historically avoided impredicative type inference.

We reformulate System F in terms of call-by-push-value, and study type inference for it. Surprisingly, this new perspective yields a novel type inference algorithm which is extremely simple to implement (not even requiring unification), infers many types, and has a simple declarative specification. Furthermore, our approach offers type theoretic explanations of how many of the heuristics used in existing algorithms for impredicative polymorphism arise.

This algorithm is absurdly easy to implement, too.

Because we don't need unification, it also looks like this kind of algorithm ought to play very nicely with dependent types. My PhD student Ilya Kaisin is looking at this problem now, so stay tuned for even more.