Thursday, March 17, 2022

Fold Considered Annoying

I recently read Shriram Krishamurthi and Kathi Fisler's ICER 2021 paper, Developing Behavioral Concepts of Higher-Order Functions. In this paper, they study not the theory of higher-order functions, bur rather the pedagogy of higher-order functions: how do we teach students how to program with them?

In particular, they studied how students understood different higher-order functions -- e.g., map, filter, fold and co. -- by asking them to perform tasks like classifying which functions were similar, which example input/outputs could conceivably be produced by particular HOFs, and so on.

Since this is a blog, I can say that one thing I really liked about this paper is that it confirmed my prejudices! In particular, they found that fold was much harder for students to understand than any of the other higher-order functions they studied.

This is something I have long believed, both from anecdotal evidence and for mathematical reasons. I'll pass over the anecdotes (since the paper has actual data), and focus instead on what I see as the core mathematical issue: the specification of fold is dramatically more complex than the specification of map or filter.

Both map and filter have extremely simple equational specifications. For example, map is characterised by the following three laws:

map f []         = []
map f (xs ++ ys) = (map f xs) ++ (map f ys)
map f [x]        = [f x]

Basically, this says that map takes nils to nils, distributes over appends, and applies f to elements pointwise. All three of these equations are universally-quantified, unconditional equations, which is about as simple as a specification can get.

The specification for filter is only a little more complex:

filter p []         = []
filter p (xs ++ ys) = (filter p xs) ++ (filter p ys)
filter p [x]        = if p(x) then [x] else []

filter does nothing to nils, distributes over appends, and for each element either keeps it or discards it. Again, All three of these equations are universally-quantified, unconditional equations.

On the other hand, what is the specification for fold? That is given by its universal property, which reads:

 [h [] = nil ∧
  (∀x, xs. cons x (h xs) = h (x :: xs))]  // Uh-oh!
          iff 
(fold nil cons = h)

This is not an equational specification at all! It is a bi-implication, and while the right-to-left direction is basically the definition of fold, the left-to-right direction contains a formula with a logical quantifier in the premise. We know from nearly a century of experience with high school calculus that quantifier alternations are a giant obstacle to understanding: every year thousands of students find themselves utterly baffled by epsilon-delta arguments, and here we have something harder – we have an implication with a universally quantified premise.

So I've always been very careful whenever I teach students about the fold function, because I think it has a very challenging spec for students to wrap their heads around.

If you're watching out for quantifiers, that also suggests that take-while will also be one of the more challenging for functions students – takewhile p xs returns the longest prefix of xs satisfying the predicate p, and "longest prefix" is going to be involve another higher-rank quantified formula.

The evidence in the paper seems weakly consistent with this idea, in that they report that take-while does seem harder for students, but it does not seem to be as much harder as I would have expected.

Anyway, I found this a really interesting paper!

5 comments:

  1. I think I'm missing something. Your "specifications" for map and filter seem indistinguishable from the way those functions are implemented. Correspondingly, why can't I just present the following analogous "specification" for fold?

    fold z s [] = z
    fold z s (x::xs) = s x (fold z s xs)

    I don't see why this "specification" is any harder to understand than the ones for map and filter.

    ReplyDelete
    Replies
    1. Hi, Derek!

      The specifications for map and filter are not implementations: the ++ symbol means the append function, and you can't pattern-match on function calls like that. Append and nil form the monoid structure of a free monoid, and the equations I give assert that map and filter are monoid homomorphisms: and that's enough to fully characterise them.

      The definition of fold you give is actually the right-to-left direction of the bi-implication I wrote. Assume fold nil cons = h, and then after you replace h with fold nil cons, you get exactly the equations you wrote down. fold has a stronger spec than that, though -- it is the unique solution to those equations. So for any h such that h [] = nil and h (x :: xs) = cons x (h xs), we know that h = fold nil cons.

      Delete
  2. Ah, that explains why I always tend to use foldMap! It has a similarly simple equational specification.

    ReplyDelete
  3. Oh, but it is really easy to teach fold. You just draw a syntax tree for a list, and explain that fold is going to replace every cons with some function f and the (sole) nil with some value a. Sure, this is not the universal property or anything, but understanding how a function works maybe is not the same as fully grasping its spec...?

    ReplyDelete

  4. Computer Science | Gyansetu offers top-notch courses that cater to every level of expertise. With knowledgeable instructors and a variety of subjects, from programming to cybersecurity, it's a perfect platform to expand your tech skills and stay ahead in the ever-evolving world of computer science.
    For more info:- https://www.gyansetu.in/blogs/how-can-a-b-com-graduate-become-a-data-scientist/

    ReplyDelete