One of the most practically useful features of functional programming
languages is pattern matching, and one of the most important
quality-of-implementation features for an implementation is the
exhaustiveness checking for pattern matches. In this post, I will give
the simplest algorithm for coverage checking that I know.
I previously gave one in my paper Focusing
on Pattern Matching. I was never completely satisfied with
that algorithm, since it was harder to understand than I liked. It
worked by giving a judgment to prove that no match would fail, rather
than directly proving that all matches covered. The negation made
thinking about extensions (such as pattern matching for dependent
types) harder than it should be, and it also made it harder to get a
match compilation algorithm.
So in this blog post, I'll give a better algorithm. It's
super-simple, and is about as naive as it should be -- there are only
six rules in total. It's not totally industrial strength, but (a) it's
certainly good enough to use in a prototype compiler, and (b) it's easy
enough to beef up.
There's also actually some cool proof theory yet to be worked out lurking
here. I'll point it out as we go along.
$$
\newcommand{\cover}[2]{{#1} \;\mathsf{covers}\;{#2}}
\newcommand{\elaborate}[3]{{#1} \;\mathsf{covers}\;{#2} \leadsto {#3}}
\newcommand{\To}{\Rightarrow}
$$
First, I'll give the syntax.
$$
\begin{array}{llcl}
\mbox{Types} &
A & ::= & 1 \bnfalt A \times B \bnfalt A + B \bnfalt A \to B \bnfalt o
\\
\mbox{Patterns} &
p & ::= & () \bnfalt (p, p) \bnfalt \inl(p) \bnfalt \inr(p) \bnfalt \_ \bnfalt x
\\
\mbox{Pattern Lists} &
ps & ::= & \cdot \bnfalt p, ps
\\
\mbox{Branch Lists} &
\Pi & ::= & \cdot \bnfalt ps \To e; \Pi
\\
\mbox{Contexts} &
\Gamma & ::= & \cdot \bnfalt u:A, \Gamma
\end{array}
$$
The types are sums and products, plus a type $o$ to indicate functions and
other non-matchable types. The patterns are just what you'd expect -- $()$ for
the unit pattenrn, $(p, p')$ for pairs, and $\inl(p)$ and $\inr(p)$ for the
left and right branches of sums. (Of course, there is no pattern for the
empty type!) We also have wildcard $\_$ and variable $x$ patterns.
Since our algorithm works inductively, we will destructure patterns into their
components, and we will need to keep track of lists of patterns. A pattern
list (with metavariable $ps$) is just the obvious thing --- a comma-separated
list of patterns.
We also need branch lists -- a pattern match clause consists of a list of
of patterns and expressions. So we use branch lists to say what body is associated
with each pattern list (a pattern list, since we are going to destructure the
pattern given for each arm).
We will also write $\overrightarrow{ps \To e}$ to indicate branch lists. If I write
a list like $\overrightarrow{(p_1, p_2), ps \To e}$, then I mean that every element
of the branch list starts with a pair pattern, and similarly for the other
constructors.
This algorithm keeps track of the type of the value being destructured, so we
introduce a typed context $\Gamma$ to keep track of the types of the expressions
being destructured.
Without further delay, here's the coverage judgment $\cover{\Pi}{\Gamma}$. You
can read this as algorithm bottom-up, in the usual logic-programming style.
$$
\array{
\rule{ \cover{\overrightarrow{ps' \To e'}}{\Gamma} }
{ \cover{\overrightarrow{\_, ps' \To e'}}{u:A, \Gamma} }
\\[1em]
\rule{ \cover{ \overrightarrow{ps' \To e'},\;\; (\_, ps \To e), \overrightarrow{ps'' \To e''} }{u:A, \Gamma} }
{ \cover{ \overrightarrow{ps' \To e'},\;\; (x, ps \To e), \overrightarrow{ps'' \To e''} }{u:A, \Gamma} }
\\[1em]
\rule{ \cover{ \overrightarrow{ps \To e},\;\; \overrightarrow{ps' \To e'} }{\Gamma}}
{ \cover{ \overrightarrow{(), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:1, \Gamma}}
\\[1em]
\rule{ \cover{ \overrightarrow{p_1, p_2, ps \To e},\;\; \overrightarrow{\_, \_, ps' \To e'} }{a:A, b:B, \Gamma}}
{ \cover{ \overrightarrow{(p_1,p_2), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:A \times B, \Gamma}}
\\[1em]
\rule{ }
{ \cover{ \overrightarrow{\_, ps'' \To e''} }{u:0, \Gamma}}
\\[1em]
\rule{ \cover{ \overrightarrow{p, ps \To e},\;\; \overrightarrow{\_, ps'' \To e''}}{a : A, \Gamma} \\
\cover{ \overrightarrow{p', ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''}}{b : B, \Gamma}}
{ \cover{ \overrightarrow{\inl(p), ps \To e},\;\; \overrightarrow{\inr(p'), ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''} }
{u:A + B, \Gamma}}
}
$$
The first two rules handle wildcards and variables. The first rule
says that if the lead pattern of every pattern list in the branch set
is a wildcard, we can just drop it and keep going. The second rule
says that you can turn any lead variable pattern into a wildcard
pattern.
The third rule handles the unit pattern. It is nice and easy --- it
says that if every lead pattern is either a unit or wildcard pattern,
we can drop it and keep going.
The fourth rule handles tuples, and is almost as easy. It says that
if we have a lead pattern $(p_1, p_2), ps \To e$, we can rewrite it to
$p_1, p_2, ps \To e$. We also have to remember to double any lead
wildcard pattern $\_, ps' \To e'$ into $\_, \_, ps' \To e'$, since we
are destructuring a product into two subpatterns.
The fifth rule handles the void type. It just succeeds, since there
are no patterns of empty type!
The sixth rule handles sums. It says that if we are scrutinizing a
value of type $A + B$, then we can separate the branch list into those
branches handling the left case $\inl(p), ps \To e$ and those branches
handling the right case $\inr(p'), ps' \To e'$, and then check then at
$A$ and $B$ respectively. Here, we have to remember to send wildcards
$\_, ps'' \To e''$ to both sides, since a wildcard can match
both the left- and right-cases.
And that's it!
The open problem in proof theory is a bit of slight of hand in my
notation. You understand exactly what I mean when I take a
meta-pattern like $\overrightarrow{(p_1, p_2), ps \To e}$, and turn it into $\overrightarrow{p_1, p_2,
ps \To e}$. However, formalizing this is trickier than it looks, and
it's not known how to give a proof-theoretic characterization of this
kind of pattern (according to Rob Simmons, who I asked early this year
about it).
As a bonus, I'll also show how coverage checking can also give you
a compilation algorithm for pattern matching. The judgment $\elaborate{\Pi}{\Gamma}{t}$
tells you how to turn a list of branches $\Pi$ into a nested sequence of tuple
destructurings and case-statements.
$$
\array{
\rule{ \elaborate{\overrightarrow{ps' \To e'}}{\Gamma}{t} }
{ \elaborate{\overrightarrow{\_, ps' \To e'}}{u:A, \Gamma}{t} }
\\[1em]
\rule{ \elaborate{\overrightarrow{ps' \To e'}}{\Gamma}{t} }
{ \elaborate{\overrightarrow{\_, ps' \To e'}}{u:A, \Gamma}{t} }
\\[1em]
\rule{ \elaborate{ \overrightarrow{ps' \To e'},\;\; (\_, ps \To \letv{x}{u}{e}), \overrightarrow{ps'' \To e''} }{\Gamma}{t} }
{ \elaborate{ \overrightarrow{ps' \To e'},\;\; (x, ps \To e), \overrightarrow{ps'' \To e''} }{\Gamma}{t} }
\\[1em]
\rule{ \elaborate{ \overrightarrow{ps \To e},\;\; \overrightarrow{ps' \To e'} }{\Gamma}{t} }
{ \elaborate{ \overrightarrow{(), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:1, \Gamma}{\letv{()}{u}{t}} }
\\[1em]
\rule{ \elaborate{ \overrightarrow{p_1, p_2, ps \To e},\;\; \overrightarrow{\_, \_, ps' \To e'} }{a:A, b:B, \Gamma}{t} }
{ \elaborate{ \overrightarrow{(p_1,p_2), ps \To e},\;\; \overrightarrow{\_, ps' \To e'} }{u:A \times B, \Gamma}{\letv{(a,b)}{u}{t}} }
\\[1em]
\rule{ }
{ \elaborate{ \overrightarrow{\_, ps'' \To e''} }{u:0, \Gamma}{\abort(u)} }
\\[1em]
\rule{ \elaborate{ \overrightarrow{p, ps \To e},\;\; \overrightarrow{\_, ps'' \To e''}}{a : A, \Gamma}{t_1} \\
\elaborate{ \overrightarrow{p', ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''}}{b : B, \Gamma}{t_2} }
{ \elaborate{ \overrightarrow{\inl(p), ps \To e},\;\; \overrightarrow{\inr(p'), ps' \To e'},\;\; \overrightarrow{\_, ps'' \To e''} }
{u:A + B, \Gamma}
{\case{u}{a}{t_1}{b}{t_2}} }
}
$$
Note that it is exactly the same judgment as before --- coverage
checking computes exactly the information we need, and so code generation
comes along for the ride!