tag:blogger.com,1999:blog-80684660358755897912023-09-29T02:36:08.626-07:00Semantic DomainNeel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.comBlogger102125tag:blogger.com,1999:blog-8068466035875589791.post-92194493018832410492023-09-21T09:21:00.001-07:002023-09-21T09:21:41.970-07:00Actually, We Will Read Your Thesis<p>One of the weirder (and wronger) bits of "folk wisdom" surrounding the PhD is that no one reads PhD dissertations. If you look online, you'll find many posts <a href="https://academia.stackexchange.com/questions/124689/whats-the-point-of-phd-theses-if-nobody-reads-them">like this one</a>, which ask why anyone writes PhD theses when no one reads them.</p>
<p>This would be a good question, if it were based on a true premise. In fact, if you write even a moderately decent thesis, it will probably be read by many researchers in your field. This is for a simple reason: a thesis has a much higher page limit than any of the papers upon which it is based. As a result, there's more room for the author to explain their results. Furthermore, because it is the author's second attempt at explaining the material, they will also have a better idea of <em>how</em> to explain their results.</p>
<p>When a PhD student makes a discovery or an invention, they and their collaborators will write a paper about it. Papers are targeted at the leading experts in the field, and since the leading experts (pretty much in virtue of being experts) already understand all the basics and know where the state-of-the-art is, research papers are usually diffs. We explain only the novelties of the work, leaving the shared basics of understanding unexplained and tacit.</p>
<p>While this may be fine for experts, it makes accessing the literature difficult for novices (or even experts in related areas). It's like joining a long-running conversation in the middle: there will be a lot of shared context that the new reader lacks.</p>
<p>This is what makes PhD theses so valuable. Because there is much more space, the author of a PhD thesis has the room to explain what they did, from whatever point they think makes sense, rather than solely in terms of what the delta is. So whenever I find a paper I don't understand, I start looking for the PhD thesis based on it. Nine times out of ten, the thesis is vastly more understandable: "obvious" lemmas will have explicit proofs, algorithms will have detailed pseudocode, and the right intuitions and perspectives to take about the topic will be spelled out.</p>
<p>So why is there so much negative folklore around it? My hunch is that it is a way for students to psyche themselves into actually submitting their dissertation. Many PhD students are perfectionists (since they are generally bright people with high standards), and they find the thought of their dissertations becoming part of the scientific record very stressful. By telling themselves no one will read it, they can overcome their perfectionism and actually submit the document once it is done.</p>
<p>Writing a thesis is hard work, so I don't begrudge anyone the psychological tricks they might need to actually finish. But I think anyone who has finished writing a dissertation should feel proud of their work, and accept that many people in their field will, in fact, read it. (And if you are a researcher not yet in the habit of reading theses, let me suggest that you start -- it is a superpower for understanding things!)</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-42714794720905177052023-07-21T09:17:00.004-07:002023-08-02T03:12:24.391-07:00Linear-time parser combinators <style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<p>My birthday just passed, and to relax I wrote a parser combinator library.</p>
<p>Over the last few years, I have worked quite a bit with <a href="https://xnning.github.io/">Ningning Xie</a> and <a href="https://www.cl.cam.ac.uk/~jdy22/">Jeremy Yallop</a> on parser combinators, which has led to <a href="https://github.com/yallop/ocaml-flap">a family of parser combinators</a> which have optimal linear-time performance in theory, and which are many times faster than lex+yacc in practice.</p>
<p>But these use advanced multistage programming techniques, and I wondered how well we can do without staging. So in this blog post I will describe the implementation of a non-staged, but still efficient, parser combinator library in plain OCaml. This will be a lot slower than <code>flap</code>, or even <code>menhir/ocamlyacc</code>, but will still have pretty good linear-time performance.</p>
<p>The basic trick we developed is to check that the grammar that a user defines with our parser combinator library falls in (morally) the class of LL(1) grammars. The LL(1) grammars have the property that they can be parsed with a <em>non-backtracking</em> recursive descent algorithm, in linear time with only a single token of lookahead.</p>
<p>So how do we check this property? Well, first, let's think about how we can form grammars from smaller ones. In a BNF grammar, we define a set</p>
<pre><code>E ::= num
| '(' E ')'
| E '+' E
| E '*' E
| E '-' E </code></pre>
<p>Here. we can see that there are two main forms of building grammatical expressions.</p>
<ol style="list-style-type: decimal">
<li><p>We can build grammatical expressions by <em>concatenation</em> -- the clause <code>E '+' E</code> means that an expression can be a subexpression <code>E</code>, followed by the plus sign <code>+</code>, finally followed by another subexpression <code>E</code>.</p></li>
<li><p>We can build grammatical expression by <em>alternation</em> -- the clauses separated by a vertical bar <code>|</code> are alternatives, and the whole string matches if it matches any of the alternative expressions. So an expression can be <em>either</em> a number <code>num</code>, a parenthesized expression <code>( E )</code>, or a binary arithmetic operator.</p></li>
</ol>
<p>Both of these forms of composition can introduce non-determinism. For example, if you see the expression <code>3 + 4</code>, a recursive descent parser cannot tell that it should take the addition branch (<code>E '+' E</code>) rather than the multiplication branch (<code>E '*' E</code>) without doing more than one token of lookahead (the first token is a <code>3</code>, which could be followed by anything).</p>
<p>Sequencing can also introduce nondeterminism. For example, the expression <code>3 + 4 + 5</code> can be parsed as either <code>Add(3, Add(4, 5))</code> or as <code>Add(Add(3, 4), 5)</code> -- this grammar does not tell us whether we should stop parsing the first subexpression at <code>3</code>, or continue to <code>3 + 4</code>.</p>
<p>So this grammar is not suitable for being parsed by recursive descent. To identify the grammars that can be so parsed, we need to do some <em>static analysis</em> of the grammar -- in fact, a type system. The components of the static analysis we use are intended to handle the two kinds of nondeterminism above.</p>
<p>To handle the nondeterminism introduced by alternation, we will compute <em>first sets</em> for our grammars. That is, we will work out which characters can occur as the first character of a string recognised by the grammar. So if our grammar recognises the set of strings <code>{foo, bar, barton, foobar, quux}</code>, then its first set will be <code>{f, b, q}</code> -- these are the first characters of the strings in the language.</p>
<p>We can use the first sets to rule out alternation nondeterminism, by requiring that each alternative in a grammar rule have a disjoint first set.</p>
<p>Handling sequential nondeterminism is a bit trickier. To do this, we have find pairs of words in the language, in which the first is a prefix of the second (for example, <code>foo</code> and <code>foobar</code>). Then, we return the first character of each suffix (for <code>foo</code> and <code>foobar</code>, we return <code>b</code>). This set of characters is called the <em>follow set</em> of the language -- it is the set of characters which can follow a word in the language, which might be characters of a suffix of that word. So in the language above, the follow set is <code>{b, t}</code>, due to <code>foo/foobar</code> and <code>bar/barton</code>. Because <code>foo</code> can be followed by the string <code>bar</code> to produce the word <code>foobar</code>, <code>b</code> is one of the characters in the follow set, since <code>b</code> is the first character of <code>bar</code>. Similarly, because <code>bar</code> can followed by <code>ton</code> to produce the word <code>barton</code>, <code>t</code> (as the first character of <code>ton</code>) is one of the characters in the follow set.</p>
<p>Now, when we concatenate two grammatical expressions <code>A B</code>, we check to see if the follow set of <code>A</code> and the first set of <code>B</code> are disjoint. This way, after parsing an <code>A</code>-word, it is unambiguous whether to keep parsing an <code>A</code> or switch to parsing a <code>B</code> because the first character will tell us to either continue parsing <code>A</code> (if it is in the follow set of <code>A</code>) or switch to <code>B</code> (if it is in the first set of <code>B</code>).</p>
<p>(ASIDE: Technically, this is not the FOLLOW set as defined in (say) the Dragon book -- that is a property of grammars, not languages. The definition I use here is a superior one (sometimes called the follow-last set), which was invented by Derick Wood and Anne Brüggemann-Klein in their 1992 paper <a href="https://link.springer.com/chapter/10.1007/3-540-55210-3_182"><em>Deterministic Regular Languages</em></a>. What makes it better is that it is a property of languages, not grammars, and is hence much better-behaved mathematically.)</p>
<p>Finally, we also track whether the language recognises the empty string or not. If both <code>A</code> and <code>B</code> can match the empty string, then <code>A | B</code> is ambiguous – an empty string can match either one.</p>
<p>These three facts – the first set, the follow set, and nullability – are the static facts we will build for every parser combinator, and we will use it to (a) optimise the implementation of conforming combinators, and (b) reject combinators which don't satisfy the restrictions. We could also statically track whether the grammar is left-recursive or not – see my 2019 paper with Jeremy Yallop for how – but for this blog post I'll skip that.</p>
<p>Now, let's talk about the implementation. Since we want to calculate a first set and a follow set for each parser, we need a datatype to represent sets of characters. The <code>Set</code> module in the OCaml standard library is a perfectly serviceable balanced binary tree, but because we want to work with sets of characters, there is a much more compact and faster representation: a bit set.</p>
<p>Since an 8-bit character only has 256 possible states, we can represent a set of characters with 256 bits (32 bytes), with each bit set to 0 if the character is absent from the set, and set to 1 if it is present. This does presume we look at the input at a time -- if we wanted to handle Unicode, likely the best approach would be to work with a canonicalised UTF-8 representation. I'll skip over that for now, since we can deal solely with ASCII for the moment.</p>
<p>This turns out to be pretty much the only optimisation needed to get decent performance -- the library below runs between <strong>one-half and two-thirds the speed of lex+menhir on a simple s-expression parsing benchmark.</strong></p>
<p>Unlike many other parser combinator libraries, it does no backtracking (ie, runs in time linear in the input size). Furthermore, it also has a clean declarative reading of the language recognized -- ie, A | B and B | A recognise the same language, unlike in (eg) PEGs, or parser combinators with committed-choice operators. (If you are a parsing nerd, basically we are restricting to LL(1) grammars, modulo the difference between FOLLOW and FOLLOWLAST.)</p>
<p><a href="https://gist.github.com/neel-krishnaswami/b1594c57433b7df2a143634a2fff3544">Here is a link to a gist with the implementation</a>. It's really short – only 300 lines of code for a full combinator library, with about another 150 lines to demonstrate and benchmark it.</p>
<h1 id="implementation">Implementation</h1>
<h2 id="character-sets">Character Sets</h2>
<p>Below, I give the signature and implementation of an OCaml bitset library.</p>
<h3 id="character-set-interface">Character Set Interface</h3>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> C : <span class="kw">sig</span>
<span class="kw">type</span> t
<span class="kw">val</span> make : (<span class="dt">char</span> -> <span class="dt">bool</span>) -> t
<span class="kw">val</span> empty : t
<span class="kw">val</span> one : <span class="dt">char</span> -> t
<span class="kw">val</span> union : t -> t -> t
<span class="kw">val</span> inter : t -> t -> t
<span class="kw">val</span> top : t
<span class="kw">val</span> negate : t -> t
<span class="kw">val</span> mem : <span class="dt">char</span> -> t -> <span class="dt">bool</span>
<span class="kw">val</span> equal : t -> t -> <span class="dt">bool</span>
<span class="kw">val</span> is_empty : t -> <span class="dt">bool</span>
<span class="kw">val</span> disjoint : t -> t -> <span class="dt">bool</span>
<span class="kw">val</span> fold : (<span class="dt">char</span> -> 'a -> 'a) -> 'a -> t -> 'a
<span class="kw">val</span> of_string : <span class="dt">string</span> -> t
<span class="kw">end</span> = </code></pre></div>
<p>This interface has an abstract type <code>t</code> of bitsets, and a function <code>make</code> which converts a predicate on characters into a bitset, <code>one</code> which constructs a singleton set, along with the rest of the standard operations of boolean algebra (such as intersection, union, and complement). We also have a membership test <code>mem</code>, as well as operations to test whether two sets are <code>equal</code>, or whether they are empty (<code>is_empty</code>) or nonoverlapping (<code>disjoint</code>).</p>
<h3 id="character-set-implementation">Character Set Implementation</h3>
<p>The implementation of this is quite fun -- I don't have many chances to do bit-fiddling, so I cherish the ones that arise. I represent sets using OCaml's <code>bytes</code> type, which are used, reasonably enough, to represent consecutive sequences of bytes.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">struct</span>
<span class="kw">type</span> t = bytes
<span class="kw">let</span> make f =
<span class="kw">let</span> <span class="kw">open</span> Int <span class="kw">in</span>
<span class="kw">let</span> <span class="kw">open</span> <span class="dt">Char</span> <span class="kw">in</span>
Bytes.init <span class="dv">32</span> (<span class="kw">fun</span> i ->
<span class="kw">let</span> b0 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">0</span>)))) <span class="dv">0</span> <span class="kw">in</span>
<span class="kw">let</span> b1 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">1</span>)))) <span class="dv">1</span> <span class="kw">in</span>
<span class="kw">let</span> b2 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">2</span>)))) <span class="dv">2</span> <span class="kw">in</span>
<span class="kw">let</span> b3 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">3</span>)))) <span class="dv">3</span> <span class="kw">in</span>
<span class="kw">let</span> b4 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">4</span>)))) <span class="dv">4</span> <span class="kw">in</span>
<span class="kw">let</span> b5 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">5</span>)))) <span class="dv">5</span> <span class="kw">in</span>
<span class="kw">let</span> b6 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">6</span>)))) <span class="dv">6</span> <span class="kw">in</span>
<span class="kw">let</span> b7 = shift_left (Bool.to_int (f (chr (i * <span class="dv">8</span> + <span class="dv">7</span>)))) <span class="dv">7</span> <span class="kw">in</span>
<span class="kw">let</span> (||) = logor <span class="kw">in</span>
<span class="dt">Char</span>.chr (b7 || b6 || b5 || b4 || b3 || b2 || b1 || b0))</code></pre></div>
<p>The constructor <code>make</code> takes a predicate <code>f : char -> bool</code>, and then builds a bitset which contains exactly those elements which make <code>f</code> true. To do this, we build a 32-byte array (since <code>8 * 32 = 256</code>). The <code>Bytes.init</code> function initialises a byte at a time, so the initaliser has to make 8 calls to <code>f</code> (at indices <code>i*8 + 0</code> to <code>i*8 + 7</code>) to figure out what each bit of the byte is. Then, we convert each boolean to an integer, and do a left shift to create an integer where the correct position is set to 1, and all the other bits are 0. Finally we take all these integers and OR them together, to get the byte that we want.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> mem c s =
<span class="kw">let</span> b = (<span class="dt">Char</span>.code c) / <span class="dv">8</span> <span class="kw">in</span>
<span class="kw">let</span> i = (<span class="dt">Char</span>.code c) <span class="kw">mod</span> <span class="dv">8</span> <span class="kw">in</span>
<span class="kw">let</span> w = <span class="dt">Char</span>.code (Bytes.get s b) <span class="kw">in</span>
Int.logand (Int.shift_left <span class="dv">1</span> i) w > <span class="dv">0</span> </code></pre></div>
<p>The <code>mem</code> function also does a bit of bit-fiddling. We look up the correct byte and bit-offset corresponding to the input character, and check to see if that bit is set in the corresponding byte of the bitset.</p>
<p>The rest of the operations can be defined in terms of <code>make</code> and <code>mem</code> -- they are the only ones which need any bit-twiddling. ~<sub>~</sub> {.ocaml} let empty = make (fun c -> false) let top = make (fun c -> true) let one c = make (fun c' -> c = c') let union s1 s2 = make (fun c -> mem c s1 || mem c s2) let inter s1 s2 = make (fun c -> mem c s1 && mem c s2) let negate s = make (fun c -> not (mem c s))</p>
<p>let equal s1 s2 = let rec loop i acc = if i = 32 then acc else loop (i+1) (acc && (Bytes.get s1 i = Bytes.get s2 i)) in loop 0 true</p>
<p>let is_empty s = equal s empty</p>
<p>let disjoint s1 s2 = is_empty (inter s1 s2) ~<sub>~</sub></p>
<p>We do some trivial bit-twiddling anyway with the <code>equal</code> function: rather than comparing character by character, we just check to see if the two bitsets are equal with a byte-by-byte comparison.</p>
<p>Emptiness checking <code>is_empty</code> and disjointness checking <code>disjoint</code> can now be defined in terms of the Boolean operations and equality. We can also define a <code>fold</code> function for looping over a bitset, and we also have a <code>of_string</code> function to build sets from strings.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> fold f init s =
<span class="kw">let</span> <span class="kw">rec</span> loop i acc =
<span class="kw">if</span> i > <span class="dv">255</span> <span class="kw">then</span>
acc
<span class="kw">else</span>
<span class="kw">let</span> c = <span class="dt">Char</span>.chr i <span class="kw">in</span>
<span class="kw">if</span> mem c s <span class="kw">then</span>
loop (i+<span class="dv">1</span>) (f c acc)
<span class="kw">else</span>
loop (i+<span class="dv">1</span>) acc
<span class="kw">in</span>
loop <span class="dv">0</span> init
<span class="kw">let</span> of_string str =
<span class="kw">let</span> p c = <span class="dt">String</span>.contains str c <span class="kw">in</span>
make p
<span class="kw">end</span></code></pre></div>
<h2 id="grammar-types">Grammar Types</h2>
<h3 id="grammar-type-interface">Grammar Type Interface</h3>
<p>This is enough to let us implement the <code>Tp</code> module for parser types.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Tp : <span class="kw">sig</span>
<span class="kw">type</span> t = { null : <span class="dt">bool</span>; first : C.t; follow : C.t; }
<span class="kw">exception</span> TypeError <span class="kw">of</span> <span class="dt">string</span>
<span class="kw">val</span> <span class="dt">char</span> : <span class="dt">char</span> -> t
<span class="kw">val</span> eps : t
<span class="kw">val</span> seq : t -> t -> t
<span class="kw">val</span> charset : C.t -> t
<span class="kw">val</span> <span class="dt">string</span> : <span class="dt">string</span> -> t
<span class="kw">val</span> alt : t -> t -> t
<span class="kw">val</span> bot : t
<span class="kw">val</span> equal : t -> t -> <span class="dt">bool</span>
<span class="kw">val</span> fix : (t -> t) -> t
<span class="kw">val</span> print : <span class="dt">Format</span>.formatter -> t -> <span class="dt">unit</span>
<span class="kw">end</span>
<span class="kw">end</span> = </code></pre></div>
<p>A type is a triple of a nullability predicate, a first set, and a follow set. This API has a bunch of operators for building semantic types:</p>
<ul>
<li><code>char c</code> yields the semantic type for the singleton language containing the character <code>c</code>,</li>
<li><code>eps</code> is the type of the language containing only the empty string,</li>
<li><code>seq t1 t2</code> takes two types <code>t1</code> and <code>t2</code>, and returns the type corresponding to the sequencing of two languages with those two types</li>
<li><code>bot</code> is the type of the empty language,</li>
<li><code>alt t1 t2</code> takes two types <code>t1</code> and <code>t2</code>, and returns the type corresponding to the union of the languages typed by <code>t1</code> and <code>t2</code></li>
<li><code>fix f</code> takes a function <code>f : t -> t</code>, and returns the smallest type <code>t</code> such that <code>f(t) = t</code>.</li>
<li><code>charset cs</code> returns the type of the language recognising exactly the single-character strings in <code>cs</code>.</li>
</ul>
<p>Some of these combinations are impossible due to our ambiguity rules. For example, we don't want to admit the union of two languages both containing the empty string, because then it would be ambiguous whether to take the left or right branch.</p>
<p>So we also have a <code>TypeError</code> exception, which we use to signal when we are combining two incompatible types.</p>
<h3 id="grammar-type-implementation">Grammar Type Implementation</h3>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">struct</span>
<span class="kw">type</span> t = {
null : <span class="dt">bool</span>;
first : C.t;
follow : C.t;
}
<span class="kw">exception</span> TypeError <span class="kw">of</span> <span class="dt">string</span>
<span class="kw">let</span> <span class="dt">char</span> c = {
null = <span class="kw">false</span>;
first = C.one c;
follow = C.empty;
}</code></pre></div>
<p><code>char c</code> returns a record, with false nullability (since the language <code>{"c"}</code> has no empty strings), a first set of <code>{c}</code> (since the first character of the only string in the language is <code>c</code>), and an empty follow set (since there are no nonempty strings you can append to <code>"c"</code> to get a string in <code>{"c"}</code>).</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> eps = {
null = <span class="kw">true</span>;
first = C.empty;
follow = C.empty;
}</code></pre></div>
<p><code>eps</code> returns the type of the language containing only the empty string <code>{""}</code>. It is nullable, since it obviously contains the empty string, but the first and follow sets are empty since it has no strings with any characters in them.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> seq t1 t2 =
<span class="kw">let</span> separate t1 t2 =
<span class="dt">not</span> t1.null
&&
C.disjoint t1.follow t2.first
<span class="kw">in</span>
<span class="kw">if</span> separate t1 t2 <span class="kw">then</span>
{ null = <span class="kw">false</span>;
first = t1.first;
follow = C.union t2.follow (<span class="kw">if</span> t2.null
<span class="kw">then</span> (C.union t2.first t1.follow)
<span class="kw">else</span> C.empty);
}
<span class="kw">else</span>
<span class="dt">raise</span> (TypeError <span class="st">"ambiguous sequencing"</span>)</code></pre></div>
<p>Sequencing <code>seq t1 t2</code> is interesting, because it is the first operator we have seen which can fail. We want to allow parsing the sequencing two languages <code>L</code> and <code>M</code> (with types <code>t1</code> and <code>t2</code>) using a greedy strategy -- we want to match as much of <code>L</code> as we can, and then switch to looking for <code>M</code> strings. A way to test if this is ok is to check if (a) <code>L</code> does not contain the empty string, and (b) <code>L</code>'s follow set is disjoint from <code>M</code>'s first set. Because <code>L</code> has no empty strings, we can immediately start parsing for <code>L</code>, and because the first and follow sets are disjoint, we can always choose to keep parsing <code>L</code> as long as there are valid continuation characters for it.</p>
<p>This is what the <code>separate</code> function tests for.</p>
<p>The resulting composite type has <code>false</code> nullability, because every string in the concatenated language has one <code>L</code>-part and and one <code>M</code>-part, and the <code>L</code>-part is always nonempty. The first set is the same as <code>t1</code>'s first set, since every string in the concatenated language starts with an <code>L</code>-string.</p>
<p>The follow set is a little tricky. A word <code>w</code> from the concatenation of <code>L</code> and <code>M</code> always consists of an <code>L</code>-part <code>l</code>, and an <code>M</code>-part <code>m</code>, so that <code>w = append(l, m)</code>. Obviously any extension of <code>m</code> to a longer <code>m'</code> is still a word of <code>L ; M</code>, so the combined follow set of has to at least contain the follow set of <code>M</code> -- i.e., <code>t2.follow</code>.</p>
<p>However, if <code>M</code> contains the empty string, then <code>L</code> is a subset of <code>L;M</code>, and so all the follow characters of <code>L</code> -- <code>t1.follow</code> -- also have to be included, as well as the first characters of <code>M</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> alt t1 t2 =
<span class="kw">let</span> nonoverlapping t1 t2 =
<span class="dt">not</span> (t1.null && t2.null)
&&
C.disjoint t1.first t2.first
<span class="kw">in</span>
<span class="kw">if</span> nonoverlapping t1 t2 <span class="kw">then</span>
{
null = t1.null || t2.null;
first = C.union t1.first t2.first;
follow = C.union t1.follow t2.follow;
}
<span class="kw">else</span>
<span class="dt">raise</span> (TypeError <span class="st">"ambiguous alternation"</span>)
<span class="kw">let</span> bot = {
null = <span class="kw">false</span>;
first = C.empty;
follow = C.empty;
}</code></pre></div>
<p>If <code>t1</code> and <code>t2</code> are types for languages <code>L</code> and <code>M</code>, then <code>alt t1 t2</code> is the type of the union language <code>L ∪ M</code>. Again, typing can fail if the types might correspond to overlapping languages. We rule this out with the <code>nonoverlapping</code> predicate – it checks that at most one language can contain the empty string (i.e., at most one of <code>t1.null</code> and <code>t2.null</code> hold), and that <code>t1.first</code> and <code>t2.first</code> are disjoint – if the first sets are disjoint, then a word in <code>L</code> must start with different character than any character in <code>M</code>. If this condition is satisfied, then the resulting type just takes the union of the first and follow sets, and the OR of the two nullability predicates.</p>
<p>We also define the <code>bot</code> type, for the empty language <code>{}</code>. Since it has no strings in it, the <code>bot.null</code> field is false, and the first and follow sets are empty.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> charset cs =
<span class="kw">if</span> C.is_empty cs <span class="kw">then</span>
bot
<span class="kw">else</span>
{ null = <span class="kw">false</span>;
first = cs;
follow = C.empty;
}
<span class="kw">let</span> <span class="dt">string</span> s =
<span class="kw">if</span> <span class="dt">String</span>.length s = <span class="dv">0</span> <span class="kw">then</span>
eps
<span class="kw">else</span>
<span class="dt">char</span> s.[<span class="dv">0</span>]</code></pre></div>
<p>We can use this to define some convenience functions. <code>charset cs</code> returns the type of the languages containing one-character strings for each character in <code>cs</code>. If the set is empty, we return <code>bot</code>, and otherwise we return a type where the first set is <code>cs</code>, the follow set is empty, and the nullability predicate is false.</p>
<p><code>string s</code> returns a type for the language <code>{s}</code> containing the single string <code>s</code>. If <code>s</code> is the empty string, we return <code>eps</code>, and otherwise we can return <code>char s.[0]</code>, since the first set is the first character of <code>s</code>, the language is not nullable, and the follow set is empty since there are no possible continuations to <code>s</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> equal t1 t2 =
t1.null = t2.null
&& C.equal t1.first t2.first
&& C.equal t1.follow t2.follow
<span class="kw">let</span> fix f =
<span class="kw">let</span> <span class="kw">rec</span> loop t =
<span class="kw">let</span> t' = f t <span class="kw">in</span>
<span class="kw">if</span> equal t t' <span class="kw">then</span>
t'
<span class="kw">else</span>
loop t'
<span class="kw">in</span>
loop bot</code></pre></div>
<p>Now, we can define the type equality function <code>equal</code>, and use it to define the fixed point function <code>fix f</code>, which returns the least fixed point of the function <code>f</code>. It works in the usual way – we start with the smallest type <code>bot</code>, and then repeatedly apply <code>f</code> to it until it stops changing. If <code>f</code> is monotone, then this loop has to terminate, because on each iteration, we must have added at least one character to <code>follow</code> or <code>first</code>, or changed <code>null</code> from false to true. As a result, there can be at most 511 iterations (= 255 + 255 + 1) before the loop cannot increase any more.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml">
<span class="kw">let</span> print out t =
<span class="kw">let</span> p fmt = <span class="dt">Format</span>.fprintf out fmt <span class="kw">in</span>
<span class="kw">let</span> print_set cs =
C.fold (<span class="kw">fun</span> c () -> p <span class="st">"%c"</span> c) () cs
<span class="kw">in</span>
<span class="kw">let</span> print_bool = <span class="kw">function</span>
| <span class="kw">true</span> -> p <span class="st">"true"</span>
| <span class="kw">false</span> -> p <span class="st">"false"</span>
<span class="kw">in</span>
p <span class="st">"{</span><span class="ch">\n</span><span class="st">"</span>;
p <span class="st">" null = "</span>; print_bool t.null; p <span class="st">";</span><span class="ch">\n</span><span class="st">"</span>;
p <span class="st">" first = C.of_string </span><span class="ch">\"</span><span class="st">"</span>; print_set t.first; p <span class="st">"</span><span class="ch">\"</span><span class="st">;</span><span class="ch">\n</span><span class="st">"</span>;
p <span class="st">" follow = C.of_string </span><span class="ch">\"</span><span class="st">"</span>; print_set t.follow; p <span class="st">"</span><span class="ch">\"</span><span class="st">;</span><span class="ch">\n</span><span class="st">"</span>;
p <span class="st">"}</span><span class="ch">\n</span><span class="st">"</span>
<span class="kw">end</span></code></pre></div>
<p>We can also define a <code>print</code> function for debugging purposes.</p>
<h2 id="parsers">Parsers</h2>
<h3 id="parser-interface">Parser Interface</h3>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Parser: <span class="kw">sig</span>
<span class="kw">type</span> 'a t
<span class="kw">exception</span> ParseFailure <span class="kw">of</span> <span class="dt">int</span>
<span class="kw">val</span> (<span class="kw">let</span>+) : 'a t -> ('a -> 'b) -> 'b t
<span class="kw">val</span> map : ('a -> 'b) -> 'a t -> 'b t
<span class="kw">val</span> <span class="dt">unit</span> : <span class="dt">unit</span> t
<span class="kw">val</span> return : 'a -> 'a t
<span class="kw">val</span> seq : 'a t -> 'b t -> ('a * 'b) t
<span class="kw">val</span> (<span class="kw">and</span>+) : 'a t -> 'b t -> ('a * 'b) t
<span class="kw">val</span> <span class="dt">char</span> : <span class="dt">char</span> -> <span class="dt">unit</span> t
<span class="kw">val</span> charset : C.t -> <span class="dt">char</span> t
<span class="kw">val</span> <span class="dt">string</span> : <span class="dt">string</span> -> <span class="dt">unit</span> t
<span class="kw">val</span> fail : 'a t
<span class="kw">val</span> (||) : 'a t -> 'a t -> 'a t
<span class="kw">val</span> any : 'a t <span class="dt">list</span> -> 'a t
<span class="kw">val</span> fix : ('a t -> 'a t) -> 'a t
<span class="kw">val</span> parse : 'a t -> <span class="dt">string</span> -> <span class="dt">int</span> -> <span class="dt">int</span> * 'a
<span class="kw">end</span> = </code></pre></div>
<p>Above, we give the signature to the <code>Parser</code> module, which actually constructs the parsers. There is an abstract type <code>'a t</code>, which is the type of parsers which return a value of type <code>'a</code>, and an exception <code>ParseFailure</code>, which returns the location at which parsing failed.</p>
<p>The parser type <code>'a t</code> is a functor, which is why it supports a <code>map</code> operation.</p>
<p>The applicative (or Cartesian monoidal, if you are a semanticist) structure is exhibited by the <code>unit</code> and <code>seq</code> operations. <code>unit</code> is the parser that recognizes the empty string and immediately returns unit, and <code>seq p1 p2</code> will try to run <code>p1</code>, then run <code>p2</code>, and return their paired return values.</p>
<p><code>map</code> has the synonym <code>let+</code>, and <code>seq</code> has a synonym <code>(and+)</code>, which together lets us use Ocaml's equivalent of Haskell do-notation for applicatives. So we can write a parser like:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span>+ x = p1
<span class="kw">and</span>+ y = p2
<span class="kw">and</span>+ z = p3
<span class="kw">in</span>
x + y + z </code></pre></div>
<p>This will run <code>p1</code>, then <code>p2</code>, then <code>p3</code>, and finally return the sum of the three parser values.</p>
<p>The <code>char c</code>, <code>charset cs</code>, and <code>string s</code> parsers build parsers for recognizing individual characters, sets of characters, or single strings. <code>charset</code> and <code>string</code> can be defined from <code>char</code> and the other combinators, but it is convenient to have them as part of the API.</p>
<p><code>fail</code> is the parser that always fails, never recognising anything, and <code>(||)</code> is the binary alternation operator – <code>p1 || p2</code> recognises the union of the languages <code>p1</code> and <code>p2</code> do. <code>any ps</code> is the n-ary version – it recognises any string that any of the parsers in the list <code>ps</code> do.</p>
<p>Finally, <code>fix f</code> is used to define grammars recursively. If <code>f : 'a t -> 'a t</code>, then <code>fix f</code> recognises the least fixed point of <code>f</code>.</p>
<p>All of these operations are allowed to signal a <code>TypeError</code> if a client tries to combine them in a way which would lead to ambiguity.</p>
<h3 id="parser-implementation">Parser Implementation</h3>
<p>Below, I give the implementation of the <code>Parser</code> module.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">struct</span>
<span class="kw">type</span> 'a t = { tp : Tp.t; parse : <span class="dt">string</span> -> <span class="dt">int</span> -> <span class="dt">int</span> * 'a }
<span class="kw">exception</span> ParseFailure <span class="kw">of</span> <span class="dt">int</span></code></pre></div>
<p>The implementation type of a parser is actually a pair, consisting of a parser type (as defined in the <code>Tp</code> module), and a parsing function of type <code>string -> int -> (int * 'a)</code>. So <code>p.parse s i</code> will take a string <code>s</code>, and begin parsing at <code>i</code>. If it succeeds, it will return <code>(j,v)</code>, where <code>j</code> is where it finished parsing, and <code>v</code> is the return value of the parser. If the run of the parser failed, then it will raise the <code>ParseFailure</code> exception.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="dt">char</span> c =
<span class="kw">let</span> p s i =
<span class="kw">if</span> i < <span class="dt">String</span>.length s && s.[i] = c <span class="kw">then</span>
(i+<span class="dv">1</span>, ())
<span class="kw">else</span>
<span class="dt">raise</span> (ParseFailure i)
<span class="kw">in</span>
{ tp = Tp.<span class="dt">char</span> c; parse = p }</code></pre></div>
<p>The <code>char c</code> parser is one of the simplest. It constructs its type with a call to the <code>Tp.char</code> function, and constructs a parse function which checks if the next character is <code>c</code>, and returns unit on success and signals failure if there is no next character or it isn't <code>c</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> (<span class="kw">let</span>+) p f =
<span class="kw">let</span> p' s i =
<span class="kw">let</span> (i, v) = p.parse s i <span class="kw">in</span>
(i, f v)
<span class="kw">in</span>
{tp = p.tp; parse = p'}
<span class="kw">let</span> map f p =
<span class="kw">let</span>+ x = p <span class="kw">in</span> f x </code></pre></div>
<p>As we said before, the parser type <code>'a t</code> is functorial -- it supports a <code>map</code> operation. All <code>map f p</code> has to do is to write a new parse function which runs <code>p.parse</code>, and then on success wraps the return value with a call to <code>f</code>.</p>
<p>We also define a synonym <code>(let+)</code> to enable the applicative syntax.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="dt">unit</span> = { tp = Tp.eps; parse = <span class="kw">fun</span> s i -> (i, ()) }
<span class="kw">let</span> return x =
<span class="kw">let</span>+ () = <span class="dt">unit</span> <span class="kw">in</span> x
<span class="kw">let</span> (<span class="kw">and</span>+) p1 p2 =
<span class="kw">let</span> p' s i =
<span class="kw">let</span> (i, a) = p1.parse s i <span class="kw">in</span>
<span class="kw">let</span> (i, b) = p2.parse s i <span class="kw">in</span>
(i, (a,b))
<span class="kw">in</span>
{ tp = Tp.seq p1.tp p2.tp; parse = p' }
<span class="kw">let</span> seq = (<span class="kw">and</span>+)</code></pre></div>
<p><code>unit</code> is the simplest parser – it always succeeds without moving the position of the index into the string (ie, it recognises the empty string) and returns unit. <code>return x</code> is almost as simple: it also immediately succeeds, but returns the value <code>x</code>.</p>
<p>The <code>seq</code> (aka <code>and+</code>) operator sequences two parsers <code>p1</code> and <code>p2</code>. Its parse function runs <code>p1.parse</code>, yielding an updated index and a value <code>a</code>, and then <code>p1.parse</code> is run starting from the new index, yielding the value <code>b</code>. Then, the final index is returned, alongside the pair <code>(a,b)</code>.</p>
<p>Note that this function never backtracks: by construction, all of our parsers will turn out to be greedy, but this is ok since the follow set of <code>p1</code> and the first set of <code>p2</code> are known to be disjoint – there is never any risk that <code>p1</code> will consume characters that are actually part of the start of <code>p2</code>. So we do not have to (as PCRE-style regexps must) do any backtracking.</p>
<p>The returned type simply uses <code>Tp.seq</code> on the two argument parsers' types.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> charset cs =
<span class="kw">let</span> p s i =
<span class="kw">if</span> i < <span class="dt">String</span>.length s && C.mem s.[i] cs <span class="kw">then</span>
(i+<span class="dv">1</span>, s.[i])
<span class="kw">else</span>
<span class="dt">raise</span> (ParseFailure i)
<span class="kw">in</span>
{tp = Tp.charset cs; parse = p }
<span class="kw">let</span> <span class="dt">string</span> str =
<span class="kw">let</span> p s i =
<span class="kw">if</span> i + <span class="dt">String</span>.length str < <span class="dt">String</span>.length s <span class="kw">then</span>
<span class="kw">let</span> <span class="kw">rec</span> loop j =
<span class="kw">if</span> j < <span class="dt">String</span>.length str <span class="kw">then</span>
<span class="kw">if</span> str.[j] = s.[i + j] <span class="kw">then</span>
loop (j+<span class="dv">1</span>)
<span class="kw">else</span>
<span class="dt">raise</span> (ParseFailure (i+j))
<span class="kw">else</span>
(i+j, ())
<span class="kw">in</span>
loop <span class="dv">0</span>
<span class="kw">else</span>
<span class="dt">raise</span> (ParseFailure i)
<span class="kw">in</span>
{tp = Tp.<span class="dt">string</span> str; parse = p} </code></pre></div>
<p><code>charset cs</code> and <code>string s</code> are just convenience parsers, which just check if the first character of the input is in the set <code>cs</code>, or whether the prefix of the output is <code>s</code>, respectively.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> fail =
{ tp = Tp.bot;
parse = <span class="kw">fun</span> s i -> <span class="dt">raise</span> (ParseFailure i) }</code></pre></div>
<p>The <code>fail</code> parser always fails, no matter what the input is. This corresponds to recognising the empty language.</p>
<p>The alternation operator <code>p1 || p2</code> below is the most complex parser definition.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> (||) p1 p2 =
<span class="kw">let</span> p' s i =
<span class="kw">if</span> i < <span class="dt">String</span>.length s <span class="kw">then</span>
<span class="kw">if</span> C.mem s.[i] p1.tp.Tp.first <span class="kw">then</span>
p1.parse s i
<span class="kw">else</span> <span class="kw">if</span> C.mem s.[i] p2.tp.Tp.first <span class="kw">then</span>
p2.parse s i
<span class="kw">else</span> <span class="kw">if</span> p1.tp.Tp.null <span class="kw">then</span>
p1.parse s i
<span class="kw">else</span> <span class="kw">if</span> p2.tp.Tp.null <span class="kw">then</span>
p2.parse s i
<span class="kw">else</span>
<span class="dt">raise</span> (ParseFailure i)
<span class="kw">else</span> <span class="kw">if</span> p1.tp.Tp.null <span class="kw">then</span>
p1.parse s i
<span class="kw">else</span> <span class="kw">if</span> p2.tp.Tp.null <span class="kw">then</span>
p2.parse s i
<span class="kw">else</span>
<span class="dt">raise</span> (ParseFailure i)
<span class="kw">in</span>
{tp = Tp.alt p1.tp p2.tp; parse = p' }
<span class="kw">let</span> any ps = <span class="dt">List</span>.fold_left (||) fail ps </code></pre></div>
<p>Alternation is the only parser combinator that uses the information from the types to decide how to proceed. The reason is that we do not want our parsers to backtrack – if they did, performance would switch from worst-case linear to worst-case exponential. (This is the reason that classic parser combinators have such dreadful performance.)</p>
<p>However, this means that when we see an alternation, we have to decide whether to parse with <code>p1</code> or <code>p2</code>. In traditional LL(k) style, We do this by looking at the first character of the input. If it is in <code>p1</code>'s first set, we parse with <code>p1</code>, and if it is in <code>p2</code>'s first set, we parse with <code>p2</code>.</p>
<p>This is easy – except for the corner cases, of which there are two main ones. The first corner case is what we do when the input string is exhausted (ie, if the position we are to start looking at is at the end of the string). In this case, the only string that could match is the empty string, so we check to see whether either <code>p1</code> or <code>p2</code>is nullable, and then we invoke the nullable parser (if any).</p>
<p>The other corner case is when there is still input left, but its first character doesn't match the first set of either parser. In this case, again the only possible matching string is the empty string, and so we check nullability again.</p>
<p>If none of these conditions match, then the string can't be parsed and we signal an error.</p>
<p>Once we have alternation, we can use it and <code>fail</code> to define the <code>any</code> combinator by folding over the list of parsers with <code>||</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> fix f =
<span class="kw">let</span> g t = (f {fail <span class="kw">with</span> tp = t}).tp <span class="kw">in</span>
<span class="kw">let</span> r = <span class="dt">ref</span> (fail.parse) <span class="kw">in</span>
<span class="kw">let</span> p = f {tp = (Tp.fix g); parse = <span class="kw">fun</span> s i -> !r s i} <span class="kw">in</span>
r := p.parse;
p
<span class="kw">let</span> parse p s i = p.parse s i
<span class="kw">end</span></code></pre></div>
<p>Finally, the fixed point <code>fix f</code> is a little annoying to define, because <code>f : 'a t -> 'a t</code>, and this means that it takes a pair of a type and a parse function, and returns a pair of a type and a parse function. The type-level fixed point <code>Tp.fix</code> wants an argument of type <code>Tp.t -> Tp.t</code>, and the <code>g</code> function in the body turns <code>f</code> into this kind of function using a dummy parser argument to <code>f</code>.</p>
<p>To actually implement the parsing function, we turn to a trick called Landin's knot – we manually implement recursion by backpatching. First, we define <code>r</code>, a pointer to a failing parse function. Then, we invoke <code>f</code> with a parser which takes the type of the fixed point (<code>Tp.fix g</code>), and a parse function which just invokes the function stored in <code>r</code>. This yields a new parser, which we assign to <code>r</code> to tie the knot on the recursion.</p>
<p>The reason we do this, instead of defining the parse function <code>q</code> as</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="kw">rec</span> q s i = (f {tp = Tp.fix g; parse = q}).parse s i </code></pre></div>
<p>is that this definition re-invokes <code>f</code> on each recursive step, whereas the imperative definition only invokes <code>f</code> once.</p>
<p>That's the whole implementation of the parser library! It's just over 300 lines of code.</p>
<h2 id="a-sample-parser">A Sample Parser</h2>
<p>Below, I give the implementation of a simple s-expression parser, which recognises parenthesized sequences of words, like <code>(foo bar (baz (quux) ()))</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Test = <span class="kw">struct</span>
<span class="kw">open</span> Parser
<span class="kw">let</span> letter = charset (C.of_string <span class="st">"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"</span>)
<span class="kw">let</span> digit = charset (C.of_string <span class="st">"0123456789"</span>)
<span class="kw">let</span> whitespace = charset (C.of_string <span class="st">" </span><span class="ch">\t\n</span><span class="st">"</span>)</code></pre></div>
<p>We can use our parsers to define basic parsers that recognise single letters, digits, or whitespace.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="co">(* val (==>) : 'a t -> ('a -> 'b) -> 'b t </span>
<span class="co"> let (==>) p f = let+ x = p in f x </span>
<span class="co"> (* val (>>) : 'a t -> 'b t -> 'a t *)</span>
<span class="co"> let (>>) p1 p2 = </span>
<span class="co"> let+ x = p1 </span>
<span class="co"> and+ _ = p2 in</span>
<span class="co"> x</span></code></pre></div>
<p>Next, I define a couple of utility combinators. <code>==></code> is an infix spelling of <code>map</code>, whose utility will become clear shortly.</p>
<p><code>p1 >> p2</code> is like <code>seq</code>, parsing with <code>p1</code> and then <code>p2</code>, except that in this case the value returned by <code>p2</code> is discarded. This is useful for doing things like discarding trailing whitespace.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="co">(* val star : 'a t -> 'a list t *)</span>
<span class="kw">let</span> star p =
fix (<span class="kw">fun</span> r ->
any [
<span class="dt">unit</span> ==> (<span class="kw">fun</span> () -> []);
(seq p r) ==> (<span class="kw">fun</span> (x, xs) -> x :: xs)
])
<span class="co">(* val starskip : 'a t -> unit t *)</span>
<span class="kw">let</span> starskip p =
fix (<span class="kw">fun</span> r ->
any [ <span class="dt">unit</span> ==> (<span class="kw">fun</span> () -> ());
(seq p r) ==> (<span class="kw">fun</span> _ -> ()) ])</code></pre></div>
<p>The <code>star p</code> combinator parses with <code>p</code> as many times as it can, and then returns a list of all the <code>p</code>-values that it was able to parse. <code>starskip p</code> is similar, except that it ignores its return value, always returning unit <code>()</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="co">(* val symbol : string t *)</span>
<span class="kw">let</span> symbol =
<span class="kw">let</span>+ c = letter
<span class="kw">and</span>+ cs = star letter
<span class="kw">and</span>+ _ = starskip whitespace
<span class="kw">in</span>
<span class="kw">let</span> b = <span class="dt">Buffer</span>.create <span class="dv">0</span> <span class="kw">in</span>
<span class="dt">List</span>.iter (<span class="dt">Buffer</span>.add_char b) (c :: cs);
<span class="dt">Buffer</span>.contents b
<span class="co">(* val paren : 'a t -> 'a t *)</span>
<span class="kw">let</span> paren p =
<span class="kw">let</span>+ () = <span class="dt">char</span> <span class="ch">'('</span> >> starskip whitespace
<span class="kw">and</span>+ x = p
<span class="kw">and</span>+ () = <span class="dt">char</span> <span class="ch">')'</span> >> starskip whitespace
<span class="kw">in</span>
x</code></pre></div>
<p>The <code>symbol</code> parser parses a nonzero list of letters, followed by any amount of whitespace, and converts the list of characters into a string. This does allocate a lot more than it should, with one allocation per character – but for a blog post it's fine.</p>
<p>The <code>paren p</code> combinator call wraps the language recognised by <code>p</code> in parentheses. We also follow the calls to <code>char '('</code> and <code>char ')'</code> with a <code>starskip whitespace</code>. This is one of the annoying things about writing scannerless parsers – you have to handle whitespace manually.</p>
<p>The discipline I use is to follow each logical token with trailing whitespace, but to never precede logical tokens with whitespace. This ensures that whitespace is never in the first set of a grammatical production, which makes sequencing work out reasonably well. It's still a PITA relative to explicitly tokenized lexer/parser pairs, though.</p>
<p>Finally, we can define the <code>sexp</code> type and parser.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">type</span> sexp =
| Sym <span class="kw">of</span> <span class="dt">string</span>
| Seq <span class="kw">of</span> sexp <span class="dt">list</span>
<span class="co">(* val sexp : sexp t *)</span>
<span class="kw">let</span> sexp =
fix (<span class="kw">fun</span> r ->
any [
symbol ==> (<span class="kw">fun</span> s -> Sym s);
paren (star r) ==> (<span class="kw">fun</span> xs -> Seq xs)
])
<span class="kw">end</span></code></pre></div>
<p>An s-expression is either a symbol <code>Sym s</code>, or a list of sexpressions <code>Seq ss</code>. The parser for this is literally this definition – it is either symbol (which returns <code>Sym s</code>), or it is a parenthesized list of s-expressions (derived using <code>star</code> on the <code>r</code> recursive combinator.</p>
<p>This also demonstrates why I like the infix map <code>==></code> – it looks quite a bit like clausal definitions in pattern matching, or the parser actions in Menhir or Yacc.</p>
<h2 id="performance">Performance</h2>
<p>So how fast does it run? Well, to test this out I generated some large s-expressions and parsed them. This was all benchmarked on an Intel i7-6700 running at 3.4 GHz, running Ubuntu 22.04, with the OCaml code all compiled to native code with <code>ocamlopt</code>.</p>
<p>When I run the parser combinator library on a 4.5 megabyte string, I get the following performance numbers.</p>
<ul>
<li>String length: 4498771 bytes</li>
<li>Parser elapsed time: 0.736 sec</li>
<li>Parsing rate: 6.12e+06 bytes/sec</li>
</ul>
<p>On a string 10x bigger, the performance of the combinators is slightly faster than a linear speedup would predict.</p>
<ul>
<li>String length: 45007848 bytes</li>
<li>Parser elapsed time: 5.571 sec</li>
<li>Parsing rate: 8.08e+06 bytes/sec</li>
</ul>
<p>I don't know why the 45 megabyte string parsers <em>faster</em> than the 4.5 megabyte one. There shouldn't be any fixed latencies of any significance, so I can only chalk it up to garbage collection weirdness.</p>
<p>I also wrote a small ocamllex/Menhir parser, and benchmarked it. The Menhir parser showed similar tendencies, though its absolute speed is higher. For the 4.5 megabyte s-expression, we get:</p>
<ul>
<li>String length: 4498771 bytes</li>
<li>Parser elapsed time: 0.492 sec</li>
<li>Parsing rate: 9.14e+06 bytes/sec</li>
</ul>
<p>For the 45 megabyte string, we get:</p>
<ul>
<li>String length: 45007848 bytes</li>
<li>Parser elapsed time: 2.989 sec</li>
<li>Parsing rate: 1.51e+07 bytes/sec</li>
</ul>
<p>Again, parsing a longer string shows a more-than-linear speedup. Since LR(1) parsing, like LL(1), is a linear time algorithm, this indicates some stuff going on in the runtime.</p>
<p>Comparing like-for-like, this little parser combinator library is 2/3 the speed of Menhir for the 4.5 megabyte file, and slightly more than 1/2 the speed on the 45 megabyte file.</p>
<p>This is much better performance than I expected! Because parser combinators are essentially big nested closures, I expected there to be a heavy performance penalty because we basically have to do an indirect jump for every character in the input.</p>
<p>This should be a recipe for Python-esque slowness, but I guess the processor architects at Intel really know how to design indirect branch predictors! I checked <a href="https://www.agner.org/optimize/microarchitecture.pdf">Agner Fog's optimisation guide</a>, but apparently the design of the branch predictor in Skylake is undocumented. All Fog can say is that "Indirect jumps and indirect calls are predicted well."</p>
<p>The net result of this is that I have to update my beliefs: before my birthday, I thought that parser combinators were basically a bad idea. But now I think they can be a very reasonable parsing technique, even without doing fun stuff like <a href="https://www.cl.cam.ac.uk/~nk480/flap.pdf">using staging to achieve parser/lexer fusion</a>!</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com8tag:blogger.com,1999:blog-8068466035875589791.post-54617419293368955342022-11-15T08:04:00.003-08:002022-11-15T08:04:28.530-08:00CN: Verifying Systems C Code with Separation-Logic Refinement TypesWe have a new paper on combining separation logic and refinement types to verify C code, appearing at POPL 2023 in a couple of months. It's called <em><a href="https://www.cl.cam.ac.uk/~nk480/cn.pdf">CN: Verifying Systems C Code with Separation-Logic Refinement Types</a></em>, and it's by Christopher Pulte, Dhruv C. Makwana, Thomas Sewell, Kayvan Memarian, Peter Sewell, and me.
<blockquote>
<p>Despite significant progress in the verification of
hypervisors, operating systems, and compilers, and in
verification tooling, there exists a wide gap
between the approaches used in verification projects
and conventional development of systems software. We
see two main challenges in bringing these closer
together: verification handling the complexity of
code and semantics of conventional systems software,
and verification usability.</p>
<p>We describe an experiment in verification tool
design aimed at addressing some aspects of both: we
design and implement CN, a separation-logic
refinement type system for C systems software, aimed
at predictable proof automation, based on a realistic
semantics of ISO C. CN reduces refinement typing to
decidable propositional logic reasoning, uses
first-class resources to support pointer aliasing and
pointer arithmetic, features resource inference for
iterated separating conjunction, and uses a novel
syntactic restriction of ghost variables in
specifications to guarantee their successful
inference. We implement CN and formalise key aspects
of the type system, including a soundness proof of
type checking. To demonstrate the usability of CN we
use it to verify a substantial component of Google’s
pKVM hypervisor for Android.</p>
</blockquote>
There's some cool separation logic trickery that might become a blogpost, and of course the CN tool is open source, and is publically available as a backend of <a href="https://github.com/rems-project/cerberus">the Cerberus executable C semantics</a>.
Naturally I (and most of my coauthors) will be in Boston this January for POPL, and the talk about CN will be public then. If you can't wait, earlier this summer I gave <a href="https://www.youtube.com/watch?v=og_MR9ykYh8">a talk about CN which was recorded and on Youtube.</a>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-7453857346306576702022-11-04T10:56:00.001-07:002022-11-05T16:19:00.703-07:00Two Papers about Refinement Types <p>The rule of thumb I use is that <a href="https://noamz.org">Noam Zeilberger</a> is generally five to ten years ahead of me in identifying interesting problems. A decade ago he was working intensively on the semantics of refinement types, and lo and behold, in the last couple of years so have I. So I'd like tto tell you about two draft papers developing both the theory of refinement types, and how to apply them to verification problems!</p>
<ul>
<li><p><a href="https://www.cl.cam.ac.uk/~nk480/liq.pdf"><i>Focusing on Liquid Refinement Typing</i></a>, Dimitrios J. Economou, Neel Krishnaswami, Jana Dunfield.</p>
<blockquote>
<p>We present a foundation systematizing, in a way that works for any evaluation order, the variety of mechanisms for SMT constraint generation found in index refinement and liquid type systems. Using call-by-push-value, we design a polarized subtyping relation allowing us to prove that our logically focused typing algorithm is sound, complete, and decidable, even in cases seemingly likely to produce constraints with existential variables. We prove type soundness with respect to an elementary domain-theoretic denotational semantics. Soundness implies, relatively simply, our system’s totality and logical consistency.</p>
</blockquote>
<p>In this paper, my coauthors and I decided to take a look at Liquid Types, which is one of the most successful offshoots of the refinement types family tree. Two things make it particularly powerful.</p>
<p>First, Liquid Types are engineered from the ground up for decidable typechecking. This ensures you never have to pray that a fragile pile of heuristics will be good enough to typecheck your program. Second, they extend the DML/Stardust style of refinement types with the idea of “measure functions”, which let you compute SMT predicates by recursion on the structure of your program data. This is a really potent extension which brings all kinds of deep functional correctness properties into reach for automated verification.</p>
<p>So we decided to build a small refinement type system on top of a call-by-push-value calculus that exemplifies these two properties. We then give an algorithmic type system (proved sound and complete with respect to the declarative spec), and then give a denotational semantics to our calculus, which shows that our refinement type system is consistent.</p></li>
<li><p><a href="https://www.cl.cam.ac.uk/~nk480/ert.pdf"><i>Explicit Refinement Types</i></a>, Jad Ghalayini and Neel Krishnaswami.</p>
<blockquote>
<p>We present 𝜆ert, a type theory supporting refinement types with explicit proofs. Instead of solving refinement constraints with an SMT solver like DML and Liquid Haskell, our system requires and permits programmers to embed proofs of properties within the program text, letting us support a rich logic of properties including quantifiers and induction. We show that the type system is sound by showing that every refined program erases to a simply-typed program, and by means of a denotational semantics, we show that every erased program has all of the properties demanded by its refined type. All of our proofs are formalised in Lean 4.</p>
</blockquote>
<p>In this paper, we take the opposite tack from the previous paper, in which we carefully restricted the system of refinements to ensure that an SMT solver (like Z3) could always discharge the constraints. SMT solvers, while powerful, struggle with logical features (such as quantifiers and induction) that are outside the core decidable fragment. (In fact, they basically have to struggle, because full first-order logic is undecidable!)</p>
<p>So in this paper, Jad and I designed a refinement type system where there are no solvers at all – all logical obligations are discharged with explicit proofs. As a result, our language of refinements easily extends to quantified formulas (eg, we can express that a function is monotonic very easily).</p></li>
</ul>
<p>One interesting thing is that both of these papers use very low-tech denotational semantics. The first paper uses elementary domain theory, and the second is even simper – it uses plain old sets ands functions. This was partly done to maximise the accessibility of our results – it’s sort of folklore that refinement types have to have complicated semantics, and I hoped we could push back against that belief.</p>
<p>However, Paul-André Melliès and Noam Zeilberger have a beautiful paper, <a href="http://noamz.org/papers/funts.pdf"><i>Functors are type refinement systems</i></a>, in which they give a uniform, structured treatment of refinement types in terms of fibrations. Our semantics in the two papers above can really be seen as what you get if you inlined all the abstractions in Noam and Paul-André’s paper. This makes our results elementary, but at the price of generality. Particularly if we want to pursue mechanised proofs further, then it would be a good idea to work explicitly with the fibrational interface, because that could let us change the semantics without having to go back and re-prove all of the properties relating syntax to semantics.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-79402047839742501502022-09-13T09:41:00.000-07:002022-09-13T09:41:59.518-07:00The Golden Age of PL Research<p>I was chatting with a PhD student while back, who remarked to me that many senior PL researchers seemed stuck in a defensive crouch. I though that was quite a striking observation, because (a) he was not wrong, but (b) people his age don't have or need that posture because <strong>we are currently in a golden age for PL research.</strong></p>
<p>What I mean by that is that there was a period where the research tradition in programming languages nearly died outright in the US, and a lot of our more senior researchers remember this very keenly. Basically, AI winter and cheap microcomputers killed the Lisp machine, and as collateral damage managed to nearly wipe out PL research in America -- many PL researchers in the US ended up moving overseas or changing their research focus.</p>
<p>However, over the course of about a decade (from the mid 1990s to the mid 2000s), the field made the transition from being on life support to being very vigorous. I started grad school towards the end of this transition period, and so I ended up with first hand view of the change.</p>
<p>Here are some of the factors that triggered the transition, both practical and theoretical. (Note that the thing about a first-hand view is that it is not a global view: I am sure other people will have different opinions about what happened!)</p>
<ol style="list-style-type: decimal">
<li><p><strong>Syntactic Type Soundness</strong> Before Felleisen and Wright's 1994 paper <a href="https://dl.acm.org/doi/10.1006/inco.1994.1093"><em>A Syntactic Approach to Type Soundness</em></a> appeared, proving type safety was surprisingly hard.</p>
<p>E.g., if you look at Milner's <a href="https://homepages.inf.ed.ac.uk/wadler/papers/papers-we-love/milner-type-polymorphism.pdf"><em>A Theory of Type Polymorphism in Programming</em></a>, you will see that what he does to prove type safety is:</p>
<ol style="list-style-type: lower-alpha">
<li>Solve a domain equation for a universal domain containing all the features of mini-ML</li>
<li>Give a denotational interpretation of untyped mini-ML</li>
<li>Define a logical relation as a family of retracts of the universal domain which exclude "wrong"</li>
<li>Prove the fundamental theorem of logical relations.</li>
</ol>
<p>This is a lot more work than the Wright-Felleisen approach, in which you just write down the typing rules, the reduction relation, and prove a handful of lemmas about it.</p>
<p>The technical simplicity of the Wright-Felleisen approach meant that it was very easy to apply PL design techniques to nearly arbitrary domains, and that made it possible to quickly build formal models of industrially interesting languages. Igarashi et al's <a href="https://dl.acm.org/doi/10.1145/503502.503505">Featherweight Java</a> is probably one of the earliest example of this, and more recently <a href="https://research.google/pubs/pub48341/">WebAssembly</a> has demonstrated that it is possible to build the formal spec in sync with the language design effort.</p></li>
<li><p><strong>Java</strong> Java was released in 1996, and its immediate success had a critical impact in popularising garbage collection. Before Java, using languages with garbage collection for application programming was an extremely controversial idea among working programmers. Guy Steele (who co-authored the Java spec) remarked about this on the <code>ll1-discuss</code> mailing list:</p>
<blockquote>
<p>We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp.</p>
</blockquote>
<p>Java also ended up dragging programmers halfway to ML! In 1998, Gilad Bracha, Martin Odersky, David Stoutamire and Phil Wadler proposed adding parametric polymorphism to Java, and in 2004 after 5 years of arguments it was added in Java 5. This also made a big impact in popularising advanced type systems features.</p></li>
<li><p><strong>Dynamic Languages</strong> Languages like Perl, Python, Ruby, and Javascript (in roughly that order) began achieving serious popularity in the late 90s/early 2000s, primarily as tools for building websites. Like Java, all of these languages had garbage collection (even if only reference counting), which helped normalise it.</p>
<p>But in addition to automatic memory management, these languages all supported first-class functions of one variety or another, and so gave <em>literally</em> millions of programmers their first introduction to functional programming.</p>
<p>Younger programmers will be surprised by just how controversial an addition first-class functions were! Java omitted it initially, despite its designers being expert Lisp and Smalltalk programmers. Even C#, which was released 6 years after Java, did not support them at first. IMO it was only because of the experience people had with dynamic languages that created the pressure to add them.</p>
<p>Nowadays, thanks to the influence of dynamic languages on the design zeitgeist, even languges whose design aesthetic is deliberately conservative, such as Go, include first-class functions without a second thought.</p></li>
<li><p><strong>SAT and SMT solvers</strong> CDCL-based SAT solvers was invented in the mid 1990s, and it revolutionised SAT solving. A few years later, by the early 2000s, the lazy approach to SMT was invented, which made it possible to reliably check whether quantifier-free formulas from all kinds of mixed domains were valid.</p>
<p>This has proved to be an immense boon for PL research across a wild variety of different research areas, from verification of operatiing systems code like the <a href="https://www.microsoft.com/en-us/research/publication/safe-to-the-last-instruction-automated-verification-of-a-type-safe-operating-system/">Verve project</a>, the <a href="https://klee.github.io/">KLEE symbolic execution engine</a>, and refinement type systems like <a href="https://ucsd-progsys.github.io/liquidhaskell-blog/">Liquid Haskell</a>.</p></li>
<li><p><strong>Separation logic</strong> Hoare logic, invented in 1968, was one of the first systematic approaches to proving the correctness of imperative programs. But despite serious efforts to use it in verification, such as <a href="https://dl.acm.org/doi/10.1145/390019.808307">the Euclid system</a>, it never quite caught on.</p>
<p>The major reason for this was that it was very difficult to reason about pointers and aliasing with Hoare logic. Every Hoare logic assertion talks about the whole state, and so you need to write down explicit disjointness assertions to make clear that modifying one pointer will not change the value of a different pointer. So if you have <code>n</code> pointers, then you need <code>O(n^2)</code> disjointness assertions.</p>
<p><a href="https://www.cs.cmu.edu/~jcr/seplogic.pdf">Separation logic</a>, invented by John Reynolds and Peter O'Hearn, solved this problem, by formalising the intuition that an assertion could really only be talking about <em>part</em> of the heap. This made modular reasoning about imperative programs radically easier, and has been used in numerous verification efforts since.</p>
<p>One minor fact about separation logic. John C. Reynolds invented separation logic when was 65. At the time that most people start thinking about retirement, he was making yet another giant contribution to the whole field!</p></li>
<li><p><strong>Mature Proof Assistants</strong> Work on the Coq proof assistant began in 1984, but it went through multiple successive rounds of reimplementation as it influenced and was influenced by competing systems like LEGO, HOL and Isabelle. It probably achieved a recognizably modern form in 2002, when the 7.0 release was made.</p>
<p>This was a milestone in terms of quality of implementation, since this is when (I think -- I wasn't there!) term normalisation started to happen via a bytecode VM. This made many things (like reflective proof) much faster, which made it feasible to build really large proofs in Coq. Indeed, Georges Gonthier reports that <a href="https://www.cl.cam.ac.uk/~lp15/Pages/4colproof.pdf">his 2005 mechanised proof of the 4-colour theorem</a> began as an experiment to stress test the new fast VM!</p>
<p>The result that made CS researchers outside of PL stand up and take notice was not the four color theorem, though. It was Xavier Leroy's correctness proof of a C compiler in the <a href="https://compcert.org/">CompCert project</a>, in 2006. People have been talking about program verification since the 1960s, and this was one of the first big programs that was actually verified -- and in a proof assistant, no less.</p>
<p>Nowadays, you can go to a systems conference like SOSP and find <a href="https://sosp19.rcs.uwaterloo.ca/program.html">whole sessions about verification</a>, which would have been unthinkable when I was a student.</p>
<p>Coq is not the only proof assistant, of course, and all its competitor systems (e.g., Isabelle) have seen substantial improvements in their maturity as well.</p></li>
</ol>
<p>Some other important, but maybe less vital, factors:</p>
<ol start="4" style="list-style-type: decimal">
<li><p><strong>Step-indexing</strong> In 2001, <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.5695">McAllester and Appel</a> showed how you could build a logical relations model using "step-indexing" as an alternative to the domain theoretic approach. This is a less general, but more elementary and flexible, approach to modelling the semantics of recursion than techniques like domain theory. The flexibility meant that languages with features like <a href="https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.906.2181&rep=rep1&type=pdf">ML-style modules</a>, <a href="http://nickbenton.name/hlsl.pdf">machine code</a>, and <a href="http://sro.sussex.ac.uk/id/eprint/1428/1/ICALP05.pdf">self-modifying code</a> could all be given straightforward models.</p>
<p>One small thing that makes me smile is that this is now the modern approach to proving type safety for fancy languages -- and that it is very similar to the approach the Wright/Felleisen approach displaced! I feel (i.e., believe but don't have a real argument for) that you need a big enough community or ecosystem to support a technically sophisticated research tradition, and now that PL is a bigger field again we can once more consider using advanced techniques.</p></li>
<li><p><a href="https://www.seas.upenn.edu/~sweirich/papers/lx/lxpaper.pdf"><em>Flexible Type Analysis, Crary and Weirich, ICFP 1999</em></a> Compared to some of the other papers I've linked to, this paper is mostly forgotten -- according to Google Scholar, it has less than 10 citations in the last five years. However, IMO it really is a pivotal paper.</p>
<p>What makes it so very important is that after writing this paper, <a href="https://www.cis.upenn.edu/~sweirich/">Stephanie Weirich</a> began collaborating with <a href="https://simon.peytonjones.org/">Simon Peyton-Jones</a>. Stephanie, Simon, and their students and collaborators, began systematically extending GHC Haskell with the features described in this paper. And once they finished with that they carried on with many more extensions!</p>
<p>This has been an <em>immensely</em> productive collaboration. It's why Haskell has grown so many "fancy" type system features, and in turn it has made it possible to actually try out many things in the literature in actual programming. The value of this can't be overstated: a huge number of PL researchers got their first introduction to what type theory can do from the advanced extensions to GHC.</p></li>
<li><p><strong>Model Checking</strong> This goes under "smaller factors" not because it's actually a smaller factor, but because I am very ignorant about the state of the art! It's long been important in hardware, but in software, the biggest success I know of is Microsoft's <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/slam-sdvrp.pdf">Static Driver Verifier</a>, which is used to model check drivers before they are certified for Windows. This is obviously a big deal, but I can't really contextualise this because I don't know much about model checking. However, I am under the impression that SAT/SMT-based bounded model checking supplanted the older BDD-based approach, which would mean that model checking also benefited from the SAT/SMT revolution.</p></li>
</ol>
<p>The net effect of all this is that a motivated, ambitious student can aim at problems which <strong>simultaneously</strong> have significant theoretical depth and substantial practical impact -- stuff that would have been field-shaking results 20 years ago are almost routine now.</p>
<p>For example, <a href="https://uk.linkedin.com/in/maja-trela-75a216172">Maja Trela</a>, one of my recently-graduated Part III (i.e., Master's) students here at Cambridge, just completed her dissertation under the primary supervision of <a href="https://www.cl.cam.ac.uk/~caw77/">Conrad Watt</a>. In her dissertation, <a href="https://www.cl.cam.ac.uk/~nk480/maja-diss.pdf">she used a separation logic library for Isabelle to verify the correctness of an efficient Webassembly interpreter</a>, which is now being used as the reference interpreter by <a href="https://bytecodealliance.org/">the Bytecode Alliance</a>!</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com7tag:blogger.com,1999:blog-8068466035875589791.post-44538585683365336972022-03-17T10:53:00.001-07:002022-03-17T10:53:38.932-07:00Fold Considered Annoying<p>I recently read Shriram Krishamurthi and Kathi Fisler's ICER 2021 paper, <a href="https://dl.acm.org/doi/abs/10.1145/3446871.3469739"><em>Developing Behavioral Concepts of Higher-Order Functions</em></a>. In this paper, they study not the <em>theory</em> of higher-order functions, bur rather the <em>pedagogy</em> of higher-order functions: how do we teach students how to program with them?</p>
<p>In particular, they studied how students understood different higher-order functions -- e.g., <code>map</code>, <code>filter</code>, <code>fold</code> and co. -- by asking them to perform tasks like classifying which functions were similar, which example input/outputs could conceivably be produced by particular HOFs, and so on.</p>
<p>Since this is a blog, I can say that one thing I really liked about this paper is that it confirmed my prejudices! In particular, they found that <code>fold</code> was much harder for students to understand than any of the other higher-order functions they studied.</p>
<p>This is something I have long believed, both from anecdotal evidence and for mathematical reasons. I'll pass over the anecdotes (since the paper has actual data), and focus instead on what I see as the core mathematical issue: the specification of <code>fold</code> is <em>dramatically</em> more complex than the specification of <code>map</code> or <code>filter</code>.</p>
<p>Both <code>map</code> and <code>filter</code> have extremely simple equational specifications. For example, <code>map</code> is characterised by the following three laws:</p>
<pre><code>map f [] = []
map f (xs ++ ys) = (map f xs) ++ (map f ys)
map f [x] = [f x]</code></pre>
<p>Basically, this says that map takes nils to nils, distributes over appends, and applies <code>f</code> to elements pointwise. All three of these equations are universally-quantified, unconditional equations, which is about as simple as a specification can get.</p>
<p>The specification for filter is only a little more complex:</p>
<pre><code>filter p [] = []
filter p (xs ++ ys) = (filter p xs) ++ (filter p ys)
filter p [x] = if p(x) then [x] else []</code></pre>
<p><code>filter</code> does nothing to nils, distributes over appends, and for each element either keeps it or discards it. Again, All three of these equations are universally-quantified, unconditional equations.</p>
<p>On the other hand, what is the specification for <code>fold</code>? That is given by its universal property, which reads:</p>
<pre><code> [h [] = nil ∧
(∀x, xs. cons x (h xs) = h (x :: xs))] // Uh-oh!
iff
(fold nil cons = h)</code></pre>
<p>This is not an equational specification at all! It is a bi-implication, and while the right-to-left direction is basically the definition of <code>fold</code>, the left-to-right direction contains a formula with a logical quantifier in the premise. We know from nearly a century of experience with high school calculus that quantifier alternations are a giant obstacle to understanding: every year thousands of students find themselves utterly baffled by epsilon-delta arguments, and here we have something <em>harder</em> – we have an implication with a universally quantified premise.</p>
<p>So I've always been very careful whenever I teach students about the <code>fold</code> function, because I think it has a very challenging spec for students to wrap their heads around.</p>
<p>If you're watching out for quantifiers, that also suggests that <code>take-while</code> will also be one of the more challenging for functions students – <code>takewhile p xs</code> returns the longest prefix of <code>xs</code> satisfying the predicate <code>p</code>, and "longest prefix" is going to be involve another higher-rank quantified formula.</p>
<p>The evidence in the paper seems weakly consistent with this idea, in that they report that <code>take-while</code> does seem harder for students, but it does not seem to be as much harder as I would have expected.</p>
<p>Anyway, I found this a really interesting paper!</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com4tag:blogger.com,1999:blog-8068466035875589791.post-69454257984319358762022-03-03T09:19:00.000-08:002022-03-03T09:19:22.472-08:00Simple Type Inference for System F<p>Henry Mercer, Cameron Ramsay, and I have a new draft paper on type inference out! Check out <a href="https://www.cl.cam.ac.uk/~nk480/implicit-polarized-f.pdf"><em>Implicit Polarized F: Local Type Inference for Impredicativity</em></a>.
<blockquote>
<p>
System F, the polymorphic lambda calculus, features
the principle of impredicativity: polymorphic types
may be explicitly instantiated at other types,
enabling many powerful idioms such as Church encoding
and data abstraction. Unfortunately, type applications
need to be implicit for a language to be human-usable,
and the problem of inferring all type applications in
System F is undecidable. As a result, language
designers have historically avoided impredicative type
inference.
</p>
<p>
We reformulate System F in terms of
call-by-push-value, and study type inference for
it. Surprisingly, this new perspective yields a novel
type inference algorithm which is extremely simple to
implement (not even requiring unification), infers
many types, and has a simple declarative
specification. Furthermore, our approach offers type
theoretic explanations of how many of the heuristics
used in existing algorithms for impredicative
polymorphism arise.
</p>
</blockquote>
<p>This algorithm is <em>absurdly</em> easy to implement, too.
<p>Because we don't need unification, it also looks like this kind of algorithm ought to play very nicely with dependent types. My PhD student <a href="https://www.cst.cam.ac.uk/people/ik404">Ilya Kaisin</a> is looking at this problem now, so stay tuned for even more. Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-19538007368238197482022-01-17T05:30:00.002-08:002022-01-17T05:30:17.789-08:00Static typing vs. Dynamic Typing vs. Dana Scott<p>If you read about programming languages on the Internet, you'll inevitably run into (or even participate in) long arguments about whether dynamic or static typing is better. A long time ago, before I went to grad school, I enjoyed arguing with people on the Internet about this very topic.</p>
<p>But, I no longer find myself interested in that debate. This is for two reasons. One, like so many addicts before me, I've moved onto harder drugs, such as frequentism versus Bayesianism. (Frequentism is good, actually, and p-values are one of the great inventions of science. Just so you know.)</p>
<p>Two, I learned that these debates largely end up being echoes of an argument Dana Scott had with himself. You can tell he's a genius, since he ended up producing many of the strongest arguments for <strong>and</strong> many of the strongest arguments against!</p>
<p>One of the best arguments against grounding semantics on the untyped lambda calculus is Dana Scott's unpublished paper from 1969, <a href="https://www.sciencedirect.com/science/article/pii/030439759390095B/"><em>A type-theoretical alternative to ISWIM, CUCH, OWHY</em></a>. In this paper, he argues that untyped syntactic models are difficult to reason about, and we should design meta-languages with a clear mathematical status, in order to facilitate reasoning about languages and programs.</p>
<p>But why is it unpublished? Let's see what Scott himself has to say:</p>
<blockquote>
<p>In particular, it was recognized that there were posets whose continuous function spaces of self-maps could be identified with the given posets themselves. And so there actually were “traditional” models of lambda-calculus that could be defined without first going through the proof theory of the formal system itself (and which could be related to many other mathematically meaningful structures and theories as well).</p>
<p>This revelation was both gratifying and embarrassing. After writing with tiresome sarcasm about the lack of meaning in the type-free lambda-calculus introduced only formally, the author himself found an interesting kind of semantical interpretation for the “type-free” language.</p>
</blockquote>
<p>The result that a universal domain can be constructed from these lattice theoretic models of the lambda calculus is described in his epochal 1971 paper <a href="https://www.cs.ox.ac.uk/files/3229/PRG07.pdf"><em>Continuous Lattices</em></a>.</p>
<p>Indeed, Scott's results continue to have a lot of influence today. This goes well beyond being a technical tool in the toolbox: the sensibility and style of thinking of he invented continues to pervade our research culture and shapes the latest methods.</p>
<ol style="list-style-type: decimal">
<li><p>The invention of the universal domain taught us that an inconsistent logic can still have a meaningful semantics. Prior to this work, it was believed that inconsistent logics were degenerate (due to everything following from false), but the D∞ model disproves that — it dramatically expands the kinds of arguments that we can model mathematically.</p>
<p>This is incredibly philosophically important, since you can't use mathematics to study something you don't have a mathematical model of! And since as mathematicians we spend a lot of our time making wrong arguments, it's very valuable to be able to understand them.</p>
<p>(As an unrelated aside, this is one of the things that I as a proof theorist most admire about finite model theory – things like conflict-driven clause learning in SAT solvers exploit the idea that being wrong can be productive in a really elegant way.)</p></li>
<li><p>Having a universal domain makes it possible to formalise the intuition that a type is a subset of the set of all possible values in a mathematically coherent way. Naively, this idea is problematic, but with a universal domain a la Scott the approach works ("types as retracts" in the jargon).</p>
<p>This approach underpins the functioning of program logics like Hoare logic (see Mellies and Zeilberger's paper <a href="http://noamz.org/papers/funts.pdf"><em>Functors are Type Refinement Systems</em></a>), is used to define the semantics of dependent type theories like Nuprl, and is also important for formulating the semantics of System F, the polymorphic lambda calculus.</p></li>
<li><p>The construction of the universal domain as the limit of a sequence of finitary approximations lets you derive the abstract domains of abstract interpretation by taking Scott's inverse limit construction and cutting off the sequence at a finite point rather than going all the way to infinity. This is important for understanding how static analysis fits into the landscape of semantics -- see David Schmidt's paper <a href="http://santos.cs.ksu.edu/schmidt/MFPS09/paper.pdf"><em>Abstract Interpretation from a Denotational Semantics Perspective</em></a>.</p></li>
<li><p>In the domain-theoretic model, the idea of finitary information is realised by modelling a program as either giving you an answer or going into an infinite loop. If you change the idea of no-answer from looping to signalling an error (say, with an exception), then you can similarly derive gradual typing!</p>
<p>Max S. New's and Amal Ahmed's <a href="https://arxiv.org/abs/1807.02786"><em>Graduality from Embedding-projection Pairs</em></a> really nails this idea down, but it has been one of the driving intuitions since the very beginning of work on the topic. <a href="https://www.sciencedirect.com/science/article/pii/0167642394000042">Fritz Henglein's original paper</a> explicitly formulates the rules for casts in analogy with the embedding-projection pairs found in the inverse limit construction.</p>
<p>There's an important lesson here about how analogies can be productive for a long time before we give a completely formal expression of them -- Henglein wrote his paper in <strong>1994!</strong></p></li>
<li><p>Since Dana Scott has very good taste, his emphasis was on the algebraic properties of his model construction, and so he quickly moved from complete lattices to <a href="https://www.cs.bham.ac.uk/~axj/pub/papers/handy1.pdf">domain theory</a>.</p>
<p>But this also meant that he laid out the path for people to look at other ways of obtaining universal domains. One super-important approach was pioneered by Maurice Nivat and the "Dutch school" of denotational semantics in the late 1970s and early 1980s, where instead of domains people looked at metric spaces and Banach's theorem. Bakker and Zucker's <a href="https://dl.acm.org/doi/abs/10.1145/800070.802188">1982 paper is still quite readable</a> is a still-readable account of the early thinking on this topic, and it reached a mature form with <a href="https://www.cwi.nl/~janr/papers/files-of-papers/1989-JCSS.pdf">the work of America and Rutten</a>.</p>
<p>This technique still extremely heavily used today, except for various reasons we have renamed it <a href="http://www.ccs.neu.edu/home/amal/ahmedsthesis.pdf">"step-indexing"</a>. This is the key technical mechanism which lets us apply universal domain constructions to give a <a href="https://nickbenton.name/hlsl.pdf">good semantics to assembly language</a>, because the ability to pun data and code by setting the PC and jumping needs the same mathematical machinery as interpreting the untyped lambda calculus.</p></li>
<li><p>This point about the equivocation of code and data in assembly language holds more generally. For example, <a href="https://semantic-domain.blogspot.com/2016/05/lobs-theorem-is-almost-y-combinator.html">Löb's theorem can be understood as a modal variant of the derivation of the Y-combinator</a>, illustrates how the interaction of self-reference and quotation resembles the ideas behind the D∞-style structure. This idea hasn't been worked out as nicely as New and Ahmed worked things out for gradual types, but what's a blog post without a provocation to future research?</p></li>
</ol>
<p>Thinking of "typed versus untyped languages" sort of misses the point, theoretically. We can neither pull them apart, nor do we really want to. To quote Heraclitus (as translated by Kathleen Freeman):</p>
<blockquote>
<p>They do not understand how that which differs with itself is in agreement: harmony consists of opposing tension, like that of the bow and the lyre.</p>
</blockquote>
<!--
#### Postscript
This would make a great place to stop, but I won't!
Everything I've written is written from the perspective of a specific
sensibility and value system – namely, my own. Paul-Andre Mellies once
remarked to me that it was a shame no one had discovered the
proof-theoretic invariants of assembly language, and I found his
assertion so striking that I have remembered it ever since. It's
really not a very obvious thing to think, and so it has stayed with me
a reminder that the unity of theory and practice is not a fact, but an
act.
No one else is obliged to share my views, and mostly, they don't. If
you went and talked to, say, [Jan Vitek](http://janvitek.org/), he
would be motivated by roughly none of what I have written here. He
works on dynamic languages because they are used by many programmers
writing many important programs, and making life better for them is
good!
This is not intended as a slight, but rather as a reminder that what
gets people out of bed in the morning varies.
-->
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-49138725166782788202021-12-30T23:51:00.001-08:002021-12-30T23:51:52.714-08:00What is a programming language?<p>One of the more surprising facts about the discipline of programming language theory is that it is actually possible to define what programming languages are in a reasonably mathematically satisfying way.</p>
<ol style="list-style-type: decimal">
<li><p>A <em>language</em> is a presentation of a (generalised) algebraic theory. Basically, think of a language as a set of generators and relations in the style of abstract algebra.</p>
<p>You need to beef up the universal algebra story a bit to handle variables and binding (e.g., see the work of Fiore and Hamana on higher-order algebraic theories), but the core intuition that a language is a set of generators for terms, plus a set of equations these terms satisfy is exactly the right one.</p>
<p>For example:</p>
<ol style="list-style-type: decimal">
<li>the simply-typed lambda calculus</li>
<li>regular expressions</li>
<li>relational algebra</li>
</ol></li>
<li><p>A <em>model</em> of a a language is literally just any old mathematical structure which supports the generators of the language and respects the equations.</p>
<p>For example:</p>
<ol style="list-style-type: decimal">
<li>we can model the typed lambda calculus using sets for types and mathematical functions for terms,</li>
<li>we can model regular expressions as denoting particular languages (ie, sets of strings)</li>
<li>we can model relational algebra expressions as sets of tuples</li>
</ol></li>
<li><p>A <em>model of computation</em> or <em>machine model</em> is basically a description of an abstract machine that we think can be implemented with physical hardware, at least in principle. So these are things like finite state machines, Turing machines, Petri nets, pushdown automata, register machines, circuits, and so on. Basically, think of models of computation as the things you study in a computability class.</p>
<p>Nearly all models of computation are transition systems, where you have a set of states, and a transition function which tells you how a computation can change one state into another. There are lots of generalisations of this (relations instead of functions, I/O, system calls, message passing communication, etc.), and the slick way of accounting for this generality is by modelling transition systems as coalgebras (i.e., the set <code>S</code> paired with a coalgebra map <code>step : S → F(S)</code>, where <code>F</code> is a functor.</p>
<p>In addition to this, every model of computation also comes with a notion of <em>observation</em> – in its most basic form, how do you tell when a program has finished executing? Observations can get quite intricate: to model computational complexity, you have to be able to observe runtime and memory usage, and for concurrent programs you can observe intermediate program states (due to the possibility of concurrent interference). More practically, the compiler correctness theorem for something like CompCert says that the sequence of system calls the compiled binary produces will be the same as the one the C abstract machine will produce.</p>
<p>I don't know a nice algebraic/categorical characterisation of the notion of observation, though probably Alexandra Silva does.</p>
<p>The Church-Turing thesis characterises which abstract machines we think it is possible to physically implement.</p>
<p>Some examples of models of computation are:</p>
<ol style="list-style-type: decimal">
<li>Turing machines,</li>
<li>finite state machines</li>
<li>RAM machine programs</li>
</ol></li>
<li><p>A language is a <em>programming language</em> when you can give at least one model of the language via a machine model.</p>
<p>For example:</p>
<ol style="list-style-type: decimal">
<li><p>the types of the lambda calculus can be viewed as partial equivalence relations over Gödel codes for some universal turing machine, and the terms of a type can be assigned to equivalence classes of the corresponding PER.</p></li>
<li><p>Regular expressions can be interpreted into finite state machines quotiented by bisimulation.</p></li>
<li><p>A set in relational algebra can be realised as equivalence classes of B-trees, and relational algebra expressions as nested loops walking over these trees.</p></li>
</ol>
<p>Note that in all three cases we have to quotient the states of the machine model by a suitable equivalence relation to preserve the equations of the language's theory.</p>
<p>This quotient is <em>very</em> important, and is the source of a lot of confusion. It hides the equivalences the language theory wants to deny, but that is not always what the programmer wants – e.g., is merge sort equal to bubble sort? As mathematical functions, they surely are, but if you consider them as operations running on an actual computer, then we will have strong preferences! (Conversely, the ability to equate radically different machine configurations is the reason why optimising compilers and refactoring are things.)</p>
<p>Moreover, we have to set up this quotient so it agrees with the machine model's notion of observation – for example, we want exactly and only the strings accepted by a regular expression to end in accepting states of the finite state machine. This property is called <em>adequacy</em>, and is what justifies the claim that this is a reasonable implementation of the programming language.</p></li>
</ol>
<p>In other words, a programming language is a <em>language</em> which has an <em>adequate implementation</em> in a <em>model of computation</em>. Note that this definition does not mandate the implementation strategy!</p>
<p>A common source of confusion arises from the fact that if you have a nice type-theoretic language (like the STLC), then:</p>
<ol style="list-style-type: decimal">
<li>the term model of this theory will be the initial model in the category of models, and</li>
<li>you can turn the terms into a machine model by orienting some of the equations the lambda-theory satisfies and using them as rewrites.</li>
</ol>
<p>These oriented equations are where operational semantics comes from, at least for purely functional languages. As a result we often abuse language to say the simply typed lambda calculus "is" a programming language.</p>
<p>But in fact, the lambda calculus is a presentation of an algebraic theory, and it is the same programming language regardless of whether it is implemented by term rewriting or by compilation to x86 machine code.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com5tag:blogger.com,1999:blog-8068466035875589791.post-2570335456367009942021-12-01T06:11:00.003-08:002021-12-06T01:42:46.708-08:00Obliteratingly Fast Parser Combinators
<p><a href="https://www.cl.cam.ac.uk/~jdy22/">Jeremy Yallop</a>, <a href="https://xnning.github.io/">Ningning Xie</a>, and I have a new draft out about combinator parsing, entitled <a href="https://www.cl.cam.ac.uk/~nk480/flap.pdf"><em>Fusing Lexing and Parsing</em></a>. The abstract is short, but sweet:</p>
<blockquote>
<p>Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity, but harmful for performance.</p>
<p>We show how to fuse together separately-defined lexers and parsers, drastically improving performance without compromising modularity. Our staged parser combinator library, flap, provides a standard parser combinator interface, but generates psecialized token-free code that runs several times faster than ocamlyacc on a range of benchmarks.</p>
</blockquote>
<p>I'm excited about this for three reasons.</p>
<p>First, even though the performance is way better this time around, going from competitive with lex+yacc to <strong>3-9 times faster,</strong> the implementation is actually <em>simpler</em> than in <a href="https://www.cl.cam.ac.uk/~nk480/parsing.pdf">our 2019 paper</a> – we don't need a complex staging pipeline because we introduce an IR which makes all the invariants needed for fast code generation lexically apparent.</p>
<p>Second, really fast parsers are potentially a powerful tool for applications like IDEs. If you are fast enough, you are free from having to mess around with things like incremental parsing, because if you can parse a whole editor buffer in a millisecond it's fine to just reparse everything on every keystroke. Imagine what could happen if we could fused in typechecking, too! That's currently just wishful thinking (aka "future work"), of course, but since typechecking is mostly a tree traversal, it's not totally hopeless. </p>
<p>Finally, at a meta-level this is a good illustration of the incorrectness of the standard engineer proverb "fast, good, cheap – pick two". This advices presumes that we lie on <a href="https://en.wikipedia.org/wiki/Pareto_efficiency#Use_in_engineering">the Pareto frontier</a>, so the best you can do is manage tradeoffs. But that's actually a false assumption – it is possible to do things which are unconditionally better along all three dimensions.</p>
<p>No source code is ready for release yet, but Jeremy is likely to
release an OPAM package for it in due course. </p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com7tag:blogger.com,1999:blog-8068466035875589791.post-85382534689480693152021-11-17T04:18:00.000-08:002021-11-17T04:18:27.385-08:00POPL till you DROPL <p>My PhD student Faustyna Krawiec, along with me and several other coauthors, has a new paper out, which is due to appear at <a href="">POPL 2022</a>:</p>
<blockquote>
<p><a href="https://www.cl.cam.ac.uk/~nk480/higher-order-ad.pdf"><em><strong>Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation</strong></em></a></p>
<p>Faustyna Krawiec, Simon Peyton-Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew Fitzgibbon.</p>
<p>In this paper, we give a simple and efficient implementation of reverse-mode automatic differentiation, which both extends easily to higher-order functions, and has run time and memory consumption linear in the run time of the original program. In addition to a formal description of the translation, we also describe an implementation of this algorithm, and prove its correctness by means of a logical relations argument.</p>
</blockquote>
<p>We give an incredibly simple implementation of reverse-mode AD, which works in a language with higher-order functions, and is asymptotically efficient. When we started the work I was hoping it would be a good excuse to learn some differential geometry, but it turned out there is such a simple algorithm that we didn't need any fancy machinery! So I still don't know any differential geometry, but we've got a cool AD algorithm to share.</p>
<p>That's the only POPL paper this year with my name on it, but it turns out that a lot of students and close collaborators have papers without me as well! It might not matter for my academic prestige, but it makes me happy, especially when I see how interesting the work is!</p>
<ol style="list-style-type: decimal">
<li><p>My Phd student, <a href="https://www.cl.cam.ac.uk/~ds709/">Dima Szamozvancev</a> and his <em>other</em> advisor <a href="https://www.cl.cam.ac.uk/~mpf23/">Marcelo Fiore</a>, have a paper <a href="https://www.cl.cam.ac.uk/~ds709/agda-soas/"><em>Formal Metatheory of Second-Order Abstract Syntax</em></a>. They reformulated Marcelo's work on second-order abstract syntax to fit into a constructive type theory, and then mechanised it in Agda. As a result, you can give a description of a syntax, and automatically get out the substitution operators and their correctness property. One particularly powerful feature is that it doesn't just handle binding, but also metavariables. This is essential if you want handle things like inference rules as first-class values!</p></li>
<li><p>My newest PhD student, <a href="https://www.cst.cam.ac.uk/people/ik404">Ilya Kaysin</a>, has a paper, <em>The Leaky Semicolon</em> (no link yet) with several other people giving a compositional semantics to weak memory. As a fun fact, Ilya is entering his PhD with more POPL publications than I exited with. :)</p></li>
<li><p><a href="https://vikraman.org/">Vikraman Chaudhury</a>, is a (former) PhD student of <a href="https://amr-sabry.luddy.indiana.edu">Amr Sabry</a>'s, but he spent a year visiting me and we wrote a paper together on <a href="https://arxiv.org/abs/1907.07283">the semantics of capabilities</a>. At this POPL, he has a paper with Amr Sabry and Jacek Karwowski, <a href="https://arxiv.org/abs/2110.0540"><em>Symmetries in Reversible Programming: From Symmetric Rig Groupoids to Reversible Programming Languages</em></a>4, which gives a nice semantics for reversible programming languages.</p></li>
</ol>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-7841902141749349732021-09-28T05:36:00.000-07:002021-09-28T05:36:23.562-07:00Design Benchmarks<p>A few weeks ago, <a href="https://parentheticallyspeaking.org/">Shriram Krishnamurthi</a> asked <a href="https://twitter.com/ShriramKMurthi/status/1435707726676561935">an interesting question on Twitter</a>:</p>
<blockquote>
<p>It's common in PL to have benchmarks for <em>performance</em>. But does anyone know of PL <em>design</em> benchmarks?</p>
</blockquote>
<p>This caught my attention because my longest-running research program -- my work on functional reactive programming, with 7 papers so far -- has been explicitly guided by a handful of design benchmarks.</p>
<p>I've never written them down explcitly, so I thought I'd take the opportunity to do so.</p>
<ol style="list-style-type: decimal">
<li><p><strong>The language must validate the full beta- and eta-laws for all of the type constructors in the language.</strong></p>
<p>This is a proxy benchmark, of course. My <em>actual</em> goal is to make languages which are easy to program with and reason about.</p>
<p>However, this is an open-ended and vague goal, and so I needed to find a specific, concrete operationalisation of it. There are many possible proxies for this goal, and the one I picked was to look for programming languages which validate a rich collection of semantically-justified equations. This specific choice was made for three reasons.</p>
<ul>
<li><p>First, this goal it pushes me away from spending time on quality-of-life features like rich standard libraries. These features obviously do make a language easier to use, but they do not directly touch upon any FRP-specific problems.</p></li>
<li><p>Second, the search for well-behaved equational properties pushes me towards taking features known to be useful, and trying to decompose them into smaller type-theoretic components.</p>
<p><a href="https://www.asktog.com/papers/raskinintuit.html">As Jeff Raskin once noted</a>, in user interfaces "intuitive" and "familiar" are basically synonyms, which means it is good to have a design benchmark which does not penalise doing things that are different from the old standard.</p></li>
<li><p>Third, <a href="https://en.wikipedia.org/wiki/D%C3%A9formation_professionnelle"><em>déformation professionnelle</em></a>. This particular goal is the one that the toolbox I've acquired over the years -- expertise in semantics and type theory -- is best-adapted to move me towards.</p>
<p>If my expertise lay in running user studies, writing model checkers, or programming with continuations, then I would probably approach the problem differently! For example, <a href="http://www.cs.cmu.edu/~garnet/garnetIEEE.pdf">the work on the Garnet toolkit</a> is both extremely good and very influential, and very different in character from anything I have pursued.</p>
<p>Obviously, I could learn how to write a constraint solver or run a user study, but the art is long, and life short. I think it's OK for people to focus on different things, and to learn from each other. :)</p></li>
</ul></li>
<li><p><strong>I have a small collection of tiny programs which are intended to model various common UI patterns, and the languages I design must express these programs.</strong></p>
<p>The purpose of these programs is basically to prevent over-simplifying away real complexity. If we're trying to optimise a language design for ease of reasoning, the easiest way of "achieving" the goal is to design an impoverished language.</p>
<p>For example, I check whether I can write programs which dynamically create new widgets, to ensure that I can't design something which restricts programmers to a fixed dataflow graph. Likewise, a program with two widgets sending notifications "sideways" to each other keeps me from trying to rely on hierarchical encapsulation. (This is super tempting, but as far as I can tell, no actual program consistently does this.)</p></li>
<li><p><strong>The language must admit a simple implementation strategy.</strong></p>
<p>This design principle arose as a reaction to a negative lesson.</p>
<p>My first paper on higher-order FRP, <a href="https://www.cl.cam.ac.uk/~nk480/frp-lics11.pdf">Ultrametric Semantics of Reactive Programs</a>, used a system of dynamic, incremental dependency propagation (a baby version of the stuff <a href="http://adapton.org/">Matt Hammer</a> and <a href="http://www.umut-acar.org/publications-by-topic#TOC-Self-Adjusting-Computation">Umut Acar</a> used to do), and it ended up requiring a really weird implementation strategy for closures. In addition to being able to call them, I also needed to be able to move forward them in time, and this required a really weird set of hacks to go into a closure and move everything in its captured environment forward in time as well.</p>
<p>This was painful to implement, difficult to prove correct, and it leaked memory like a sieve. Trying to deal with these issues led directly to <a href="https://www.cl.cam.ac.uk/~nk480/simple-frp.pdf">my ICFP 2013 paper on FRP</a>, which is basically a type system for immediate mode GUI programming.</p>
<p>Note that I don't mind if the implementation strategy is challenging to prove correct (eg, my <a href="https://www.cl.cam.ac.uk/~nk480/transfinite-step-indexing.pdf">POPL 2021 paper with Simon Spies and Derek Dreyer</a> is about inventing some needed machinery for modelling retained-mode GUI implementations), but the actual running code has to be simple.</p>
<p>This means I haven't looked at any dataflow or constraint stuff in years, but sometimes a design rule is there to prevent you from doing certain things.</p></li>
</ol>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-43668596562236030242021-07-08T15:54:00.002-07:002021-07-08T15:54:58.974-07:00Postdoctoral opening in the TypeFoundry project<p>I have an opening for a
postdoctoral position in Cambridge with the European ERC Consolidator Grant project
TypeFoundry. This is a 2-year position, with the possibility of extension. <br /></p><p>The TypeFoundry project aims to use recent developments in
proof theory and semantics, such as polarized type theory and
call-by-push-value, to identify the theoretical structures underpinning
bidirectional type inference.<br /><br />Informal enquiries are welcome; please contact me at: <a href="mailto:nk480@cl.cam.ac.uk"><nk480@cl.cam.ac.uk></a>.<br /><br />More details and an application form can be found at <<a href="https://www.jobs.cam.ac.uk/job/30485/">https://www.jobs.cam.ac.uk/job/30485/</a>>.<br /><br /></p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-27851696254136863782021-02-15T14:24:00.000-08:002021-02-15T14:24:41.735-08:00Five (and a Half) Derivatives in Language Theory<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<p>Thanks to the influence of machine learning, differentiating programs has become a big business. However, there are a lot of other things in programming language theory which are also derivatives, and people sometimes get confused about the inter-relationships. So I decided to lay out some of them and some of their interconnections.</p>
<h3 id="brzozowski-derivatives">1. Brzozowski Derivatives</h3>
<p>The first kind of derivative is the humble, yet beloved, <a href="https://en.wikipedia.org/wiki/Brzozowski_derivative">Brzozowski derivative</a>. Given a character set <span class="math inline">\(\Sigma\)</span> and a regular language <span class="math inline">\(L \subseteq \Sigma^\ast\)</span>, the Brzozowski derivative of <span class="math inline">\(L\)</span> with respect to the character <span class="math inline">\(c \in \Sigma\)</span> is</p>
<p><span class="math display">\[
\delta_c(L) = \left\{ w \in \Sigma^\ast \;\mid|\; c \cdot w \in L \right\}
\]</span></p>
<p>That is, it's the subset of <span class="math inline">\(L\)</span> prefixed with <span class="math inline">\(c\)</span>, minus the leading character. It can be defined as a transformations on regular expressions <span class="math inline">\(r\)</span> as follows:</p>
<p><span class="math display">\[
\begin{array}{lcll}
\delta_c(0) & = & \bot & \\
\delta_c(r_1 + r_2) & = & \delta_c(r_1) + \delta_c(r_2) & \\
\delta_c(c') & = & \epsilon & \mbox{when } c = c'\\
\delta_c(c') & = & \bot & \mbox{when } c \neq c' \\
\delta_c(\epsilon) & = & \bot & \\
\delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 & \mbox{when $r_1$ not nullable} \\
\delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 + \delta_c(r_2) & \mbox{when $r_1$ nullable} \\
\delta_c(r \ast) & = & \delta_c(r) \cdot r\ast & \\
\end{array}
\]</span></p>
<p>The Brzozowski derivative is called a derivative for two reasons. First, it is linear with respect to <span class="math inline">\(+\)</span> on regular expressions -- i.e., <span class="math inline">\(\delta_c(r_1 + r_2) = \delta_c(r_1) + \delta_c(r_2)\)</span>. Second, it satisfies the derivative rule for monomials <span class="math inline">\(\delta_c(c^{n+1}) = c^n\)</span>. You may recall that the high-school rule for derivatives says <span class="math inline">\(\delta_c(c^{n+1}) = n \times c^n\)</span>, but recall that in regular expressions, <span class="math inline">\(r + r = r\)</span>, so <span class="math inline">\(n \times r = r\)</span>.</p>
<p>However, this is a weak notion of derivative, which satisfies few of properties with other derivatives. The reason for this is that the rule for products does not match the product rule for high-school derivatives -- if the Brzozowski derivative satisfied the high school rule, it would be</p>
<p><span class="math display">\[
\delta_c(r_1 \cdot r_2) = \delta_c(r_1) \cdot r_2 + r_1 \cdot \delta_c(r_2)
\]</span></p>
<h3 id="differential-algebras">2. Differential algebras</h3>
<p>If we try to change the definition of Brzozowski derivative to match the high school rule, we get what is called a <a href="https://en.wikipedia.org/wiki/Differential_algebra"><em>differential algebra</em></a>. A differential algebra is a semiring <span class="math inline">\(S\)</span>, plus a collection of unary functions <span class="math inline">\(\delta_i : S \to S\)</span> which are called the <em>derivations</em>. The axioms for derivations mirror the rules for differentiating polynomials -- the motivating model for differential algebras is a semiring of polynomials with addition and multiplication of polynomials as the semiring structure, and the derivation is the derivatives with respect to the polynomial's variable. The rules we get are:</p>
<p><span class="math display">\[
\begin{array}{lcll}
\delta_c(0) & = & 0 & \\
\delta_c(r_1 + r_2) & = & \delta_c(r_1) + \delta_c(r_2) & \\
\delta_c(c') & = & \epsilon & \mbox{when } c = c'\\
\delta_c(c') & = & \bot & \mbox{when } c \neq c' \\
\delta_c(\epsilon) & = & 0 & \\
\delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 + r_1 \cdot \delta_c(r_2) & \\
\delta_c(r \ast) & = & r\ast \cdot \delta_c(r) \cdot r\ast & \\
\end{array}
\]</span></p>
<p>Note that the only things that changed is the clause for concatenation and the Kleene star. The axioms of a differential algebra actually say nothing about Kleene star, because they are specified for semirings, but to handle full regular languages you need to change the definition so that the differential respects the equation <span class="math inline">\(r \ast = \epsilon + r \cdot r\ast\)</span>. If memory serves, Dexter Kozen has studied this extension. (Even if memory doesn't serve, he's a safe bet...!)</p>
<p>If you look at this definition for a while, you realize that the original Brzozowski derivative removes a single character <span class="math inline">\(c\)</span> from the <em>front</em> of a string, and the differential removes a single <span class="math inline">\(c\)</span> from <em>anywhere</em> in the string.</p>
<p>For regular algebras, this is a peculiar thing to want to do, but it suddenly becomes a lot more interesting when we move from semirings of polynomials to semiring categories of polynomial functors.</p>
<h3 id="derivatives-of-data-types">3. Derivatives of Data Types</h3>
<p>One interesting factoid is that sets "look like" a semiring. The Cartesian product <span class="math inline">\(A \times B\)</span> and the disjoint union <span class="math inline">\(A + B\)</span> obviously don't satisfy the semiring equations, but if you replace the equality symbol with isomorphisms, then they do. If you take this analogy seriously and try categorify the notion of a semiring, we get what is variously called a <a href="https://ncatlab.org/nlab/show/rig+category"><em>rig category</em>, <em>semiring category</em>, or <em>bimonoidal category</em></a>.</p>
<p>Instead of a set plus two binary operations, we have a category with two monoidal structures with the semiring equations becoming isomorphisms. The exact commuting diagrams needed were worked out in the early 70s, independently by Laplaza and Kelly.</p>
<p>One semiring category very useful category for programming is the category of <a href="https://ncatlab.org/nlab/show/polynomial+functor"><em>polynomial functors</em></a>. Most inductive datatypes like lists, binary trees, and so on can be understood as the least fixed point of sum and products. We can formalise this idea by giving a grammar of sums, products and</p>
<p><span class="math display">\[
\begin{array}{lcll}
F,G & ::= & \mathrm{Id} & \\
& | & \underline{0} \\
& | & F \oplus G & \\
& | & \underline{1} \\
& | & F \otimes G & \\
& | & K(A) & \mbox{where } A \in \mathrm{Set} \\
\end{array}
\]</span></p>
<p>Every term of this grammar can be interpreted as a functor:</p>
<p><span class="math display">\[
\newcommand{\interp}[1]{[\![ #1 ]\!]}
\begin{array}{lcll}
\interp{F} & : & Set \to Set \\
\interp{\mathrm{Id}} & = & X \mapsto X \\
\interp{\underline{0}} & = & X \mapsto
\interp{F \oplus G} & = & X \mapsto \interp{F}\,X + \interp{G}\,X \\
\interp{F \otimes G} & = & X \mapsto \interp{F}\,X \times \interp{G}\,X \\
\interp{K(A)} & = & X \mapsto A
\end{array}
\]</span></p>
<p>So you can see that <span class="math inline">\(\oplus\)</span> is interpreted as the coproduct functor, <span class="math inline">\(\otimes\)</span> is the product functor, <span class="math inline">\(K(A)\)</span> is the constant functor, and <span class="math inline">\(\mathrm{Id}\)</span> is interpreted as the identity functor.</p>
<p>The least fixed point of one of these polynomials is an inductive type. So to model lists of type X, we would define the list functor</p>
<p><span class="math display">\[
\mathrm{ListF} \triangleq K(1) \oplus (K(X) \otimes \mathrm{Id})
\]</span></p>
<p>and then take its least fixed point.</p>
<p>Since we have a semiring category, we can ask if it has a categorical analogue of a derivation, to give it differential structure. And these polynomials do! The derivative with respect to the variable <span class="math inline">\(\mathrm{Id}\)</span> can be defined as</p>
<p><span class="math display">\[
\begin{array}{lcll}
\delta(K(A)) & = & 0 & \\
\delta(\underline{0}) & = & 0 & \\
\delta(F \oplus G) & = & \delta(F) \oplus \delta(G) & \\
\delta(\mathrm{Id}) & = & 1 & \\
\delta(\underline{1}) & = & 0 & \\
\delta(F \otimes G) & = & \delta(F) \otimes G \oplus F \otimes \delta(G) & \\
\end{array}
\]</span></p>
<p>If you have a functor for a binary tree:</p>
<p><span class="math display">\[
\mathrm{TreeF} \triangleq \underline{1} \oplus (\mathrm{Id} \otimes \mathrm{Id})
\]</span></p>
<p>Then the differential is: <span class="math display">\[
\delta(\mathrm{TreeF}) = (\underline{1} \otimes \mathrm{Id}) \oplus (\mathrm{Id} \otimes \underline{1})
\]</span></p>
<p>This tells you whether the left or the right position was chosen. The utility of this is that given an element <span class="math inline">\(x \in X\)</span>, and you have an element of <span class="math inline">\(\delta(F)\,X\)</span>, then you can "fill in the gap" to get an element of X. That is, you can define:</p>
<p><span class="math display">\[
\mathit{fill}_F : \delta(F)\,X \times X \to F\,X
\]</span></p>
<p>If you take <span class="math inline">\(X\)</span> to be the type of trees <span class="math inline">\(\mu(\mathrm{TreeF})\)</span>, this gives you an indication of whether to take the left or right subtree, plus the tree you didn't follow. Then a list of these things gives you a path into a tree -- which is just Huet's famous <em>zipper</em>.</p>
<p>The exposition so far largely glosses Conor McBride's famous paper <a href="http://strictlypositive.org/diff.pdf"><em>The Derivative of a Regular Type is the Type of its One-Hole Contexts</em></a>.</p>
<p>The precise connection to linearity and derivatives was explored by Abbot, Altenkirch, Ghani and McBride in their paper <a href=""><em>Derivatives of Containers</em></a>, using significantly more categorical machinery. In their paper, Altenkirch <em>et al</em> explain how the <span class="math inline">\(\mathit{fill}\)</span> operation is really a <em>linear</em> function</p>
<p><span class="math display">\[
\mathit{fill}_F : \delta(F)\,X \otimes X \multimap F\,X
\]</span></p>
<p>and how <span class="math inline">\(\delta(F)\)</span> can be viewed as giving a kind of tangent for <span class="math inline">\(F\)</span> at <span class="math inline">\(X\)</span>.</p>
<h3 id="differential-linear-logic">4. Differential Linear Logic</h3>
<p>The appearance of linear types here should be delightful rather than shocking. We think of the ususal derivative as associating a tangent vector space to each point of an original space, and some of the primordial models of linear type theory are finite-dimensional vector spaces and linear transformations (a model of the multiplicative-additive fragment of classical linear logic, indeed also of the exponentials when the field the vector space is over is finite), and Banach spaces and short maps, a simple model of intuitionistic linear logic including the full exponential.</p>
<p>As a result, we should expect that we should be able to find a way of making differentiation makes sense within linear type theory, and since a type theory is the initial model of a class of models, this means that derivative-like constructions are something we should look for any time linearity arises.</p>
<p>The pure form of this recipe can be found in Erhard and Regnier's <a href="https://arxiv.org/abs/1606.01642">differential linear logic</a>. One of the really cool features of this line of work is that derivatives arise as an extension to the usual structural rules for the derivatives -- you just adjust how to treat variables and derivatives appear like magic. One fact about this line of work is that it has tended to focus on proof nets rather than sequent calculus, which made it less accessible than one would like. What solved this issue for me was Marie Kerjean's <a href="https://lipn.univ-paris13.fr/~kerjean/papers/EDP18.pdf"><em>A Logical Account for Linear Partial Differential Equations</em></a>.</p>
<p>I do want to admit that despite starting off this section saying you shouldn't be shocked, actually I am -- this stuff really seems like sorcery to me. So even though I believe intellectually that everything should work out like this, I'm still astounded that it does. (It's rather like the feeling I get whenever I run a program I did a machine-checked correctness proof of.)</p>
<h3 id="automatic-differentiation">5. Automatic Differentiation</h3>
<p>These days, the most famous kind of differentiation in language theory is, of course, <a href="https://en.wikipedia.org/wiki/Dual_number"><em>automatic differentiation</em></a>, which takes a program and rewrites it to produce a new program computing the derivative of the original program. Automatic differentiation comes in two main varieties, called <em>forward mode</em> and <em>reverse mode</em>.</p>
<p>The idea of forward mode is incredibly simple -- you just treat real numbers as an abstract datatype, and replace your original real number implementation with a new one based on <a href=""><em>dual numbers</em></a>. Intuitively, a dual number is a pair <span class="math inline">\(a + b \epsilon\)</span>, where <span class="math inline">\(\epsilon\)</span> is just a formal infinitesimal which is "nilpotent" (i.e., <span class="math inline">\(\epsilon^2 = 0\)</span>).</p>
<p>Then the data abstraction properties of the typed lambda calculus ensure that when you run the program with the dual number implementation of reals, you are returned a pair of the original value and its partial derivatives (i.e. the Jacobian matrix). <a href="http://blog.sigfpe.com/2005/07/automatic-differentiation.html">Dan Piponi blogged about this over 15 years ago(!)</a>, but even though I read it at the time, the algorithm was so simple I failed to recognise its importance!</p>
<p>Reverse mode is more complicated -- it comes from the observation that in machine learning, you are generally optimising a function <span class="math inline">\(R^n \to R\)</span>, where the result is some scoring function. Since <span class="math inline">\(n\)</span> might be very large and the dimensionality of the result is just 1, you can achieve superior computational complexity by computing the transpose of the Jacobian rather than the Jacobian itself.</p>
<p>When you do so, the resulting syntactic program transformations all look like they are doing fancy continuation manipulations, and their semantic counterparts all look like manipulations of the dual space. So the syntatic continuation/double-negation operations are reflecting the fact that that finite-dimensional vector spaces are isomorphic to their double-duals (i.e., <span class="math inline">\(V^{\ast\ast} \simeq V\)</span>).</p>
<p>In addition to being really neat, this is actually a really nice problem to look at. People basically care about reverse mode AD because it goes fast, which means you get this lovely interplay between operational concerns and semantic models. A second bonus is that AD offers computer scientists a good reason to learn some differential geometry, which in turn is a good excuse to learn general relativity.</p>
<h3 id="incremental-computation">6? Incremental Computation</h3>
<p>After all this talk of differentiation, you might wonder -- what about finite differences?</p>
<p>It turns out that they have a nice interpretation in lambda calculus, too! In 2015, Yufei Cai, Paolo Giarusso, Tillman Rendel and Klaus Ostermann introduced <a href="https://inc-lc.github.io/"><em>the incremental lambda calculus</em></a>.</p>
<p>The basic idea behind this is that for each type <span class="math inline">\(X\)</span>, you introduce a type of "changes" <span class="math inline">\(\Delta X\)</span> for it too. A morphism between <span class="math inline">\((X, \Delta X)\)</span> and <span class="math inline">\((Y, \Delta Y)\)</span> is a pair of functions <span class="math inline">\(f : X \to Y\)</span> and <span class="math inline">\(\mathit{df} : X \to \Delta X \to \Delta Y\)</span>, where <span class="math inline">\(\mathit{df}\)</span> is the incrementalization -- <span class="math inline">\(df\,x\,\mathit{dx}\)</span> tells you how the to change the output of <span class="math inline">\(f\)</span>,x to account for the change <span class="math inline">\(\mathit{dx}\)</span> to the input.</p>
<p>This looks a lot like what you get out of the tangent space functor, but it turns out that the incremental lambda calculus is <em>not</em> simply a derivative a la the differential lambda calculus, because the derivative of a function is unique, and these incrementalisations don't have to be. Michael Arntzenius and I used the ILC <a href="https://www.cl.cam.ac.uk/~nk480/seminaive-datafun.pdf">in our paper on extending seminaive evaluation from Datalog to a higher-order language</a>, and it was critical for the efficiency of our result that we had this freedom.</p>
<p>Mario Picallo-Alvarez worked out answers to a lot of the semantics questions underlying this in <a href="https://arxiv.org/abs/2002.05256">his PhD thesis, <em>Change actions: from incremental computation to discrete derivatives</em></a>, showing how incremental lambda calculus arises as a generalisation of the models of the differential lambda calculus.</p>
<p>He noted that the "real" semantics for change types should look like a category, in which a type is a category where the objects are values, and a morphisms <span class="math inline">\(\delta : v_1 \to v_2\)</span> is evidence that <span class="math inline">\(v_1\)</span> can be changed into <span class="math inline">\(v_2\)</span>. On the grounds that 2-categories are painful, Mario worked with change <em>monoids</em> (i.e., no typing, but yes to identities and composition), and the Giarusso-style change structures that Michael and I used retained the typed changes, but discarded all the categorical axioms. So there's more to be done here!</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com1tag:blogger.com,1999:blog-8068466035875589791.post-52808302031203874662020-12-14T05:30:00.002-08:002020-12-14T05:30:42.470-08:00TypeFoundry: new ERC Consolidator Grant<p>I am very pleased to have <a href="https://erc.europa.eu/news/CoG-recipients-2020">received an ERC Consolidator Grant</a> for my TypeFoundry proposal.
<p>This will be a five year project to develop the foundations of bidirectional type inference. If you are interested in pursuing a PhD in this area, or conversely, are finishing a PhD in this area, please get in touch!
<blockquote>
<p> Many modern programming languages, whether developed in industry, like Rust or Java, or in academia, like Haskell or Scala, are typed. All the data in a program is classified by its type (e.g., as strings or integers), and at compile-time programs are checked for consistent usage of types, in a process called type checking. Thus, the expression <code>3 + 4</code> will be accepted, since the + operator takes two numbers as arguments, but the expression <code>3 + "hello"</code> will be rejected, as it makes no sense to add a number and a string. Though this is a simple idea, sophisticated type system can track properties like algorithmic complexity, data-race freedom, differential privacy, and data abstraction.</p>
<p>In general, programmers must annotate programs to tell compilers the types to check. In theoretical calculi, it is easy to demand enough annotations to trivialize typechecking, but this can make the annotation burden unbearable: often larger than the program itself! So, to transfer results from formal calculi to new programming languages, we need type inference algorithms, which reconstruct missing data from partially-annotated programs.</p>
<p>However, the practice of type inference has outpaced its theory. Compiler authors have implemented many type inference systems, but the algorithms are often ad-hoc or folklore, and the specifications they are meant to meet are informal or nonexistent. The makes it hard to learn how to implement type inference, hard to build alternative implementations (whether for new compilers or analysis engines for IDEs), and hard for programmers to predict if refactorings will preserve typability.</p>
<p>In <b>TypeFoundry</b>, we will use recent developments in proof theory and semantics (like polarized type theory and call-by-push-value) to identify the theoretical structure underpinning type inference, and use this theory to build a collection of techniques for type inference capable of scaling up to the advanced type system features in both modern and future languages. </p>
</blockquote>
<p> One of the things that makes me happy about this (beyond the obvious benefits of research funding and the recognition from my peers) is that it shows off the international character of science. I'm an Indian-American researcher, working in the UK, and being judged and funded by researchers in Europe. There has been a truly worrying amount of chauvinism and jingoism in public life recently, and the reminder that cosmopolitanism and universalism are real things too is very welcome.
<p>Also, if you are planning on submitting a Starting Grant or Consolidator proposal to the ERC in the coming year about programming languages, verification or the like, please feel free to get in touch, and I'll be happy to share advice. Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com4tag:blogger.com,1999:blog-8068466035875589791.post-80031034353817676972020-12-02T14:05:00.002-08:002020-12-02T14:06:42.197-08:00Church Encodings, Inductive Types, and Relational Parametricity<p>My blogging has been limited this past year due to RSI, but I do not want to leave things entirely fallow, and last year I wrote an email which can be edited into a decent enough blog post.</p>
<p>Quite often, people will hear that System F, the polymorphic lambda calculus, satisfies a property called <em>relational parametricity</em>. We also often hear people say that the parametricity property of System F lets you prove that a Church encoding of an inductive datatypes actually satisfies all the equational properties we expect of inductive types.</p>
<p>But it's surprisingly hard to find an explicit account of how you go from the basic properties of parametricy (the relational interpretation of System F and the identity extension principle) to the proof that inductive types are definable. I learned how this works from Bob Atkey's paper <a href="https://bentnib.org/fomega-parametricity.html"><em>Relational Parametricity for Higher Kinds</em></a>, but the proofs are kind of terse, since his page count was mostly focused on the new stuff he had invented.</p>
<p>In the sequel, I'm going to assume that you know what the relational model of system F looks like, that the term calculus satisfies <em>the abstraction theorem</em> (i.e., the fundamental theorem of logical relations), and also that the model satisfies the <em>identity extension property</em> -- if you take the relational interpretation of a type <code>B(α)</code>, and fill in the type variable <code>α</code> with the equality relation for the type <code>A</code>, then the relational interpretation of <code>B[A/α]</code> will the equality relation for the type <code>B[A/α]</code>. In what follows I will often write <code>Id[X]</code> to mean the equality relation for the type <code>X</code>.</p>
<p>For completeness' sake, I also give a quick summary of the model at the end of the post.</p>
<h2 id="haskell-style-functors-in-system-f">Haskell-style functors in System F</h2>
<p>Given all this preface, we now define a <em>functor</em> <code>F</code> as a type expression <code>F(-)</code> with a hole, along with an operation <code>fmap : ∀a,b. (a → b) → F(a) → F(b)</code>, such that</p>
<pre><code>fmap _ _ id = id
fmap _ _ f ∘ fmap _ _ g = fmap _ _ (f ∘ g)</code></pre>
<p>I've written _ to indicate System F type arguments I'm suppressing. Basically think of these as Haskell-style functors.</p>
<p>Next, we can define the inductive type <code>μF</code> using the usual Church encoding as:</p>
<pre><code>μF = ∀a. (F(a) → a) → a
foldF : ∀a. (F(a) → a) → μF → a
foldf = Λa λf : F(a) → a. λx:μF. x [a] f
inF : F(μF) → μF
inF x = Λa. λf : F(a) → a.
let g : μF → a = fold [μF] [a] f in
let h : F(μF) → F(a) = fmap [μF] [a] g in
let v : F(a) = h x in
f v</code></pre>
<p>I've written out <code>inF</code> with type-annotated local bindings to make it easier to read. If you inlined all the local bindings and suppressed type annotations, then it would read:</p>
<pre><code>inF : F(μF) → μF
inF x = Λa. λf : F(a) → a.
f (fmap (fold f) x)</code></pre>
<h2 id="f-algebra-homomorphisms-and-the-graph-lemma">F-algebra Homomorphisms and the Graph Lemma</h2>
<p>Now, we can prove a lemma about F-algebra homomorphisms, which is the tool we will use to prove initiality. But what are F-algebra homomorphisms?</p>
<p>An <em>F-algebra</em> is a pair <code>(X, g : F(X) → X)</code>. An <em>F-algebra homomorphism</em> between two F-algebras <code>(X, k : F(X) → X)</code> and <code>(Y, k' : F(Y) → Y)</code> is a function <code>f : X → Y</code> such that the following ASCII-art diagram commutes:</p>
<pre><code> F(X) — k → X
| |
fmap X Y f f
↓ ↓
F(Y) — k' → Y</code></pre>
<p>That is, for all <code>u : F(X)</code>, we want <code>f(k u) = k'(fmap [X] [Y] f u)</code></p>
<p>Before we prove something about homomorphisms, we need to prove a technical lemma called the <em>graph lemma</em>, which will let us connect parametricity with our functorial definitions.</p>
<p><strong>Graph Lemma</strong>: Let <code>(F, fmap)</code> be a functor, and <code>h : A → B</code> be a function. Define the <em>graph relation</em> <code><h></code> to be the relation <code>{(a,b) | b = f(a) }</code>. Then <code>F<h> ⊆ <fmap [A] [B] h></code>.</p>
<p>Proof:</p>
<ol style="list-style-type: decimal">
<li>First, note that <code>fmap</code> has the type <code>∀a b. (a → b) → F(a) → F(b)</code>.</li>
<li><p>Since <code>fmap</code> is parametric, we can choose the relations <code><h></code> and the identity to instantiate <code>F</code> with, giving us:</p>
<pre><code>(fmap [A] [B], fmap [B] [B]) ∈ (<h> → Id[B]) → (F<h> → Id<B>)</code></pre></li>
<li>Note that <code>(h, id) ∈ <h> → Id[B]</code>.</li>
<li>Hence <code>(fmap [A] [B] h, fmap [B] [B] id) ∈ (F<h> → Id<B>)</code></li>
<li>By functoriality, <code>(fmap [A] [B] h, id) ∈ (F<h> → Id<B>)</code></li>
<li>Assume <code>(x, y) ∈ F<h></code>.
<ol style="list-style-type: decimal">
<li>Hence <code>(fmap [A] [B] h x, id y) ∈ Id<B>)</code></li>
<li>So <code>fmap [A] [B] h x = y</code>.</li>
</ol></li>
<li>Therefore <code>(x, y) ∈ <fmap [A] [B] h></code>.</li>
<li><p>Therefore <code>F<h> ⊆ <fmap [A] [B] h></code>.</p></li>
</ol>
<p>Graph relations are just functions viewed as relations, and the graph lemma tells us that the relational semantics of a type constructor applied to a graph relation <code><h></code> will behave like the implementation of the <code>fmap</code> term. In other words, it connects the relational semantics to the code implementing functoriality. (As an aside, this feels like it should be an equality, but I only see how to prove the inclusion.)</p>
<p>We use this lemma in the proof of the homomorphism lemma, which we state below:</p>
<p><strong>Homomorphism Lemma</strong>: Let <code>(F, fmap)</code> be a functor. Given two F-algebras <code>(A, k : F(A) → A)</code> and <code>(B, k' : F(B) → B)</code>, and an F-algebra homomorphism <code>h : (A,k) → (B,k')</code>, then for all <code>e : μF</code>, we have</p>
<pre><code>e [B] k' = h (e [A] k)</code></pre>
<p>Proof:</p>
<ol style="list-style-type: decimal">
<li><p>First, by the parametricity of <code>e</code>, we know that</p>
<pre><code> (e [A], e [B]) ∈ (F<h> → <h>) → <h></code></pre></li>
<li>We want to apply the arguments <code>(k, k')</code> to <code>(e [A], e [B])</code>, so we have to show that <code>(k, k') ∈ F<h> → <h></code>.</li>
<li><p>To show this, assume that <code>(x, y) ∈ F<h></code>.</p>
<ol style="list-style-type: decimal">
<li>Now we have to show that <code>(k x, k' y) ∈ <h></code>.</li>
<li>Unfolding the definition of <code><h></code>, we need <code>(h (k x), k' y) ∈ Id[B]</code>.</li>
<li>Since <code>h</code> is an F-algebra homomorphism, we have that <code>h (k x) = k' (fmap [A] [B] h x)</code>.</li>
<li>So we need to show <code>(k' (fmap A B h x), k' y) ∈ Id[B]</code>.</li>
<li>Now, we know that <code>(x, y) ∈ F<h></code>.</li>
<li>By the graph lemma, we know <code>F<h> ⊆ <fmap [A] [B] h></code>.</li>
<li>So <code>(x, y) ∈ <fmap [A] [B] h></code>.</li>
<li>Unfolding the definition, we know <code>y = fmap [A] [B] h x</code>.</li>
<li>So we want <code>(k' (fmap [A] [B] h x), k' (fmap [A] [B] h x)) ∈ Id[B]</code>.</li>
<li>Since <code>Id[B]</code> is an equality relation, this is immediate.</li>
</ol></li>
<li>Hence <code>(k, k') ∈ F<h> → <h></code>.</li>
<li>Therefore <code>(e [A] k, e [B] k') ∈ <h></code>.</li>
<li><p>Unfolding the definition of <code><h></code>, we know <code>e [B] k' = h(e [A] k)</code>. This is what we wanted to show.</p></li>
</ol>
<h3 id="discussion">Discussion:</h3>
<p>The whole machinery of F-algebra homomorphisms basically exists to phrase the commuting conversions in a nice way. We just proved that for <code>e : μF</code>, we have</p>
<pre><code> e [B] k' = h (e [A] k)</code></pre>
<p>Recall that for Church encoded inductive types, the fold is basically the identity, so this result is equivalent (up to beta) to:</p>
<pre><code> foldF [B] k' e = h (foldF [A] k' e)</code></pre>
<p>So this lets us shifts contexts out of iterators if they are F-algebra homomorphisms. Note that this proof was also the one where the graph lemma is actually used.</p>
<h2 id="the-beta-and-eta-rules-for-inductive-types">The Beta and Eta Rules for Inductive Types</h2>
<p>Next, we'll prove the beta- and eta-rules for inductive types. I'll do it in three steps:</p>
<ul>
<li>The beta rule (eg, <code>let (x,y) = (e1, e2) in e' == [e1/x, e2/y]e')</code>,</li>
<li>The basic eta rule (eg, <code>let (x,y) = e in (x,y) == e)</code></li>
<li>The commuting conversions (eg, <code>C[e] = let (x,y) = e in C[(x,y)])</code></li>
</ul>
<p><strong>Theorem (beta rule)</strong>: If <code>k : F(A) → A</code> and <code>e : F(μF)</code>, then</p>
<pre><code> foldF [A] k (inF e) = k (fmap [μF] [A] (foldF [A] k) e)</code></pre>
<p>Proof: This follows by unfolding the definitions and beta-reducing. (You don't even need parametricity for this part!)</p>
<p>This shows that for any F-algebra <code>(A, k), foldF [A] k</code> is an F-algebra homomorphism from <code>(μF, inF)</code> to <code>(A, k)</code>.</p>
<p><strong>Theorem (basic eta)</strong> For all <code>e : μF</code>, we have <code>e = e [μF] inF</code></p>
<p>Proof:</p>
<ol style="list-style-type: decimal">
<li>Assume an arbitrary <code>B</code> and <code>k : F(B) → B</code>.
<ol style="list-style-type: decimal">
<li>Note <code>h = foldF [B] k</code> is an F-algebra homomorphism from <code>(μF, inF)</code> to <code>(B, k)</code> by the beta rule.</li>
<li>Then by our lemma, <code>e [B] k = h (e [μF] inF)</code>.</li>
<li>Unfolding, <code>h (e [μF] inF) = foldF [B] k (e [μF] inF)</code>.</li>
<li>Unfolding, <code>foldF [B] k (e [μF] inF) = e [μF] inF [B] k</code>.</li>
<li>So <code>e [B] k = e [μF] inF [B] k</code>.</li>
</ol></li>
<li>So for all <code>B</code> and <code>k : F(B) → B</code>, we have <code>e [B] k = e [μF] inF [B] k</code>.</li>
<li>By extensionality, <code>e = e [μF] inF</code>.</li>
</ol>
<p><strong>Theorem (commuting conversions)</strong>: If <code>k : F(A) → A</code> and <code>f : μF → A</code> and <code>f</code> is an F-algebra homomorphism from <code>(μF, inF)</code> to <code>(A, k)</code>, then <code>f = foldF [A] k</code>.</p>
<p>Proof:</p>
<ol style="list-style-type: decimal">
<li>We want to show <code>f = foldF [A] k</code>.</li>
<li>Unfolding <code>foldF</code>, we want to show <code>f = λx:μF. x [A] k</code></li>
<li>Assume <code>e : μF</code>.
<ol style="list-style-type: decimal">
<li>Now we will show f e = e [A] k.</li>
<li>By the homomorphism lemma, and the fact that <code>f : (μF, inF) → (A, k)</code> we know that <code>e [A] k = f (e [μF] inF)</code>.</li>
<li>By the basic eta rule, <code>e [A] k = f e</code></li>
</ol></li>
<li>So for all <code>e</code>, we have <code>f e = e [A] k</code>.</li>
<li>Hence by extensionality <code>f = λx:μF. x [A] k</code>.</li>
</ol>
<p>Note that what I (as a type theorist) called "commuting conversions" is exactly the same as "initiality" (to a category theorist), so now we have shown that the Church encoding of a functor actually lets us define the initial algebra.</p>
<p>Thus, we know that inductive types are definable!</p>
<h2 id="appendix-the-relational-model-of-system-f">Appendix: the Relational Model of System F</h2>
<p>In this appendix, I'll quickly sketch one particular model of System F, basically inspired from the PER model of Longo and Moggi. What we will do is to define types as <em>partial equivalence relations</em> (PERs for short) on terms. Recall that a partial equivalence relation <code>A</code> on a set <code>X</code> is a relation on <code>X</code>, which is (a) symmetric (i.e., if <code>(x, x') ∈ R</code> then <code>(x',x) ∈ R</code>), and (b) transitive (if <code>(x,y) ∈ X</code> and <code>(y,z) ∈ R</code> then <code>(x,z) ∈ R</code>).</p>
<p>We take the set of semantic types to be the set of PERs on terms which are closed under evalation in either direction:</p>
<pre><code>Type = { X ∈ PER(Term) | ∀x, x', y, y'.
if x ↔∗ x' and y ↔∗ y'
then (x,y) ∈ X ⇔ (x',y') ∈ X}</code></pre>
<p>PERs form a complete lattice, with the meet given by intersection, which is the property which lets us interpret the universal quantifier of System F. We can then take a <em>type environment</em> <code>θ</code> to be a map from variables to semantic types, and interpret types as follows:</p>
<pre><code>〚a〛θ ≜ θ(α)
〚A → B〛θ ≜ { (e, e') ∈ Term × Term | ∀ (t,t') ∈ 〚A〛θ.
(e t,e' t') ∈ 〚B〛θ }
〚∀a. A〛θ ≜ ⋂_{X ∈ Type} 〚A〛(θ, X/α) </code></pre>
<p>This is the <em>PER model of System F</em>, which is quite easy to work with but not quite strong enough to model parametricity. To model parametricity, we need a <em>second</em> semantics for System F, the so-called relational semantics.</p>
<p>A relation <code>R</code> between two PERs <code>A</code> and <code>B</code> over a set <code>X</code>, is a a relation on <code>X</code> which respects <code>A</code> and <code>B</code>. Suppose that <code>(a,a') ∈ A</code> and <code>(b,b') ∈ B</code>. Then, a relation <code>R</code> respects <code>A</code> and <code>B</code> when <code>(a, b) ∈ R</code> if and only if <code>(a', b') ∈ R</code>.</p>
<p>Now, suppose we have two environments <code>θ₁</code> and <code>θ₂</code> sending type variables to types, and a relation environment <code>ρ</code> that sends each type variable <code>a</code> to a relation respecting <code>θ₁(a)</code> and <code>θ₂(a)</code>. Then we can define the <em>relational interpretation</em> of System F types as follows:</p>
<pre><code>R〚a〛θ₁ θ₂ ρ ≜ θ(α)
R〚A → B〛θ ≜ { (e, e') ∈ Term × Term | ∀ (t,t') ∈ R〚A〛θ₁ θ₂ ρ.
(e t,e' t') ∈ R〚B〛θ₁ θ₂ ρ }
R〚∀a. A〛θ ≜ ⋂_{X, Y ∈ Type, R ∈ Rel(X,Y)}
R〚A〛(θ₁, X/a) (θ₂, Y/a) (ρ, R/a)</code></pre>
<p>Additionally, we need to redefine the PER model's interpretation of the <code>∀a.A</code> case, to use the relational model:</p>
<pre><code>〚∀a. A〛θ ≜ ⋂_{X, Y ∈ Type, R ∈ Rel(X,Y)}
R〚A〛θ θ (Id[θ], R/a)</code></pre>
<p>Here, to give a PER interpetation for the forall type, we use the relational interpretation, duplicating the type environment and using the identity relation for each variable (written <code>Id(θ)</code>). By identity relation, we mean that given a PER <code>A</code>, the PER <code>A</code> is a relation between between <code>A</code> and <code>A</code> which respects <code>A</code>.</p>
<p>This modified definition ensures the model satisfies the <em>identity extension property</em> -- if you take the relational interpretation of a type <code>B(α)</code>, and fill in the type variable <code>α</code> with the equality relation for the type <code>A</code>, then the relational interpretation of <code>B[A/α]</code> will the equality relation for the type <code>B[A/α]</code>. In what follows I will often write <code>Id[X]</code> to mean the equality relation for the type <code>X</code>.</p>
<p>The term calculus for System F also satisfies <em>the abstraction property</em>, (aka the fundamental property of logical relations), which says that given a well typed term <code>Θ; · ⊢ e : A</code> and two type environments <code>θ₁</code> and <code>θ₂</code>, and a relation environment <code>ρ</code> between them, then <code>e</code> is related to itself in <code>R〚A〛θ θ ρ</code>.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com1tag:blogger.com,1999:blog-8068466035875589791.post-39340577551217828612020-06-19T14:51:00.002-07:002020-06-19T14:51:40.674-07:00PLDI 2020 Conference Report<p>I just finished "attending" <a href="https://conf.researchr.org/home/pldi-2020">PLDI 2020</a>, a programming languages conference. Like many conferences in computer science, due to COVID-19, on short notice the physical conference had to be cancelled and replaced with an online virtual conference. Talks were streamed to Youtube, questions were posted to a custom Slack channel, there was a startling variety of online chat applications to hold discussions with, and so on.</p>
<p>It was obviously an enormous amount of work to put together (there was even a custom chat application, <a href="https://www.clowdr.org">Clowdr</a>, written specifically for SIGPLAN(?) conferences), which makes me feel very unkind to report that for me, the online conference experience was a complete waste of time.</p>
<p>So, what went wrong?</p>
<p>To understand this, it is worth thinking about what the purpose of a conference is. The fundamental purpose of a conference is to meet people with shared interests and talk to them. The act of talking to people is fundamental, since it is how (a) you get to see new perspectives about subjects you are interested in, and (b) how you build the social networks that make it possible for you to become and remain an expert. (Recall the 19th century economist Alfred Marshall's description of this process: "The mysteries of the trade become no mysteries; but are as it were in the air, and children learn many of them unconsciously.”)</p>
<p>Even talks -- the things we build conferences around -- exist to facilitate this process of conversation. Talks at conferences really have two important functions: first, the topic of a talk is a filtering device, which helps identify the subset of the audience which is interested in this topic. This means that in the break after the talk, it is now much easier to find people to talk to who share interests with you.</p>
<p>Second, talks supply Schelling-style focal points: you and your interlocutor have common knowledge that you are both interested in the topic of the session, and you both also know that you saw the talk, which gives you a subject of conversation. (Note: as a speaker, you do communicate with your audience, but indirectly, as your talk becomes the seed of a conversation between audience members, which they will use to develop their own understandings.)</p>
<p>The fundamental problem with PLDI 2020 was that it was frustratingly hard to <em>actually talk to people</em>. The proliferation of timezones and chat applications meant that I found it incredibly difficult to actually find someone to talk to -- at least one of the apps (gather.town) just never worked for me, sending me into an infinite sign-up loop, and the others were either totally empty of people who were actually there, or did not seem suited for conversation. (One key issue is that many people log in to an app, and then stop paying attention to it, which makes presence indications useless.)</p>
<p>So if social distancing and travel restrictions due to COVID-19 remain in force (as seems likely), I think it would be better to simply convert our PL conferences fully into journals, and look for other ways to make online networking happen. The sheer amount of labour going into PLDI 2020 supplies strong evidence that simply trying to replicate a physical conference online cannot be made to work with any amount of effort.</p>
<p>However, before I convince everyone that I am totally right about everything, there are still online conferences that will happen -- for example, ICFP in August. So to make sure that I can look forward to the experience rather than dreading it, I will need to make a plan to actually make sure I talk to people.</p>
<p>So, if you are reading this, and are (or know) a researcher in PL who would like to talk, then give me a ping and we can arrange something for ICFP.</p>
<ul>
<li><p>I am explicitly happy to talk to graduate students and postdocs about their research.</p></li>
<li><p>I know the most about dependent types, linear and modal types, type inference, separation logic, and program verification.</p></li>
<li><p>I know less about (but still like talking about) compiler stuff, theorem proving, SMT, Datalog, macros, and parsing.</p></li>
<li><p>This list is not an exhaustive list of things I'm interested in. One of the nice things about conversations is that you can use shared touchstones to discover how to like new things.</p></li>
</ul>
<p>Once you get in touch, I'll put you on a list, and then once the conference talks are listed, we can organize a time to have a chat after we have all seen a talk we are interested in. (Having seen a talk means we will all have shared common knowledge fresh in our minds.) Assuming enough people are interested, I will aim for meetings of 3-5 people, including myself.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com3tag:blogger.com,1999:blog-8068466035875589791.post-11869787107565173312020-02-13T14:56:00.000-08:002020-02-13T14:56:04.438-08:00Thought Experiment: An Introductory Compilers Class<p>Recently, I read a blog post in which Ben Karel <a href="https://eschew.wordpress.com/2020/01/26/undergrad-compilers-from-the-hive-mind/">summarized the reaction to a request John Regehr made about how to teach compilers</a>, and as one might predict, the Internet was in agreement that the answer was "absolutely everything". Basically, everyone has a different perspective on what the most important thing is, and so the union of everyone's most important thing is everything.</p>
<p>In fact, from a certain point of view, I don't disagree. From parsing to typechecking to intermediate representations to dataflow analysis to register allocation to instruction selection to data representations to stack layouts to garbage collectors to ABIs to debuggers, basically everything in compilers is packed to the gills with absolutely gorgeous algorithms and proofs. The sheer density of wonderful stuff in compilers is just out of this world.</p>
<p>However, John specified an <em>introductory</em> course. Alas, this makes "everything" the wrong answer -- he is asking for a pedagody-first answer which fits coherently into a smallish number of lectures. This means that we have to start with the question of what we want the students to learn, and then build the curriculum around that.</p>
<h1 id="the-goal-of-teaching-compilers">The Goal of Teaching Compilers</h1>
<p>So, what should students have learned after taking a compilers class?</p>
<p>The obvious (but wrong) answer is "how to write a compiler", because in all likelihood they will forget most of the techniques for implementing compilers soon after the course ends. This is because most of them – unless they catch the compiler bug – will not write very many more compilers. So if we want them to come away from a compilers class with some lasting knowledge, we have to think in terms of deeper programming techniques which (a) arise naturally when writing compilers, but (b) apply in more contexts than just compilers.</p>
<p>There are two obvious candidates for such techniques:</p>
<ol style="list-style-type: decimal">
<li>Computing values by recursion over inductively-defined structures.</li>
<li>Computing values least fixed points of monotone functions by bottom-up iteration.</li>
</ol>
<p>Both of these are fundamental algorithmic techniques. However, I think that recursion over inductive structure is much more likely to be taught outside of a compilers course, especially considering that modern functional programming (i.e., datatypes and pattern matching) is now much more often taught elsewhere in the curriculum than it used to be.</p>
<p>This leaves fixed-point computations.</p>
<p>So let's choose the goal of an intro compilers class to be teaching fixed point computations many times over, in many different contexts, with the idea that the general technique of fixed point computation can be learned via generalizing from many specific examples. The fact that they are building a compiler is significant primarily because it means that the different examples can all be seen in a motivating context.</p>
<p>So, this drives some choices for the design of the course.</p>
<ol style="list-style-type: decimal">
<li><p>Our goal is to show how fixed point computations arise in many different contexts, and the stages of the compilation pipeline naturally supply a non-contrived collection of very different-looking contexts. Consequently, we should design the course in a full front-to-back style, covering all the phases of the compilation pipeline.</p>
<p>However, this style of course has the real weakness that it can result in shallow coverage of each topic. Mitigating this weakness drives the design of the rest of the course.</p></li>
<li><p>In particular, the compiler structure should be very opinionated.</p>
<p>The thing we <em>want</em> to do is to present a variety of options (LR versus LL parsing, graph colouring versus linear scan register allocation, CPS versus SSA for IRs, etc), and then present their strengths and weaknesses, and the engineering and design considerations which will lead you to favour one choice over the other.</p>
<p>But for this course, we won't do any of that. As an instructor, we will simply choose the algorithm, and in particular choose the one that most benefits from fixed point computations.</p>
<p>The existence of alternatives should be gestured at in lecture, but the student-facing explanation for why we are not exploring them is for their first compiler we are aiming for good choices but biased much more towards implementation simplicity than gcc or LLVM will. (This will be an easier pitch if there is an advanced compilers course, or if students have a final year project where they can write a fancy compiler.)</p></li>
<li><p>We will also need to bite the bullet and de-emphasize the aspects of compilation where fixed point computations are less relevant.</p>
<p>Therefore, we will not cover runtime systems and data structure layouts in any great detail. This substantially affects the design of the language to compile -- in particular, we should not choose a language that has closures or objects. Furthemore, we will just tell students what stack layout to use, and memory management wil be garbage collection via the Boehm gc.</p></li>
</ol>
<h1 id="coursecompiler-structure">Course/Compiler Structure</h1>
<p>For the rest of this note, I'll call the language to compile Introl, because I always liked the Algol naming convention. Introl is a first-order functional language, basically what you get if you removed nested functions, lambda expressions, references, and exceptions from OCaml. I've attached a sketch of this language at the end of the post.</p>
<p>This language has two features which are "hard" -- polymorphism and datatypes. The reason for the inclusion of polymorphism is that it makes formulating type inference as a fixed point problem interesting, and the reason datatypes exist is because (a) match statements offer interesting control flow, and (b) they really show off what sparse conditional constrant propagation can do.</p>
<p>So the course will then follow the top-down lexing to code generation approach that so many bemoan, but which (in the context of our goals) is actually totally appropriate.</p>
<h2 id="lexing-and-parsing">Lexing and Parsing</h2>
<p>For lexing, I would start with the usual regular expressions and NFAs thing, but then take a bit of a left turn. First, I would show them that state set sizes could explode, and then introduce them to <a href="https://www.sciencedirect.com/science/article/pii/S0890540197926882">Brüggeman-Klein and Derick Wood's <em>deterministic regular expressions</em></a> as a way of preventing this explosion.</p>
<p>The reason for this is that the conditions essentially check whether a regular expression is parseable by recursive descent without backtracking -- i.e., you calculate NULLABLE, FIRST and (a variant of) FOLLOW sets for the regular expression. This lets you explain what these sets mean in a context without recursion or fixed points, which makes it easy to transition to LL(1) grammars, which are fundamentally just deterministic regular languages plus recursion.</p>
<p>So then the calculation of these sets as a fixed point equation is very easy, and using the deterministic regular languages means that the explanation of what these sets mean can be decoupled from how to compute via a fixed point computation.</p>
<p>Naturally, this means that the grammar of Introl must be LL(1).</p>
<h2 id="typechecking">Typechecking</h2>
<p>Typechecking for this kind of language is pretty routine, but in this case it should be phrased as an abstract interpretation, in the style of Cousot's <a href="https://www.irif.fr/~mellies/mpri/mpri-ens/articles/cousot-types-as-abstract-interpretations.pdf"><em>Types as Abstract Interpretation</em></a>.</p>
<p>The interesting thing here is that polymorphism can be presented via what Cousot called "the Herbrand abstraction". The idea is that the abstract elements are monotypes with unification variables in them, with the partial order that <code>t₂ ≥ t₁</code> if there is a substitution <code>σ</code> such that <code>t₂ = σ(t₁)</code>, and the unification algorithm itself as an <em>attempt</em> to calculate the substitution witnessing the join of two terms. I say attempt, since unification can fail. So this is a kind of <em>partial</em> join operation -- in the case you cannot join the two terms, there must have been a type error!</p>
<p>In Introl as presented, top-level functions have a type annotation, and so it will work out that you end up not needing to do a serious fixed point computation to infer types. Indeed, even if you omitted annotations, the fact that unification is calculating a most general unifier means that the fixed point iteration for recursive functions terminates in one iteration. (This should not be too surprising since the Dama-Milner algorithm only needs one traversal of the syntax.)</p>
<p>This fact is worth working out, because a great deal of life in static analysis involves trying to find excuses to iterate less. Indeed, this is what motivates the move to SSA – which will be the very next phase of the compiler.</p>
<h2 id="conversion-to-ssacps">Conversion to SSA/CPS</h2>
<p>The choice of IR is always a fraught one in a compiler class. In this course, I would use a static single assignment (SSA) representation. SSA is a good choice because (a) it simplifies implementing all kinds of dataflow optimizations, (b) generating it also needs dataflow analyses, and (c) it is the IR of choice for serious compilers. (a) and (b) mean we get to do lots of fixed point computations, and (c) means it won't feel artificial to today's yoof.</p>
<p>However, I wouldn't go directly to SSA, since I find φ-functions very difficult to explain directly. IMO, it is worth pulling a trick out of <a href="https://www.cs.purdue.edu/homes/suresh/502-Fall2008/papers/kelsey-ssa-cps.pdf">Richard Kelsey's hat</a>, and exploiting the correspondence between continuation-passing style (CPS), and static single assignment form (SSA).</p>
<p>CPS often comes with all sorts of mysticism and higher-order functions, but in this case, it works out very simply. Basically, we can let-normalize the program, and then transform the surface language into <em>basic blocks with arguments</em> (or if you prefer, a bunch of functions tail-calling each other). This is the version of SSA that the MLton compiler used, and which (if memory serves) the Swift SIL interemediate representation uses as well.</p>
<p>Roughly speaking, each basic block in the program will end up have zero or more formal arguments, and jumps to that block have arguments which fill in the arguments. This ends up making it super easy to explain, because the program just goes into let-normal form, and all tail calls get compiled to jumps-with-arguments, and non-tail calls use a call IR instruction.</p>
<p>Now, if we do this maximally naively – which we <strong>absolutely</strong> want to do – we will get nearly pessimal SSA, since each basic block will take as arguments all the variables in its scope. The reason we do this is to make the need for SSA minimization obvious: we just want to shrink each block's parameter lists so it doesn't take parameters for variables which either don't vary or don't get used.</p>
<p>Doing this will take us to something close to "pruned SSA" form, which would normally be overkill for a first compiler, except that we want to use it to motivate the computation of the dominator tree. Because of the either-or nature of the shrinking, we can do this with two analyses, a liveness analysis and a constancy analysis.</p>
<p>Both of these are dataflow analyses, and we can show how to use the dominator tree to order the work to calculate the calculate the fixed point faster. This will justify doing the fixed point computation to calculate the dominance information we need.</p>
<p>There is a bit of redundancy here, which is deliberate – I think it's important to have done two analyses before calculating dominators, because doing one fixed point computation to speed up one fixed point computation may feel artificial. But doing one computation that speeds up two computations is easy to see the benefit of, especially if we phrase the fixed point computation as taking the transfer function plus the dominance information.</p>
<p>I would avoid using one of the fast dominance algorithms, because pedagogically it's easier to explain the calculation when it is close to the definition.</p>
<p>The other nice thing here is that typechecking got accelerated by a good choice of abstract domain, and flow analysis gets accelerated by a good choice of iteration order.</p>
<p>Once the students have some experience manipulating this representation, I would probably switch to the traditional SSA form. The main benefit of this is just teaching the ability to read the existing literature more.</p>
<h2 id="high-level-optimizations">High-level Optimizations</h2>
<p>One thing worth spelling out is that this language (like MLton's SSA) has a high-level SSA representation – switches, data constructors, and field selectors and things like that will all be part of the SSA IR. This makes it possible to do optimizations while thinking at the language level, rather than the machine level. And furthermore, since students have a decent SSA representation, we certainly should use it to do all the "easy" SSA optimizations, like copy propagation and loop-invariant code motion.</p>
<p>All of these are unconditional rewrites when in SSA form, so they are all easy to implement, and will get the students comfortable with manipulating the SSA representation. The next step is to implement dead code elimination, which is both a nice optimization, and also requires them to re-run liveness analysis. This will open the door to understanding that compilation passes may have to be done repeatedly.</p>
<p>Once the students have done that, they will be warmed up for the big "hero optimization" of the course, which will be <a href="http://www.cs.wustl.edu/~cytron/531Pages/f11/Resources/Papers/cprop.pdf">sparse conditional constant propagation</a>. Because we are working with a high-level IR, sparse conditional constant propagation ought to yield even more impressive results than usual, with lots of tuple creation/pattern matching and cases on constructors disappearing in a really satisfying way.</p>
<p>In particular, good examples ought to arise from erasing the <code>None</code> branches from code using the option monad for safe division <code>safediv : int → int → option int</code>, if there is a test that the divisors are nonzero dominating the divisions.</p>
<h2 id="lowering-and-register-allocation">Lowering and Register Allocation</h2>
<p>As I mentioned before, the big optimization was performed on a high-level SSA form: switch statements and data constructors and selectors are still present. Before we can generate machine code, they will have to be turned into lower-level operations. So we can define a "low-level" SSA, where the selections and so on are turned into memory operations, and then translate the high-level IR into the low-level IR.</p>
<p>This ought to be done pretty naively, with each high-level operation turning into its low-level instruction sequence in a pretty direct way. Since the course design was to do as many dataflow optimizations as we could, to bring out the generic structure, a good test the development is smooth enough is whether the flow analyses are sufficiently parameterized that we can change the IR and lattices and still get decent code reuse. Then we can lean on these analyses to clean up the low-level instruction sequences.</p>
<p>If more abstract code is too hard to understand, I would skip most of the low-level optimizations. The only must-do analysis at the low-level representation is to re-reun liveness analysis, so that we can do register allocation using <a href="https://publikationen.bibliothek.kit.edu/1000007166/6532">the SSA-form register allocation algorithms of Hack</a>. This (a) lets us avoid graph colouring, (b) while still getting good results, and (c) depends very obviously on the results of a dataflow analysis. It also makes register coalescing (normally an advanced topic) surprisingly easy.</p>
<p>Code generation is then pretty easy – once we've done register allocation, we can map the low-level IR operations to loads, stores, and jumps in the dirt-obvious way; calls and returns doing the obvious stack manipulations; and foreign calls to the Boehm gc to allocate objects. This is not optimal, but most of the techniques I know of in this space don't use fixed points much, and so are off-piste relative to the course aim.</p>
<h1 id="perspectives">Perspectives</h1>
<p>The surprising (to me, anyway) thing is that the choice of focusing on fixed point computations plus the choice to go SSA-only, lays out a path to a surprisingly credible compiler. Before writing this note I would have thought this was definitely too ambitious a compiler for a first compiler! But now, I think it only may be too ambitious.</p>
<p>Obviously, we benefit a lot from working with a first-order functional langage: in many ways, is just an alternate notation for SSA. However, it <em>looks</em> different from SSA, and it <em>looks</em> like a high-level language, which means the students ought to feel like they have accomplished something.</p>
<p>But before I would be willing to try teaching it this way, I would want to program up a compiler or two in this style. This would either confirm that the compiler is tractable, or would show that the compiler is just too big to be teachable. If it is too big, I'd probably change the surface language to only support booleans and integers as types. Since these values all fit inside a word, we could drop the lowering pass altogether.</p>
<p>Another thing I like is that we are able to do a plain old dataflow analysis to parse, and then a dataflow analysis with a clever representation of facts when typechecking, and then dataflow analyses with a clever iteration scheme when doing optimization/codegen.</p>
<p>However, if this ends up being too cute and confusion-prone, another alternative would be to drop type inference by moving to a simply typed language, and then drop SSA in favour of the traditional Kildall/Allen dataflow approach. You could still teach the use of acceleration structures by computing def-use chains and using them for (e.g.) liveness. This would bring out the parallels between parsing and flow analysis.</p>
<p>Relative to an imaginary advanced compilers course, we're missing high-level optimizations like inlining, partial redundancy elimination, and fancy loop optimizations, as well as low-level things like instruction scheduling. We're also missing an angle for tackling control-flow analysis for higher-order language featuring objects or closures.</p>
<p>But since we were able to go SSA straight away, the students will be in a good position to learn this on their own.</p>
<h1 id="introl-sketch">Introl sketch</h1>
<h2 id="type-structure">Type structure</h2>
<p>The type language has</p>
<ul>
<li>Integers <code>int</code></li>
<li>Unit type <code>unit</code></li>
<li>Tuples <code>T_1 * ... * T_n</code></li>
<li><p>Sum types via datatype declarations:</p>
<pre><code>datatype list a = Nil | Cons of a * list a </code></pre></li>
</ul>
<h2 id="program-structure">Program structure</h2>
<h3 id="top-level-definitions">Top level definitions</h3>
<p>All function declarations are top level. Nested functions and anonymous lambda-expressions are not allowed. Function declarations have a type scheme, which may be polymorphic:</p>
<pre><code> val foo : forall a_1, ..., a_n. (T_1, ..., T_n) -> T
def foo(x_1, ..., x_n) = e </code></pre>
<h3 id="expressions">Expressions</h3>
<p>Expressions are basically the usual things:</p>
<ul>
<li>variables <code>x</code></li>
<li>constants <code>3</code></li>
<li>arithmetic <code>e_1 + e_2</code></li>
<li>unit <code>()</code></li>
<li>tuples <code>(e_1, ..., e_n)</code></li>
<li>data constructors <code>Foo(e)</code></li>
<li>variable bindings <code>let p = e_1; e_2</code></li>
<li>direct function calls <code>f(e_1, ..., e_n)</code>, where <code>f</code> is a top-level function name</li>
<li>match statements <code>match e { p_1 -> e_1 | ... p_n -> e_n }</code></li>
</ul>
<p>Note the absence of control structures like <code>while</code> or <code>for</code>; we will use tail-recursion for that. Note also that there are no explicit type arguments in calls to polymorphic functions.</p>
<h3 id="patterns">Patterns</h3>
<p>Patterns exist, but nested patterns do not, in order to avoid having to deal with pattern compilation. They are:</p>
<ul>
<li>variables <code>x</code></li>
<li>constructor patterns <code>Foo(x_1, ..., x_n)</code></li>
<li>tuple patterns <code>(x_1, ..., x_n)</code></li>
</ul>
<h3 id="example">Example</h3>
<p>Here's a function that reverses a list:</p>
<pre><code>val reverse_acc : forall a. (list a, list a) -> list a
def reverse_acc(xs, acc) =
match xs {
| Nil -> acc
| Cons(x, xs) -> reverse_acc(xs, Cons(x, acc))
}
val reverse : forall a. list a -> list a
def reverse(xs) = reverse_acc(xs, Nil)</code></pre>
<p>Here's a function to zip two lists together:</p>
<pre><code>val zip : forall a, b. (list a, list b) -> list (a * b)
def zip(as, bs) =
match as {
| Nil -> Nil
| Cons(a, as') ->
match bs {
| Nil -> Nil
| Cons(b, bs') -> Cons((a,b), zip(as',bs'))
}
}</code></pre>
<p>Note the lack of nested patterns vis-a-vis ML.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com5tag:blogger.com,1999:blog-8068466035875589791.post-53437500462566506842019-10-30T03:07:00.000-07:002019-10-30T03:12:26.080-07:00Every Finite Automaton has a Corresponding Regular Expression <style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; background-color: #f8f8f8; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
pre, code { background-color: #f8f8f8; }
code > span.kw { color: #204a87; font-weight: bold; } /* Keyword */
code > span.dt { color: #204a87; } /* DataType */
code > span.dv { color: #0000cf; } /* DecVal */
code > span.bn { color: #0000cf; } /* BaseN */
code > span.fl { color: #0000cf; } /* Float */
code > span.ch { color: #4e9a06; } /* Char */
code > span.st { color: #4e9a06; } /* String */
code > span.co { color: #8f5902; font-style: italic; } /* Comment */
code > span.ot { color: #8f5902; } /* Other */
code > span.al { color: #ef2929; } /* Alert */
code > span.fu { color: #000000; } /* Function */
code > span.er { color: #a40000; font-weight: bold; } /* Error */
code > span.wa { color: #8f5902; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #000000; } /* Constant */
code > span.sc { color: #000000; } /* SpecialChar */
code > span.vs { color: #4e9a06; } /* VerbatimString */
code > span.ss { color: #4e9a06; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #000000; } /* Variable */
code > span.cf { color: #204a87; font-weight: bold; } /* ControlFlow */
code > span.op { color: #ce5c00; font-weight: bold; } /* Operator */
code > span.pp { color: #8f5902; font-style: italic; } /* Preprocessor */
code > span.ex { } /* Extension */
code > span.at { color: #c4a000; } /* Attribute */
code > span.do { color: #8f5902; font-weight: bold; font-style: italic; } /* Documentation */
code > span.an { color: #8f5902; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #8f5902; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #8f5902; font-weight: bold; font-style: italic; } /* Information */
</style>
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
<!--
To compile this file, use:
pandoc -s --highlight-style tango --mathjax --filter graphviz.py kleene.md -o kleene.html
-->
<p>Today I'd like to blog about the nicest proof of Kleene's theorem that regular expressions and finite automata are equivalent that I know. One direction of this proof -- that every regular expression has a corresponding finite automaton -- is very well-known to programmers, since <a href="https://en.wikipedia.org/wiki/Thompson%27s_construction">Thompson's construction</a>, which converts regular expressions to nondeterministic automta, and <a href="https://en.wikipedia.org/wiki/Powerset_construction">the subset construction</a>, which turns nondeterministic automta into deterministic automta, are the basis of every lexer and regexp engine there is.</p>
<p>However, the proof of the other direction -- that every automaton has a corresponding regexp -- is often stated, but seldom proved. This is a shame, since it has an elegant formulation in terms of linear algebra(!).</p>
<h3 id="the-construction">The Construction</h3>
<p>FIrst, let's assume we have a regular language <span class="math inline">\((R, 1, \cdot, 0, +, \ast)\)</span>. Now, define the category <span class="math inline">\(C\)</span> as follows:</p>
<ul>
<li>The objects of <span class="math inline">\(C\)</span> are the natural numbers.</li>
<li>A morphism <span class="math inline">\(f : n \to m\)</span> is an <span class="math inline">\(n \times m\)</span> matrix of elements of <span class="math inline">\(R\)</span>. We will write <span class="math inline">\(f_{(i,j)}\)</span> to get the element of <span class="math inline">\(R\)</span> in the <span class="math inline">\(i\)</span>-th row and <span class="math inline">\(j\)</span>-th column.</li>
<li>The identity morphism <span class="math inline">\(id : n \to n\)</span> is the <span class="math inline">\(n \times n\)</span> matrix which is <span class="math inline">\(1\)</span> on the diagonal and <span class="math inline">\(0\)</span> off the diagonal</li>
<li>The composition <span class="math inline">\(f;g\)</span> of <span class="math inline">\(f : n \to m\)</span> and <span class="math inline">\(g : m \to o\)</span> is given by <em>matrix multiplication</em>:</li>
</ul>
<p><span class="math display">\[
(f;g)_{i, j} = \sum_{k \in 0\ldots m} f_{(i,k)} \cdot g_{(k,j)}
\]</span></p>
<p>This may seem like a rather peculiar thing to do -- matrices filled with regular expressions? However, to understand why this is potentially a sensible idea, let's look at a generic automaton of dimension 2.</p>
<p style="text-align:center;"><svg width="159pt" height="79pt"
viewBox="0.00 0.00 159.00 79.13" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 75.1253)">
<title>G</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-75.1253 155,-75.1253 155,4 -4,4"/>
<!-- 0 -->
<g id="node1" class="node">
<title>0</title>
<ellipse fill="none" stroke="#000000" cx="27" cy="-20.1253" rx="27" ry="18"/>
<text text-anchor="middle" x="27" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">0</text>
</g>
<!-- 0->0 -->
<g id="edge1" class="edge">
<title>0->0</title>
<path fill="none" stroke="#000000" d="M18.4307,-37.5352C16.8311,-47.2132 19.6875,-56.1253 27,-56.1253 31.5703,-56.1253 34.4,-52.644 35.489,-47.6702"/>
<polygon fill="#000000" stroke="#000000" points="35.5693,-37.5352 39.9898,-47.5706 35.5297,-42.535 35.49,-47.5349 35.49,-47.5349 35.49,-47.5349 35.5297,-42.535 30.9901,-47.4991 35.5693,-37.5352 35.5693,-37.5352"/>
<text text-anchor="middle" x="27" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
</g>
<!-- 1 -->
<g id="node2" class="node">
<title>1</title>
<ellipse fill="none" stroke="#000000" cx="124" cy="-20.1253" rx="27" ry="18"/>
<text text-anchor="middle" x="124" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">1</text>
</g>
<!-- 0->1 -->
<g id="edge2" class="edge">
<title>0->1</title>
<path fill="none" stroke="#000000" d="M54.0199,-20.1253C64.1304,-20.1253 75.8016,-20.1253 86.6794,-20.1253"/>
<polygon fill="#000000" stroke="#000000" points="96.7886,-20.1253 86.7886,-24.6254 91.7886,-20.1253 86.7886,-20.1254 86.7886,-20.1254 86.7886,-20.1254 91.7886,-20.1253 86.7885,-15.6254 96.7886,-20.1253 96.7886,-20.1253"/>
<text text-anchor="middle" x="75.5" y="-23.9253" font-family="Times,serif" font-size="14.00" fill="#000000">b</text>
</g>
<!-- 1->0 -->
<g id="edge3" class="edge">
<title>1->0</title>
<path fill="none" stroke="#000000" d="M104.6499,-7.5523C95.0326,-2.7357 83.1161,1.1253 72,-1.1253 67.2623,-2.0844 62.4035,-3.5584 57.7123,-5.2704"/>
<polygon fill="#000000" stroke="#000000" points="48.4111,-9.0301 55.9959,-1.1104 53.0467,-7.1563 57.6823,-5.2825 57.6823,-5.2825 57.6823,-5.2825 53.0467,-7.1563 59.3687,-9.4546 48.4111,-9.0301 48.4111,-9.0301"/>
<text text-anchor="middle" x="75.5" y="-4.9253" font-family="Times,serif" font-size="14.00" fill="#000000">c</text>
</g>
<!-- 1->1 -->
<g id="edge4" class="edge">
<title>1->1</title>
<path fill="none" stroke="#000000" d="M115.4307,-37.5352C113.8311,-47.2132 116.6875,-56.1253 124,-56.1253 128.5703,-56.1253 131.4,-52.644 132.489,-47.6702"/>
<polygon fill="#000000" stroke="#000000" points="132.5693,-37.5352 136.9898,-47.5706 132.5297,-42.535 132.49,-47.5349 132.49,-47.5349 132.49,-47.5349 132.5297,-42.535 127.9901,-47.4991 132.5693,-37.5352 132.5693,-37.5352"/>
<text text-anchor="middle" x="124" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">d</text>
</g>
</g>
</svg></p>
<p>Given such a picture, we can give the transition matrix for this as follows:</p>
<p><span class="math display">\[
A =
\begin{array}{l|cc}
& 0 & 1 \\\hline
0 & a & b \\
1 & c & d \\
\end{array}
\]</span></p>
<p>So the <span class="math inline">\((i,j)\)</span>-th entry tells us the label of the edge from node <span class="math inline">\(i\)</span> to node <span class="math inline">\(j\)</span>. In our category, this is an endomap <span class="math inline">\(A : 2 \to 2\)</span>. What happens if we compose <span class="math inline">\(A\)</span> with itself? Since composition is just matrix multiplication, we get:</p>
<p><span class="math display">\[
\left[\begin{array}{cc} a & b \\ c & d \end{array}\right]
\cdot
\left[\begin{array}{cc} a & b \\ c & d \end{array}\right]
=
\left[\begin{array}{cc} aa + bc & ab + bd \\ ca + dc & cb + dd \end{array}\right]
\]</span></p>
<p>Notice that the <span class="math inline">\((i,j)\)</span>-th entry is <em>a regular expression describing the paths of length two</em>! So composition of arrows corresponds to path composition. The identity matrix are the paths of length 0, the transition matrix gives the paths of length 1, and matrix multiplication gives path concatentation. So it's easy to get a regexp for any bounded length path. Indeed, it is easy to see that the endomaps of <span class="math inline">\(n\)</span> form a semiring.</p>
<p>So the endomaps of <span class="math inline">\(n\)</span> are <span class="math inline">\(n \times n\)</span> matrixes, and we can turn them into a semiring <span class="math inline">\((S, 1_S, \cdot_S, 0_S, +_S)\)</span> as follows:</p>
<ul>
<li>The carrier <span class="math inline">\(S\)</span> is <span class="math inline">\(\mathrm{Hom}(n,n)\)</span>, the set of <span class="math inline">\(n \times n\)</span> matrices over <span class="math inline">\(R\)</span>.</li>
<li>The unit <span class="math inline">\(1_S\)</span> is the identity map <span class="math inline">\(\mathrm{id} : n \to n\)</span></li>
<li>The multiplication <span class="math inline">\(\cdot_S\)</span> is composition of morphisms (i.e., matrix multiplication). Since these are endomorphisms (ie, square matrices), the result of composition (i.e., multiplication of square matrices) is another endomorphism (ie, square matrix of the same dimension).</li>
<li>The zero <span class="math inline">\(0_S\)</span> is the matrix with all 0 entries.</li>
<li>The addition <span class="math inline">\(f + g\)</span> is the pointwise addition of matrices -- i.e., <span class="math inline">\((f + g)_{i, j} = f_{i,j} + g_{i,j}\)</span></li>
</ul>
<p>Now we can ask if we can equip endomaps with a Kleene star operation as well? Recall that the defining equation for the Kleene star is</p>
<p><span class="math display">\[
s\ast = 1 + s\cdot s\ast
\]</span></p>
<p>which says that we are looking for a <em>transitive closure</em> operation. That is, <span class="math inline">\(s\)</span> consists of 0 length paths, or anything in <span class="math inline">\(s\)</span> plus anything in <span class="math inline">\(s\ast\)</span>. So we want a definition of the Kleene star which satisfies this equation. Let's consider the case of the size 2 automaton again:</p>
<p style="text-align:center;"><svg width="159pt" height="79pt"
viewBox="0.00 0.00 159.00 79.13" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 75.1253)">
<title>G</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-75.1253 155,-75.1253 155,4 -4,4"/>
<!-- 0 -->
<g id="node1" class="node">
<title>0</title>
<ellipse fill="none" stroke="#000000" cx="27" cy="-20.1253" rx="27" ry="18"/>
<text text-anchor="middle" x="27" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">0</text>
</g>
<!-- 0->0 -->
<g id="edge1" class="edge">
<title>0->0</title>
<path fill="none" stroke="#000000" d="M18.4307,-37.5352C16.8311,-47.2132 19.6875,-56.1253 27,-56.1253 31.5703,-56.1253 34.4,-52.644 35.489,-47.6702"/>
<polygon fill="#000000" stroke="#000000" points="35.5693,-37.5352 39.9898,-47.5706 35.5297,-42.535 35.49,-47.5349 35.49,-47.5349 35.49,-47.5349 35.5297,-42.535 30.9901,-47.4991 35.5693,-37.5352 35.5693,-37.5352"/>
<text text-anchor="middle" x="27" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">a</text>
</g>
<!-- 1 -->
<g id="node2" class="node">
<title>1</title>
<ellipse fill="none" stroke="#000000" cx="124" cy="-20.1253" rx="27" ry="18"/>
<text text-anchor="middle" x="124" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">1</text>
</g>
<!-- 0->1 -->
<g id="edge2" class="edge">
<title>0->1</title>
<path fill="none" stroke="#000000" d="M54.0199,-20.1253C64.1304,-20.1253 75.8016,-20.1253 86.6794,-20.1253"/>
<polygon fill="#000000" stroke="#000000" points="96.7886,-20.1253 86.7886,-24.6254 91.7886,-20.1253 86.7886,-20.1254 86.7886,-20.1254 86.7886,-20.1254 91.7886,-20.1253 86.7885,-15.6254 96.7886,-20.1253 96.7886,-20.1253"/>
<text text-anchor="middle" x="75.5" y="-23.9253" font-family="Times,serif" font-size="14.00" fill="#000000">b</text>
</g>
<!-- 1->0 -->
<g id="edge3" class="edge">
<title>1->0</title>
<path fill="none" stroke="#000000" d="M104.6499,-7.5523C95.0326,-2.7357 83.1161,1.1253 72,-1.1253 67.2623,-2.0844 62.4035,-3.5584 57.7123,-5.2704"/>
<polygon fill="#000000" stroke="#000000" points="48.4111,-9.0301 55.9959,-1.1104 53.0467,-7.1563 57.6823,-5.2825 57.6823,-5.2825 57.6823,-5.2825 53.0467,-7.1563 59.3687,-9.4546 48.4111,-9.0301 48.4111,-9.0301"/>
<text text-anchor="middle" x="75.5" y="-4.9253" font-family="Times,serif" font-size="14.00" fill="#000000">c</text>
</g>
<!-- 1->1 -->
<g id="edge4" class="edge">
<title>1->1</title>
<path fill="none" stroke="#000000" d="M115.4307,-37.5352C113.8311,-47.2132 116.6875,-56.1253 124,-56.1253 128.5703,-56.1253 131.4,-52.644 132.489,-47.6702"/>
<polygon fill="#000000" stroke="#000000" points="132.5693,-37.5352 136.9898,-47.5706 132.5297,-42.535 132.49,-47.5349 132.49,-47.5349 132.49,-47.5349 132.5297,-42.535 127.9901,-47.4991 132.5693,-37.5352 132.5693,-37.5352"/>
<text text-anchor="middle" x="124" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">d</text>
</g>
</g>
</svg></p>
<p>Now, let's think about the <em>loop-free</em> paths between 0 and 0. There are two classes of them:</p>
<ol style="list-style-type: decimal">
<li>The single step <span class="math inline">\(a\)</span>.</li>
<li>We can take a step from 0 to 1 via <span class="math inline">\(b\)</span>, take some number of steps from 1 to itself via <span class="math inline">\(d\)</span>, and then back to 0 via <span class="math inline">\(c\)</span>.</li>
</ol>
<p>So we can describe the loop-free paths from 0 to 0 via the regular expression</p>
<p><span class="math display">\[
F \triangleq a + b(d\ast)c
\]</span></p>
<p>Similarly, the loop-free paths from 1 to 1 are either:</p>
<ol style="list-style-type: decimal">
<li>The single step <span class="math inline">\(d\)</span>.</li>
<li>We can take a step from 1 to 0 via <span class="math inline">\(c\)</span>, take some number of steps from 1 to itself via <span class="math inline">\(a\)</span>, and then back to 1 via <span class="math inline">\(b\)</span>.</li>
</ol>
<p>So these paths are described by <span class="math display">\[
G \triangleq d + c(a\ast)b
\]</span></p>
<p>So the set of all paths from 0 to 0 is <span class="math inline">\(F\ast\)</span>, and the set of all paths from <span class="math inline">\(1\)</span> to <span class="math inline">\(1\)</span> is <span class="math inline">\(G\ast\)</span>. This lets us write down the</p>
<p><span class="math display">\[
A\ast =
\begin{array}{l|cc}
& 0 & 1 \\\hline
0 & F\ast & {F\ast} \cdot b \cdot {G\ast} \\
1 & {G\ast} \cdot c \cdot {F\ast} & G\ast \\
\end{array}
\]</span></p>
<p>Proving that this is a correct definition requires a bit of work, so I will skip over it. That's because I want to get to the cool bit, which is that this construction is enough to define the Kleene star for <em>any</em> dimension <span class="math inline">\(n\)</span>! We'll prove this by well-founded induction. First, suppose we have an <span class="math inline">\(n \times n\)</span> transition matrix <span class="math inline">\(T\)</span> for <span class="math inline">\(n > 2\)</span>. Then, we can divide the matrix into blocks, with <span class="math inline">\(A\)</span> and <span class="math inline">\(D\)</span> as square submatrices:</p>
<p><span class="math display">\[
T = \left[ \begin{array}{c|c}
A & B \\\hline
C & D
\end{array}
\right]
\]</span></p>
<p>and then define <span class="math inline">\(T\ast\)</span> as:</p>
<p><span class="math display">\[
T\ast =
\left[
\begin{array}{c|c}
F\ast & {F\ast} \cdot B \cdot {G\ast} \\ \hline
{G\ast} \cdot C \cdot {F\ast} & G\ast \\
\end{array}
\right]
\]</span></p>
<p>where</p>
<p><span class="math display">\[
\begin{array}{lcl}
F & \triangleq & A + B(D\ast)C \\
G & \triangleq & D + C(A\ast)B \\
\end{array}
\]</span></p>
<p>By induction, we know that square submatrices of dimension smaller than <span class="math inline">\(n\)</span> is a Kleene algebra, so we know that <span class="math inline">\(A\)</span> and <span class="math inline">\(D\)</span> have Kleene algebra structure. The reason for defining the categorical structure also becomes apparent here -- even though <span class="math inline">\(A\)</span> and <span class="math inline">\(D\)</span> are square, <span class="math inline">\(B\)</span> and <span class="math inline">\(C\)</span> may not be, and we will need to make use of the fact we defined composition on non-square matrices for them.</p>
<h3 id="references-and-miscellany">References and Miscellany</h3>
<p>I have, as is traditional for bloggers, failed to prove any of my assertions. Luckily other people have done the work! Two papers that I particularly like are:</p>
<ul>
<li><p>Dexter Kozen's paper <a href="https://www.cs.cornell.edu/~kozen/Papers/ka.pdf"><em>A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events</em></a></p></li>
<li><p>Stephen Dolan's ICFP 2013 paper, <a href="http://stedolan.net/research/semirings.pdf"><em>Fun with semirings: a functional pearl on the abuse of linear algebra</em></a>.</p></li>
</ul>
<p>One fact which didn't fit into the narrative above, but which I think is extremely neat, is about the linear-algebraic view of the Kleene star. To see what it is, recall the definition of the <em>matrix exponential</em> for a matrix <span class="math inline">\(A\)</span>:</p>
<p><span class="math display">\[
e^A \triangleq \sum_{k = 0}^\infty \frac{A^k}{k!} = I + A + \frac{A^2}{2!} + \frac{A^3}{3!} + \ldots
\]</span></p>
<p>Now, note that</p>
<ol style="list-style-type: decimal">
<li><span class="math inline">\(k!\)</span> is an integer</li>
<li>any integer can be expressed as a formal sum <span class="math inline">\(n = \overbrace{1 + \ldots + 1}^{n \,\mbox{times}}\)</span>,</li>
<li>and in regular languages, we have <span class="math inline">\(r + r = r\)</span>.</li>
</ol>
<p>Hence <span class="math inline">\(k! = 1\)</span>, and since <span class="math inline">\(A/1 = A\)</span>, we have</p>
<p><span class="math display">\[
e^A \triangleq \sum_{k = 0}^\infty A^k = I + A + A^2 + A^3 + \ldots
\]</span></p>
<p>But this is precisely the definition of <span class="math inline">\(A\ast\)</span>! In other words, for square matrices over Kleene algebras, Kleene iteration is precisely the same thing as the matrix exponential.</p>
<h3 id="the-implementation">The Implementation</h3>
<p>The nice thing about this construction is that you can turn it into running code more or less directly, which leads to an implementation of the DFA-to-regexp algorithm in less than 100 lines of code. I'll give an implementation in OCaml.</p>
<p>First, let's give a signature for Kleene algebras. All the constructions we talked about are parameterized in the Kleene algebra, so our implementation will be too.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> <span class="kw">type</span> KLEENE = <span class="kw">sig</span>
<span class="kw">type</span> t
<span class="kw">val</span> zero : t
<span class="kw">val</span> ( ++ ) : t -> t -> t
<span class="kw">val</span> one : t
<span class="kw">val</span> ( * ) : t -> t -> t
<span class="kw">val</span> star : t -> t
<span class="kw">end</span></code></pre></div>
<p>You can see here that we have <code>zero</code> and <code>one</code> as the unit for the addition <code>++</code> and multiplication <code>*</code> respectively, and we also have a <code>star</code> operation representing the Kleene star.</p>
<p>Similarly, we can give a signature for matrices over an element type.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> <span class="kw">type</span> MATRIX = <span class="kw">sig</span>
<span class="kw">type</span> elt
<span class="kw">type</span> t
<span class="kw">val</span> dom : t -> <span class="dt">int</span>
<span class="kw">val</span> cod : t -> <span class="dt">int</span>
<span class="kw">val</span> get : t -> <span class="dt">int</span> -> <span class="dt">int</span> -> elt
<span class="kw">val</span> id : <span class="dt">int</span> -> t
<span class="kw">val</span> compose : t -> t -> t
<span class="kw">val</span> zero : <span class="dt">int</span> -> <span class="dt">int</span> -> t
<span class="kw">val</span> (++) : t -> t -> t
<span class="kw">val</span> make : <span class="dt">int</span> -> <span class="dt">int</span> -> (<span class="dt">int</span> -> <span class="dt">int</span> -> elt) -> t
<span class="kw">end</span></code></pre></div>
<p>So this signature requires an element type <code>elt</code>, a matrix type <code>t</code>, and a collection of operations. We can the number of rows in the matrix with <code>dom</code> and the columns with <code>cod</code>, with the names coming from the thought of dimensions and matrices forming a category. Similarly this gives us a notion of identity with <code>id</code>, which takes an integer and returns the identity matrix of that size. Matrix multiplication is just composition, which leads to the name <code>compose</code>.</p>
<p>We also want the ability to create zero matrices with the <code>zero</code> function, and the ability to do pointwise addition of matrices with the <code>(++)</code> operation. Neither of these operations need square matrices, but to add matrices pointwise requires compatible dimensions.</p>
<p>Finally we want a way to construct a matrix with <code>make</code>, which is given the dimensions and a function to produce the elements.</p>
<p>Now, since the matrix construction was parameterized by a Kleene algebra, we can write a functor <code>Matrix</code> in Ocaml which takes a module of type <code>KLEENE</code> to produce an implementation of the <code>MATRIX</code> signature with elements of the Kleene algebra as the element type:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Matrix(R : KLEENE) : MATRIX <span class="kw">with</span> <span class="kw">type</span> elt = R.t =
<span class="kw">struct</span>
<span class="kw">type</span> elt = R.t
<span class="kw">type</span> t = {
row : <span class="dt">int</span>;
col: <span class="dt">int</span>;
data : <span class="dt">int</span> -> <span class="dt">int</span> -> R.t
}</code></pre></div>
<p>We take the element type to be elements of the Kleene algebra, and take a matrix to be a record with the number of rows, the number of columns, and a function to produce the entries. This is not a high-performance choice, but it works fine for illustration purposes.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> make row col data = {row; col; data}
<span class="kw">let</span> dom m = m.row
<span class="kw">let</span> cod m = m.col
<span class="kw">let</span> get m i j =
<span class="kw">assert</span> (<span class="dv">0</span> <= i && i < m.row);
<span class="kw">assert</span> (<span class="dv">0</span> <= j && j < m.col);
m.data i j </code></pre></div>
<p>To <code>make</code> a matrix, we just use the data from the constructor to build the appropriate record, and to get the <code>dom</code>ain and <code>cod</code>omain we just select the row or column fields. To get an element we just call the underlying function, with an assertion check to ensure the arguments are in bounds.</p>
<p>This is a point where a more dependently-typed language would be nice: if the dimensions of the matrix were tracked in the type then we could omit the assertion checks. (Basically, the element type would become a type constructor <code>t : nat -> nat -> type</code>, which would be a type constructor corresponding to the hom-set,)</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> id n = make n n (<span class="kw">fun</span> i j -> <span class="kw">if</span> i = j <span class="kw">then</span> R.one <span class="kw">else</span> R.zero)
<span class="kw">let</span> compose f g =
<span class="kw">assert</span> (cod f = dom g);
<span class="kw">let</span> data i j =
<span class="kw">let</span> <span class="kw">rec</span> loop k acc =
<span class="kw">if</span> k < cod f <span class="kw">then</span>
loop (k+<span class="dv">1</span>) R.(acc ++ (get f i k * get g k j))
<span class="kw">else</span>
acc
<span class="kw">in</span>
loop <span class="dv">0</span> R.zero
<span class="kw">in</span>
make (dom f) (cod g) data</code></pre></div>
<p>The <code>id</code>entity function takes an integer, and returns a square matrix that is <code>R.one</code> on the diagonal and <code>R.zero</code> off-diagonal. If you want to <code>compose</code> two matrices, the routine checks that the dimension match, and then implements a very naive row-vector times column-vector calculation for each indexing. What makes this naive is that it will get recalculated each time!</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> zero row col = make row col (<span class="kw">fun</span> i j -> R.zero)
<span class="kw">let</span> (++) m1 m2 =
<span class="kw">assert</span> (dom m1 = dom m2);
<span class="kw">assert</span> (cod m1 = cod m2);
<span class="kw">let</span> data i j = R.(++) (get m1 i j) (get m2 i j) <span class="kw">in</span>
make (dom m1) (cod m1) data
<span class="kw">end</span> </code></pre></div>
<p>Finally we can implement the <code>zero</code> matrix operation, and matrix addition, both of which work the expected way.</p>
<p>We will use all these operations to show that square matrices form a Kleene algebra. To show this, we will define a functor <code>Square</code> which is parameterized by</p>
<ol style="list-style-type: decimal">
<li>A Kleene algebra <code>R</code>.</li>
<li>An implementation of matrices with element type <code>R.t</code></li>
<li>A size (i.e., an integer)</li>
</ol>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Square(R : KLEENE)
(M : MATRIX <span class="kw">with</span> <span class="kw">type</span> elt = R.t)
(Size : <span class="kw">sig</span> <span class="kw">val</span> size : <span class="dt">int</span> <span class="kw">end</span>) : KLEENE <span class="kw">with</span> <span class="kw">type</span> t = M.t =
<span class="kw">struct</span>
<span class="kw">type</span> t = M.t </code></pre></div>
<p>The implementation type is the matrix type <code>M.t</code>, and the semiring operations are basically inherited from the module <code>M</code>:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> (++) = M.(++)
<span class="kw">let</span> ( * ) = M.compose
<span class="kw">let</span> one = M.id Size.size
<span class="kw">let</span> zero = M.zero Size.size Size.size</code></pre></div>
<p>The only changes are that <code>one</code> and <code>zero</code> are restricted to square matrices of size <code>Size.size</code>, and so we don't need to pass them dimension arguments.</p>
<p>All of the real work is handled in the implementation of the Kleene star. We can follow the construction very directly. First, we can define a <code>split</code> operation which splits a square matrix into four smaller pieces:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> split i m =
<span class="kw">let</span> <span class="kw">open</span> M <span class="kw">in</span>
<span class="kw">assert</span> (i >= <span class="dv">0</span> && i < dom m);
<span class="kw">let</span> k = dom m - i <span class="kw">in</span>
<span class="kw">let</span> shift m x y = <span class="kw">fun</span> i j -> get m (i + x) (j + y) <span class="kw">in</span>
<span class="kw">let</span> a = make i i (shift m <span class="dv">0</span> <span class="dv">0</span>) <span class="kw">in</span>
<span class="kw">let</span> b = make i k (shift m <span class="dv">0</span> i) <span class="kw">in</span>
<span class="kw">let</span> c = make k i (shift m i <span class="dv">0</span>) <span class="kw">in</span>
<span class="kw">let</span> d = make k k (shift m i i) <span class="kw">in</span>
(a, b, c, d)</code></pre></div>
<p>This divides a matrix into four, by finding the split point <code>i</code>, and then making a new matrix that uses the old matrix's <code>data</code> with a suitable offset. So <code>b</code>'s x-th row and y-th column will be the x-th row and (y+i)-th column of the original, for exmample.</p>
<p>Symmetrically, we need a <code>merge</code> operation.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> merge (a, b, c, d) =
<span class="kw">assert</span> (M.dom a = M.dom b && M.cod a = M.cod c);
<span class="kw">assert</span> (M.dom d = M.dom c && M.cod d = M.cod b);
<span class="kw">let</span> get i j =
<span class="kw">match</span> i < M.dom a, j < M.cod a <span class="kw">with</span>
| <span class="kw">true</span>, <span class="kw">true</span> -> M.get a i j
| <span class="kw">true</span>, <span class="kw">false</span> -> M.get b i (j - M.cod a)
| <span class="kw">false</span>, <span class="kw">true</span> -> M.get c (i - M.dom a) j
| <span class="kw">false</span>, <span class="kw">false</span> -> M.get d (i - M.dom a) (j - M.cod a)
<span class="kw">in</span>
M.make (M.dom a + M.dom d) (M.cod a + M.cod d) get </code></pre></div>
<p>To <code>merge</code> four matrices, their dimensions need to match up properly, and then the lookup operation <code>get</code> needs to use the indices to decide which quadrant to look up the answer in.</p>
<p>This can be used to write the <code>star</code> function:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="kw">rec</span> star m =
<span class="kw">match</span> M.dom m <span class="kw">with</span>
| <span class="dv">0</span> -> m
| <span class="dv">1</span> -> M.make <span class="dv">1</span> <span class="dv">1</span> (<span class="kw">fun</span> _ _ -> R.star (M.get m <span class="dv">0</span> <span class="dv">0</span>))
| n -> <span class="kw">let</span> (a, b, c, d) = split (n / <span class="dv">2</span>) m <span class="kw">in</span>
<span class="kw">let</span> fstar = star (a ++ b * star d * c) <span class="kw">in</span>
<span class="kw">let</span> gstar = star (d ++ c * star a * b) <span class="kw">in</span>
merge (fstar, fstar * b * gstar,
gstar * c * fstar, gstar)
<span class="kw">end</span></code></pre></div>
<p>This function looks at the dimension of the matrix. If the matrix is dimension 1, we can just use the underlying Kleene star on the singleton entry in the matrix. If the square matrix is bigger, we can split the matrix in four and then directly apply the construction we saw earlier. We try to divide the four quadrants as nearly in half as we can, because this reduces the number of times we need to make a recursive call.</p>
<p>As an example of using this, let's see how we can convert automata given by state-transition functions to regular expressions.</p>
<p>First, we define a trivial Kleene algebra using a datatype of regular expressions, and use this to set up a module of matrices:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Re = <span class="kw">struct</span>
<span class="kw">type</span> t =
| Chr <span class="kw">of</span> <span class="dt">char</span>
| Seq <span class="kw">of</span> t * t
| Eps
| Bot
| Alt <span class="kw">of</span> t * t
| Star <span class="kw">of</span> t
<span class="kw">let</span> zero = Bot
<span class="kw">let</span> one = Eps
<span class="kw">let</span> ( * ) r1 r2 = Seq(r1, r2)
<span class="kw">let</span> ( ++ ) r1 r2 = Alt(r1, r2)
<span class="kw">let</span> star r = Star r
<span class="kw">end</span>
<span class="kw">module</span> M = Matrix(Re)</code></pre></div>
<p>Then, we can construct a Kleene algebra of matrices of size 2, and then give a 2-state automaton by its transition function.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> T = Square(Re)(M)(<span class="kw">struct</span> <span class="kw">let</span> size = <span class="dv">2</span> <span class="kw">end</span>)
<span class="kw">let</span> t =
M.make <span class="dv">2</span> <span class="dv">2</span> (<span class="kw">fun</span> i j ->
Re.Chr (<span class="kw">match</span> i, j <span class="kw">with</span>
| <span class="dv">0</span>, <span class="dv">0</span> -> <span class="ch">'a'</span>
| <span class="dv">0</span>, <span class="dv">1</span> -> <span class="ch">'b'</span>
| <span class="dv">1</span>, <span class="dv">0</span> -> <span class="ch">'c'</span>
| <span class="dv">1</span>, <span class="dv">1</span> -> <span class="ch">'d'</span>
| _ -> <span class="kw">assert</span> <span class="kw">false</span>))</code></pre></div>
<p>Now, we can use the Kleene star on <code>T</code> to compute the transitive closure of this transition function:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"># M.get tstar <span class="dv">0</span> <span class="dv">1</span>;;
- : M.elt =
Seq (Star (Alt (Chr <span class="ch">'a'</span>, Seq (Chr <span class="ch">'b'</span>, Seq (Star (Chr <span class="ch">'d'</span>), Chr <span class="ch">'c'</span>)))),
Seq (Chr <span class="ch">'b'</span>,
Star (Alt (Chr <span class="ch">'d'</span>, Seq (Chr <span class="ch">'c'</span>, Seq (Star (Chr <span class="ch">'a'</span>), Chr <span class="ch">'b'</span>))))))</code></pre></div>
<p>Voila! A regexp corresponding to the possible paths from 0 to 1.</p>
<p><a href="href=%22https://gist.github.com/neel-krishnaswami/c809b12bfeb5e4ed7e4444cbdae43f23%22">The code is available as a Github gist.</a> Bugfixes welcome!</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com4tag:blogger.com,1999:blog-8068466035875589791.post-52048439877338372292019-08-23T03:17:00.000-07:002019-08-23T03:17:00.227-07:00New Draft Paper: Survey on Bidirectional Typechecking
<p>Along with J. Dunfield, I have written a <a href="https://www.cl.cam.ac.uk/~nk480/bidir-survey.pdf">new survey paper on bidirectional typechecking</a>.</p>
<blockquote>
<p>Bidirectional typing combines two modes of typing: type checking, which checks that a program satisfies a known type, and type synthesis, which determines a type from the program. Using checking enables bidirectional typing to break the decidability barrier of Damas-Milner approaches; using synthesis enables bidirectional typing to avoid the large annotation burden of explicitly typed languages. In addition, bidirectional typing improves error locality. We highlight the design principles that underlie bidirectional type systems, survey the development of bidirectional typing from the prehistoric period before Pierce and Turner's local type inference to the present day, and provide guidance for future investigations.</p>
</blockquote>
<p>This is the first survey paper I've tried to write, and an interesting thing about the process is that it forces you to organize your thinking, and that can lead to you changing your mind. For example, before we started this paper I was convinced that bidirectional typechecking was obviously a manifestation of focalization. When we attempted to make this argument clearly in the survey, that led me to realize I didn't believe it! Now I think that the essence of bidirectional typechecking arises from paying close attention to the mode discipline (in the logic programming sense) of the typechecking relations.</p>
<p>Also, if you think we have overlooked some papers on the subject, please let us know. Bidirectional typechecking is a piece of implementation folklore that is perhaps over three decades old, which makes it very hard for us to feel any certainty about having found all the relevant literature.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com7tag:blogger.com,1999:blog-8068466035875589791.post-67450544936494016452019-08-21T06:42:00.000-07:002019-08-21T06:42:48.565-07:00On the Relationship Between Static Analysis and Type Theory<p>Some months ago, <a href="https://www.cc.gatech.edu/~rmangal3/">Ravi Mangal</a> sent me an email with a very interesting question, which I reproduce in part below:</p>
<blockquote>
<p>The question that has been bothering me is, "How are static analyses and type systems related?". Besides Patrick Cousot's <a href="https://www.irif.fr/~mellies/mpri/mpri-ens/articles/cousot-types-as-abstract-interpretations.pdf"><em>Types as abstract interpretations</em></a>, I have failed to find other material directly addressing this. Perhaps, the answer is obvious, or maybe it has just become such a part of the community's folklore that it hasn't been really written down. However, to me, it's not clear how one might even phrase this question in a precise mathematical manner.</p>
</blockquote>
<p>The reason this question is so interesting is because some very beautiful answers to it have been worked out, but have for some reason never made it into our community's folklore. Indeed, I didn't know the answers to this question until quite recently.</p>
<p>The short answer is that there are two views of what type systems are, and in one view type systems are just another static analysis, and in another they form the substrate upon which static analyses can be built -- and both views are true at the same time.</p>
<h2 id="static-analysis-including-types-as-properties">Static Analysis (including Types) as Properties</h2>
<p>One valid view of a type system, is that it is just another species of static analysis. At a high level, the purpose of a static analysis is to answer the question, "Can I tell what properties this program satisfies <strong>without</strong> running it?"</p>
<p>Abstract interpretation, model checking, dataflow analysis and type inference are all different ways of answering this question, each with somewhat different emphases and priorities. IMO, to understand the relations between these styles of analysis, it's helpful to think about the two main styles of semantics: operational semantics and denotational semantics.</p>
<p>A denotational semantics turns a program into a mathematical object (sets, domains, metric spaces, etc). This has the benefit that it lets you turn the full power of mathematics towards reasoning about programs, but it has the cost that deducing properties of arbitrary mathematical gadgets can require arbitrary mathematical reasoning.</p>
<p>Operational semantics, on the other hand, turns a program into a state machine. This has the benefit that this is relatively easy to do, and close to how computers work. It has the cost that it only gives semantics to whole programs, and in reality people tend to write libraries and other partial programs. (Also, infinite-state operational semantics doesn't make things obviously easier to work with than a denotational semantics.)</p>
<p>Operational semantics gives rise to model checking: if you have a program, you can view the sequence of states it takes on as a concrete model of temporal logic. So model checking then checks whether a temporal properties overapproximates or refines a concrete model.</p>
<p>So basically all static analyses work by finding some efficiently representable approximation of the meaning of a program, and then using that to define an algorithm to analyse the program, typically by some form of model checking. For type systems, the approximations are the types, and in abstract interpretation, this is what the abstract domains are.</p>
<p>David A. Schmidt's paper <a href="http://santos.cs.ksu.edu/schmidt/MFPS09/paper.pdf"><em>Abstract Interpretation from a Denotational Semantics Perspective</em></a> shows how you can see abstract domains as arising from domain theory, by noting that domains arise as the infinite limit of a series of finite approximations, and so you can get an abstract domain by picking a finite approximation rather than going all the way to the limit.</p>
<p>Thomas Jensen's PhD thesis, <a href="http://www.irisa.fr/lande/jensen/papers/thesis.pdf"><em>Abstract Interpretation in Logical Form</em></a>, shows how an abstract interpretation of a higher-order functional language corresponds to a certain intersection type system for it. This helps illustrate the idea that abstract domains and types are both abstractions of a program. (FWIW, finding equivalences between abstract domains and type systems used to be big topic back in the 90s, but seems to have disappeared from view. I don't know why!)</p>
<p>Now, model checking an infinite-state system will obviously fail to terminate in general. However, it can then be made effective and algorithmic by using your abstractions (either from types or from abstract interpretation) to turn the model into a finitary one that can be exhaustively searched.</p>
<p>For dataflow analysis, David A. Schmidt works out this idea in detail in his paper, <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.9327&rep=rep1&type=pdf"><em>Data Flow Analysis is Model Checking of Abstract Interpretations</em></a>, where he shows how if you build an abstraction of program traces and model-check against mu-calculus formulas, you can get back most classical dataflow analyses. (If one lesson you draw is that you should read everything David Schmidt has written, well, you won't be wrong.)</p>
<p>It's less common to formulate type-based analyses in this form, but some people have looked at it.</p>
<ol style="list-style-type: lower-alpha">
<li><p>There is a line of work on "higher-order model checking" which works by taking higher-order programs, using type inference for special type systems to generate abstractions for them which lie in special subclasses of pushdown systems which are decidable for the properties of interest for program analysis (eg, emptiness testing). Luke Ong wrote an overview of the area for LICS 2015, in his paper <a href="http://www.cs.ox.ac.uk/people/luke.ong/personal/publications/LICS15.pdf"><em>Higher-Order Model Checking: an Overview</em></a>.</p></li>
<li><p>Dave Clarke and Ilya Sergey have a nice IPL 2012 paper, <a href="https://www.sciencedirect.com/science/article/pii/S0020019011002791"><em>A correspondence between type checking via reduction and type checking via evaluation</em></a>, in which they show how you can view types as abstract values, and use this perspective to derive an abstract interpreter for a program which operates on these abstract values. They don't state it explicitly, since it was too obvious to mention, but you can check that a program <code>e</code> has the type <code>int</code> by asserting that <code>e</code> will eventually evaluate to the abstract value <code>int</code>. This is easy to check since every program in their abstract semantics terminates -- and so you can just evaluate <code>e</code> and see if the result is <code>int</code>.</p></li>
</ol>
<h2 id="types-as-grammar">Types as Grammar</h2>
<p>However, while types can be seen as a way of stating program properties, they have a second life as a way of defining <em>what the valid programs are</em>. A type system can also be seen as a way of statically ruling out certain programs from consideration. That is, an expression like <code>3("hi")</code>, which applies the argument <code>"hi"</code> to the number <code>3</code>, is valid Javascript or Python code, but is not a valid program in OCaml or Java, as it would be rejected at compile time by the typechecker.</p>
<p>As a result, type systems can also be used to simplify the semantics: if a term is statically ruled out, you don't have to give semantics to it. The benefit of doing this is that this can dramatically simplify your static analyses.</p>
<p>For example, if you have a typed functional language with integers and function types, then building an abstract interpretation to do a sign analysis is simplified by the fact that you don't have to worry about function values flowing into the integer domain. You can just take the obvious abstract domains for each type and then rely on the type system to ensure that only programs where everything fits together the way you want are ever allowed, freeing you of the need for a jumbo abstract domain that abstracts all values.</p>
<p>For cultural reasons, this methodology is often used in type systems research, but relatively rarely in static analysis work. Basically, if you tell a type systems researcher that you are changing the language to make it easier to analyze, they will say "oh, okay", but if you tell a static analysis researcher the same thing, they may express a degree of skepticism about your ability to update all Java code everywhere. Naturally, neither reaction is right or wrong: it's contextual. After all, there <em>are</em> billions of lines of Java code that won't change, <strong>and also</strong> the Java language <em>does</em> change from version to version.</p>
<p>However, as researchers we don't just pick up context from our problems, but also from our peers, and so the culture of type systems research tends to privilege new languages and the culture of static analysis research tends to privilege applicability to existing codebases, even though there is no fundamental technical reason for this linkage. After all, all static analyses exploit the structure of the language whenever they can in order to improve precision and scalability.</p>
<p>For example, in a typical intraprocedural dataflow analysis, we initialize all the local variables to the bottom element of the dataflow lattice. This is a sound thing to do, <em>because</em> the language is constrained so that local variables are freshly allocated on each procedure invocation. If, on the other hand, we were doing dataflow analysis of assembly code, different subroutines would all simultaneously affect the dataflow values assigned to each register.</p>
<p>As a consequence, one thing I'm looking forward to is finding out what static analysis researchers will do when they start taking a serious look at Rust. Its type system enforces really strong invariants about aliasing and data races, which ought to open the door to some really interesting analyses.</p>
<h2 id="the-relationship-between-the-two-views">The Relationship Between the Two Views</h2>
<p>As you can see, grammaticality constraints and program analysis are not opposed, and can overlap in effect. Furthermore, this means people writing papers on type systems are often blurry about which perspective they are taking -- in one section of a paper they may talk about types-as-properties, and in another they may talk about types-as-grammar. This ambiguity is technically harmless, but can be confusing to a reader coming to type systems from other areas within formal methods.</p>
<p>The reason this ambiguity is mostly harmless is because for well-designed languages there will be a nice relationship between these two views of what types are. John C Reynolds explained this, in his lovely paper <a href="https://kilthub.cmu.edu/articles/The_Meaning_of_Types_From_Intrinsic_to_Extrinsic_Semantics/6610688"><em>The Meaning of Types: From Intrinsic to Extrinsic Semantics</em></a>. (In his language, "intrinsic types" are types-as-grammar, and "extrinsic types" are types-as-property.)</p>
<p>Basically, he noted that when you give a denotational semantics under a types-as-grammar view, you interpret <em>typing derivations</em>, and when you give a denotational semantics in the types-as-property view, you interpret <em>program terms</em>. For types-as-grammar to make sense, its semantics must be <em>coherent</em>: two typing derivations of the same program must have the same semantics. (Otherwise the semantics of a program can be ambiguous.) But once coherence holds, it becomes feasible to prove the equivalence between the intrinsic and extrinsic semantics, because there is a tight connection between the syntax of the language and the typing derivations.</p>
<p>Noam Zeilberger and Paul-Andre Melliès wrote a POPL 2015 paper, <a href="https://www.irif.fr/~mellies/papers/functors-are-type-refinement-systems.pdf"><em>Functors are Type Refinement Systems</em></a>, where they work out the general semantic structure underpinning Reynolds' work, in terms of the relationship between intersection types (or program logic) and a base language.</p>
<h2 id="compositionality">Compositionality</h2>
<p>Another cultural effect is the importance type systems research places on compositionality. It is both absolutely the case that there is no <em>a priori</em> reason a type system has to be compositional, and also that most type systems researchers -- including myself! -- will be very doubtful of any type system which is not compositional.</p>
<p>To make this concrete, here's an example of a type system which is pragmatic, reasonable, and yet which someone like me would still find distasteful.</p>
<p>Consider a Java-like language whose type system explicitly marks nullability. We will say that method argument <code>(x : C)</code> means that <code>x</code> is a possibly <code>null</code> value of type <code>C</code>, and an argument <code>x : C!</code> means that <code>x</code> is definitely of class <code>C</code>, with no <code>null</code> value. Now, suppose that in this language, we decide that null tests should refine the type of a variable:</p>
<pre><code> // x : C outside the if-block
if (x !== null) {
doSomething(x); // x : C! inside the if-block
} </code></pre>
<p>Basically, we checked that <code>x</code> was not <code>null</code>, so we promoted its type from <code>C</code> to <code>C!</code>. This is a reasonable thing to do, which industrial languages like <a href="https://kotlinlang.org">Kotlin</a> support, and yet you will be hard-pressed to find many formal type theories supporting features like this.</p>
<p>The reason is that the culture of type theory says that type systems should satisfy a <em>substitution principle</em>: a variable stands in for a term, and so substituting a term of the same type as the variable should result in a typeable term. In this case, suppose that <code>e</code> is a very complicated expression of type <code>C</code>. Then if we do a substitution of <code>e</code> for <code>x</code>, we get:</p>
<pre><code> if (e !== null) {
doSomething(e); // e : C! -- but how?
} </code></pre>
<p>Now we somehow have to ensure that <code>e</code> has the type <code>C!</code> inside the scope of the conditional. For variables this requires updating the context, but for arbitrary terms it will be much harder, if it is even possible at all!</p>
<p>The culture of static analysis says that since it's easy for variables, we should handle that case, whereas the culture of type theory says that anything you do for variables should work for general terms. Again, which view is right or wrong is contextual: it's genuinely useful for Kotlin programmers to have dataflow-based null-tracking, but at the same time it's much easier to reason about programs if substitution works.</p>
<p>The reason type theorists have this culture is that type theory has two parents. On one side are Scott and Strachey, who consciously and explicitly based denotational semantics upon the compositionality principle; and the other side comes from the structural proof theory originated by Gerhard Gentzen, which was invented precisely to prove cut-eliminability (which basically just is a substitution principle). As a result, the type-theoretic toolbox is full of tools which both assume and maintain compositionality. So giving that up (or moving past it, depending on how you want to load the wording emotionally) is something which requires very careful judgement.</p>
<p>For example, the <a href="https://docs.racket-lang.org/ts-guide/index.html">Typed Racket</a> language extends the dynamically-typed <a href="https://docs.racket-lang.org/">Racket</a> language with types, and to support interoperation with idiomatic Racket code (which uses control-flow sensitive type tests), they do flow-sensitive typing. At ESOP 2011, Arjun Guha, Claudiu Saftoiu, and Shriram Krishnamurthi wrote a paper about a particularly elegant way of combining flow analysis and typechecking, <a href="https://plasma-seminar.github.io/readings/guha-esop2011.pdf"><em>Typing Local Control and State Using Flow Analysis</em></a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Finally, one might wonder whether it is worth it to try developing a unified perspective on type theory and static analysis. It's a lot of work to learn any one of these things, and (for example) David Schmidt's papers, while deeply rewarding, are not for the faint of heart. So maybe we should all specialize and not worry about what members of cousin tribes are doing?</p>
<p>All I can say is that I think I benefited tremendously from the fact that (largely by accident) I ended up learning separation logic, denotational semantics and proof theory as part of my PhD. There actually is a wholeness or consilience to the variety of techniques that the semantics community has devised, and insights apply across domains in surprising ways.</p>
<p>My <a href="https://www.cl.cam.ac.uk/~nk480/parsing.pdf">recent PLDI paper</a> with <a href="https://www.cl.cam.ac.uk/~jdy22/">Jeremy Yallop</a> is a nice example of this. Very little in this paper is hard, but what makes it easy is that we felt free to switch perspectives as we needed.</p>
<p>Because I had learned to like automata theory from reading papers on model checking, I ended up reading about Brüggemann-Klein and Wood's characterization of <a href="http://tr.informatik.uni-freiburg.de/reports/report38/report38.ps.gz">the deterministic regular languages</a>. Then, because I share the aesthetics of denotational semantics, I realized that they invented a superior, compositional characterization of the FOLLOW sets used in LL(1) parsing. Then, because I knew about the importance of substitution from type theory, this lead me to switch from the BNF presentation of grammars, to the mu-regular expressions. This let us offer the traditional parser combinator API, while under the hood having a type system which identified just the LL(1)-parseable grammars. Moreoever, the type inference algorithm we use in the implementation is basically a dataflow analysis. And that set things up in a form to which multistage programming techniques were applicable, letting us derive parser combinators that run faster than yacc.</p>
<p>But beyond the direct applications like that, it is just good to be able to go to a conference and be interested by what other people are doing. Everyone attending has developed deep expertise in some very esoteric subjects, and learning enough to learn from them is valuable. Research communities are long-running conversations, and being able to hold up your end of the conversation makes the community work better.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com4tag:blogger.com,1999:blog-8068466035875589791.post-68646940393294910772019-08-12T05:26:00.001-07:002019-08-12T05:26:09.503-07:00Michael Arntzenius on the Future of Coding PodcastMy PhD student <a href="http://www.rntz.net">Michael Arntzenius</a> is on the <a href="https://futureofcoding.org/episodes/040"><em>Future of Coding</em></a> podcast, talking about the things he has learned while developing <a href="http://www.rntz.net/datafun">Datafun</a>. In the course of a PhD, people tend to work on extremely refractory problems, which is a useful discipline for developing and sharpening your worldview. Since Michael is a particularly thoughtful person, he's developed a particularly thoughtful view of computing, and so the interview is well worth reading or listening to.
<blockquote>
<p>This episode explores the intersections between various flavors of math and programming, and the ways in which they can be mixed, matched, and combined. Michael Arntzenius, "rntz" for short, is a PhD student at the University of Birmingham building a programming language that combines some of the best features of logic, relational, and functional programming. The goal of the project is "to find a sweet spot of something that is more powerful than Datalog, but still constrained enough that we can apply existing optimizations to it and imitate what has been done in the database community and the Datalog community." The challenge is combining the key part of Datalog (simple relational computations without worrying too much underlying representations) and of functional programming (being able to abstract out repeated patterns) in a way that is reasonably performant.
<p>This is a wide-ranging conversation including: Lisp macros, FRP, Eve, miniKanren, decidability, computability, higher-order logics and their correspondence to higher-order types, lattices, partial orders, avoiding logical paradoxes by disallowing negation (or requiring monotonicity) in self reference (or recursion), modal logic, CRDTS (which are semi-lattices), and the place for formalism is programming. This was a great opportunity for me to brush up on (or learn for the first time) some useful mathematical and type theory key words. Hope you get a lot out of it as well -- enjoy!
</blockquote>
<p>I'd like to thank <a href="https://stevekrouse.com/">Steve Krouse</a> for interviewing Michael. This interview series is a really nice idea -- a lot of the intuition or worldview underpinning research programs never makes it into a paper, but can be brought out nicely in a conversation.
<p>Also, if you prefer reading to listening, the link also has a <a href="https://futureofcoding.org/episodes/040">a transcript of the podcast</a> available. Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com1tag:blogger.com,1999:blog-8068466035875589791.post-83779206727721221612019-07-11T08:15:00.000-07:002019-07-11T08:16:21.572-07:00All Comonads, All The Time<p>(If you are on the POPL 2020 program committee, please stop reading until after your reviews are done.)</p>
<p>It's been a few months since I've last blogged, so I figure I should say what I have been up to lately. The answer seems to be <strong>comonads</strong>. I have two new draft papers, both of which involve comonadic type theories in a fairly critical way. (I suppose that this makes me a type-theoretic hipster, who doesn't do monads now that they're popular.)</p>
<a name='more'></a>
<ul>
<li><p><a href="https://www.cl.cam.ac.uk/~nk480/popl20-cap-submission.pdf"><em>Recovering Purity with Comonads and Capabilities</em></a></p>
<p>The first paper, written with <a href="https://vikraman.org/">Vikraman Chaudhury</a> has the lovely subtitle <em>The marriage of purity and comonads</em>.</p>
<p>Folk wisdom has it that once your functional language supports impure features, that's game over: there's no way to recover a sublanguage which supports reasoning as if programs are pure, since higher-order functions mean you never know if a function variable could have an effect. So this is why people have historically turned to pure languages, and then isolate effects via a monadic type discipline (like Haskell) or effect system (like <a href="https://koka-lang.github.io/koka/doc/kokaspec.html">Koka</a>).</p>
<p>In this paper, we show that folk wisdom jumped the gun! We start with a small impure language, and then extend it by <em>adding</em> a comonadic type constructor for pure terms. This lets us do the opposite of Haskell: effects are pervasive, except in the bits where we <em>deny</em> the capability to perform them. Interestingly, the word "capability" is critical -- our denotational semantics formalizes the notion of <em>object capabilities</em>, and we show that our type system doesn't just offer the ability to enforce purity, but also notions intermediate between purity and pervasive effects, such as <em>capability safety</em>.</p>
<p>I'm really pleased with this paper, since it pulls together ideas people working in areas as far apart as operating systems and categorical proof theory, and does so with really elementary techniques in semantics. It builds on ideas that Martin Hofmann invented in his 2003 paper <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.478.5395&rep=rep1&type=pdf"><em>Linear Types and Non-Size-Increasing Polynomial Time Computation</em></a>, which is one of the papers that had a really big impact on me as a grad student.</p></li>
<li><p><a href="https://www.cl.cam.ac.uk/~nk480/seminaive-datafun.pdf"><em>Seminaïve Evaluation for a Higher-Order Functional Language</em></a></p>
<p>The next paper was written with <a href="http://www.rntz.net">Michael Arntzenius</a>.</p>
<p>For the last few years, we've been working on a language called <a href="http://www.rntz.net/datafun/"><em>Datafun</em></a>, which is a higher-order version of <a href="https://en.wikipedia.org/wiki/Datalog">Datalog</a>, which is either a query language akin to SQL plus recursion, or a simple bottom-up logic programming language.</p>
<p>We developed <a href="https://www.cl.cam.ac.uk/~nk480/datafun.pdf">a really nice semantics for Datafun</a>, but the thing about query languages is that people want them to go fast. The most important technique for making Datalog go fast is an optimization called <em>semi-naive evaluation</em>, but it was designed for a first-order logic programming language, and Datafun isn't that. So in this paper we worked out how seminaive evaluation should work for higher-order languages, and the answer involves a big old bag of tricks, starting with the <a href="https://www.informatik.uni-marburg.de/~pgiarrusso/ILC/"><em>incremental lambda calculus</em></a>, but also requiring a pretty intricate use of comonads to control how recursion is used and to propagate derivative information throughout the program. (Interestingly, Martin Hofmann also worked on using comonads to stratify recursion in his paper <a href="http://www.dcs.ed.ac.uk/home/mxh/csl97.ps.gz"><em>A Mixed Modal/Linear Lambda Calculus With Applications To Bellantoni-Cook Safe Recursion</em></a>!)</p>
<p>It is super satisfying to develop optimizations that make programs go <em>asymptotically</em> faster, and on top of that this technique feels like the tip of the iceberg -- the design space feels wide open to me, and I suspect static incrementalization techniques are potentially going to end up being as big a hammer as continuation-passing style.</p></li>
</ul>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com2tag:blogger.com,1999:blog-8068466035875589791.post-89071138803262310642019-05-13T09:18:00.000-07:002019-05-14T06:33:42.911-07:00Implementing Inverse Bidirectional Typechecking <style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
div.sourceCode { overflow-x: auto; }
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
code > span.dt { color: #902000; } /* DataType */
code > span.dv { color: #40a070; } /* DecVal */
code > span.bn { color: #40a070; } /* BaseN */
code > span.fl { color: #40a070; } /* Float */
code > span.ch { color: #4070a0; } /* Char */
code > span.st { color: #4070a0; } /* String */
code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
code > span.ot { color: #007020; } /* Other */
code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
code > span.fu { color: #06287e; } /* Function */
code > span.er { color: #ff0000; font-weight: bold; } /* Error */
code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
code > span.cn { color: #880000; } /* Constant */
code > span.sc { color: #4070a0; } /* SpecialChar */
code > span.vs { color: #4070a0; } /* VerbatimString */
code > span.ss { color: #bb6688; } /* SpecialString */
code > span.im { } /* Import */
code > span.va { color: #19177c; } /* Variable */
code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code > span.op { color: #666666; } /* Operator */
code > span.bu { } /* BuiltIn */
code > span.ex { } /* Extension */
code > span.pp { color: #bc7a00; } /* Preprocessor */
code > span.at { color: #7d9029; } /* Attribute */
code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
</style>
<p>In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In <p>In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In this post, let's implement it! What follows is a bit of OCaml code implementing the type system of the previous post.</p>
<p>First, let's give a data type to represent the types of the linear type system. As usual, we will have a datatype <code>tp</code> with one constructor for each grammatical production. In the comment next to each constructor, I'll give the term that the constructor corresponds to.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">type</span> tp =
| One <span class="co">(* represents 1 *)</span>
| Tensor <span class="kw">of</span> tp * tp <span class="co">(* represents A ⊗ B *)</span>
| Lolli <span class="kw">of</span> tp * tp <span class="co">(* represents A ⊸ B *)</span> </code></pre></div>
<p>Now, we can give a datatype to represent expressions. We'll represent variables with strings, and use the datatype <code>exp</code> to represent expressions. As before, there is a comment connecting the datatype to the expressions of the grammar.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">type</span> var = <span class="dt">string</span>
<span class="kw">type</span> <span class="dt">exp</span> =
| Unit <span class="co">(* represents () *)</span>
| LetUnit <span class="kw">of</span> <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents let () = e in e' *)</span>
| Pair <span class="kw">of</span> <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents (e, e') *)</span>
| LetPair <span class="kw">of</span> var * var * <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents let (x,y) = e in e' *)</span>
| Lam <span class="kw">of</span> var * <span class="dt">exp</span> <span class="co">(* represents λx. e *)</span>
| App <span class="kw">of</span> <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents e e' *)</span>
| Var <span class="kw">of</span> var <span class="co">(* represents x *)</span></code></pre></div>
<p>Now we have to do something annoying, and implement some functions on the option datatype which really should be in the standard library. Basically we just want the standard functional programming structure on option types -- folds, maps, and monadic structure -- so we just go ahead an implement it.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Option = <span class="kw">struct</span>
<span class="kw">type</span> 'a t = 'a <span class="dt">option</span>
<span class="kw">let</span> map f = <span class="kw">function</span>
| <span class="dt">None</span> -> <span class="dt">None</span>
| <span class="dt">Some</span> x -> <span class="dt">Some</span> (f x)
<span class="kw">let</span> return x = <span class="dt">Some</span> x
<span class="kw">let</span> fail = <span class="dt">None</span>
<span class="kw">let</span> (>>=) m f =
<span class="kw">match</span> m <span class="kw">with</span>
| <span class="dt">None</span> -> <span class="dt">None</span>
| <span class="dt">Some</span> x -> f x
<span class="kw">let</span> fold some none = <span class="kw">function</span>
| <span class="dt">None</span> -> none
| <span class="dt">Some</span> x -> some x
<span class="kw">end</span></code></pre></div>
<p>Now, we can actually implement the bidirectional typechecker. To understand the implementation, it's actually helpful to understand the interface, first.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> <span class="kw">type</span> TYPING = <span class="kw">sig</span>
<span class="kw">type</span> ctx = (var * tp <span class="dt">option</span>) <span class="dt">list</span>
<span class="kw">type</span> 'a t = ctx -> ('a * ctx) <span class="dt">option</span>
<span class="kw">val</span> map : ('a -> 'b) -> 'a t -> 'b t
<span class="kw">val</span> return : 'a -> 'a t
<span class="kw">val</span> ( >>= ) : 'a t -> ('a -> 'b t) -> 'b t
<span class="kw">val</span> synth : <span class="dt">exp</span> -> tp t
<span class="kw">val</span> check : <span class="dt">exp</span> -> tp -> <span class="dt">unit</span> t</code></pre></div>
<p>The basic structure of our typechecker is to give a pair of operations <code>check</code> and <code>synth</code>, which respectively check that an expression <code>e</code> has a type <code>tp</code>, and infer a type for an expression. This function will be written in a monadic style, so we also have a type constructor <code>'a t</code> for typechecking computations, and the usual assortment of functorial (<code>map</code>) and monadic (<code>return</code> and <code>>>=</code>) structure for this type.</p>
<p>The monadic type constructor <code>'a t</code> is a pretty basic state-and-exception monad. It plumbs the context (of type <code>ctx</code>) through the computation, and can either return a value and an updated context, or it will fail.</p>
<p>An interesting feature of this context representation is that it does not map variables to types – it maps them to the option type <code>tp option</code>. This is because of the way that the moding will work out; the type is an <em>output</em> of the typing relation, and so when we put a variable into the context, we will not give it a type, and use the computation to ascribe it a type, which will be reflected in the output context. This is also why we use a full state monad rather than a reader monad for the context – we are basically implementing part of Prolog's substitution threading here.</p>
<p>We will also need a number of operations to implement the typechecker.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">val</span> fail : 'a t
<span class="kw">val</span> checkvar : var -> tp -> <span class="dt">unit</span> t
<span class="kw">val</span> lookup : var -> tp t
<span class="kw">val</span> withvar : var -> 'a t -> 'a t
<span class="kw">val</span> tp_eq : tp -> tp -> <span class="dt">unit</span> t
<span class="kw">end</span></code></pre></div>
<p>We will need to <code>fail</code> in order to judge programs ill-typed. The <code>checkvar x tp</code> operation gives the variable <code>x</code> the type <code>tp</code>. The <code>lookup x</code> operation will look in the context to find a a type for <code>x</code>, failing if <code>x</code> has not yet been given a type. The operation <code>withvar x m</code> will run the monadic computation <code>m</code> in a context extended with the variable <code>x</code>. (No type is given for the variable, because it's the job of <code>m</code> to give the variable a type.) Finall, there's an equality test <code>tp_eq tp1 tp2</code>, that acts as a guard, failing if the two arguments are unequal.</p>
<p>Now, we can move on to the actual implementation.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Typing : TYPING = <span class="kw">struct</span>
<span class="kw">type</span> ctx = (var * tp <span class="dt">option</span>) <span class="dt">list</span>
<span class="kw">type</span> 'a t = ctx -> ('a * ctx) <span class="dt">option</span>
<span class="kw">let</span> map f m ctx =
<span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span>
m ctx >>= <span class="kw">fun</span> (x, ctx) ->
return (f x, ctx)
<span class="kw">let</span> return x = <span class="kw">fun</span> ctx -> <span class="dt">Some</span>(x, ctx)
<span class="kw">let</span> (>>=) m f = <span class="kw">fun</span> ctx ->
<span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span>
m ctx >>= <span class="kw">fun</span> (a, ctx') ->
f a ctx'</code></pre></div>
<p>As promised, the computation type is a state-and-exception monad, and the implementation of <code>map</code> and the monadic unit and bind are pretty unsurprising. More interesting are the implementations of the actual operations in the monadic interface.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> fail : 'a t = <span class="kw">fun</span> ctx -> <span class="dt">None</span> </code></pre></div>
<p>Failure is easy to implement – it just ignores the context, and then returns <code>None</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="kw">rec</span> checkvar (x : var) (tp : tp) : <span class="dt">unit</span> t = <span class="kw">fun</span> ctx ->
<span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span>
<span class="kw">match</span> ctx <span class="kw">with</span>
| [] -> fail
| (y, <span class="dt">None</span>) :: rest <span class="kw">when</span> x = y -> return ((), (y, <span class="dt">Some</span> tp) :: rest)
| (y, <span class="dt">Some</span> _) :: rest <span class="kw">when</span> x = y -> fail
| h :: rest -> checkvar x tp rest >>= <span class="kw">fun</span> ((), rest') ->
return ((), h :: rest')</code></pre></div>
<p>The way that <code>checkvar x tp</code> works is that it iterates through the variables in the context, looking for the hypothesis which matches the variable <code>x</code>. When it finds it, it returns an updated context with the type of <code>x</code> set to <code>Some tp</code>. If the variable is already set, then that means that this is the second use of the variable, and so <code>checkvar</code> fails – this enforces the property that variables are used <em>at most</em> one time. If the variable isn't in the context, then <code>checkvar</code> also fails, because this is an out-of-scope variable reference. All other hypotheses are left unchanged.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> lookup x (ctx : ctx) =
<span class="kw">match</span> <span class="dt">List</span>.assoc_opt x ctx <span class="kw">with</span>
| <span class="dt">None</span> -> Option.fail
| <span class="dt">Some</span> <span class="dt">None</span> -> Option.fail
| <span class="dt">Some</span> (<span class="dt">Some</span> tp) -> Option.return(tp, ctx)</code></pre></div>
<p>The <code>lookup x</code> computation is even simpler – it returns <code>tp</code> if <code>(x, Some tp)</code> is in the context, and fails otherwise.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> withvar (<span class="kw">type</span> a) (x : var) (m : a t) : a t = <span class="kw">fun</span> ctx ->
<span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span>
m ((x, <span class="dt">None</span>) :: ctx) >>= <span class="kw">function</span>
| (r, (y, <span class="dt">Some</span> _) :: ctx') <span class="kw">when</span> x = y -> return (r, ctx')
| (r, (y, <span class="dt">None</span>) :: ctx') <span class="kw">when</span> x = y -> fail
| _ -> <span class="kw">assert</span> <span class="kw">false</span></code></pre></div>
<p>The <code>withvar x m</code> operation extends the context with the variable <code>x</code>, and then runs <code>m</code> in the extended context.</p>
<p>An invariant our context representation maintains is that the output context has exactly the same variables in exactly the same order as the input context, and so we just pop off the first variable of the output context before returning, checking to make sure that the type of the variable has been set (i.e., <code>Some _</code>) to ensure that the variable was used <em>at least</em> one time. In conjunction with <code>checkvar</code> ensuring that the variable is used <em>at most</em> one time, this will ensure that each variable is used exactly one time.</p>
<p>If the first variable of the output context isn't <code>x</code>, or if the output context is empty, then our invariant is broken, and so we signal an assertion failure.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> tp_eq (tp1 : tp) (tp2 : tp) = <span class="kw">if</span> tp1 = tp2 <span class="kw">then</span> return () <span class="kw">else</span> fail </code></pre></div>
<p>The <code>type_eq tp1 tp2</code> function just turns a boolean test into a guard. Now, we can go through the synthesis and checking functions clause-by-clause:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="kw">rec</span> synth = <span class="kw">function</span>
| Unit -> return One</code></pre></div>
<p>We synthesize the unit type for the unit value.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | Pair(e1, e2) -> synth e1 >>= <span class="kw">fun</span> tp1 ->
synth e2 >>= <span class="kw">fun</span> tp2 ->
return (Tensor(tp1, tp2))</code></pre></div>
<p>To synthesize a type for a pair, we synthesize types for each of the components, and then return their tensor product.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | Lam(x, e) -> withvar x (synth e >>= <span class="kw">fun</span> ret_tp ->
lookup x >>= <span class="kw">fun</span> arg_tp ->
return (Lolli(arg_tp, ret_tp)))</code></pre></div>
<p>Functions are interesting, because we need to deal with variables, and evaluation order plays out in a neat way here. We infer a type <code>ret_tp</code> for the body <code>e</code>, and then we look up the type <code>tp_arg</code> that the body <code>e</code> ascribed to the variable <code>x</code>. This lets us give a type <code>Lolli(tp_arg, tp_ret)</code> for the whole function.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | LetUnit(e, e') -> check e One >>= <span class="kw">fun</span> () ->
synth e'</code></pre></div>
<p>To synthesize a type for unit elimination, we synthesize a type for the body, and check that the scrutinee has the unit type <code>One</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | LetPair(x, y, e, e') ->
withvar y (withvar x (synth e' >>= <span class="kw">fun</span> res_tp ->
lookup x >>= <span class="kw">fun</span> tp1 ->
lookup y >>= <span class="kw">fun</span> tp2 ->
check e (Tensor(tp1, tp2)) >>= <span class="kw">fun</span> () ->
return res_tp))</code></pre></div>
<p>To eliminate a pair, we introduce (using <code>withvar</code>) scopes for the variables <code>x</code> and <code>y</code>, and then:</p>
<ol style="list-style-type: decimal">
<li>We synthesize a type <code>res_tp</code> for the continuation <code>e'</code>.</li>
<li>Since <code>e'</code> used <code>x</code> and <code>y</code>, we can look up the types they were used at (binding the type of <code>x</code> to <code>tp1</code> and the type of <code>y</code> to <code>tp2</code>).</li>
<li>Then, we check that the scrutinee <code>e</code> has the type <code>Tensor(tp1, tp2)</code>.</li>
<li>Finally, we return the type <code>res_tp</code> for the type of the whole expression.</li>
</ol>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | App(_, _) -> fail
| Var _ -> fail </code></pre></div>
<p>Since applications and variable references are checking, not synthesizing, we <code>fail</code> if we see one of them in synthesizing position. If they are in checking position, we can use the <code>check</code> function to typecheck them:</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">and</span> check (e : <span class="dt">exp</span>) (tp : tp) : <span class="dt">unit</span> t =
<span class="kw">match</span> e <span class="kw">with</span>
| Var x -> checkvar x tp </code></pre></div>
<p>The variable case simply uses <code>checkvar</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | App(e1, e2) -> synth e2 >>= <span class="kw">fun</span> tp_arg ->
check e1 (Lolli(tp_arg, tp))</code></pre></div>
<p>To check an application <code>e1 e2</code> at a type <code>tp</code>, we first synthesize the argument type by inferring a type <code>tp_arg</code> for <code>e2</code>, and then we check that <code>e1</code> has the function type <code>Lolli(tp_arg, tp)</code>.</p>
<div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | e -> synth e >>= tp_eq tp
<span class="kw">end</span></code></pre></div>
<p>Finally, when we find a synthesizing term in checking position, we infer a type for it and then see if it is equal to what we expected.</p>
<p>This code is, at-best, lightly-tested, but I knocked together a <a href="https://github.com/neel-krishnaswami/inverse-bidirectional-typechecking">small Github repository</a> with the code. Enjoy!</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com2tag:blogger.com,1999:blog-8068466035875589791.post-28549785976383842052019-05-10T15:45:00.000-07:002019-05-16T03:32:28.319-07:00Inverting Bidirectional Typechecking<script type="text/javascript" async
src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML">
</script>
<p><span class="math display">\[\newcommand{\bnfalt}{\;\;|\;\;}
\newcommand{\To}{\Rightarrow}
\newcommand{\From}{\Leftarrow}
\newcommand{\rule}[2]{\frac{\displaystyle \begin{matrix} #1 \end{matrix}}{\displaystyle #2}}
\newcommand{\check}[3]{{#1} \vdash {#2} \From {#3}}
\newcommand{\synth}[3]{{#1} \vdash {#2} \To {#3}}
\newcommand{\lam}[1]{\lambda {#1}.\,}
\newcommand{\inj}[2]{\mathsf{in}_{#1}({#2})}
\newcommand{\case}[5]{\mathsf{case}({#1}, \inj{1}{#2} \to {#3}, \inj{2}{#4} \to {#5})}
\newcommand{\match}[2]{\mathsf{match}\;{#1}\;\mathsf{of}\;[ #2 ]}
\newcommand{\arm}[2]{{#1} \to {#2}}
\newcommand{\downshift}[1]{\downarrow{#1}}
\newcommand{\upshift}[1]{\uparrow{#1}}
\newcommand{\thunk}[1]{\left\{{#1}\right\}}
\newcommand{\return}[1]{\mathsf{return}\;#1}
\newcommand{\fun}[1]{\lambda\;#1}
\newcommand{\checkn}[3]{{#1} \rhd {#2} \From {#3}}
\newcommand{\checkp}[2]{{#1} \leadsto {#2}}
\newcommand{\spine}[4]{{#1} \vdash {#2} : {#3} \gg {#4}}
\newcommand{\tensor}{\otimes}
\newcommand{\lolli}{\multimap}
\newcommand{\letv}[2]{\mathsf{let}\;{#1}={#2}\;\mathsf{in}\;}
\newcommand{\unit}{\langle\rangle}
\newcommand{\letunit}[1]{\letv{\unit}{#1}}
\newcommand{\pair}[2]{\left\langle{#1}, {#2}\right\rangle}
\newcommand{\letpair}[3]{\letv{\pair{#1}{#2}}{#3}}\]</span></p>
<p>In the traditional recipe for bidirectional typechecking, introduction forms are checked, and the principal subterm of elimination forms are inferred. However, a while back <a href="https://noamz.org">Noam Zeilberger</a> remarked to me that in multiplicative linear logic, bidirectional typechecking worked just as well if you did it backwards. It is worth spelling out the details of this remark, and so this blog post.</p>
<p>First, let's give the types and grammar of multiplicative linear logic.</p>
<p><span class="math display">\[
\begin{array}{llcl}
\mbox{Types} & A & ::= & 1 \bnfalt A \tensor B \bnfalt A \lolli B \\
\mbox{Terms} & e & ::= & x \bnfalt \lam{x}{e} \bnfalt e\,e' \\
& & | & \unit \bnfalt \letunit{e}{e'} \\
& & | & \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'} \\
\mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x \From A \\
\end{array}
\]</span></p>
<p>Our types are the unit type <span class="math inline">\(1\)</span>, the tensor product <span class="math inline">\(A \tensor B\)</span>, and the linear function space <span class="math inline">\(A \lolli B\)</span>. The unit and pair have the expected introduction forms <span class="math inline">\(\unit\)</span> and <span class="math inline">\(\pair{e}{e'}\)</span>, and they have "pattern matching" style elimination forms. Functions are introduced with lambdas <span class="math inline">\(\lam{x}{e}\)</span> and eliminated with applications <span class="math inline">\(e\,e'\)</span> as usual, and of course variable references <span class="math inline">\(x\)</span> as usual. Contexts are a bit unusual -- they pair together variables and their types as usual, but instead of treating a variable as a placeholder for a <em>synthesizing</em> term, we treat variables as placeholders for <em>checking</em> terms.</p>
<p>Now, let's go through the typing rules. First, we give the introduction and elimination rules for the unit type.</p>
<p><span class="math display">\[
\begin{array}{ll}
\rule{ }
{ \synth{\cdot}{\unit}{1} }
&
\rule{ \synth{\Delta}{e'}{A} &
\check{\Gamma}{e}{1}
}
{ \synth{\Gamma, \Delta}{\letunit{e}{e'}}{A} }
\\[1em]
\end{array}
\]</span></p>
<p>The introduction rule says that in an empty context, the unit value <span class="math inline">\(\unit\)</span> <em>synthesizes</em> the type <span class="math inline">\(1\)</span>. The pattern-matching style elimination <span class="math inline">\(\letunit{e}{e'}\)</span> typechecks as follows. First, we infer a type <span class="math inline">\(A\)</span> for the body <span class="math inline">\(e'\)</span>, and then we check that the scrutinee <span class="math inline">\(e\)</span> has the unit type <span class="math inline">\(1\)</span>.</p>
<p>This order is <em>backwards</em> from traditional bidirectional systems -- we synthesize a type for the continuation first, before checking the type of the data we are eliminating. In the case of units, this is a mere curiosity, but it gets more interesting with the tensor product type <span class="math inline">\(A \tensor B\)</span>.</p>
<p><span class="math display">\[
\begin{array}{ll}
\rule{ \synth{\Gamma}{e}{A} &
\synth{\Delta}{e'}{B} }
{ \synth{\Gamma, \Delta}{\pair{e}{e'}}{A \tensor B} }
&
\rule{ \synth{\Gamma, x \From A, y \From B}{e'}{C} &
\check{\Delta}{e}{A \tensor B} }
{ \synth{\Gamma, \Delta}{\letpair{x}{y}{e}{e'}}{C} }
\end{array}
\]</span></p>
<p>Now, the synthesis for pairs remains intuitive. For a pair <span class="math inline">\(\pair{e}{e'}\)</span>, we first infer a type <span class="math inline">\(A\)</span> for <span class="math inline">\(e\)</span>, and a type <span class="math inline">\(B\)</span> for <span class="math inline">\(e'\)</span>, and then conclude that the pair has the type <span class="math inline">\(A \tensor B\)</span>. However, the typing of the pair elimination <span class="math inline">\(\letpair{x}{y}{e}{e'}\)</span> is much wilder.</p>
<p>In this rule, we <em>first</em> check that the continuation <span class="math inline">\(e'\)</span> has the type <span class="math inline">\(C\)</span>, and then we learn from typechecking <span class="math inline">\(e'\)</span> that <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> were required to have had types <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> respectively. This gives us the data that we need to check that <span class="math inline">\(e\)</span> has the type <span class="math inline">\(A \tensor B\)</span>.</p>
<p>The linear function type <span class="math inline">\(A \lolli B\)</span> has a similar character:</p>
<p><span class="math display">\[
\begin{array}{ll}
\rule{ \synth{\Gamma, x \From A}{e}{B} }
{ \synth{\Gamma}{\lam{x}{e}}{A \lolli B} }
&
\rule{ \synth{\Gamma}{e'}{A} &
\check{\Delta}{e}{A \lolli B} }
{ \check{\Gamma, \Delta}{e\,e'}{B} }
\end{array}
\]</span></p>
<p>Here, to infer at type for the introduction form <span class="math inline">\(\lam{x}{e}\)</span>, we infer a type <span class="math inline">\(B\)</span> for the body <span class="math inline">\(e\)</span>, and then look up what type <span class="math inline">\(A\)</span> the parameter <span class="math inline">\(x\)</span> was required to be for the body to typecheck. To check that an application <span class="math inline">\(e\,e'\)</span> has the type <span class="math inline">\(B\)</span>, we infer a type <span class="math inline">\(A\)</span> for the argument <span class="math inline">\(e'\)</span>, and then check that the function <span class="math inline">\(e\)</span> has the function type <span class="math inline">\(A \lolli B\)</span>.</p>
<p>Again, the checking/synthesis mode of thse rules are precisely reversed from usual bidirectional type systems. We can see how this reversal plays out for variables below:</p>
<p><span class="math display">\[
\begin{array}{ll}
\rule{ }
{ \check{x \From A}{x}{A} }
&
\rule{ \synth{\Gamma}{e}{A} & A = B}
{ \check{\Gamma}{e}{B} }
\end{array}
\]</span></p>
<p>Here, when we check that the variable <span class="math inline">\(x\)</span> has the type <span class="math inline">\(A\)</span>, the context must be such that it demands <span class="math inline">\(x\)</span> to have the type <span class="math inline">\(A\)</span>. (However, the switch between checking and synthesis is the same as ever.)</p>
<p>If you are used to regular bidirectional systems, the information flow in the variable rule (as well as for pattern matching for pairs and lambda-abstraction for functions) is a bit unexpected. We are used to having a context tell us what types each variable has. However, in this case we are not doing that! Instead, we use it to record the types that the rest of the program requires<br />
variables to have.</p>
<p>This is still a "well-moded" program in the sense of logic programming. However, the moding is a bit more exotic now -- within a context, the variables are <em>inputs</em>, but their types are <em>outputs</em>. This is a bit fancier than the mode systems that usual logic programming languages have, but people have studied mode systems which can support this (such as Uday Reddy's <a href="http://www.cs.bham.ac.uk/~udr/papers/directional.pdf"><em>A Typed Foundation for Directional Logic Programming</em></a>).</p>
<p>As far as the metatheory of this system goes, I don't know very much about it. Substitution works fine -- you can easily prove a theorem of the form:</p>
<blockquote>
<p><strong>Theorem</strong> <em>(Substitution)</em> If <span class="math inline">\(\check{\Delta}{e}{A}\)</span>, then</p>
<ol style="list-style-type: decimal">
<li>If <span class="math inline">\(\check{\Gamma, x \From A, \Theta}{e'}{C}\)</span> then <span class="math inline">\(\check{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\)</span>.</li>
<li>If <span class="math inline">\(\synth{\Gamma, x \From A, \Theta}{e'}{C}\)</span> then <span class="math inline">\(\synth{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\)</span></li>
</ol>
</blockquote>
<p>However, I don't presently know a good characterization of the kind of terms are typable under this discipline. E.g., in the standard bidirectional presentation, the annotation-free terms are precisely the <span class="math inline">\(\beta\)</span>-normal terms. However, in the reverse bidirectional system, that is not the case. <!-- In particular, an application like <span class="math inline">\((\lam{x}{x})\,\unit\)</span> will check at the type <span class="math inline">\(1\)</span>, and this is an obvious <span class="math inline">\(\beta\)</span>-redex.</p> -->
<p>Two papers that seem closely related to this system are:</p>
<ol style="list-style-type: decimal">
<li><p>Adam Chlipala, Leaf Petersen, and Robert Harper's TLDI 2005 paper, <a href="http://adam.chlipala.net/papers/StrictTLDI05/"><em>Strict Bidirectional Type Checking</em></a>, and</p></li>
<li><p>Ningning Xie and Bruno C. d. S. Oliveira's ESOP 2018 paper, <a href="https://xnning.github.io/papers/let-arguments-go-first.pdf"><em>Let Arguments Go First</em></a>.</p></li>
</ol>
<p>Adam and company's paper includes the traditional synthesizing bidirectional hypotheses, as well as checking hypotheses very similar to the ones in this post, but inspired by relevance logic rather than linear logic. The basic idea is that if a hypothesis is relevant, then it is okay to let checking determine its type, since we are guaranteed that the variable will appear in some checking context (which will tell us what type it should have). The same idea applies here, since linearity necessarily implies relevance.</p>
<p>Ningning and Bruno's paper has an application rule that looks exactly like the one in this paper -- argument types are synthesized, which permits inferring the type of a function head in a checking context. However, their system is focused on inferring polymorphic types, which makes the precise relationship a bit unclear to me.</p>
<p>The implementation of reverse bidirectionality is just as easy as traditional bidirectional systems, but I will leave that for my next post.</p>
Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com3