tag:blogger.com,1999:blog-80684660358755897912021-01-20T00:38:37.298-08:00Semantic DomainNeel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.comBlogger88125tag:blogger.com,1999:blog-8068466035875589791.post-52808302031203874662020-12-14T05:30:00.002-08:002020-12-14T05:30:42.470-08:00TypeFoundry: new ERC Consolidator Grant<p>I am very pleased to have <a href="https://erc.europa.eu/news/CoG-recipients-2020">received an ERC Consolidator Grant</a> for my TypeFoundry proposal. <p>This will be a five year project to develop the foundations of bidirectional type inference. If you are interested in pursuing a PhD in this area, or conversely, are finishing a PhD in this area, please get in touch! <blockquote><p> Many modern programming languages, whether developed in industry, like Rust or Java, or in academia, like Haskell or Scala, are typed. All the data in a program is classified by its type (e.g., as strings or integers), and at compile-time programs are checked for consistent usage of types, in a process called type checking. Thus, the expression <code>3 + 4</code> will be accepted, since the + operator takes two numbers as arguments, but the expression <code>3 + "hello"</code> will be rejected, as it makes no sense to add a number and a string. Though this is a simple idea, sophisticated type system can track properties like algorithmic complexity, data-race freedom, differential privacy, and data abstraction.</p> <p>In general, programmers must annotate programs to tell compilers the types to check. In theoretical calculi, it is easy to demand enough annotations to trivialize typechecking, but this can make the annotation burden unbearable: often larger than the program itself! So, to transfer results from formal calculi to new programming languages, we need type inference algorithms, which reconstruct missing data from partially-annotated programs.</p> <p>However, the practice of type inference has outpaced its theory. Compiler authors have implemented many type inference systems, but the algorithms are often ad-hoc or folklore, and the specifications they are meant to meet are informal or nonexistent. The makes it hard to learn how to implement type inference, hard to build alternative implementations (whether for new compilers or analysis engines for IDEs), and hard for programmers to predict if refactorings will preserve typability.</p> <p>In <b>TypeFoundry</b>, we will use recent developments in proof theory and semantics (like polarized type theory and call-by-push-value) to identify the theoretical structure underpinning type inference, and use this theory to build a collection of techniques for type inference capable of scaling up to the advanced type system features in both modern and future languages. </p></blockquote> <p> One of the things that makes me happy about this (beyond the obvious benefits of research funding and the recognition from my peers) is that it shows off the international character of science. I'm an Indian-American researcher, working in the UK, and being judged and funded by researchers in Europe. There has been a truly worrying amount of chauvinism and jingoism in public life recently, and the reminder that cosmopolitanism and universalism are real things too is very welcome. <p>Also, if you are planning on submitting a Starting Grant or Consolidator proposal to the ERC in the coming year about programming languages, verification or the like, please feel free to get in touch, and I'll be happy to share advice. Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com2tag:blogger.com,1999:blog-8068466035875589791.post-80031034353817676972020-12-02T14:05:00.002-08:002020-12-02T14:06:42.197-08:00Church Encodings, Inductive Types, and Relational Parametricity<p>My blogging has been limited this past year due to RSI, but I do not want to leave things entirely fallow, and last year I wrote an email which can be edited into a decent enough blog post.</p><p>Quite often, people will hear that System F, the polymorphic lambda calculus, satisfies a property called <em>relational parametricity</em>. We also often hear people say that the parametricity property of System F lets you prove that a Church encoding of an inductive datatypes actually satisfies all the equational properties we expect of inductive types.</p><p>But it's surprisingly hard to find an explicit account of how you go from the basic properties of parametricy (the relational interpretation of System F and the identity extension principle) to the proof that inductive types are definable. I learned how this works from Bob Atkey's paper <a href="https://bentnib.org/fomega-parametricity.html"><em>Relational Parametricity for Higher Kinds</em></a>, but the proofs are kind of terse, since his page count was mostly focused on the new stuff he had invented.</p><p>In the sequel, I'm going to assume that you know what the relational model of system F looks like, that the term calculus satisfies <em>the abstraction theorem</em> (i.e., the fundamental theorem of logical relations), and also that the model satisfies the <em>identity extension property</em> -- if you take the relational interpretation of a type <code>B(α)</code>, and fill in the type variable <code>α</code> with the equality relation for the type <code>A</code>, then the relational interpretation of <code>B[A/α]</code> will the equality relation for the type <code>B[A/α]</code>. In what follows I will often write <code>Id[X]</code> to mean the equality relation for the type <code>X</code>.</p><p>For completeness' sake, I also give a quick summary of the model at the end of the post.</p><h2 id="haskell-style-functors-in-system-f">Haskell-style functors in System F</h2><p>Given all this preface, we now define a <em>functor</em> <code>F</code> as a type expression <code>F(-)</code> with a hole, along with an operation <code>fmap : ∀a,b. (a → b) → F(a) → F(b)</code>, such that</p><pre><code>fmap _ _ id = id<br />fmap _ _ f ∘ fmap _ _ g = fmap _ _ (f ∘ g)</code></pre><p>I've written _ to indicate System F type arguments I'm suppressing. Basically think of these as Haskell-style functors.</p><p>Next, we can define the inductive type <code>μF</code> using the usual Church encoding as:</p><pre><code>μF = ∀a. (F(a) → a) → a<br /><br />foldF : ∀a. (F(a) → a) → μF → a<br />foldf = Λa λf : F(a) → a. λx:μF. x [a] f<br /><br />inF : F(μF) → μF<br />inF x = Λa. λf : F(a) → a. <br /> let g : μF → a = fold [μF] [a] f in<br /> let h : F(μF) → F(a) = fmap [μF] [a] g in<br /> let v : F(a) = h x in<br /> f v</code></pre><p>I've written out <code>inF</code> with type-annotated local bindings to make it easier to read. If you inlined all the local bindings and suppressed type annotations, then it would read:</p><pre><code>inF : F(μF) → μF<br />inF x = Λa. λf : F(a) → a. <br /> f (fmap (fold f) x)</code></pre><h2 id="f-algebra-homomorphisms-and-the-graph-lemma">F-algebra Homomorphisms and the Graph Lemma</h2><p>Now, we can prove a lemma about F-algebra homomorphisms, which is the tool we will use to prove initiality. But what are F-algebra homomorphisms?</p><p>An <em>F-algebra</em> is a pair <code>(X, g : F(X) → X)</code>. An <em>F-algebra homomorphism</em> between two F-algebras <code>(X, k : F(X) → X)</code> and <code>(Y, k' : F(Y) → Y)</code> is a function <code>f : X → Y</code> such that the following ASCII-art diagram commutes:</p><pre><code> F(X) — k → X<br /> | |<br /> fmap X Y f f<br /> ↓ ↓<br /> F(Y) — k' → Y</code></pre><p>That is, for all <code>u : F(X)</code>, we want <code>f(k u) = k'(fmap [X] [Y] f u)</code></p><p>Before we prove something about homomorphisms, we need to prove a technical lemma called the <em>graph lemma</em>, which will let us connect parametricity with our functorial definitions.</p><p><strong>Graph Lemma</strong>: Let <code>(F, fmap)</code> be a functor, and <code>h : A → B</code> be a function. Define the <em>graph relation</em> <code><h></code> to be the relation <code>{(a,b) | b = f(a) }</code>. Then <code>F<h> ⊆ <fmap [A] [B] h></code>.</p><p>Proof:</p><ol style="list-style-type: decimal"><li>First, note that <code>fmap</code> has the type <code>∀a b. (a → b) → F(a) → F(b)</code>.</li><li><p>Since <code>fmap</code> is parametric, we can choose the relations <code><h></code> and the identity to instantiate <code>F</code> with, giving us:</p><pre><code>(fmap [A] [B], fmap [B] [B]) ∈ (<h> → Id[B]) → (F<h> → Id<B>)</code></pre></li><li>Note that <code>(h, id) ∈ <h> → Id[B]</code>.</li><li>Hence <code>(fmap [A] [B] h, fmap [B] [B] id) ∈ (F<h> → Id<B>)</code></li><li>By functoriality, <code>(fmap [A] [B] h, id) ∈ (F<h> → Id<B>)</code></li><li>Assume <code>(x, y) ∈ F<h></code>. <ol style="list-style-type: decimal"><li>Hence <code>(fmap [A] [B] h x, id y) ∈ Id<B>)</code></li><li>So <code>fmap [A] [B] h x = y</code>.</li></ol></li><li>Therefore <code>(x, y) ∈ <fmap [A] [B] h></code>.</li><li><p>Therefore <code>F<h> ⊆ <fmap [A] [B] h></code>.</p></li></ol><p>Graph relations are just functions viewed as relations, and the graph lemma tells us that the relational semantics of a type constructor applied to a graph relation <code><h></code> will behave like the implementation of the <code>fmap</code> term. In other words, it connects the relational semantics to the code implementing functoriality. (As an aside, this feels like it should be an equality, but I only see how to prove the inclusion.)</p><p>We use this lemma in the proof of the homomorphism lemma, which we state below:</p><p><strong>Homomorphism Lemma</strong>: Let <code>(F, fmap)</code> be a functor. Given two F-algebras <code>(A, k : F(A) → A)</code> and <code>(B, k' : F(B) → B)</code>, and an F-algebra homomorphism <code>h : (A,k) → (B,k')</code>, then for all <code>e : μF</code>, we have</p><pre><code>e [B] k' = h (e [A] k)</code></pre><p>Proof:</p><ol style="list-style-type: decimal"><li><p>First, by the parametricity of <code>e</code>, we know that</p><pre><code> (e [A], e [B]) ∈ (F<h> → <h>) → <h></code></pre></li><li>We want to apply the arguments <code>(k, k')</code> to <code>(e [A], e [B])</code>, so we have to show that <code>(k, k') ∈ F<h> → <h></code>.</li><li><p>To show this, assume that <code>(x, y) ∈ F<h></code>.</p><ol style="list-style-type: decimal"><li>Now we have to show that <code>(k x, k' y) ∈ <h></code>.</li><li>Unfolding the definition of <code><h></code>, we need <code>(h (k x), k' y) ∈ Id[B]</code>.</li><li>Since <code>h</code> is an F-algebra homomorphism, we have that <code>h (k x) = k' (fmap [A] [B] h x)</code>.</li><li>So we need to show <code>(k' (fmap A B h x), k' y) ∈ Id[B]</code>.</li><li>Now, we know that <code>(x, y) ∈ F<h></code>.</li><li>By the graph lemma, we know <code>F<h> ⊆ <fmap [A] [B] h></code>.</li><li>So <code>(x, y) ∈ <fmap [A] [B] h></code>.</li><li>Unfolding the definition, we know <code>y = fmap [A] [B] h x</code>.</li><li>So we want <code>(k' (fmap [A] [B] h x), k' (fmap [A] [B] h x)) ∈ Id[B]</code>.</li><li>Since <code>Id[B]</code> is an equality relation, this is immediate.</li></ol></li><li>Hence <code>(k, k') ∈ F<h> → <h></code>.</li><li>Therefore <code>(e [A] k, e [B] k') ∈ <h></code>.</li><li><p>Unfolding the definition of <code><h></code>, we know <code>e [B] k' = h(e [A] k)</code>. This is what we wanted to show.</p></li></ol><h3 id="discussion">Discussion:</h3><p>The whole machinery of F-algebra homomorphisms basically exists to phrase the commuting conversions in a nice way. We just proved that for <code>e : μF</code>, we have</p><pre><code> e [B] k' = h (e [A] k)</code></pre><p>Recall that for Church encoded inductive types, the fold is basically the identity, so this result is equivalent (up to beta) to:</p><pre><code> foldF [B] k' e = h (foldF [A] k' e)</code></pre><p>So this lets us shifts contexts out of iterators if they are F-algebra homomorphisms. Note that this proof was also the one where the graph lemma is actually used.</p><h2 id="the-beta-and-eta-rules-for-inductive-types">The Beta and Eta Rules for Inductive Types</h2><p>Next, we'll prove the beta- and eta-rules for inductive types. I'll do it in three steps:</p><ul><li>The beta rule (eg, <code>let (x,y) = (e1, e2) in e' == [e1/x, e2/y]e')</code>,</li><li>The basic eta rule (eg, <code>let (x,y) = e in (x,y) == e)</code></li><li>The commuting conversions (eg, <code>C[e] = let (x,y) = e in C[(x,y)])</code></li></ul><p><strong>Theorem (beta rule)</strong>: If <code>k : F(A) → A</code> and <code>e : F(μF)</code>, then</p><pre><code> foldF [A] k (inF e) = k (fmap [μF] [A] (foldF [A] k) e)</code></pre><p>Proof: This follows by unfolding the definitions and beta-reducing. (You don't even need parametricity for this part!)</p><p>This shows that for any F-algebra <code>(A, k), foldF [A] k</code> is an F-algebra homomorphism from <code>(μF, inF)</code> to <code>(A, k)</code>.</p><p><strong>Theorem (basic eta)</strong> For all <code>e : μF</code>, we have <code>e = e [μF] inF</code></p><p>Proof:</p><ol style="list-style-type: decimal"><li>Assume an arbitrary <code>B</code> and <code>k : F(B) → B</code>. <ol style="list-style-type: decimal"><li>Note <code>h = foldF [B] k</code> is an F-algebra homomorphism from <code>(μF, inF)</code> to <code>(B, k)</code> by the beta rule.</li><li>Then by our lemma, <code>e [B] k = h (e [μF] inF)</code>.</li><li>Unfolding, <code>h (e [μF] inF) = foldF [B] k (e [μF] inF)</code>.</li><li>Unfolding, <code>foldF [B] k (e [μF] inF) = e [μF] inF [B] k</code>.</li><li>So <code>e [B] k = e [μF] inF [B] k</code>.</li></ol></li><li>So for all <code>B</code> and <code>k : F(B) → B</code>, we have <code>e [B] k = e [μF] inF [B] k</code>.</li><li>By extensionality, <code>e = e [μF] inF</code>.</li></ol><p><strong>Theorem (commuting conversions)</strong>: If <code>k : F(A) → A</code> and <code>f : μF → A</code> and <code>f</code> is an F-algebra homomorphism from <code>(μF, inF)</code> to <code>(A, k)</code>, then <code>f = foldF [A] k</code>.</p><p>Proof:</p><ol style="list-style-type: decimal"><li>We want to show <code>f = foldF [A] k</code>.</li><li>Unfolding <code>foldF</code>, we want to show <code>f = λx:μF. x [A] k</code></li><li>Assume <code>e : μF</code>. <ol style="list-style-type: decimal"><li>Now we will show f e = e [A] k.</li><li>By the homomorphism lemma, and the fact that <code>f : (μF, inF) → (A, k)</code> we know that <code>e [A] k = f (e [μF] inF)</code>.</li><li>By the basic eta rule, <code>e [A] k = f e</code></li></ol></li><li>So for all <code>e</code>, we have <code>f e = e [A] k</code>.</li><li>Hence by extensionality <code>f = λx:μF. x [A] k</code>.</li></ol><p>Note that what I (as a type theorist) called "commuting conversions" is exactly the same as "initiality" (to a category theorist), so now we have shown that the Church encoding of a functor actually lets us define the initial algebra.</p><p>Thus, we know that inductive types are definable!</p><h2 id="appendix-the-relational-model-of-system-f">Appendix: the Relational Model of System F</h2><p>In this appendix, I'll quickly sketch one particular model of System F, basically inspired from the PER model of Longo and Moggi. What we will do is to define types as <em>partial equivalence relations</em> (PERs for short) on terms. Recall that a partial equivalence relation <code>A</code> on a set <code>X</code> is a relation on <code>X</code>, which is (a) symmetric (i.e., if <code>(x, x') ∈ R</code> then <code>(x',x) ∈ R</code>), and (b) transitive (if <code>(x,y) ∈ X</code> and <code>(y,z) ∈ R</code> then <code>(x,z) ∈ R</code>).</p><p>We take the set of semantic types to be the set of PERs on terms which are closed under evalation in either direction:</p><pre><code>Type = { X ∈ PER(Term) | ∀x, x', y, y'. <br /> if x ↔∗ x' and y ↔∗ y' <br /> then (x,y) ∈ X ⇔ (x',y') ∈ X}</code></pre><p>PERs form a complete lattice, with the meet given by intersection, which is the property which lets us interpret the universal quantifier of System F. We can then take a <em>type environment</em> <code>θ</code> to be a map from variables to semantic types, and interpret types as follows:</p><pre><code>〚a〛θ ≜ θ(α)<br />〚A → B〛θ ≜ { (e, e') ∈ Term × Term | ∀ (t,t') ∈ 〚A〛θ. <br /> (e t,e' t') ∈ 〚B〛θ }<br />〚∀a. A〛θ ≜ ⋂_{X ∈ Type} 〚A〛(θ, X/α) </code></pre><p>This is the <em>PER model of System F</em>, which is quite easy to work with but not quite strong enough to model parametricity. To model parametricity, we need a <em>second</em> semantics for System F, the so-called relational semantics.</p><p>A relation <code>R</code> between two PERs <code>A</code> and <code>B</code> over a set <code>X</code>, is a a relation on <code>X</code> which respects <code>A</code> and <code>B</code>. Suppose that <code>(a,a') ∈ A</code> and <code>(b,b') ∈ B</code>. Then, a relation <code>R</code> respects <code>A</code> and <code>B</code> when <code>(a, b) ∈ R</code> if and only if <code>(a', b') ∈ R</code>.</p><p>Now, suppose we have two environments <code>θ₁</code> and <code>θ₂</code> sending type variables to types, and a relation environment <code>ρ</code> that sends each type variable <code>a</code> to a relation respecting <code>θ₁(a)</code> and <code>θ₂(a)</code>. Then we can define the <em>relational interpretation</em> of System F types as follows:</p><pre><code>R〚a〛θ₁ θ₂ ρ ≜ θ(α)<br />R〚A → B〛θ ≜ { (e, e') ∈ Term × Term | ∀ (t,t') ∈ R〚A〛θ₁ θ₂ ρ. <br /> (e t,e' t') ∈ R〚B〛θ₁ θ₂ ρ }<br />R〚∀a. A〛θ ≜ ⋂_{X, Y ∈ Type, R ∈ Rel(X,Y)} <br /> R〚A〛(θ₁, X/a) (θ₂, Y/a) (ρ, R/a)</code></pre><p>Additionally, we need to redefine the PER model's interpretation of the <code>∀a.A</code> case, to use the relational model:</p><pre><code>〚∀a. A〛θ ≜ ⋂_{X, Y ∈ Type, R ∈ Rel(X,Y)} <br /> R〚A〛θ θ (Id[θ], R/a)</code></pre><p>Here, to give a PER interpetation for the forall type, we use the relational interpretation, duplicating the type environment and using the identity relation for each variable (written <code>Id(θ)</code>). By identity relation, we mean that given a PER <code>A</code>, the PER <code>A</code> is a relation between between <code>A</code> and <code>A</code> which respects <code>A</code>.</p><p>This modified definition ensures the model satisfies the <em>identity extension property</em> -- if you take the relational interpretation of a type <code>B(α)</code>, and fill in the type variable <code>α</code> with the equality relation for the type <code>A</code>, then the relational interpretation of <code>B[A/α]</code> will the equality relation for the type <code>B[A/α]</code>. In what follows I will often write <code>Id[X]</code> to mean the equality relation for the type <code>X</code>.</p><p>The term calculus for System F also satisfies <em>the abstraction property</em>, (aka the fundamental property of logical relations), which says that given a well typed term <code>Θ; · ⊢ e : A</code> and two type environments <code>θ₁</code> and <code>θ₂</code>, and a relation environment <code>ρ</code> between them, then <code>e</code> is related to itself in <code>R〚A〛θ θ ρ</code>.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com1tag:blogger.com,1999:blog-8068466035875589791.post-39340577551217828612020-06-19T14:51:00.002-07:002020-06-19T14:51:40.674-07:00PLDI 2020 Conference Report<p>I just finished "attending" <a href="https://conf.researchr.org/home/pldi-2020">PLDI 2020</a>, a programming languages conference. Like many conferences in computer science, due to COVID-19, on short notice the physical conference had to be cancelled and replaced with an online virtual conference. Talks were streamed to Youtube, questions were posted to a custom Slack channel, there was a startling variety of online chat applications to hold discussions with, and so on.</p><p>It was obviously an enormous amount of work to put together (there was even a custom chat application, <a href="https://www.clowdr.org">Clowdr</a>, written specifically for SIGPLAN(?) conferences), which makes me feel very unkind to report that for me, the online conference experience was a complete waste of time.</p><p>So, what went wrong?</p><p>To understand this, it is worth thinking about what the purpose of a conference is. The fundamental purpose of a conference is to meet people with shared interests and talk to them. The act of talking to people is fundamental, since it is how (a) you get to see new perspectives about subjects you are interested in, and (b) how you build the social networks that make it possible for you to become and remain an expert. (Recall the 19th century economist Alfred Marshall's description of this process: "The mysteries of the trade become no mysteries; but are as it were in the air, and children learn many of them unconsciously.”)</p><p>Even talks -- the things we build conferences around -- exist to facilitate this process of conversation. Talks at conferences really have two important functions: first, the topic of a talk is a filtering device, which helps identify the subset of the audience which is interested in this topic. This means that in the break after the talk, it is now much easier to find people to talk to who share interests with you.</p><p>Second, talks supply Schelling-style focal points: you and your interlocutor have common knowledge that you are both interested in the topic of the session, and you both also know that you saw the talk, which gives you a subject of conversation. (Note: as a speaker, you do communicate with your audience, but indirectly, as your talk becomes the seed of a conversation between audience members, which they will use to develop their own understandings.)</p><p>The fundamental problem with PLDI 2020 was that it was frustratingly hard to <em>actually talk to people</em>. The proliferation of timezones and chat applications meant that I found it incredibly difficult to actually find someone to talk to -- at least one of the apps (gather.town) just never worked for me, sending me into an infinite sign-up loop, and the others were either totally empty of people who were actually there, or did not seem suited for conversation. (One key issue is that many people log in to an app, and then stop paying attention to it, which makes presence indications useless.)</p><p>So if social distancing and travel restrictions due to COVID-19 remain in force (as seems likely), I think it would be better to simply convert our PL conferences fully into journals, and look for other ways to make online networking happen. The sheer amount of labour going into PLDI 2020 supplies strong evidence that simply trying to replicate a physical conference online cannot be made to work with any amount of effort.</p><p>However, before I convince everyone that I am totally right about everything, there are still online conferences that will happen -- for example, ICFP in August. So to make sure that I can look forward to the experience rather than dreading it, I will need to make a plan to actually make sure I talk to people.</p><p>So, if you are reading this, and are (or know) a researcher in PL who would like to talk, then give me a ping and we can arrange something for ICFP.</p><ul><li><p>I am explicitly happy to talk to graduate students and postdocs about their research.</p></li><li><p>I know the most about dependent types, linear and modal types, type inference, separation logic, and program verification.</p></li><li><p>I know less about (but still like talking about) compiler stuff, theorem proving, SMT, Datalog, macros, and parsing.</p></li><li><p>This list is not an exhaustive list of things I'm interested in. One of the nice things about conversations is that you can use shared touchstones to discover how to like new things.</p></li></ul><p>Once you get in touch, I'll put you on a list, and then once the conference talks are listed, we can organize a time to have a chat after we have all seen a talk we are interested in. (Having seen a talk means we will all have shared common knowledge fresh in our minds.) Assuming enough people are interested, I will aim for meetings of 3-5 people, including myself.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com3tag:blogger.com,1999:blog-8068466035875589791.post-11869787107565173312020-02-13T14:56:00.000-08:002020-02-13T14:56:04.438-08:00Thought Experiment: An Introductory Compilers Class<p>Recently, I read a blog post in which Ben Karel <a href="https://eschew.wordpress.com/2020/01/26/undergrad-compilers-from-the-hive-mind/">summarized the reaction to a request John Regehr made about how to teach compilers</a>, and as one might predict, the Internet was in agreement that the answer was "absolutely everything". Basically, everyone has a different perspective on what the most important thing is, and so the union of everyone's most important thing is everything.</p><p>In fact, from a certain point of view, I don't disagree. From parsing to typechecking to intermediate representations to dataflow analysis to register allocation to instruction selection to data representations to stack layouts to garbage collectors to ABIs to debuggers, basically everything in compilers is packed to the gills with absolutely gorgeous algorithms and proofs. The sheer density of wonderful stuff in compilers is just out of this world.</p><p>However, John specified an <em>introductory</em> course. Alas, this makes "everything" the wrong answer -- he is asking for a pedagody-first answer which fits coherently into a smallish number of lectures. This means that we have to start with the question of what we want the students to learn, and then build the curriculum around that.</p><h1 id="the-goal-of-teaching-compilers">The Goal of Teaching Compilers</h1><p>So, what should students have learned after taking a compilers class?</p><p>The obvious (but wrong) answer is "how to write a compiler", because in all likelihood they will forget most of the techniques for implementing compilers soon after the course ends. This is because most of them – unless they catch the compiler bug – will not write very many more compilers. So if we want them to come away from a compilers class with some lasting knowledge, we have to think in terms of deeper programming techniques which (a) arise naturally when writing compilers, but (b) apply in more contexts than just compilers.</p><p>There are two obvious candidates for such techniques:</p><ol style="list-style-type: decimal"><li>Computing values by recursion over inductively-defined structures.</li><li>Computing values least fixed points of monotone functions by bottom-up iteration.</li></ol><p>Both of these are fundamental algorithmic techniques. However, I think that recursion over inductive structure is much more likely to be taught outside of a compilers course, especially considering that modern functional programming (i.e., datatypes and pattern matching) is now much more often taught elsewhere in the curriculum than it used to be.</p><p>This leaves fixed-point computations.</p><p>So let's choose the goal of an intro compilers class to be teaching fixed point computations many times over, in many different contexts, with the idea that the general technique of fixed point computation can be learned via generalizing from many specific examples. The fact that they are building a compiler is significant primarily because it means that the different examples can all be seen in a motivating context.</p><p>So, this drives some choices for the design of the course.</p><ol style="list-style-type: decimal"><li><p>Our goal is to show how fixed point computations arise in many different contexts, and the stages of the compilation pipeline naturally supply a non-contrived collection of very different-looking contexts. Consequently, we should design the course in a full front-to-back style, covering all the phases of the compilation pipeline.</p><p>However, this style of course has the real weakness that it can result in shallow coverage of each topic. Mitigating this weakness drives the design of the rest of the course.</p></li><li><p>In particular, the compiler structure should be very opinionated.</p><p>The thing we <em>want</em> to do is to present a variety of options (LR versus LL parsing, graph colouring versus linear scan register allocation, CPS versus SSA for IRs, etc), and then present their strengths and weaknesses, and the engineering and design considerations which will lead you to favour one choice over the other.</p><p>But for this course, we won't do any of that. As an instructor, we will simply choose the algorithm, and in particular choose the one that most benefits from fixed point computations.</p><p>The existence of alternatives should be gestured at in lecture, but the student-facing explanation for why we are not exploring them is for their first compiler we are aiming for good choices but biased much more towards implementation simplicity than gcc or LLVM will. (This will be an easier pitch if there is an advanced compilers course, or if students have a final year project where they can write a fancy compiler.)</p></li><li><p>We will also need to bite the bullet and de-emphasize the aspects of compilation where fixed point computations are less relevant.</p><p>Therefore, we will not cover runtime systems and data structure layouts in any great detail. This substantially affects the design of the language to compile -- in particular, we should not choose a language that has closures or objects. Furthemore, we will just tell students what stack layout to use, and memory management wil be garbage collection via the Boehm gc.</p></li></ol><h1 id="coursecompiler-structure">Course/Compiler Structure</h1><p>For the rest of this note, I'll call the language to compile Introl, because I always liked the Algol naming convention. Introl is a first-order functional language, basically what you get if you removed nested functions, lambda expressions, references, and exceptions from OCaml. I've attached a sketch of this language at the end of the post.</p><p>This language has two features which are "hard" -- polymorphism and datatypes. The reason for the inclusion of polymorphism is that it makes formulating type inference as a fixed point problem interesting, and the reason datatypes exist is because (a) match statements offer interesting control flow, and (b) they really show off what sparse conditional constrant propagation can do.</p><p>So the course will then follow the top-down lexing to code generation approach that so many bemoan, but which (in the context of our goals) is actually totally appropriate.</p><h2 id="lexing-and-parsing">Lexing and Parsing</h2><p>For lexing, I would start with the usual regular expressions and NFAs thing, but then take a bit of a left turn. First, I would show them that state set sizes could explode, and then introduce them to <a href="https://www.sciencedirect.com/science/article/pii/S0890540197926882">Brüggeman-Klein and Derick Wood's <em>deterministic regular expressions</em></a> as a way of preventing this explosion.</p><p>The reason for this is that the conditions essentially check whether a regular expression is parseable by recursive descent without backtracking -- i.e., you calculate NULLABLE, FIRST and (a variant of) FOLLOW sets for the regular expression. This lets you explain what these sets mean in a context without recursion or fixed points, which makes it easy to transition to LL(1) grammars, which are fundamentally just deterministic regular languages plus recursion.</p><p>So then the calculation of these sets as a fixed point equation is very easy, and using the deterministic regular languages means that the explanation of what these sets mean can be decoupled from how to compute via a fixed point computation.</p><p>Naturally, this means that the grammar of Introl must be LL(1).</p><h2 id="typechecking">Typechecking</h2><p>Typechecking for this kind of language is pretty routine, but in this case it should be phrased as an abstract interpretation, in the style of Cousot's <a href="https://www.irif.fr/~mellies/mpri/mpri-ens/articles/cousot-types-as-abstract-interpretations.pdf"><em>Types as Abstract Interpretation</em></a>.</p><p>The interesting thing here is that polymorphism can be presented via what Cousot called "the Herbrand abstraction". The idea is that the abstract elements are monotypes with unification variables in them, with the partial order that <code>t₂ ≥ t₁</code> if there is a substitution <code>σ</code> such that <code>t₂ = σ(t₁)</code>, and the unification algorithm itself as an <em>attempt</em> to calculate the substitution witnessing the join of two terms. I say attempt, since unification can fail. So this is a kind of <em>partial</em> join operation -- in the case you cannot join the two terms, there must have been a type error!</p><p>In Introl as presented, top-level functions have a type annotation, and so it will work out that you end up not needing to do a serious fixed point computation to infer types. Indeed, even if you omitted annotations, the fact that unification is calculating a most general unifier means that the fixed point iteration for recursive functions terminates in one iteration. (This should not be too surprising since the Dama-Milner algorithm only needs one traversal of the syntax.)</p><p>This fact is worth working out, because a great deal of life in static analysis involves trying to find excuses to iterate less. Indeed, this is what motivates the move to SSA – which will be the very next phase of the compiler.</p><h2 id="conversion-to-ssacps">Conversion to SSA/CPS</h2><p>The choice of IR is always a fraught one in a compiler class. In this course, I would use a static single assignment (SSA) representation. SSA is a good choice because (a) it simplifies implementing all kinds of dataflow optimizations, (b) generating it also needs dataflow analyses, and (c) it is the IR of choice for serious compilers. (a) and (b) mean we get to do lots of fixed point computations, and (c) means it won't feel artificial to today's yoof.</p><p>However, I wouldn't go directly to SSA, since I find φ-functions very difficult to explain directly. IMO, it is worth pulling a trick out of <a href="https://www.cs.purdue.edu/homes/suresh/502-Fall2008/papers/kelsey-ssa-cps.pdf">Richard Kelsey's hat</a>, and exploiting the correspondence between continuation-passing style (CPS), and static single assignment form (SSA).</p><p>CPS often comes with all sorts of mysticism and higher-order functions, but in this case, it works out very simply. Basically, we can let-normalize the program, and then transform the surface language into <em>basic blocks with arguments</em> (or if you prefer, a bunch of functions tail-calling each other). This is the version of SSA that the MLton compiler used, and which (if memory serves) the Swift SIL interemediate representation uses as well.</p><p>Roughly speaking, each basic block in the program will end up have zero or more formal arguments, and jumps to that block have arguments which fill in the arguments. This ends up making it super easy to explain, because the program just goes into let-normal form, and all tail calls get compiled to jumps-with-arguments, and non-tail calls use a call IR instruction.</p><p>Now, if we do this maximally naively – which we <strong>absolutely</strong> want to do – we will get nearly pessimal SSA, since each basic block will take as arguments all the variables in its scope. The reason we do this is to make the need for SSA minimization obvious: we just want to shrink each block's parameter lists so it doesn't take parameters for variables which either don't vary or don't get used.</p><p>Doing this will take us to something close to "pruned SSA" form, which would normally be overkill for a first compiler, except that we want to use it to motivate the computation of the dominator tree. Because of the either-or nature of the shrinking, we can do this with two analyses, a liveness analysis and a constancy analysis.</p><p>Both of these are dataflow analyses, and we can show how to use the dominator tree to order the work to calculate the calculate the fixed point faster. This will justify doing the fixed point computation to calculate the dominance information we need.</p><p>There is a bit of redundancy here, which is deliberate – I think it's important to have done two analyses before calculating dominators, because doing one fixed point computation to speed up one fixed point computation may feel artificial. But doing one computation that speeds up two computations is easy to see the benefit of, especially if we phrase the fixed point computation as taking the transfer function plus the dominance information.</p><p>I would avoid using one of the fast dominance algorithms, because pedagogically it's easier to explain the calculation when it is close to the definition.</p><p>The other nice thing here is that typechecking got accelerated by a good choice of abstract domain, and flow analysis gets accelerated by a good choice of iteration order.</p><p>Once the students have some experience manipulating this representation, I would probably switch to the traditional SSA form. The main benefit of this is just teaching the ability to read the existing literature more.</p><h2 id="high-level-optimizations">High-level Optimizations</h2><p>One thing worth spelling out is that this language (like MLton's SSA) has a high-level SSA representation – switches, data constructors, and field selectors and things like that will all be part of the SSA IR. This makes it possible to do optimizations while thinking at the language level, rather than the machine level. And furthermore, since students have a decent SSA representation, we certainly should use it to do all the "easy" SSA optimizations, like copy propagation and loop-invariant code motion.</p><p>All of these are unconditional rewrites when in SSA form, so they are all easy to implement, and will get the students comfortable with manipulating the SSA representation. The next step is to implement dead code elimination, which is both a nice optimization, and also requires them to re-run liveness analysis. This will open the door to understanding that compilation passes may have to be done repeatedly.</p><p>Once the students have done that, they will be warmed up for the big "hero optimization" of the course, which will be <a href="http://www.cs.wustl.edu/~cytron/531Pages/f11/Resources/Papers/cprop.pdf">sparse conditional constant propagation</a>. Because we are working with a high-level IR, sparse conditional constant propagation ought to yield even more impressive results than usual, with lots of tuple creation/pattern matching and cases on constructors disappearing in a really satisfying way.</p><p>In particular, good examples ought to arise from erasing the <code>None</code> branches from code using the option monad for safe division <code>safediv : int → int → option int</code>, if there is a test that the divisors are nonzero dominating the divisions.</p><h2 id="lowering-and-register-allocation">Lowering and Register Allocation</h2><p>As I mentioned before, the big optimization was performed on a high-level SSA form: switch statements and data constructors and selectors are still present. Before we can generate machine code, they will have to be turned into lower-level operations. So we can define a "low-level" SSA, where the selections and so on are turned into memory operations, and then translate the high-level IR into the low-level IR.</p><p>This ought to be done pretty naively, with each high-level operation turning into its low-level instruction sequence in a pretty direct way. Since the course design was to do as many dataflow optimizations as we could, to bring out the generic structure, a good test the development is smooth enough is whether the flow analyses are sufficiently parameterized that we can change the IR and lattices and still get decent code reuse. Then we can lean on these analyses to clean up the low-level instruction sequences.</p><p>If more abstract code is too hard to understand, I would skip most of the low-level optimizations. The only must-do analysis at the low-level representation is to re-reun liveness analysis, so that we can do register allocation using <a href="https://publikationen.bibliothek.kit.edu/1000007166/6532">the SSA-form register allocation algorithms of Hack</a>. This (a) lets us avoid graph colouring, (b) while still getting good results, and (c) depends very obviously on the results of a dataflow analysis. It also makes register coalescing (normally an advanced topic) surprisingly easy.</p><p>Code generation is then pretty easy – once we've done register allocation, we can map the low-level IR operations to loads, stores, and jumps in the dirt-obvious way; calls and returns doing the obvious stack manipulations; and foreign calls to the Boehm gc to allocate objects. This is not optimal, but most of the techniques I know of in this space don't use fixed points much, and so are off-piste relative to the course aim.</p><h1 id="perspectives">Perspectives</h1><p>The surprising (to me, anyway) thing is that the choice of focusing on fixed point computations plus the choice to go SSA-only, lays out a path to a surprisingly credible compiler. Before writing this note I would have thought this was definitely too ambitious a compiler for a first compiler! But now, I think it only may be too ambitious.</p><p>Obviously, we benefit a lot from working with a first-order functional langage: in many ways, is just an alternate notation for SSA. However, it <em>looks</em> different from SSA, and it <em>looks</em> like a high-level language, which means the students ought to feel like they have accomplished something.</p><p>But before I would be willing to try teaching it this way, I would want to program up a compiler or two in this style. This would either confirm that the compiler is tractable, or would show that the compiler is just too big to be teachable. If it is too big, I'd probably change the surface language to only support booleans and integers as types. Since these values all fit inside a word, we could drop the lowering pass altogether.</p><p>Another thing I like is that we are able to do a plain old dataflow analysis to parse, and then a dataflow analysis with a clever representation of facts when typechecking, and then dataflow analyses with a clever iteration scheme when doing optimization/codegen.</p><p>However, if this ends up being too cute and confusion-prone, another alternative would be to drop type inference by moving to a simply typed language, and then drop SSA in favour of the traditional Kildall/Allen dataflow approach. You could still teach the use of acceleration structures by computing def-use chains and using them for (e.g.) liveness. This would bring out the parallels between parsing and flow analysis.</p><p>Relative to an imaginary advanced compilers course, we're missing high-level optimizations like inlining, partial redundancy elimination, and fancy loop optimizations, as well as low-level things like instruction scheduling. We're also missing an angle for tackling control-flow analysis for higher-order language featuring objects or closures.</p><p>But since we were able to go SSA straight away, the students will be in a good position to learn this on their own.</p><h1 id="introl-sketch">Introl sketch</h1><h2 id="type-structure">Type structure</h2><p>The type language has</p><ul><li>Integers <code>int</code></li><li>Unit type <code>unit</code></li><li>Tuples <code>T_1 * ... * T_n</code></li><li><p>Sum types via datatype declarations:</p><pre><code>datatype list a = Nil | Cons of a * list a </code></pre></li></ul><h2 id="program-structure">Program structure</h2><h3 id="top-level-definitions">Top level definitions</h3><p>All function declarations are top level. Nested functions and anonymous lambda-expressions are not allowed. Function declarations have a type scheme, which may be polymorphic:</p><pre><code> val foo : forall a_1, ..., a_n. (T_1, ..., T_n) -> T <br /> def foo(x_1, ..., x_n) = e </code></pre><h3 id="expressions">Expressions</h3><p>Expressions are basically the usual things:</p><ul><li>variables <code>x</code></li><li>constants <code>3</code></li><li>arithmetic <code>e_1 + e_2</code></li><li>unit <code>()</code></li><li>tuples <code>(e_1, ..., e_n)</code></li><li>data constructors <code>Foo(e)</code></li><li>variable bindings <code>let p = e_1; e_2</code></li><li>direct function calls <code>f(e_1, ..., e_n)</code>, where <code>f</code> is a top-level function name</li><li>match statements <code>match e { p_1 -> e_1 | ... p_n -> e_n }</code></li></ul><p>Note the absence of control structures like <code>while</code> or <code>for</code>; we will use tail-recursion for that. Note also that there are no explicit type arguments in calls to polymorphic functions.</p><h3 id="patterns">Patterns</h3><p>Patterns exist, but nested patterns do not, in order to avoid having to deal with pattern compilation. They are:</p><ul><li>variables <code>x</code></li><li>constructor patterns <code>Foo(x_1, ..., x_n)</code></li><li>tuple patterns <code>(x_1, ..., x_n)</code></li></ul><h3 id="example">Example</h3><p>Here's a function that reverses a list:</p><pre><code>val reverse_acc : forall a. (list a, list a) -> list a <br />def reverse_acc(xs, acc) = <br /> match xs {<br /> | Nil -> acc<br /> | Cons(x, xs) -> reverse_acc(xs, Cons(x, acc))<br /> }<br /><br />val reverse : forall a. list a -> list a <br />def reverse(xs) = reverse_acc(xs, Nil)</code></pre><p>Here's a function to zip two lists together:</p><pre><code>val zip : forall a, b. (list a, list b) -> list (a * b)<br />def zip(as, bs) = <br /> match as { <br /> | Nil -> Nil<br /> | Cons(a, as') -> <br /> match bs { <br /> | Nil -> Nil<br /> | Cons(b, bs') -> Cons((a,b), zip(as',bs'))<br /> }<br /> }</code></pre><p>Note the lack of nested patterns vis-a-vis ML.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com5tag:blogger.com,1999:blog-8068466035875589791.post-53437500462566506842019-10-30T03:07:00.000-07:002019-10-30T03:12:26.080-07:00Every Finite Automaton has a Corresponding Regular Expression <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; background-color: #f8f8f8; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } pre, code { background-color: #f8f8f8; } code > span.kw { color: #204a87; font-weight: bold; } /* Keyword */ code > span.dt { color: #204a87; } /* DataType */ code > span.dv { color: #0000cf; } /* DecVal */ code > span.bn { color: #0000cf; } /* BaseN */ code > span.fl { color: #0000cf; } /* Float */ code > span.ch { color: #4e9a06; } /* Char */ code > span.st { color: #4e9a06; } /* String */ code > span.co { color: #8f5902; font-style: italic; } /* Comment */ code > span.ot { color: #8f5902; } /* Other */ code > span.al { color: #ef2929; } /* Alert */ code > span.fu { color: #000000; } /* Function */ code > span.er { color: #a40000; font-weight: bold; } /* Error */ code > span.wa { color: #8f5902; font-weight: bold; font-style: italic; } /* Warning */ code > span.cn { color: #000000; } /* Constant */ code > span.sc { color: #000000; } /* SpecialChar */ code > span.vs { color: #4e9a06; } /* VerbatimString */ code > span.ss { color: #4e9a06; } /* SpecialString */ code > span.im { } /* Import */ code > span.va { color: #000000; } /* Variable */ code > span.cf { color: #204a87; font-weight: bold; } /* ControlFlow */ code > span.op { color: #ce5c00; font-weight: bold; } /* Operator */ code > span.pp { color: #8f5902; font-style: italic; } /* Preprocessor */ code > span.ex { } /* Extension */ code > span.at { color: #c4a000; } /* Attribute */ code > span.do { color: #8f5902; font-weight: bold; font-style: italic; } /* Documentation */ code > span.an { color: #8f5902; font-weight: bold; font-style: italic; } /* Annotation */ code > span.cv { color: #8f5902; font-weight: bold; font-style: italic; } /* CommentVar */ code > span.in { color: #8f5902; font-weight: bold; font-style: italic; } /* Information */ </style> <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script><!-- To compile this file, use: pandoc -s --highlight-style tango --mathjax --filter graphviz.py kleene.md -o kleene.html --><p>Today I'd like to blog about the nicest proof of Kleene's theorem that regular expressions and finite automata are equivalent that I know. One direction of this proof -- that every regular expression has a corresponding finite automaton -- is very well-known to programmers, since <a href="https://en.wikipedia.org/wiki/Thompson%27s_construction">Thompson's construction</a>, which converts regular expressions to nondeterministic automta, and <a href="https://en.wikipedia.org/wiki/Powerset_construction">the subset construction</a>, which turns nondeterministic automta into deterministic automta, are the basis of every lexer and regexp engine there is.</p><p>However, the proof of the other direction -- that every automaton has a corresponding regexp -- is often stated, but seldom proved. This is a shame, since it has an elegant formulation in terms of linear algebra(!).</p><h3 id="the-construction">The Construction</h3><p>FIrst, let's assume we have a regular language <span class="math inline">\((R, 1, \cdot, 0, +, \ast)\)</span>. Now, define the category <span class="math inline">\(C\)</span> as follows:</p><ul><li>The objects of <span class="math inline">\(C\)</span> are the natural numbers.</li><li>A morphism <span class="math inline">\(f : n \to m\)</span> is an <span class="math inline">\(n \times m\)</span> matrix of elements of <span class="math inline">\(R\)</span>. We will write <span class="math inline">\(f_{(i,j)}\)</span> to get the element of <span class="math inline">\(R\)</span> in the <span class="math inline">\(i\)</span>-th row and <span class="math inline">\(j\)</span>-th column.</li><li>The identity morphism <span class="math inline">\(id : n \to n\)</span> is the <span class="math inline">\(n \times n\)</span> matrix which is <span class="math inline">\(1\)</span> on the diagonal and <span class="math inline">\(0\)</span> off the diagonal</li><li>The composition <span class="math inline">\(f;g\)</span> of <span class="math inline">\(f : n \to m\)</span> and <span class="math inline">\(g : m \to o\)</span> is given by <em>matrix multiplication</em>:</li></ul><p><span class="math display">\[ (f;g)_{i, j} = \sum_{k \in 0\ldots m} f_{(i,k)} \cdot g_{(k,j)} \]</span></p><p>This may seem like a rather peculiar thing to do -- matrices filled with regular expressions? However, to understand why this is potentially a sensible idea, let's look at a generic automaton of dimension 2.</p><p style="text-align:center;"><svg width="159pt" height="79pt" viewBox="0.00 0.00 159.00 79.13" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 75.1253)"><title>G</title><polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-75.1253 155,-75.1253 155,4 -4,4"/><!-- 0 --><g id="node1" class="node"><title>0</title><ellipse fill="none" stroke="#000000" cx="27" cy="-20.1253" rx="27" ry="18"/><text text-anchor="middle" x="27" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">0</text></g><!-- 0->0 --><g id="edge1" class="edge"><title>0->0</title><path fill="none" stroke="#000000" d="M18.4307,-37.5352C16.8311,-47.2132 19.6875,-56.1253 27,-56.1253 31.5703,-56.1253 34.4,-52.644 35.489,-47.6702"/><polygon fill="#000000" stroke="#000000" points="35.5693,-37.5352 39.9898,-47.5706 35.5297,-42.535 35.49,-47.5349 35.49,-47.5349 35.49,-47.5349 35.5297,-42.535 30.9901,-47.4991 35.5693,-37.5352 35.5693,-37.5352"/><text text-anchor="middle" x="27" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">a</text></g><!-- 1 --><g id="node2" class="node"><title>1</title><ellipse fill="none" stroke="#000000" cx="124" cy="-20.1253" rx="27" ry="18"/><text text-anchor="middle" x="124" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">1</text></g><!-- 0->1 --><g id="edge2" class="edge"><title>0->1</title><path fill="none" stroke="#000000" d="M54.0199,-20.1253C64.1304,-20.1253 75.8016,-20.1253 86.6794,-20.1253"/><polygon fill="#000000" stroke="#000000" points="96.7886,-20.1253 86.7886,-24.6254 91.7886,-20.1253 86.7886,-20.1254 86.7886,-20.1254 86.7886,-20.1254 91.7886,-20.1253 86.7885,-15.6254 96.7886,-20.1253 96.7886,-20.1253"/><text text-anchor="middle" x="75.5" y="-23.9253" font-family="Times,serif" font-size="14.00" fill="#000000">b</text></g><!-- 1->0 --><g id="edge3" class="edge"><title>1->0</title><path fill="none" stroke="#000000" d="M104.6499,-7.5523C95.0326,-2.7357 83.1161,1.1253 72,-1.1253 67.2623,-2.0844 62.4035,-3.5584 57.7123,-5.2704"/><polygon fill="#000000" stroke="#000000" points="48.4111,-9.0301 55.9959,-1.1104 53.0467,-7.1563 57.6823,-5.2825 57.6823,-5.2825 57.6823,-5.2825 53.0467,-7.1563 59.3687,-9.4546 48.4111,-9.0301 48.4111,-9.0301"/><text text-anchor="middle" x="75.5" y="-4.9253" font-family="Times,serif" font-size="14.00" fill="#000000">c</text></g><!-- 1->1 --><g id="edge4" class="edge"><title>1->1</title><path fill="none" stroke="#000000" d="M115.4307,-37.5352C113.8311,-47.2132 116.6875,-56.1253 124,-56.1253 128.5703,-56.1253 131.4,-52.644 132.489,-47.6702"/><polygon fill="#000000" stroke="#000000" points="132.5693,-37.5352 136.9898,-47.5706 132.5297,-42.535 132.49,-47.5349 132.49,-47.5349 132.49,-47.5349 132.5297,-42.535 127.9901,-47.4991 132.5693,-37.5352 132.5693,-37.5352"/><text text-anchor="middle" x="124" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">d</text></g></g></svg></p><p>Given such a picture, we can give the transition matrix for this as follows:</p><p><span class="math display">\[ A = \begin{array}{l|cc} & 0 & 1 \\\hline 0 & a & b \\ 1 & c & d \\ \end{array} \]</span></p><p>So the <span class="math inline">\((i,j)\)</span>-th entry tells us the label of the edge from node <span class="math inline">\(i\)</span> to node <span class="math inline">\(j\)</span>. In our category, this is an endomap <span class="math inline">\(A : 2 \to 2\)</span>. What happens if we compose <span class="math inline">\(A\)</span> with itself? Since composition is just matrix multiplication, we get:</p><p><span class="math display">\[ \left[\begin{array}{cc} a & b \\ c & d \end{array}\right] \cdot \left[\begin{array}{cc} a & b \\ c & d \end{array}\right] = \left[\begin{array}{cc} aa + bc & ab + bd \\ ca + dc & cb + dd \end{array}\right] \]</span></p><p>Notice that the <span class="math inline">\((i,j)\)</span>-th entry is <em>a regular expression describing the paths of length two</em>! So composition of arrows corresponds to path composition. The identity matrix are the paths of length 0, the transition matrix gives the paths of length 1, and matrix multiplication gives path concatentation. So it's easy to get a regexp for any bounded length path. Indeed, it is easy to see that the endomaps of <span class="math inline">\(n\)</span> form a semiring.</p><p>So the endomaps of <span class="math inline">\(n\)</span> are <span class="math inline">\(n \times n\)</span> matrixes, and we can turn them into a semiring <span class="math inline">\((S, 1_S, \cdot_S, 0_S, +_S)\)</span> as follows:</p><ul><li>The carrier <span class="math inline">\(S\)</span> is <span class="math inline">\(\mathrm{Hom}(n,n)\)</span>, the set of <span class="math inline">\(n \times n\)</span> matrices over <span class="math inline">\(R\)</span>.</li><li>The unit <span class="math inline">\(1_S\)</span> is the identity map <span class="math inline">\(\mathrm{id} : n \to n\)</span></li><li>The multiplication <span class="math inline">\(\cdot_S\)</span> is composition of morphisms (i.e., matrix multiplication). Since these are endomorphisms (ie, square matrices), the result of composition (i.e., multiplication of square matrices) is another endomorphism (ie, square matrix of the same dimension).</li><li>The zero <span class="math inline">\(0_S\)</span> is the matrix with all 0 entries.</li><li>The addition <span class="math inline">\(f + g\)</span> is the pointwise addition of matrices -- i.e., <span class="math inline">\((f + g)_{i, j} = f_{i,j} + g_{i,j}\)</span></li></ul><p>Now we can ask if we can equip endomaps with a Kleene star operation as well? Recall that the defining equation for the Kleene star is</p><p><span class="math display">\[ s\ast = 1 + s\cdot s\ast \]</span></p><p>which says that we are looking for a <em>transitive closure</em> operation. That is, <span class="math inline">\(s\)</span> consists of 0 length paths, or anything in <span class="math inline">\(s\)</span> plus anything in <span class="math inline">\(s\ast\)</span>. So we want a definition of the Kleene star which satisfies this equation. Let's consider the case of the size 2 automaton again:</p><p style="text-align:center;"><svg width="159pt" height="79pt" viewBox="0.00 0.00 159.00 79.13" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 75.1253)"><title>G</title><polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-75.1253 155,-75.1253 155,4 -4,4"/><!-- 0 --><g id="node1" class="node"><title>0</title><ellipse fill="none" stroke="#000000" cx="27" cy="-20.1253" rx="27" ry="18"/><text text-anchor="middle" x="27" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">0</text></g><!-- 0->0 --><g id="edge1" class="edge"><title>0->0</title><path fill="none" stroke="#000000" d="M18.4307,-37.5352C16.8311,-47.2132 19.6875,-56.1253 27,-56.1253 31.5703,-56.1253 34.4,-52.644 35.489,-47.6702"/><polygon fill="#000000" stroke="#000000" points="35.5693,-37.5352 39.9898,-47.5706 35.5297,-42.535 35.49,-47.5349 35.49,-47.5349 35.49,-47.5349 35.5297,-42.535 30.9901,-47.4991 35.5693,-37.5352 35.5693,-37.5352"/><text text-anchor="middle" x="27" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">a</text></g><!-- 1 --><g id="node2" class="node"><title>1</title><ellipse fill="none" stroke="#000000" cx="124" cy="-20.1253" rx="27" ry="18"/><text text-anchor="middle" x="124" y="-16.4253" font-family="Times,serif" font-size="14.00" fill="#000000">1</text></g><!-- 0->1 --><g id="edge2" class="edge"><title>0->1</title><path fill="none" stroke="#000000" d="M54.0199,-20.1253C64.1304,-20.1253 75.8016,-20.1253 86.6794,-20.1253"/><polygon fill="#000000" stroke="#000000" points="96.7886,-20.1253 86.7886,-24.6254 91.7886,-20.1253 86.7886,-20.1254 86.7886,-20.1254 86.7886,-20.1254 91.7886,-20.1253 86.7885,-15.6254 96.7886,-20.1253 96.7886,-20.1253"/><text text-anchor="middle" x="75.5" y="-23.9253" font-family="Times,serif" font-size="14.00" fill="#000000">b</text></g><!-- 1->0 --><g id="edge3" class="edge"><title>1->0</title><path fill="none" stroke="#000000" d="M104.6499,-7.5523C95.0326,-2.7357 83.1161,1.1253 72,-1.1253 67.2623,-2.0844 62.4035,-3.5584 57.7123,-5.2704"/><polygon fill="#000000" stroke="#000000" points="48.4111,-9.0301 55.9959,-1.1104 53.0467,-7.1563 57.6823,-5.2825 57.6823,-5.2825 57.6823,-5.2825 53.0467,-7.1563 59.3687,-9.4546 48.4111,-9.0301 48.4111,-9.0301"/><text text-anchor="middle" x="75.5" y="-4.9253" font-family="Times,serif" font-size="14.00" fill="#000000">c</text></g><!-- 1->1 --><g id="edge4" class="edge"><title>1->1</title><path fill="none" stroke="#000000" d="M115.4307,-37.5352C113.8311,-47.2132 116.6875,-56.1253 124,-56.1253 128.5703,-56.1253 131.4,-52.644 132.489,-47.6702"/><polygon fill="#000000" stroke="#000000" points="132.5693,-37.5352 136.9898,-47.5706 132.5297,-42.535 132.49,-47.5349 132.49,-47.5349 132.49,-47.5349 132.5297,-42.535 127.9901,-47.4991 132.5693,-37.5352 132.5693,-37.5352"/><text text-anchor="middle" x="124" y="-59.9253" font-family="Times,serif" font-size="14.00" fill="#000000">d</text></g></g></svg></p><p>Now, let's think about the <em>loop-free</em> paths between 0 and 0. There are two classes of them:</p><ol style="list-style-type: decimal"><li>The single step <span class="math inline">\(a\)</span>.</li><li>We can take a step from 0 to 1 via <span class="math inline">\(b\)</span>, take some number of steps from 1 to itself via <span class="math inline">\(d\)</span>, and then back to 0 via <span class="math inline">\(c\)</span>.</li></ol><p>So we can describe the loop-free paths from 0 to 0 via the regular expression</p><p><span class="math display">\[ F \triangleq a + b(d\ast)c \]</span></p><p>Similarly, the loop-free paths from 1 to 1 are either:</p><ol style="list-style-type: decimal"><li>The single step <span class="math inline">\(d\)</span>.</li><li>We can take a step from 1 to 0 via <span class="math inline">\(c\)</span>, take some number of steps from 1 to itself via <span class="math inline">\(a\)</span>, and then back to 1 via <span class="math inline">\(b\)</span>.</li></ol><p>So these paths are described by <span class="math display">\[ G \triangleq d + c(a\ast)b \]</span></p><p>So the set of all paths from 0 to 0 is <span class="math inline">\(F\ast\)</span>, and the set of all paths from <span class="math inline">\(1\)</span> to <span class="math inline">\(1\)</span> is <span class="math inline">\(G\ast\)</span>. This lets us write down the</p><p><span class="math display">\[ A\ast = \begin{array}{l|cc} & 0 & 1 \\\hline 0 & F\ast & {F\ast} \cdot b \cdot {G\ast} \\ 1 & {G\ast} \cdot c \cdot {F\ast} & G\ast \\ \end{array} \]</span></p><p>Proving that this is a correct definition requires a bit of work, so I will skip over it. That's because I want to get to the cool bit, which is that this construction is enough to define the Kleene star for <em>any</em> dimension <span class="math inline">\(n\)</span>! We'll prove this by well-founded induction. First, suppose we have an <span class="math inline">\(n \times n\)</span> transition matrix <span class="math inline">\(T\)</span> for <span class="math inline">\(n > 2\)</span>. Then, we can divide the matrix into blocks, with <span class="math inline">\(A\)</span> and <span class="math inline">\(D\)</span> as square submatrices:</p><p><span class="math display">\[ T = \left[ \begin{array}{c|c} A & B \\\hline C & D \end{array} \right] \]</span></p><p>and then define <span class="math inline">\(T\ast\)</span> as:</p><p><span class="math display">\[ T\ast = \left[ \begin{array}{c|c} F\ast & {F\ast} \cdot B \cdot {G\ast} \\ \hline {G\ast} \cdot C \cdot {F\ast} & G\ast \\ \end{array} \right] \]</span></p><p>where</p><p><span class="math display">\[ \begin{array}{lcl} F & \triangleq & A + B(D\ast)C \\ G & \triangleq & D + C(A\ast)B \\ \end{array} \]</span></p><p>By induction, we know that square submatrices of dimension smaller than <span class="math inline">\(n\)</span> is a Kleene algebra, so we know that <span class="math inline">\(A\)</span> and <span class="math inline">\(D\)</span> have Kleene algebra structure. The reason for defining the categorical structure also becomes apparent here -- even though <span class="math inline">\(A\)</span> and <span class="math inline">\(D\)</span> are square, <span class="math inline">\(B\)</span> and <span class="math inline">\(C\)</span> may not be, and we will need to make use of the fact we defined composition on non-square matrices for them.</p><h3 id="references-and-miscellany">References and Miscellany</h3><p>I have, as is traditional for bloggers, failed to prove any of my assertions. Luckily other people have done the work! Two papers that I particularly like are:</p><ul><li><p>Dexter Kozen's paper <a href="https://www.cs.cornell.edu/~kozen/Papers/ka.pdf"><em>A Completeness Theorem for Kleene Algebras and the Algebra of Regular Events</em></a></p></li><li><p>Stephen Dolan's ICFP 2013 paper, <a href="http://stedolan.net/research/semirings.pdf"><em>Fun with semirings: a functional pearl on the abuse of linear algebra</em></a>.</p></li></ul><p>One fact which didn't fit into the narrative above, but which I think is extremely neat, is about the linear-algebraic view of the Kleene star. To see what it is, recall the definition of the <em>matrix exponential</em> for a matrix <span class="math inline">\(A\)</span>:</p><p><span class="math display">\[ e^A \triangleq \sum_{k = 0}^\infty \frac{A^k}{k!} = I + A + \frac{A^2}{2!} + \frac{A^3}{3!} + \ldots \]</span></p><p>Now, note that</p><ol style="list-style-type: decimal"><li><span class="math inline">\(k!\)</span> is an integer</li><li>any integer can be expressed as a formal sum <span class="math inline">\(n = \overbrace{1 + \ldots + 1}^{n \,\mbox{times}}\)</span>,</li><li>and in regular languages, we have <span class="math inline">\(r + r = r\)</span>.</li></ol><p>Hence <span class="math inline">\(k! = 1\)</span>, and since <span class="math inline">\(A/1 = A\)</span>, we have</p><p><span class="math display">\[ e^A \triangleq \sum_{k = 0}^\infty A^k = I + A + A^2 + A^3 + \ldots \]</span></p><p>But this is precisely the definition of <span class="math inline">\(A\ast\)</span>! In other words, for square matrices over Kleene algebras, Kleene iteration is precisely the same thing as the matrix exponential.</p><h3 id="the-implementation">The Implementation</h3><p>The nice thing about this construction is that you can turn it into running code more or less directly, which leads to an implementation of the DFA-to-regexp algorithm in less than 100 lines of code. I'll give an implementation in OCaml.</p><p>First, let's give a signature for Kleene algebras. All the constructions we talked about are parameterized in the Kleene algebra, so our implementation will be too.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> <span class="kw">type</span> KLEENE = <span class="kw">sig</span><br /> <span class="kw">type</span> t <br /> <span class="kw">val</span> zero : t <br /> <span class="kw">val</span> ( ++ ) : t -> t -> t <br /> <span class="kw">val</span> one : t<br /> <span class="kw">val</span> ( * ) : t -> t -> t <br /> <span class="kw">val</span> star : t -> t <br /><span class="kw">end</span></code></pre></div><p>You can see here that we have <code>zero</code> and <code>one</code> as the unit for the addition <code>++</code> and multiplication <code>*</code> respectively, and we also have a <code>star</code> operation representing the Kleene star.</p><p>Similarly, we can give a signature for matrices over an element type.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> <span class="kw">type</span> MATRIX = <span class="kw">sig</span><br /> <span class="kw">type</span> elt<br /> <span class="kw">type</span> t <br /><br /> <span class="kw">val</span> dom : t -> <span class="dt">int</span><br /> <span class="kw">val</span> cod : t -> <span class="dt">int</span><br /> <span class="kw">val</span> get : t -> <span class="dt">int</span> -> <span class="dt">int</span> -> elt<br /><br /> <span class="kw">val</span> id : <span class="dt">int</span> -> t <br /> <span class="kw">val</span> compose : t -> t -> t <br /><br /> <span class="kw">val</span> zero : <span class="dt">int</span> -> <span class="dt">int</span> -> t <br /> <span class="kw">val</span> (++) : t -> t -> t <br /><br /> <span class="kw">val</span> make : <span class="dt">int</span> -> <span class="dt">int</span> -> (<span class="dt">int</span> -> <span class="dt">int</span> -> elt) -> t <br /><span class="kw">end</span></code></pre></div><p>So this signature requires an element type <code>elt</code>, a matrix type <code>t</code>, and a collection of operations. We can the number of rows in the matrix with <code>dom</code> and the columns with <code>cod</code>, with the names coming from the thought of dimensions and matrices forming a category. Similarly this gives us a notion of identity with <code>id</code>, which takes an integer and returns the identity matrix of that size. Matrix multiplication is just composition, which leads to the name <code>compose</code>.</p><p>We also want the ability to create zero matrices with the <code>zero</code> function, and the ability to do pointwise addition of matrices with the <code>(++)</code> operation. Neither of these operations need square matrices, but to add matrices pointwise requires compatible dimensions.</p><p>Finally we want a way to construct a matrix with <code>make</code>, which is given the dimensions and a function to produce the elements.</p><p>Now, since the matrix construction was parameterized by a Kleene algebra, we can write a functor <code>Matrix</code> in Ocaml which takes a module of type <code>KLEENE</code> to produce an implementation of the <code>MATRIX</code> signature with elements of the Kleene algebra as the element type:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Matrix(R : KLEENE) : MATRIX <span class="kw">with</span> <span class="kw">type</span> elt = R.t = <br /><span class="kw">struct</span><br /> <span class="kw">type</span> elt = R.t <br /> <span class="kw">type</span> t = {<br /> row : <span class="dt">int</span>; <br /> col: <span class="dt">int</span>; <br /> data : <span class="dt">int</span> -> <span class="dt">int</span> -> R.t<br /> }</code></pre></div><p>We take the element type to be elements of the Kleene algebra, and take a matrix to be a record with the number of rows, the number of columns, and a function to produce the entries. This is not a high-performance choice, but it works fine for illustration purposes.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> make row col data = {row; col; data}<br /> <span class="kw">let</span> dom m = m.row<br /> <span class="kw">let</span> cod m = m.col<br /><br /> <span class="kw">let</span> get m i j = <br /> <span class="kw">assert</span> (<span class="dv">0</span> <= i && i < m.row);<br /> <span class="kw">assert</span> (<span class="dv">0</span> <= j && j < m.col);<br /> m.data i j </code></pre></div><p>To <code>make</code> a matrix, we just use the data from the constructor to build the appropriate record, and to get the <code>dom</code>ain and <code>cod</code>omain we just select the row or column fields. To get an element we just call the underlying function, with an assertion check to ensure the arguments are in bounds.</p><p>This is a point where a more dependently-typed language would be nice: if the dimensions of the matrix were tracked in the type then we could omit the assertion checks. (Basically, the element type would become a type constructor <code>t : nat -> nat -> type</code>, which would be a type constructor corresponding to the hom-set,)</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> id n = make n n (<span class="kw">fun</span> i j -> <span class="kw">if</span> i = j <span class="kw">then</span> R.one <span class="kw">else</span> R.zero)<br /><br /> <span class="kw">let</span> compose f g = <br /> <span class="kw">assert</span> (cod f = dom g);<br /> <span class="kw">let</span> data i j = <br /> <span class="kw">let</span> <span class="kw">rec</span> loop k acc = <br /> <span class="kw">if</span> k < cod f <span class="kw">then</span><br /> loop (k+<span class="dv">1</span>) R.(acc ++ (get f i k * get g k j))<br /> <span class="kw">else</span><br /> acc<br /> <span class="kw">in</span><br /> loop <span class="dv">0</span> R.zero<br /> <span class="kw">in</span><br /> make (dom f) (cod g) data</code></pre></div><p>The <code>id</code>entity function takes an integer, and returns a square matrix that is <code>R.one</code> on the diagonal and <code>R.zero</code> off-diagonal. If you want to <code>compose</code> two matrices, the routine checks that the dimension match, and then implements a very naive row-vector times column-vector calculation for each indexing. What makes this naive is that it will get recalculated each time!</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> zero row col = make row col (<span class="kw">fun</span> i j -> R.zero)<br /><br /> <span class="kw">let</span> (++) m1 m2 = <br /> <span class="kw">assert</span> (dom m1 = dom m2);<br /> <span class="kw">assert</span> (cod m1 = cod m2);<br /> <span class="kw">let</span> data i j = R.(++) (get m1 i j) (get m2 i j) <span class="kw">in</span><br /> make (dom m1) (cod m1) data <br /><span class="kw">end</span> </code></pre></div><p>Finally we can implement the <code>zero</code> matrix operation, and matrix addition, both of which work the expected way.</p><p>We will use all these operations to show that square matrices form a Kleene algebra. To show this, we will define a functor <code>Square</code> which is parameterized by</p><ol style="list-style-type: decimal"><li>A Kleene algebra <code>R</code>.</li><li>An implementation of matrices with element type <code>R.t</code></li><li>A size (i.e., an integer)</li></ol><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Square(R : KLEENE)<br /> (M : MATRIX <span class="kw">with</span> <span class="kw">type</span> elt = R.t)<br /> (Size : <span class="kw">sig</span> <span class="kw">val</span> size : <span class="dt">int</span> <span class="kw">end</span>) : KLEENE <span class="kw">with</span> <span class="kw">type</span> t = M.t = <br /><span class="kw">struct</span><br /> <span class="kw">type</span> t = M.t </code></pre></div><p>The implementation type is the matrix type <code>M.t</code>, and the semiring operations are basically inherited from the module <code>M</code>:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> (++) = M.(++)<br /> <span class="kw">let</span> ( * ) = M.compose<br /> <span class="kw">let</span> one = M.id Size.size<br /> <span class="kw">let</span> zero = M.zero Size.size Size.size</code></pre></div><p>The only changes are that <code>one</code> and <code>zero</code> are restricted to square matrices of size <code>Size.size</code>, and so we don't need to pass them dimension arguments.</p><p>All of the real work is handled in the implementation of the Kleene star. We can follow the construction very directly. First, we can define a <code>split</code> operation which splits a square matrix into four smaller pieces:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> split i m = <br /> <span class="kw">let</span> <span class="kw">open</span> M <span class="kw">in</span> <br /> <span class="kw">assert</span> (i >= <span class="dv">0</span> && i < dom m);<br /> <span class="kw">let</span> k = dom m - i <span class="kw">in</span> <br /> <span class="kw">let</span> shift m x y = <span class="kw">fun</span> i j -> get m (i + x) (j + y) <span class="kw">in</span> <br /> <span class="kw">let</span> a = make i i (shift m <span class="dv">0</span> <span class="dv">0</span>) <span class="kw">in</span><br /> <span class="kw">let</span> b = make i k (shift m <span class="dv">0</span> i) <span class="kw">in</span><br /> <span class="kw">let</span> c = make k i (shift m i <span class="dv">0</span>) <span class="kw">in</span><br /> <span class="kw">let</span> d = make k k (shift m i i) <span class="kw">in</span> <br /> (a, b, c, d)</code></pre></div><p>This divides a matrix into four, by finding the split point <code>i</code>, and then making a new matrix that uses the old matrix's <code>data</code> with a suitable offset. So <code>b</code>'s x-th row and y-th column will be the x-th row and (y+i)-th column of the original, for exmample.</p><p>Symmetrically, we need a <code>merge</code> operation.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> merge (a, b, c, d) = <br /> <span class="kw">assert</span> (M.dom a = M.dom b && M.cod a = M.cod c);<br /> <span class="kw">assert</span> (M.dom d = M.dom c && M.cod d = M.cod b);<br /> <span class="kw">let</span> get i j = <br /> <span class="kw">match</span> i < M.dom a, j < M.cod a <span class="kw">with</span><br /> | <span class="kw">true</span>, <span class="kw">true</span> -> M.get a i j <br /> | <span class="kw">true</span>, <span class="kw">false</span> -> M.get b i (j - M.cod a)<br /> | <span class="kw">false</span>, <span class="kw">true</span> -> M.get c (i - M.dom a) j <br /> | <span class="kw">false</span>, <span class="kw">false</span> -> M.get d (i - M.dom a) (j - M.cod a)<br /> <span class="kw">in</span><br /> M.make (M.dom a + M.dom d) (M.cod a + M.cod d) get </code></pre></div><p>To <code>merge</code> four matrices, their dimensions need to match up properly, and then the lookup operation <code>get</code> needs to use the indices to decide which quadrant to look up the answer in.</p><p>This can be used to write the <code>star</code> function:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="kw">rec</span> star m = <br /> <span class="kw">match</span> M.dom m <span class="kw">with</span><br /> | <span class="dv">0</span> -> m <br /> | <span class="dv">1</span> -> M.make <span class="dv">1</span> <span class="dv">1</span> (<span class="kw">fun</span> _ _ -> R.star (M.get m <span class="dv">0</span> <span class="dv">0</span>))<br /> | n -> <span class="kw">let</span> (a, b, c, d) = split (n / <span class="dv">2</span>) m <span class="kw">in</span><br /> <span class="kw">let</span> fstar = star (a ++ b * star d * c) <span class="kw">in</span> <br /> <span class="kw">let</span> gstar = star (d ++ c * star a * b) <span class="kw">in</span> <br /> merge (fstar, fstar * b * gstar,<br /> gstar * c * fstar, gstar)<br /><span class="kw">end</span></code></pre></div><p>This function looks at the dimension of the matrix. If the matrix is dimension 1, we can just use the underlying Kleene star on the singleton entry in the matrix. If the square matrix is bigger, we can split the matrix in four and then directly apply the construction we saw earlier. We try to divide the four quadrants as nearly in half as we can, because this reduces the number of times we need to make a recursive call.</p><p>As an example of using this, let's see how we can convert automata given by state-transition functions to regular expressions.</p><p>First, we define a trivial Kleene algebra using a datatype of regular expressions, and use this to set up a module of matrices:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Re = <span class="kw">struct</span><br /> <span class="kw">type</span> t = <br /> | Chr <span class="kw">of</span> <span class="dt">char</span><br /> | Seq <span class="kw">of</span> t * t <br /> | Eps<br /> | Bot <br /> | Alt <span class="kw">of</span> t * t <br /> | Star <span class="kw">of</span> t <br /><br /> <span class="kw">let</span> zero = Bot<br /> <span class="kw">let</span> one = Eps<br /> <span class="kw">let</span> ( * ) r1 r2 = Seq(r1, r2)<br /> <span class="kw">let</span> ( ++ ) r1 r2 = Alt(r1, r2)<br /> <span class="kw">let</span> star r = Star r <br /><span class="kw">end</span><br /><br /><span class="kw">module</span> M = Matrix(Re)</code></pre></div><p>Then, we can construct a Kleene algebra of matrices of size 2, and then give a 2-state automaton by its transition function.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> T = Square(Re)(M)(<span class="kw">struct</span> <span class="kw">let</span> size = <span class="dv">2</span> <span class="kw">end</span>)<br /><span class="kw">let</span> t = <br /> M.make <span class="dv">2</span> <span class="dv">2</span> (<span class="kw">fun</span> i j -> <br /> Re.Chr (<span class="kw">match</span> i, j <span class="kw">with</span><br /> | <span class="dv">0</span>, <span class="dv">0</span> -> <span class="ch">'a'</span><br /> | <span class="dv">0</span>, <span class="dv">1</span> -> <span class="ch">'b'</span><br /> | <span class="dv">1</span>, <span class="dv">0</span> -> <span class="ch">'c'</span><br /> | <span class="dv">1</span>, <span class="dv">1</span> -> <span class="ch">'d'</span><br /> | _ -> <span class="kw">assert</span> <span class="kw">false</span>))</code></pre></div><p>Now, we can use the Kleene star on <code>T</code> to compute the transitive closure of this transition function:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"># M.get tstar <span class="dv">0</span> <span class="dv">1</span>;;<br />- : M.elt =<br />Seq (Star (Alt (Chr <span class="ch">'a'</span>, Seq (Chr <span class="ch">'b'</span>, Seq (Star (Chr <span class="ch">'d'</span>), Chr <span class="ch">'c'</span>)))),<br /> Seq (Chr <span class="ch">'b'</span>,<br /> Star (Alt (Chr <span class="ch">'d'</span>, Seq (Chr <span class="ch">'c'</span>, Seq (Star (Chr <span class="ch">'a'</span>), Chr <span class="ch">'b'</span>))))))</code></pre></div><p>Voila! A regexp corresponding to the possible paths from 0 to 1.</p><p><a href="href=%22https://gist.github.com/neel-krishnaswami/c809b12bfeb5e4ed7e4444cbdae43f23%22">The code is available as a Github gist.</a> Bugfixes welcome!</p> Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-52048439877338372292019-08-23T03:17:00.000-07:002019-08-23T03:17:00.227-07:00New Draft Paper: Survey on Bidirectional Typechecking <p>Along with J. Dunfield, I have written a <a href="https://www.cl.cam.ac.uk/~nk480/bidir-survey.pdf">new survey paper on bidirectional typechecking</a>.</p><blockquote><p>Bidirectional typing combines two modes of typing: type checking, which checks that a program satisfies a known type, and type synthesis, which determines a type from the program. Using checking enables bidirectional typing to break the decidability barrier of Damas-Milner approaches; using synthesis enables bidirectional typing to avoid the large annotation burden of explicitly typed languages. In addition, bidirectional typing improves error locality. We highlight the design principles that underlie bidirectional type systems, survey the development of bidirectional typing from the prehistoric period before Pierce and Turner's local type inference to the present day, and provide guidance for future investigations.</p></blockquote><p>This is the first survey paper I've tried to write, and an interesting thing about the process is that it forces you to organize your thinking, and that can lead to you changing your mind. For example, before we started this paper I was convinced that bidirectional typechecking was obviously a manifestation of focalization. When we attempted to make this argument clearly in the survey, that led me to realize I didn't believe it! Now I think that the essence of bidirectional typechecking arises from paying close attention to the mode discipline (in the logic programming sense) of the typechecking relations.</p><p>Also, if you think we have overlooked some papers on the subject, please let us know. Bidirectional typechecking is a piece of implementation folklore that is perhaps over three decades old, which makes it very hard for us to feel any certainty about having found all the relevant literature.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com7tag:blogger.com,1999:blog-8068466035875589791.post-67450544936494016452019-08-21T06:42:00.000-07:002019-08-21T06:42:48.565-07:00On the Relationship Between Static Analysis and Type Theory<p>Some months ago, <a href="https://www.cc.gatech.edu/~rmangal3/">Ravi Mangal</a> sent me an email with a very interesting question, which I reproduce in part below:</p><blockquote><p>The question that has been bothering me is, "How are static analyses and type systems related?". Besides Patrick Cousot's <a href="https://www.irif.fr/~mellies/mpri/mpri-ens/articles/cousot-types-as-abstract-interpretations.pdf"><em>Types as abstract interpretations</em></a>, I have failed to find other material directly addressing this. Perhaps, the answer is obvious, or maybe it has just become such a part of the community's folklore that it hasn't been really written down. However, to me, it's not clear how one might even phrase this question in a precise mathematical manner.</p></blockquote><p>The reason this question is so interesting is because some very beautiful answers to it have been worked out, but have for some reason never made it into our community's folklore. Indeed, I didn't know the answers to this question until quite recently.</p><p>The short answer is that there are two views of what type systems are, and in one view type systems are just another static analysis, and in another they form the substrate upon which static analyses can be built -- and both views are true at the same time.</p><h2 id="static-analysis-including-types-as-properties">Static Analysis (including Types) as Properties</h2><p>One valid view of a type system, is that it is just another species of static analysis. At a high level, the purpose of a static analysis is to answer the question, "Can I tell what properties this program satisfies <strong>without</strong> running it?"</p><p>Abstract interpretation, model checking, dataflow analysis and type inference are all different ways of answering this question, each with somewhat different emphases and priorities. IMO, to understand the relations between these styles of analysis, it's helpful to think about the two main styles of semantics: operational semantics and denotational semantics.</p><p>A denotational semantics turns a program into a mathematical object (sets, domains, metric spaces, etc). This has the benefit that it lets you turn the full power of mathematics towards reasoning about programs, but it has the cost that deducing properties of arbitrary mathematical gadgets can require arbitrary mathematical reasoning.</p><p>Operational semantics, on the other hand, turns a program into a state machine. This has the benefit that this is relatively easy to do, and close to how computers work. It has the cost that it only gives semantics to whole programs, and in reality people tend to write libraries and other partial programs. (Also, infinite-state operational semantics doesn't make things obviously easier to work with than a denotational semantics.)</p><p>Operational semantics gives rise to model checking: if you have a program, you can view the sequence of states it takes on as a concrete model of temporal logic. So model checking then checks whether a temporal properties overapproximates or refines a concrete model.</p><p>So basically all static analyses work by finding some efficiently representable approximation of the meaning of a program, and then using that to define an algorithm to analyse the program, typically by some form of model checking. For type systems, the approximations are the types, and in abstract interpretation, this is what the abstract domains are.</p><p>David A. Schmidt's paper <a href="http://santos.cs.ksu.edu/schmidt/MFPS09/paper.pdf"><em>Abstract Interpretation from a Denotational Semantics Perspective</em></a> shows how you can see abstract domains as arising from domain theory, by noting that domains arise as the infinite limit of a series of finite approximations, and so you can get an abstract domain by picking a finite approximation rather than going all the way to the limit.</p><p>Thomas Jensen's PhD thesis, <a href="http://www.irisa.fr/lande/jensen/papers/thesis.pdf"><em>Abstract Interpretation in Logical Form</em></a>, shows how an abstract interpretation of a higher-order functional language corresponds to a certain intersection type system for it. This helps illustrate the idea that abstract domains and types are both abstractions of a program. (FWIW, finding equivalences between abstract domains and type systems used to be big topic back in the 90s, but seems to have disappeared from view. I don't know why!)</p><p>Now, model checking an infinite-state system will obviously fail to terminate in general. However, it can then be made effective and algorithmic by using your abstractions (either from types or from abstract interpretation) to turn the model into a finitary one that can be exhaustively searched.</p><p>For dataflow analysis, David A. Schmidt works out this idea in detail in his paper, <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.9327&rep=rep1&type=pdf"><em>Data Flow Analysis is Model Checking of Abstract Interpretations</em></a>, where he shows how if you build an abstraction of program traces and model-check against mu-calculus formulas, you can get back most classical dataflow analyses. (If one lesson you draw is that you should read everything David Schmidt has written, well, you won't be wrong.)</p><p>It's less common to formulate type-based analyses in this form, but some people have looked at it.</p><ol style="list-style-type: lower-alpha"><li><p>There is a line of work on "higher-order model checking" which works by taking higher-order programs, using type inference for special type systems to generate abstractions for them which lie in special subclasses of pushdown systems which are decidable for the properties of interest for program analysis (eg, emptiness testing). Luke Ong wrote an overview of the area for LICS 2015, in his paper <a href="http://www.cs.ox.ac.uk/people/luke.ong/personal/publications/LICS15.pdf"><em>Higher-Order Model Checking: an Overview</em></a>.</p></li><li><p>Dave Clarke and Ilya Sergey have a nice IPL 2012 paper, <a href="https://www.sciencedirect.com/science/article/pii/S0020019011002791"><em>A correspondence between type checking via reduction and type checking via evaluation</em></a>, in which they show how you can view types as abstract values, and use this perspective to derive an abstract interpreter for a program which operates on these abstract values. They don't state it explicitly, since it was too obvious to mention, but you can check that a program <code>e</code> has the type <code>int</code> by asserting that <code>e</code> will eventually evaluate to the abstract value <code>int</code>. This is easy to check since every program in their abstract semantics terminates -- and so you can just evaluate <code>e</code> and see if the result is <code>int</code>.</p></li></ol><h2 id="types-as-grammar">Types as Grammar</h2><p>However, while types can be seen as a way of stating program properties, they have a second life as a way of defining <em>what the valid programs are</em>. A type system can also be seen as a way of statically ruling out certain programs from consideration. That is, an expression like <code>3("hi")</code>, which applies the argument <code>"hi"</code> to the number <code>3</code>, is valid Javascript or Python code, but is not a valid program in OCaml or Java, as it would be rejected at compile time by the typechecker.</p><p>As a result, type systems can also be used to simplify the semantics: if a term is statically ruled out, you don't have to give semantics to it. The benefit of doing this is that this can dramatically simplify your static analyses.</p><p>For example, if you have a typed functional language with integers and function types, then building an abstract interpretation to do a sign analysis is simplified by the fact that you don't have to worry about function values flowing into the integer domain. You can just take the obvious abstract domains for each type and then rely on the type system to ensure that only programs where everything fits together the way you want are ever allowed, freeing you of the need for a jumbo abstract domain that abstracts all values.</p><p>For cultural reasons, this methodology is often used in type systems research, but relatively rarely in static analysis work. Basically, if you tell a type systems researcher that you are changing the language to make it easier to analyze, they will say "oh, okay", but if you tell a static analysis researcher the same thing, they may express a degree of skepticism about your ability to update all Java code everywhere. Naturally, neither reaction is right or wrong: it's contextual. After all, there <em>are</em> billions of lines of Java code that won't change, <strong>and also</strong> the Java language <em>does</em> change from version to version.</p><p>However, as researchers we don't just pick up context from our problems, but also from our peers, and so the culture of type systems research tends to privilege new languages and the culture of static analysis research tends to privilege applicability to existing codebases, even though there is no fundamental technical reason for this linkage. After all, all static analyses exploit the structure of the language whenever they can in order to improve precision and scalability.</p><p>For example, in a typical intraprocedural dataflow analysis, we initialize all the local variables to the bottom element of the dataflow lattice. This is a sound thing to do, <em>because</em> the language is constrained so that local variables are freshly allocated on each procedure invocation. If, on the other hand, we were doing dataflow analysis of assembly code, different subroutines would all simultaneously affect the dataflow values assigned to each register.</p><p>As a consequence, one thing I'm looking forward to is finding out what static analysis researchers will do when they start taking a serious look at Rust. Its type system enforces really strong invariants about aliasing and data races, which ought to open the door to some really interesting analyses.</p><h2 id="the-relationship-between-the-two-views">The Relationship Between the Two Views</h2><p>As you can see, grammaticality constraints and program analysis are not opposed, and can overlap in effect. Furthermore, this means people writing papers on type systems are often blurry about which perspective they are taking -- in one section of a paper they may talk about types-as-properties, and in another they may talk about types-as-grammar. This ambiguity is technically harmless, but can be confusing to a reader coming to type systems from other areas within formal methods.</p><p>The reason this ambiguity is mostly harmless is because for well-designed languages there will be a nice relationship between these two views of what types are. John C Reynolds explained this, in his lovely paper <a href="https://kilthub.cmu.edu/articles/The_Meaning_of_Types_From_Intrinsic_to_Extrinsic_Semantics/6610688"><em>The Meaning of Types: From Intrinsic to Extrinsic Semantics</em></a>. (In his language, "intrinsic types" are types-as-grammar, and "extrinsic types" are types-as-property.)</p><p>Basically, he noted that when you give a denotational semantics under a types-as-grammar view, you interpret <em>typing derivations</em>, and when you give a denotational semantics in the types-as-property view, you interpret <em>program terms</em>. For types-as-grammar to make sense, its semantics must be <em>coherent</em>: two typing derivations of the same program must have the same semantics. (Otherwise the semantics of a program can be ambiguous.) But once coherence holds, it becomes feasible to prove the equivalence between the intrinsic and extrinsic semantics, because there is a tight connection between the syntax of the language and the typing derivations.</p><p>Noam Zeilberger and Paul-Andre Melliès wrote a POPL 2015 paper, <a href="https://www.irif.fr/~mellies/papers/functors-are-type-refinement-systems.pdf"><em>Functors are Type Refinement Systems</em></a>, where they work out the general semantic structure underpinning Reynolds' work, in terms of the relationship between intersection types (or program logic) and a base language.</p><h2 id="compositionality">Compositionality</h2><p>Another cultural effect is the importance type systems research places on compositionality. It is both absolutely the case that there is no <em>a priori</em> reason a type system has to be compositional, and also that most type systems researchers -- including myself! -- will be very doubtful of any type system which is not compositional.</p><p>To make this concrete, here's an example of a type system which is pragmatic, reasonable, and yet which someone like me would still find distasteful.</p><p>Consider a Java-like language whose type system explicitly marks nullability. We will say that method argument <code>(x : C)</code> means that <code>x</code> is a possibly <code>null</code> value of type <code>C</code>, and an argument <code>x : C!</code> means that <code>x</code> is definitely of class <code>C</code>, with no <code>null</code> value. Now, suppose that in this language, we decide that null tests should refine the type of a variable:</p><pre><code> // x : C outside the if-block<br /> if (x !== null) { <br /> doSomething(x); // x : C! inside the if-block<br /> } </code></pre><p>Basically, we checked that <code>x</code> was not <code>null</code>, so we promoted its type from <code>C</code> to <code>C!</code>. This is a reasonable thing to do, which industrial languages like <a href="https://kotlinlang.org">Kotlin</a> support, and yet you will be hard-pressed to find many formal type theories supporting features like this.</p><p>The reason is that the culture of type theory says that type systems should satisfy a <em>substitution principle</em>: a variable stands in for a term, and so substituting a term of the same type as the variable should result in a typeable term. In this case, suppose that <code>e</code> is a very complicated expression of type <code>C</code>. Then if we do a substitution of <code>e</code> for <code>x</code>, we get:</p><pre><code> if (e !== null) { <br /> doSomething(e); // e : C! -- but how?<br /> } </code></pre><p>Now we somehow have to ensure that <code>e</code> has the type <code>C!</code> inside the scope of the conditional. For variables this requires updating the context, but for arbitrary terms it will be much harder, if it is even possible at all!</p><p>The culture of static analysis says that since it's easy for variables, we should handle that case, whereas the culture of type theory says that anything you do for variables should work for general terms. Again, which view is right or wrong is contextual: it's genuinely useful for Kotlin programmers to have dataflow-based null-tracking, but at the same time it's much easier to reason about programs if substitution works.</p><p>The reason type theorists have this culture is that type theory has two parents. On one side are Scott and Strachey, who consciously and explicitly based denotational semantics upon the compositionality principle; and the other side comes from the structural proof theory originated by Gerhard Gentzen, which was invented precisely to prove cut-eliminability (which basically just is a substitution principle). As a result, the type-theoretic toolbox is full of tools which both assume and maintain compositionality. So giving that up (or moving past it, depending on how you want to load the wording emotionally) is something which requires very careful judgement.</p><p>For example, the <a href="https://docs.racket-lang.org/ts-guide/index.html">Typed Racket</a> language extends the dynamically-typed <a href="https://docs.racket-lang.org/">Racket</a> language with types, and to support interoperation with idiomatic Racket code (which uses control-flow sensitive type tests), they do flow-sensitive typing. At ESOP 2011, Arjun Guha, Claudiu Saftoiu, and Shriram Krishnamurthi wrote a paper about a particularly elegant way of combining flow analysis and typechecking, <a href="https://plasma-seminar.github.io/readings/guha-esop2011.pdf"><em>Typing Local Control and State Using Flow Analysis</em></a>.</p><h2 id="conclusion">Conclusion</h2><p>Finally, one might wonder whether it is worth it to try developing a unified perspective on type theory and static analysis. It's a lot of work to learn any one of these things, and (for example) David Schmidt's papers, while deeply rewarding, are not for the faint of heart. So maybe we should all specialize and not worry about what members of cousin tribes are doing?</p><p>All I can say is that I think I benefited tremendously from the fact that (largely by accident) I ended up learning separation logic, denotational semantics and proof theory as part of my PhD. There actually is a wholeness or consilience to the variety of techniques that the semantics community has devised, and insights apply across domains in surprising ways.</p><p>My <a href="https://www.cl.cam.ac.uk/~nk480/parsing.pdf">recent PLDI paper</a> with <a href="https://www.cl.cam.ac.uk/~jdy22/">Jeremy Yallop</a> is a nice example of this. Very little in this paper is hard, but what makes it easy is that we felt free to switch perspectives as we needed.</p><p>Because I had learned to like automata theory from reading papers on model checking, I ended up reading about Brüggemann-Klein and Wood's characterization of <a href="http://tr.informatik.uni-freiburg.de/reports/report38/report38.ps.gz">the deterministic regular languages</a>. Then, because I share the aesthetics of denotational semantics, I realized that they invented a superior, compositional characterization of the FOLLOW sets used in LL(1) parsing. Then, because I knew about the importance of substitution from type theory, this lead me to switch from the BNF presentation of grammars, to the mu-regular expressions. This let us offer the traditional parser combinator API, while under the hood having a type system which identified just the LL(1)-parseable grammars. Moreoever, the type inference algorithm we use in the implementation is basically a dataflow analysis. And that set things up in a form to which multistage programming techniques were applicable, letting us derive parser combinators that run faster than yacc.</p><p>But beyond the direct applications like that, it is just good to be able to go to a conference and be interested by what other people are doing. Everyone attending has developed deep expertise in some very esoteric subjects, and learning enough to learn from them is valuable. Research communities are long-running conversations, and being able to hold up your end of the conversation makes the community work better.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com4tag:blogger.com,1999:blog-8068466035875589791.post-68646940393294910772019-08-12T05:26:00.001-07:002019-08-12T05:26:09.503-07:00Michael Arntzenius on the Future of Coding PodcastMy PhD student <a href="http://www.rntz.net">Michael Arntzenius</a> is on the <a href="https://futureofcoding.org/episodes/040"><em>Future of Coding</em></a> podcast, talking about the things he has learned while developing <a href="http://www.rntz.net/datafun">Datafun</a>. In the course of a PhD, people tend to work on extremely refractory problems, which is a useful discipline for developing and sharpening your worldview. Since Michael is a particularly thoughtful person, he's developed a particularly thoughtful view of computing, and so the interview is well worth reading or listening to. <blockquote> <p>This episode explores the intersections between various flavors of math and programming, and the ways in which they can be mixed, matched, and combined. Michael Arntzenius, "rntz" for short, is a PhD student at the University of Birmingham building a programming language that combines some of the best features of logic, relational, and functional programming. The goal of the project is "to find a sweet spot of something that is more powerful than Datalog, but still constrained enough that we can apply existing optimizations to it and imitate what has been done in the database community and the Datalog community." The challenge is combining the key part of Datalog (simple relational computations without worrying too much underlying representations) and of functional programming (being able to abstract out repeated patterns) in a way that is reasonably performant. <p>This is a wide-ranging conversation including: Lisp macros, FRP, Eve, miniKanren, decidability, computability, higher-order logics and their correspondence to higher-order types, lattices, partial orders, avoiding logical paradoxes by disallowing negation (or requiring monotonicity) in self reference (or recursion), modal logic, CRDTS (which are semi-lattices), and the place for formalism is programming. This was a great opportunity for me to brush up on (or learn for the first time) some useful mathematical and type theory key words. Hope you get a lot out of it as well -- enjoy! </blockquote> <p>I'd like to thank <a href="https://stevekrouse.com/">Steve Krouse</a> for interviewing Michael. This interview series is a really nice idea -- a lot of the intuition or worldview underpinning research programs never makes it into a paper, but can be brought out nicely in a conversation. <p>Also, if you prefer reading to listening, the link also has a <a href="https://futureofcoding.org/episodes/040">a transcript of the podcast</a> available. Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com1tag:blogger.com,1999:blog-8068466035875589791.post-83779206727721221612019-07-11T08:15:00.000-07:002019-07-11T08:16:21.572-07:00All Comonads, All The Time<p>(If you are on the POPL 2020 program committee, please stop reading until after your reviews are done.)</p><p>It's been a few months since I've last blogged, so I figure I should say what I have been up to lately. The answer seems to be <strong>comonads</strong>. I have two new draft papers, both of which involve comonadic type theories in a fairly critical way. (I suppose that this makes me a type-theoretic hipster, who doesn't do monads now that they're popular.)</p><a name='more'></a> <ul><li><p><a href="https://www.cl.cam.ac.uk/~nk480/popl20-cap-submission.pdf"><em>Recovering Purity with Comonads and Capabilities</em></a></p><p>The first paper, written with <a href="https://vikraman.org/">Vikraman Chaudhury</a> has the lovely subtitle <em>The marriage of purity and comonads</em>.</p><p>Folk wisdom has it that once your functional language supports impure features, that's game over: there's no way to recover a sublanguage which supports reasoning as if programs are pure, since higher-order functions mean you never know if a function variable could have an effect. So this is why people have historically turned to pure languages, and then isolate effects via a monadic type discipline (like Haskell) or effect system (like <a href="https://koka-lang.github.io/koka/doc/kokaspec.html">Koka</a>).</p><p>In this paper, we show that folk wisdom jumped the gun! We start with a small impure language, and then extend it by <em>adding</em> a comonadic type constructor for pure terms. This lets us do the opposite of Haskell: effects are pervasive, except in the bits where we <em>deny</em> the capability to perform them. Interestingly, the word "capability" is critical -- our denotational semantics formalizes the notion of <em>object capabilities</em>, and we show that our type system doesn't just offer the ability to enforce purity, but also notions intermediate between purity and pervasive effects, such as <em>capability safety</em>.</p><p>I'm really pleased with this paper, since it pulls together ideas people working in areas as far apart as operating systems and categorical proof theory, and does so with really elementary techniques in semantics. It builds on ideas that Martin Hofmann invented in his 2003 paper <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.478.5395&rep=rep1&type=pdf"><em>Linear Types and Non-Size-Increasing Polynomial Time Computation</em></a>, which is one of the papers that had a really big impact on me as a grad student.</p></li><li><p><a href="https://www.cl.cam.ac.uk/~nk480/seminaive-datafun.pdf"><em>Seminaïve Evaluation for a Higher-Order Functional Language</em></a></p><p>The next paper was written with <a href="http://www.rntz.net">Michael Arntzenius</a>.</p><p>For the last few years, we've been working on a language called <a href="http://www.rntz.net/datafun/"><em>Datafun</em></a>, which is a higher-order version of <a href="https://en.wikipedia.org/wiki/Datalog">Datalog</a>, which is either a query language akin to SQL plus recursion, or a simple bottom-up logic programming language.</p><p>We developed <a href="https://www.cl.cam.ac.uk/~nk480/datafun.pdf">a really nice semantics for Datafun</a>, but the thing about query languages is that people want them to go fast. The most important technique for making Datalog go fast is an optimization called <em>semi-naive evaluation</em>, but it was designed for a first-order logic programming language, and Datafun isn't that. So in this paper we worked out how seminaive evaluation should work for higher-order languages, and the answer involves a big old bag of tricks, starting with the <a href="https://www.informatik.uni-marburg.de/~pgiarrusso/ILC/"><em>incremental lambda calculus</em></a>, but also requiring a pretty intricate use of comonads to control how recursion is used and to propagate derivative information throughout the program. (Interestingly, Martin Hofmann also worked on using comonads to stratify recursion in his paper <a href="http://www.dcs.ed.ac.uk/home/mxh/csl97.ps.gz"><em>A Mixed Modal/Linear Lambda Calculus With Applications To Bellantoni-Cook Safe Recursion</em></a>!)</p><p>It is super satisfying to develop optimizations that make programs go <em>asymptotically</em> faster, and on top of that this technique feels like the tip of the iceberg -- the design space feels wide open to me, and I suspect static incrementalization techniques are potentially going to end up being as big a hammer as continuation-passing style.</p></li></ul>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com2tag:blogger.com,1999:blog-8068466035875589791.post-89071138803262310642019-05-13T09:18:00.000-07:002019-05-14T06:33:42.911-07:00Implementing Inverse Bidirectional Typechecking <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > span.kw { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > span.bn { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > span.ch { color: #4070a0; } /* Char */ code > span.st { color: #4070a0; } /* String */ code > span.co { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > span.al { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > span.er { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > span.cn { color: #880000; } /* Constant */ code > span.sc { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > span.ss { color: #bb6688; } /* SpecialString */ code > span.im { } /* Import */ code > span.va { color: #19177c; } /* Variable */ code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > span.at { color: #7d9029; } /* Attribute */ code > span.do { color: #ba2121; font-style: italic; } /* Documentation */ code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style><p>In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In <p>In my last post, I remarked that the inverse bidirectional type system was obviously algorithmic. In this post, let's implement it! What follows is a bit of OCaml code implementing the type system of the previous post.</p><p>First, let's give a data type to represent the types of the linear type system. As usual, we will have a datatype <code>tp</code> with one constructor for each grammatical production. In the comment next to each constructor, I'll give the term that the constructor corresponds to.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">type</span> tp = <br /> | One <span class="co">(* represents 1 *)</span><br /> | Tensor <span class="kw">of</span> tp * tp <span class="co">(* represents A ⊗ B *)</span> <br /> | Lolli <span class="kw">of</span> tp * tp <span class="co">(* represents A ⊸ B *)</span> </code></pre></div><p>Now, we can give a datatype to represent expressions. We'll represent variables with strings, and use the datatype <code>exp</code> to represent expressions. As before, there is a comment connecting the datatype to the expressions of the grammar.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">type</span> var = <span class="dt">string</span><br /><br /><span class="kw">type</span> <span class="dt">exp</span> = <br /> | Unit <span class="co">(* represents () *)</span><br /> | LetUnit <span class="kw">of</span> <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents let () = e in e' *)</span><br /> | Pair <span class="kw">of</span> <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents (e, e') *)</span><br /> | LetPair <span class="kw">of</span> var * var * <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents let (x,y) = e in e' *)</span><br /> | Lam <span class="kw">of</span> var * <span class="dt">exp</span> <span class="co">(* represents λx. e *)</span><br /> | App <span class="kw">of</span> <span class="dt">exp</span> * <span class="dt">exp</span> <span class="co">(* represents e e' *)</span><br /> | Var <span class="kw">of</span> var <span class="co">(* represents x *)</span></code></pre></div><p>Now we have to do something annoying, and implement some functions on the option datatype which really should be in the standard library. Basically we just want the standard functional programming structure on option types -- folds, maps, and monadic structure -- so we just go ahead an implement it.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Option = <span class="kw">struct</span><br /> <span class="kw">type</span> 'a t = 'a <span class="dt">option</span><br /><br /> <span class="kw">let</span> map f = <span class="kw">function</span><br /> | <span class="dt">None</span> -> <span class="dt">None</span><br /> | <span class="dt">Some</span> x -> <span class="dt">Some</span> (f x)<br /><br /><br /> <span class="kw">let</span> return x = <span class="dt">Some</span> x <br /><br /> <span class="kw">let</span> fail = <span class="dt">None</span><br /><br /> <span class="kw">let</span> (>>=) m f = <br /> <span class="kw">match</span> m <span class="kw">with</span><br /> | <span class="dt">None</span> -> <span class="dt">None</span><br /> | <span class="dt">Some</span> x -> f x<br /><br /> <span class="kw">let</span> fold some none = <span class="kw">function</span><br /> | <span class="dt">None</span> -> none<br /> | <span class="dt">Some</span> x -> some x <br /><span class="kw">end</span></code></pre></div><p>Now, we can actually implement the bidirectional typechecker. To understand the implementation, it's actually helpful to understand the interface, first.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> <span class="kw">type</span> TYPING = <span class="kw">sig</span><br /> <span class="kw">type</span> ctx = (var * tp <span class="dt">option</span>) <span class="dt">list</span><br /> <span class="kw">type</span> 'a t = ctx -> ('a * ctx) <span class="dt">option</span> <br /><br /> <span class="kw">val</span> map : ('a -> 'b) -> 'a t -> 'b t <br /> <span class="kw">val</span> return : 'a -> 'a t<br /> <span class="kw">val</span> ( >>= ) : 'a t -> ('a -> 'b t) -> 'b t<br /><br /> <span class="kw">val</span> synth : <span class="dt">exp</span> -> tp t<br /> <span class="kw">val</span> check : <span class="dt">exp</span> -> tp -> <span class="dt">unit</span> t</code></pre></div><p>The basic structure of our typechecker is to give a pair of operations <code>check</code> and <code>synth</code>, which respectively check that an expression <code>e</code> has a type <code>tp</code>, and infer a type for an expression. This function will be written in a monadic style, so we also have a type constructor <code>'a t</code> for typechecking computations, and the usual assortment of functorial (<code>map</code>) and monadic (<code>return</code> and <code>>>=</code>) structure for this type.</p><p>The monadic type constructor <code>'a t</code> is a pretty basic state-and-exception monad. It plumbs the context (of type <code>ctx</code>) through the computation, and can either return a value and an updated context, or it will fail.</p><p>An interesting feature of this context representation is that it does not map variables to types – it maps them to the option type <code>tp option</code>. This is because of the way that the moding will work out; the type is an <em>output</em> of the typing relation, and so when we put a variable into the context, we will not give it a type, and use the computation to ascribe it a type, which will be reflected in the output context. This is also why we use a full state monad rather than a reader monad for the context – we are basically implementing part of Prolog's substitution threading here.</p><p>We will also need a number of operations to implement the typechecker.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">val</span> fail : 'a t<br /> <span class="kw">val</span> checkvar : var -> tp -> <span class="dt">unit</span> t<br /> <span class="kw">val</span> lookup : var -> tp t <br /> <span class="kw">val</span> withvar : var -> 'a t -> 'a t<br /> <span class="kw">val</span> tp_eq : tp -> tp -> <span class="dt">unit</span> t<br /><span class="kw">end</span></code></pre></div><p>We will need to <code>fail</code> in order to judge programs ill-typed. The <code>checkvar x tp</code> operation gives the variable <code>x</code> the type <code>tp</code>. The <code>lookup x</code> operation will look in the context to find a a type for <code>x</code>, failing if <code>x</code> has not yet been given a type. The operation <code>withvar x m</code> will run the monadic computation <code>m</code> in a context extended with the variable <code>x</code>. (No type is given for the variable, because it's the job of <code>m</code> to give the variable a type.) Finall, there's an equality test <code>tp_eq tp1 tp2</code>, that acts as a guard, failing if the two arguments are unequal.</p><p>Now, we can move on to the actual implementation.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span class="kw">module</span> Typing : TYPING = <span class="kw">struct</span><br /> <span class="kw">type</span> ctx = (var * tp <span class="dt">option</span>) <span class="dt">list</span> <br /><br /> <span class="kw">type</span> 'a t = ctx -> ('a * ctx) <span class="dt">option</span> <br /><br /> <span class="kw">let</span> map f m ctx = <br /> <span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span> <br /> m ctx >>= <span class="kw">fun</span> (x, ctx) -> <br /> return (f x, ctx)<br /><br /> <span class="kw">let</span> return x = <span class="kw">fun</span> ctx -> <span class="dt">Some</span>(x, ctx)<br /><br /> <span class="kw">let</span> (>>=) m f = <span class="kw">fun</span> ctx -> <br /> <span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span> <br /> m ctx >>= <span class="kw">fun</span> (a, ctx') -> <br /> f a ctx'</code></pre></div><p>As promised, the computation type is a state-and-exception monad, and the implementation of <code>map</code> and the monadic unit and bind are pretty unsurprising. More interesting are the implementations of the actual operations in the monadic interface.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> fail : 'a t = <span class="kw">fun</span> ctx -> <span class="dt">None</span> </code></pre></div><p>Failure is easy to implement – it just ignores the context, and then returns <code>None</code>.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="kw">rec</span> checkvar (x : var) (tp : tp) : <span class="dt">unit</span> t = <span class="kw">fun</span> ctx -> <br /> <span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span> <br /> <span class="kw">match</span> ctx <span class="kw">with</span><br /> | [] -> fail <br /> | (y, <span class="dt">None</span>) :: rest <span class="kw">when</span> x = y -> return ((), (y, <span class="dt">Some</span> tp) :: rest)<br /> | (y, <span class="dt">Some</span> _) :: rest <span class="kw">when</span> x = y -> fail <br /> | h :: rest -> checkvar x tp rest >>= <span class="kw">fun</span> ((), rest') -> <br /> return ((), h :: rest')</code></pre></div><p>The way that <code>checkvar x tp</code> works is that it iterates through the variables in the context, looking for the hypothesis which matches the variable <code>x</code>. When it finds it, it returns an updated context with the type of <code>x</code> set to <code>Some tp</code>. If the variable is already set, then that means that this is the second use of the variable, and so <code>checkvar</code> fails – this enforces the property that variables are used <em>at most</em> one time. If the variable isn't in the context, then <code>checkvar</code> also fails, because this is an out-of-scope variable reference. All other hypotheses are left unchanged.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> lookup x (ctx : ctx) = <br /> <span class="kw">match</span> <span class="dt">List</span>.assoc_opt x ctx <span class="kw">with</span><br /> | <span class="dt">None</span> -> Option.fail<br /> | <span class="dt">Some</span> <span class="dt">None</span> -> Option.fail<br /> | <span class="dt">Some</span> (<span class="dt">Some</span> tp) -> Option.return(tp, ctx)</code></pre></div><p>The <code>lookup x</code> computation is even simpler – it returns <code>tp</code> if <code>(x, Some tp)</code> is in the context, and fails otherwise.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> withvar (<span class="kw">type</span> a) (x : var) (m : a t) : a t = <span class="kw">fun</span> ctx -> <br /> <span class="kw">let</span> <span class="kw">open</span> Option <span class="kw">in</span> <br /> m ((x, <span class="dt">None</span>) :: ctx) >>= <span class="kw">function</span><br /> | (r, (y, <span class="dt">Some</span> _) :: ctx') <span class="kw">when</span> x = y -> return (r, ctx') <br /> | (r, (y, <span class="dt">None</span>) :: ctx') <span class="kw">when</span> x = y -> fail <br /> | _ -> <span class="kw">assert</span> <span class="kw">false</span></code></pre></div><p>The <code>withvar x m</code> operation extends the context with the variable <code>x</code>, and then runs <code>m</code> in the extended context.</p><p>An invariant our context representation maintains is that the output context has exactly the same variables in exactly the same order as the input context, and so we just pop off the first variable of the output context before returning, checking to make sure that the type of the variable has been set (i.e., <code>Some _</code>) to ensure that the variable was used <em>at least</em> one time. In conjunction with <code>checkvar</code> ensuring that the variable is used <em>at most</em> one time, this will ensure that each variable is used exactly one time.</p><p>If the first variable of the output context isn't <code>x</code>, or if the output context is empty, then our invariant is broken, and so we signal an assertion failure.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> tp_eq (tp1 : tp) (tp2 : tp) = <span class="kw">if</span> tp1 = tp2 <span class="kw">then</span> return () <span class="kw">else</span> fail </code></pre></div><p>The <code>type_eq tp1 tp2</code> function just turns a boolean test into a guard. Now, we can go through the synthesis and checking functions clause-by-clause:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">let</span> <span class="kw">rec</span> synth = <span class="kw">function</span><br /> | Unit -> return One</code></pre></div><p>We synthesize the unit type for the unit value.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | Pair(e1, e2) -> synth e1 >>= <span class="kw">fun</span> tp1 -> <br /> synth e2 >>= <span class="kw">fun</span> tp2 -> <br /> return (Tensor(tp1, tp2))</code></pre></div><p>To synthesize a type for a pair, we synthesize types for each of the components, and then return their tensor product.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | Lam(x, e) -> withvar x (synth e >>= <span class="kw">fun</span> ret_tp -> <br /> lookup x >>= <span class="kw">fun</span> arg_tp -> <br /> return (Lolli(arg_tp, ret_tp)))</code></pre></div><p>Functions are interesting, because we need to deal with variables, and evaluation order plays out in a neat way here. We infer a type <code>ret_tp</code> for the body <code>e</code>, and then we look up the type <code>tp_arg</code> that the body <code>e</code> ascribed to the variable <code>x</code>. This lets us give a type <code>Lolli(tp_arg, tp_ret)</code> for the whole function.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | LetUnit(e, e') -> check e One >>= <span class="kw">fun</span> () -> <br /> synth e'</code></pre></div><p>To synthesize a type for unit elimination, we synthesize a type for the body, and check that the scrutinee has the unit type <code>One</code>.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | LetPair(x, y, e, e') -> <br /> withvar y (withvar x (synth e' >>= <span class="kw">fun</span> res_tp -> <br /> lookup x >>= <span class="kw">fun</span> tp1 -> <br /> lookup y >>= <span class="kw">fun</span> tp2 -> <br /> check e (Tensor(tp1, tp2)) >>= <span class="kw">fun</span> () -> <br /> return res_tp))</code></pre></div><p>To eliminate a pair, we introduce (using <code>withvar</code>) scopes for the variables <code>x</code> and <code>y</code>, and then:</p><ol style="list-style-type: decimal"><li>We synthesize a type <code>res_tp</code> for the continuation <code>e'</code>.</li><li>Since <code>e'</code> used <code>x</code> and <code>y</code>, we can look up the types they were used at (binding the type of <code>x</code> to <code>tp1</code> and the type of <code>y</code> to <code>tp2</code>).</li><li>Then, we check that the scrutinee <code>e</code> has the type <code>Tensor(tp1, tp2)</code>.</li><li>Finally, we return the type <code>res_tp</code> for the type of the whole expression.</li></ol><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | App(_, _) -> fail <br /> | Var _ -> fail </code></pre></div><p>Since applications and variable references are checking, not synthesizing, we <code>fail</code> if we see one of them in synthesizing position. If they are in checking position, we can use the <code>check</code> function to typecheck them:</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> <span class="kw">and</span> check (e : <span class="dt">exp</span>) (tp : tp) : <span class="dt">unit</span> t = <br /> <span class="kw">match</span> e <span class="kw">with</span> <br /> | Var x -> checkvar x tp </code></pre></div><p>The variable case simply uses <code>checkvar</code>.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | App(e1, e2) -> synth e2 >>= <span class="kw">fun</span> tp_arg -> <br /> check e1 (Lolli(tp_arg, tp))</code></pre></div><p>To check an application <code>e1 e2</code> at a type <code>tp</code>, we first synthesize the argument type by inferring a type <code>tp_arg</code> for <code>e2</code>, and then we check that <code>e1</code> has the function type <code>Lolli(tp_arg, tp)</code>.</p><div class="sourceCode"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"> | e -> synth e >>= tp_eq tp<br /><span class="kw">end</span></code></pre></div><p>Finally, when we find a synthesizing term in checking position, we infer a type for it and then see if it is equal to what we expected.</p><p>This code is, at-best, lightly-tested, but I knocked together a <a href="https://github.com/neel-krishnaswami/inverse-bidirectional-typechecking">small Github repository</a> with the code. Enjoy!</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com2tag:blogger.com,1999:blog-8068466035875589791.post-28549785976383842052019-05-10T15:45:00.000-07:002019-05-16T03:32:28.319-07:00Inverting Bidirectional Typechecking<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-MML-AM_CHTML"></script><p><span class="math display">\[\newcommand{\bnfalt}{\;\;|\;\;} \newcommand{\To}{\Rightarrow} \newcommand{\From}{\Leftarrow} \newcommand{\rule}[2]{\frac{\displaystyle \begin{matrix} #1 \end{matrix}}{\displaystyle #2}} \newcommand{\check}[3]{{#1} \vdash {#2} \From {#3}} \newcommand{\synth}[3]{{#1} \vdash {#2} \To {#3}} \newcommand{\lam}[1]{\lambda {#1}.\,} \newcommand{\inj}[2]{\mathsf{in}_{#1}({#2})} \newcommand{\case}[5]{\mathsf{case}({#1}, \inj{1}{#2} \to {#3}, \inj{2}{#4} \to {#5})} \newcommand{\match}[2]{\mathsf{match}\;{#1}\;\mathsf{of}\;[ #2 ]} \newcommand{\arm}[2]{{#1} \to {#2}} \newcommand{\downshift}[1]{\downarrow{#1}} \newcommand{\upshift}[1]{\uparrow{#1}} \newcommand{\thunk}[1]{\left\{{#1}\right\}} \newcommand{\return}[1]{\mathsf{return}\;#1} \newcommand{\fun}[1]{\lambda\;#1} \newcommand{\checkn}[3]{{#1} \rhd {#2} \From {#3}} \newcommand{\checkp}[2]{{#1} \leadsto {#2}} \newcommand{\spine}[4]{{#1} \vdash {#2} : {#3} \gg {#4}} \newcommand{\tensor}{\otimes} \newcommand{\lolli}{\multimap} \newcommand{\letv}[2]{\mathsf{let}\;{#1}={#2}\;\mathsf{in}\;} \newcommand{\unit}{\langle\rangle} \newcommand{\letunit}[1]{\letv{\unit}{#1}} \newcommand{\pair}[2]{\left\langle{#1}, {#2}\right\rangle} \newcommand{\letpair}[3]{\letv{\pair{#1}{#2}}{#3}}\]</span></p><p>In the traditional recipe for bidirectional typechecking, introduction forms are checked, and the principal subterm of elimination forms are inferred. However, a while back <a href="https://noamz.org">Noam Zeilberger</a> remarked to me that in multiplicative linear logic, bidirectional typechecking worked just as well if you did it backwards. It is worth spelling out the details of this remark, and so this blog post.</p><p>First, let's give the types and grammar of multiplicative linear logic.</p><p><span class="math display">\[ \begin{array}{llcl} \mbox{Types} & A & ::= & 1 \bnfalt A \tensor B \bnfalt A \lolli B \\ \mbox{Terms} & e & ::= & x \bnfalt \lam{x}{e} \bnfalt e\,e' \\ & & | & \unit \bnfalt \letunit{e}{e'} \\ & & | & \pair{e}{e'} \bnfalt \letpair{x}{y}{e}{e'} \\ \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x \From A \\ \end{array} \]</span></p><p>Our types are the unit type <span class="math inline">\(1\)</span>, the tensor product <span class="math inline">\(A \tensor B\)</span>, and the linear function space <span class="math inline">\(A \lolli B\)</span>. The unit and pair have the expected introduction forms <span class="math inline">\(\unit\)</span> and <span class="math inline">\(\pair{e}{e'}\)</span>, and they have "pattern matching" style elimination forms. Functions are introduced with lambdas <span class="math inline">\(\lam{x}{e}\)</span> and eliminated with applications <span class="math inline">\(e\,e'\)</span> as usual, and of course variable references <span class="math inline">\(x\)</span> as usual. Contexts are a bit unusual -- they pair together variables and their types as usual, but instead of treating a variable as a placeholder for a <em>synthesizing</em> term, we treat variables as placeholders for <em>checking</em> terms.</p><p>Now, let's go through the typing rules. First, we give the introduction and elimination rules for the unit type.</p><p><span class="math display">\[ \begin{array}{ll} \rule{ } { \synth{\cdot}{\unit}{1} } & \rule{ \synth{\Delta}{e'}{A} & \check{\Gamma}{e}{1} } { \synth{\Gamma, \Delta}{\letunit{e}{e'}}{A} } \\[1em] \end{array} \]</span></p><p>The introduction rule says that in an empty context, the unit value <span class="math inline">\(\unit\)</span> <em>synthesizes</em> the type <span class="math inline">\(1\)</span>. The pattern-matching style elimination <span class="math inline">\(\letunit{e}{e'}\)</span> typechecks as follows. First, we infer a type <span class="math inline">\(A\)</span> for the body <span class="math inline">\(e'\)</span>, and then we check that the scrutinee <span class="math inline">\(e\)</span> has the unit type <span class="math inline">\(1\)</span>.</p><p>This order is <em>backwards</em> from traditional bidirectional systems -- we synthesize a type for the continuation first, before checking the type of the data we are eliminating. In the case of units, this is a mere curiosity, but it gets more interesting with the tensor product type <span class="math inline">\(A \tensor B\)</span>.</p><p><span class="math display">\[ \begin{array}{ll} \rule{ \synth{\Gamma}{e}{A} & \synth{\Delta}{e'}{B} } { \synth{\Gamma, \Delta}{\pair{e}{e'}}{A \tensor B} } & \rule{ \synth{\Gamma, x \From A, y \From B}{e'}{C} & \check{\Delta}{e}{A \tensor B} } { \synth{\Gamma, \Delta}{\letpair{x}{y}{e}{e'}}{C} } \end{array} \]</span></p><p>Now, the synthesis for pairs remains intuitive. For a pair <span class="math inline">\(\pair{e}{e'}\)</span>, we first infer a type <span class="math inline">\(A\)</span> for <span class="math inline">\(e\)</span>, and a type <span class="math inline">\(B\)</span> for <span class="math inline">\(e'\)</span>, and then conclude that the pair has the type <span class="math inline">\(A \tensor B\)</span>. However, the typing of the pair elimination <span class="math inline">\(\letpair{x}{y}{e}{e'}\)</span> is much wilder.</p><p>In this rule, we <em>first</em> check that the continuation <span class="math inline">\(e'\)</span> has the type <span class="math inline">\(C\)</span>, and then we learn from typechecking <span class="math inline">\(e'\)</span> that <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> were required to have had types <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> respectively. This gives us the data that we need to check that <span class="math inline">\(e\)</span> has the type <span class="math inline">\(A \tensor B\)</span>.</p><p>The linear function type <span class="math inline">\(A \lolli B\)</span> has a similar character:</p><p><span class="math display">\[ \begin{array}{ll} \rule{ \synth{\Gamma, x \From A}{e}{B} } { \synth{\Gamma}{\lam{x}{e}}{A \lolli B} } & \rule{ \synth{\Gamma}{e'}{A} & \check{\Delta}{e}{A \lolli B} } { \check{\Gamma, \Delta}{e\,e'}{B} } \end{array} \]</span></p><p>Here, to infer at type for the introduction form <span class="math inline">\(\lam{x}{e}\)</span>, we infer a type <span class="math inline">\(B\)</span> for the body <span class="math inline">\(e\)</span>, and then look up what type <span class="math inline">\(A\)</span> the parameter <span class="math inline">\(x\)</span> was required to be for the body to typecheck. To check that an application <span class="math inline">\(e\,e'\)</span> has the type <span class="math inline">\(B\)</span>, we infer a type <span class="math inline">\(A\)</span> for the argument <span class="math inline">\(e'\)</span>, and then check that the function <span class="math inline">\(e\)</span> has the function type <span class="math inline">\(A \lolli B\)</span>.</p><p>Again, the checking/synthesis mode of thse rules are precisely reversed from usual bidirectional type systems. We can see how this reversal plays out for variables below:</p><p><span class="math display">\[ \begin{array}{ll} \rule{ } { \check{x \From A}{x}{A} } & \rule{ \synth{\Gamma}{e}{A} & A = B} { \check{\Gamma}{e}{B} } \end{array} \]</span></p><p>Here, when we check that the variable <span class="math inline">\(x\)</span> has the type <span class="math inline">\(A\)</span>, the context must be such that it demands <span class="math inline">\(x\)</span> to have the type <span class="math inline">\(A\)</span>. (However, the switch between checking and synthesis is the same as ever.)</p><p>If you are used to regular bidirectional systems, the information flow in the variable rule (as well as for pattern matching for pairs and lambda-abstraction for functions) is a bit unexpected. We are used to having a context tell us what types each variable has. However, in this case we are not doing that! Instead, we use it to record the types that the rest of the program requires<br />variables to have.</p><p>This is still a "well-moded" program in the sense of logic programming. However, the moding is a bit more exotic now -- within a context, the variables are <em>inputs</em>, but their types are <em>outputs</em>. This is a bit fancier than the mode systems that usual logic programming languages have, but people have studied mode systems which can support this (such as Uday Reddy's <a href="http://www.cs.bham.ac.uk/~udr/papers/directional.pdf"><em>A Typed Foundation for Directional Logic Programming</em></a>).</p><p>As far as the metatheory of this system goes, I don't know very much about it. Substitution works fine -- you can easily prove a theorem of the form:</p><blockquote><p><strong>Theorem</strong> <em>(Substitution)</em> If <span class="math inline">\(\check{\Delta}{e}{A}\)</span>, then</p><ol style="list-style-type: decimal"><li>If <span class="math inline">\(\check{\Gamma, x \From A, \Theta}{e'}{C}\)</span> then <span class="math inline">\(\check{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\)</span>.</li><li>If <span class="math inline">\(\synth{\Gamma, x \From A, \Theta}{e'}{C}\)</span> then <span class="math inline">\(\synth{\Gamma, \Delta, \Theta}{[e/x]e'}{C}\)</span></li></ol></blockquote><p>However, I don't presently know a good characterization of the kind of terms are typable under this discipline. E.g., in the standard bidirectional presentation, the annotation-free terms are precisely the <span class="math inline">\(\beta\)</span>-normal terms. However, in the reverse bidirectional system, that is not the case. <!-- In particular, an application like <span class="math inline">\((\lam{x}{x})\,\unit\)</span> will check at the type <span class="math inline">\(1\)</span>, and this is an obvious <span class="math inline">\(\beta\)</span>-redex.</p> --><p>Two papers that seem closely related to this system are:</p><ol style="list-style-type: decimal"><li><p>Adam Chlipala, Leaf Petersen, and Robert Harper's TLDI 2005 paper, <a href="http://adam.chlipala.net/papers/StrictTLDI05/"><em>Strict Bidirectional Type Checking</em></a>, and</p></li><li><p>Ningning Xie and Bruno C. d. S. Oliveira's ESOP 2018 paper, <a href="https://xnning.github.io/papers/let-arguments-go-first.pdf"><em>Let Arguments Go First</em></a>.</p></li></ol><p>Adam and company's paper includes the traditional synthesizing bidirectional hypotheses, as well as checking hypotheses very similar to the ones in this post, but inspired by relevance logic rather than linear logic. The basic idea is that if a hypothesis is relevant, then it is okay to let checking determine its type, since we are guaranteed that the variable will appear in some checking context (which will tell us what type it should have). The same idea applies here, since linearity necessarily implies relevance.</p><p>Ningning and Bruno's paper has an application rule that looks exactly like the one in this paper -- argument types are synthesized, which permits inferring the type of a function head in a checking context. However, their system is focused on inferring polymorphic types, which makes the precise relationship a bit unclear to me.</p><p>The implementation of reverse bidirectionality is just as easy as traditional bidirectional systems, but I will leave that for my next post.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com3tag:blogger.com,1999:blog-8068466035875589791.post-39377271948198158182019-04-02T08:04:00.002-07:002019-04-02T08:04:34.800-07:00Some Paper Announcements<p>It has been a busy year for me so far, and it's nice to be able to tell the world about some of the fruits of that labour. In the next few months my coauthors and I will be presenting a few papers at PLDI and ECOOP. All of these papers will probably change a bit more before assuming their final forms, but I'd like to circulate their existence to give non-reviewers a chance to say their bit.</p><ul><li><p><a href="https://www.cl.cam.ac.uk/~nk480/parsing.pdf"><em>A Typed, Algebraic Approach to Parsing</em></a>, Neel Krishnaswami and Jeremy Yallop. Draft, accepted for publication at PLDI 2019.</p><blockquote><p>In this paper, we recall the definition of the context-free expressions (or µ-regular expressions), an algebraic presentation of the context-free languages. Then, we define a core type system for the context-free expressions which gives a compositional criterion for identifying those context-free expressions which can be parsed unambiguously by predictive algorithms in the style of recursive descent or LL(1).</p><p>Next, we show how these typed grammar expressions can be used to derive a parser combinator library which both guarantees linear-time parsing with no backtracking and single-token lookahead, and which respects the natural denotational semantics of context-free expressions. Finally, we show how to exploit the type information to write a staged version of this library, which produces dramatic increases in performance, even outperforming code generated by the standard parser generator tool ocamlyacc.</p></blockquote><p>In this paper we redo some classical results in parsing theory using the tools of type theory and semantics, and the results is very satisfying. Partly, this is due to the fact that the techniques we used are actually all known, but merely by chance happened to fall out of fashion.</p><p>It was also nice seeing just how big a win staging is, and how comprehensively it erases the costs of using combinators. I'd really like to take another look at using staged type systems as the basis of a macro system a la MacroML.</p></li><li><p><a href="https://www.cl.cam.ac.uk/~nk480/numlin.pdf"><em>NumLin: Linear Types for Linear Algebra</em></a>, Dhruv Makwana and Neel Krishnaswmi. Draft, conditionally accepted for publication at ECOOP 2019.</p><blockquote><p>We present NumLin, a functional programming language designed to express the APIs of low-level linear algebra libraries (such as BLAS/LAPACK) safely and explicitly, through a brief description of its key features and several illustrative examples. We show that NumLin’s type system is sound and that its implementation improves upon naïve implementations of linear algebra programs, almost towards C-levels of performance. Lastly, we contrast it to other recent developments in linear types and show that using linear types and fractional permissions to express the APIs of low-level linear algebra libraries is a simple and effective idea.</p></blockquote><p>One of the things that has always surprised me about linear types is that it has always seemed to me like they would be a natural fit for working with array/matrix programs, and yet if you look around there is surprisingly little work on this. In this paper Dhruv Makwana and I decided to try it to see if it could actually worked, and we learned two things:</p><ol style="list-style-type: decimal"><li>Linear types are indeed a good fit for array programming.</li><li>But really making it work calls for <em>fractional permissions</em>, and to date the type systems for working with fractions have been surprisingly complicated.</li><li>However, we came up with a trick for drastically reducing the implementation complexity of fractions, to the point that it's barely more complicated than standard HM type inference.</li></ol></li><li><p><a href="https://www.cl.cam.ac.uk/~nk480/wasm-pl.pdf"><em>A Program Logic for First-Order Encapsulated WebAssembly</em></a>, Conrad Watt, Petar Maksimov, Neel Krishnaswami, Phillipa Gardner. Draft, accepted for publication at ECOOP 2019.</p><blockquote><p>We introduce Wasm Logic, a sound program logic for first-order, encapsulated WebAssembly.</p><p>We design a novel assertion syntax, tailored to WebAssembly's stack-based semantics and the strong guarantees given by WebAssembly's type system, and show how to adapt the standard separation logic triple and proof rules in a principled way to capture WebAssembly's uncommon structured control flow. Using Wasm Logic, we specify and verify a simple WebAssembly B-tree library, giving abstract specifications independent of the underlying implementation.</p><p>We mechanise Wasm Logic and its soundness proof in full in Isabelle/HOL. As part of the soundness proof, we formalise and fully mechanise a novel, big-step semantics of WebAssembly, which we prove equivalent, up to transitive closure, to the original WebAssembly small-step semantics.</p></blockquote><p>For me, there was a bit of nostalgia in this paper: I started out my research career working on separation logic, and it was nice to be able to do that stuff again. Also, it was nice to find out what the heck Webassembly actually is!</p></li></ul>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-61029513934639255372018-08-10T07:41:00.000-07:002018-08-13T03:36:15.282-07:00Polarity and bidirectional typechecking <script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script> <p>This past July, Max New asked me about the relationship between bidirectional typechecking and semantic notions like polarization and call-by-push-value. I told him it was complicated, and that I would try to write something about the relationship. I was reminded of this since a couple of days ago, <a href="https://pigworker.wordpress.com/2018/08/06/basics-of-bidirectionalism/">Conor McBride wrote a blog post</a> where he laid out how he approaches bidirectional type systems. This is a fantastic post, and you should read it (probably twice or more, actually).</p><p>In his post, he remarks:</p><blockquote><p>I like my type systems to be bidirectional. Different people mean different things by the word “bidirectional”, and I’m one of them. Some people relate the notion to “polarity”, and I’m not one of them.</p></blockquote><p>I'm someone who would like to join the club Conor abjures, but I don't know how to!</p><p>It sure <em>looks</em> like bidirectional typechecking has a deep relationship to polarization or call-by-push-value, but I don't know how to make it work correctly. So, in this post is to explain why people think it does some deep semantic import, and then I'll talk about the mismatch that creates a problem I don't know the right way to handle. (Also, I apologize in advance for the lack of citations -- I really want to get this post out the door. If I didn't cite a paper you like (or even wrote), please link to it in the comments.)</p><p>The reason that people think that bidirectional typechecking must have a deep semantic meaning arises from how it works out in the simply-typed lambda calculus. Let's try writing some things down and seeing what happens. First, recall that bidirectional typechecking categorizes all terms as either "checking" or "synthesizing" and introduces <em>two</em> mutually recursive typing judgements for them.</p><p><span class="math display">\[\newcommand{\bnfalt}{\;\;|\;\;} \newcommand{\To}{\Rightarrow} \newcommand{\From}{\Leftarrow} \newcommand{\rule}[2]{\frac{\displaystyle \begin{matrix} #1 \end{matrix}}{\displaystyle #2}} \newcommand{\check}[3]{{#1} \vdash {#2} \From {#3}} \newcommand{\synth}[3]{{#1} \vdash {#2} \To {#3}} \newcommand{\lam}[1]{\lambda {#1}.\;} \newcommand{\inj}[2]{\mathsf{in}_{#1}({#2})} \newcommand{\case}[5]{\mathsf{case}({#1}, \inj{1}{#2} \to {#3}, \inj{2}{#4} \to {#5})} \newcommand{\match}[2]{\mathsf{match}\;{#1}\;\mathsf{of}\;[ #2 ]} \newcommand{\arm}[2]{{#1} \to {#2}} \newcommand{\downshift}[1]{\downarrow{#1}} \newcommand{\upshift}[1]{\uparrow{#1}} \newcommand{\thunk}[1]{\left\{{#1}\right\}} \newcommand{\return}[1]{\mathsf{return}\;#1} \newcommand{\fun}[1]{\lambda\;#1} \newcommand{\checkn}[3]{{#1} \rhd {#2} \From {#3}} \newcommand{\checkp}[2]{{#1} \leadsto {#2}} \newcommand{\spine}[4]{{#1} \vdash {#2} : {#3} \gg {#4}} \newcommand{\unit}{()} \]</span></p><blockquote><p><span class="math display">\[ \begin{array}{llcl} \mbox{Types} & A & ::= & b \bnfalt A \to B \\ \mbox{Contexts} & \Gamma & ::= & \cdot \bnfalt \Gamma, x:A \\ \mbox{Checking} & e & ::= & \lam{x}{e} \bnfalt t \\ \mbox{Synthesizing} & t & ::= & x \bnfalt t\;e \bnfalt e : A \\ \end{array} \]</span></p></blockquote><p>We'll start with a pretty minimal calculus -- we've got a base type <span class="math inline">\(b\)</span>, and functions. Contexts work as usual, giving variables their types, but terms are divided into <em>checking terms</em> and <em>synthesizing terms</em>. Checking terms are either introduction forms (just lambda-abstraction, here) or any synthesizing term (the intuition is that if we can infer a type for a term, we can surely check its type as well). Synthesizing terms are either variables (we can just look up their type in the context) applications, or explicitly annotated terms.</p><span class="math display">\[\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ \synth{\Gamma}{t}{A} & A = B } { \check{\Gamma}{t}{B} } & \rule{ \check{\Gamma}{e}{A} } { \synth{\Gamma}{e : A}{A} } \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}\]</span><p>You can see the intro/elim pattern in the first line -- when we check a lambda-abstraction (an introduction form) against a type <span class="math inline">\(A \to B\)</span>, we put the variable <span class="math inline">\(x\)</span> in the context at type <span class="math inline">\(A\)</span>, and check the body at type <span class="math inline">\(B\)</span>. When we apply a function, we first <em>infer</em> a type <span class="math inline">\(A \to B\)</span> for the function expression, which gives us a type <span class="math inline">\(A\)</span> to <em>check</em> the argument against.</p><p>So far, this is pretty standard stuff. Now, let's tweak the rules slightly.</p><span class="math display">\[\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ \color{red}{\synth{\Gamma}{t}{b}} } { \color{red}{\check{\Gamma}{t}{b}} } & \color{red}{\mbox{(No annotation rule)}} \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}\]</span><p>Now, we've made two changes. First, we've <em>deleted</em> the annotation rule, and second, we've restricted the checking/synthesis switch to only occur at base types <span class="math inline">\(b\)</span>.</p><ol style="list-style-type: decimal"><li><p>The effect of deleting the annotation rule is that it is not longer possible to write beta-reducible terms. A term in the simply-typed lambda-calculus written <span class="math inline">\((\lam{x:b}{x})\;y\)</span> which reduces to <span class="math inline">\(y\)</span> would be written <span class="math inline">\((\lam{x}{x} : b \to b)\;y\)</span> in a bidirectional system -- but without annotations these terms can no longer be written.</p></li><li><p>The effect of restricting the check/synth switch rule is that it is no longer possible to write eta-expandable terms. If <span class="math inline">\(f : b \to b \to b\)</span> and <span class="math inline">\(x : b\)</span>, then the term <span class="math inline">\(f\;x\)</span> would synthesize the type <span class="math inline">\(b \to b\)</span> in the original system. However, it no longer typechecks in our restricted system. To make it work, we have to <em>eta-expand</em> the term, so that we write <span class="math inline">\(\lam{y}{f\;x\;y}\)</span> instead. This now checks against <span class="math inline">\(b \to b\)</span> as we expect.</p></li></ol><p>So the joint-effect of these two restrictions is that only beta-normal, eta-long terms typecheck. The reason these terms are so important is that any two beta-eta equal terms will have the <em>same</em> normal form. So having a simple, easy characterization of these normal forms is really great! Moreover, this characterization is easy to extend to products:</p><span class="math display">\[\begin{matrix} \rule{ \check{\Gamma, x:A}{e}{B} } { \check{\Gamma}{\lam{x}{e}}{A \to B} } & \rule{ \synth{\Gamma}{t}{A \to B} & \check{\Gamma}{e}{A} } { \synth{\Gamma}{t\;e}{B} } \\[1em] \rule{ } { \check{\Gamma}{()}{1} } & \mbox{(No unit elimination rule)} \\[1em] \rule{ \check{\Gamma}{e_1}{A_1} & \check{\Gamma}{e_2}{A_2} } { \check{\Gamma}{(e_1, e_2)}{A_1 \times A_2} } & \rule{ \synth{\Gamma}{e}{A_1 \times A_2} & i \in \{1,2\} } { \synth{\Gamma}{\pi_i(e)}{A_i} } \\[1em] \rule{ \synth{\Gamma}{t}{b} } { \check{\Gamma}{t}{b} } & \mbox{(No annotation rule)} \\[1em] \rule{ x:A \in \Gamma } { \synth{\Gamma}{x}{A} } & \end{matrix}\]</span><p>This type system still characterizes normal forms in the STLC with units and products. Adding these constructors starts to give us a pattern:</p><ol style="list-style-type: decimal"><li>Introduction forms (lambda-abstractions, pairs, units) are <em>checking</em>.</li><li>Elimination forms (applications, projections) are <em>synthesizing</em>.</li></ol><p>Since units + pairs + functions are syntax for everything in cartesian closed categories, this is actually pretty wonderful. We seem to have a simple rule for characterizing beta-normal, eta-long forms.</p><p>But what happens when we try to add sums to the language? Let's try to follow our recipe, and see what happens:</p><span class="math display">\[\begin{matrix} \rule{ \check{\Gamma}{e}{A_i} & i \in \{1,2\} } { \check{\Gamma}{\inj{i}{e}}{A_1 + A_2} } & \rule{ \synth{\Gamma}{t}{A_1 + A_2} & \check{\Gamma, x_1:A_1}{e_1}{C} & \check{\Gamma, x_2:A_2}{e_2}{C} } { \check{\Gamma}{\case{t}{x_1}{e_1}{x_2}{e_2}}{C} } \end{matrix}\]</span><p>The introduction form seems to work. The elimination form is a bit more complicated -- it's the same syntax as always, but the checking/synthesis moding is a bit subtle. The expectation created by units/pairs/functions would be that both the scrutinee and the whole case form should synthesize, but expecting two branches with different contexts (i.e., <span class="math inline">\(\Gamma, x_1:A_1\)</span> for <span class="math inline">\(e_1\)</span> and <span class="math inline">\(\Gamma, x_2:A_2\)</span> for <span class="math inline">\(e_2\)</span>) to synthesize the same type is a morally dubious expectation (eg, it would not make sense in a dependently-typed setting). So we are led to say that case is checking, but that the scrutinee is synthesizing.</p><p>This imposes some restrictions on what does and doesn't count as a typeable term. For example, because <span class="math inline">\(\mathsf{case}\)</span> is checking rather than synthesizing, we can never write an expression like:</p><blockquote><p><span class="math display">\[a:((b \to A) + (b \to A)), x:b \vdash \case{a}{f}{f}{g}{g}\;x \From A\]</span></p></blockquote><p>Instead of applying an argument to a case expression of function type, we have to push the arguments into the branches:</p><blockquote><p><span class="math display">\[a:((b \to A) + (b \to A)), x:b \vdash \case{a}{f}{f\;x}{g}{g\;x} \From A\]</span></p></blockquote><p>From the point of view of typing normal forms, this actually doesn't seem too bad, because most people would consider the second term simpler than the first, and so this gives us a "nicer" notion of normal form. However, this doesn't seem like a real explanation, since our rules permit things like the following:</p><blockquote><p><span class="math display">\[\synth{f : b \to b, x:b+b}{f\;\case{x}{y}{y}{z}{z}}{b}\]</span></p></blockquote><p>To get to a better explanation before the heat death of the universe, I'm going to skip over about 20 years of research, and jump straight to polarized type theory.</p><span class="math display">\[\begin{matrix} \mbox{Positive Types} & P & ::= & 1 \bnfalt P \times Q \bnfalt P + Q \bnfalt \downshift{N} \\ \mbox{Negative Types} & N & ::= & P \to N \bnfalt \upshift{P} \\ \mbox{Values} & v & ::= & () \bnfalt (v,v) \bnfalt \inj{i}{v} \bnfalt \thunk{t} \\ \mbox{Spines} & s & ::= & \cdot \bnfalt v\;s \\[1em] \mbox{Terms} & t & ::= & \return{v} \bnfalt \fun{\overrightarrow{\arm{p_i}{t_i}}} \bnfalt \match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}} \\ \mbox{Patterns} & p & ::= & () \bnfalt (p,p') \bnfalt \inj{i}{p} \bnfalt \thunk{x} \\[1em] \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, x:N \\ \mbox{Typing Values} & & & \check{\Gamma}{v}{P} \\ \mbox{Typing Spines} & & & \spine{\Gamma}{s}{N}{M} \\ \mbox{Typing Terms} & & & \checkn{\Gamma}{t}{N} \\ \mbox{Typing Patterns} & & & \checkp{p:P}{\Delta} \\ \end{matrix}\]</span><p>The key idea in polarized type theory is to divide types into two categories, the <em>positive</em> types (sums, strict products, and suspended computations, denoted by <span class="math inline">\(P\)</span>) and the <em>negative</em> types (basically, functions, denoted by <span class="math inline">\(N\)</span>). Positive types are basically those that are eliminated with pattern matching, and the negative types are the ones that are eliminated by supplying arguments. Negatives types can be embedded into positive types using the "downshift" type <span class="math inline">\(\downshift{N}\)</span> (representing suspended computations) and positive types can be embedded into the negatives using the "upshift" <span class="math inline">\(\upshift{P}\)</span> (denoting computations producing <span class="math inline">\(P\)</span>'s).</p><p>The funny thing about this setup is that despite arising from meditation upon invariants of proof theory, we end up with a syntax that is much closer to practical functional programming languages than the pure typed lambda calculus! For example, syntax for polarized calculi tends to have <em>pattern matching</em>. However, one price of this is a proliferation of judgements. We usually end up introducing separate categories of <em>values</em> (for introducing positive types) and <em>spines</em> (argument lists for elminating negative types), as well as <em>terms</em> (how to put values and spines together in computations, as well as introducing negative types) and <em>patterns</em> (how to eliminate positive types).</p><p>Now, let's talk through the rules. First up is the <span class="math inline">\(\check{\Gamma}{v}{P}\)</span> judgement for checking the type of positive values.</p><span class="math display">\[\begin{matrix} \rule{} { \check{\Gamma}{()}{1} } & \rule{ \check{\Gamma}{v}{P} & \check{\Gamma}{v'}{Q} } { \check{\Gamma}{(v,v')}{P \times Q} } \\[1em] \rule{ \check{\Gamma}{v}{P_i} & i \in \{1,2\} } { \check{\Gamma}{\inj{i}{v}}{P_1 + P_2} } & \rule{ \checkn{\Gamma}{t}{N} } { \check{\Gamma}{\thunk{t}}{\downshift{N}} } \end{matrix}\]</span><p>The rules for units, pairs and sums are the same as always. The rule for downshift says that if a term <span class="math inline">\(t\)</span> checks at a negative type <span class="math inline">\(N\)</span>, then the thunked term <span class="math inline">\(\thunk{t}\)</span> will check against the downshifted type <span class="math inline">\(\downshift{N}\)</span>.</p><p>We'll see the rules for terms in a bit, but next will come the rules for spines, in the judgement <span class="math inline">\(\spine{\Gamma}{s}{N}{M}\)</span>. This judgement says that if the spine <span class="math inline">\(s\)</span> is applied to a head of type <span class="math inline">\(N\)</span>, it will produce a result of type <span class="math inline">\(M\)</span>. In this judgement, the type <span class="math inline">\(N\)</span> is an algorithmic input, and the type <span class="math inline">\(M\)</span> is an output.</p><span class="math display">\[\begin{matrix} \rule{ } { \spine{\Gamma}{\cdot}{N}{N} } \qquad \rule{ \check{\Gamma}{v}{P} & \spine{\Gamma}{s}{N}{M} } { \spine{\Gamma}{v\;s}{P \to N}{M} } % \\[1em] % \rule{ \forall i < n.\; & \checkp{p_i:P}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{M} } % { \spine{\Gamma}{\match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{\upshift{P}}{M} } \end{matrix}\]</span><p>The first rule says that if you have an empty argument list then the result is the same as the input, and the second rule says that if <span class="math inline">\(v\)</span> is a value of type <span class="math inline">\(P\)</span>, and <span class="math inline">\(s\)</span> is an argument list sending <span class="math inline">\(N\)</span> to <span class="math inline">\(M\)</span>, then the extended argument list <span class="math inline">\(v\;s\)</span> sends the function type <span class="math inline">\(P \to N\)</span> to <span class="math inline">\(M\)</span>.</p><p>With values and spines in hand, we can talk about terms, in the term typing judgement <span class="math inline">\(\checkn{\Gamma}{t}{N}\)</span>, which checks that a term <span class="math inline">\(t\)</span> has the type <span class="math inline">\(N\)</span>.</p><span class="math display">\[\begin{matrix} \rule{ \check{\Gamma}{v}{P} } { \checkn{\Gamma}{\return{v}}{\upshift{P}} } \qquad \rule{ \forall i < n.\; & \checkp{p_i:P}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{N} } { \checkn{\Gamma}{\fun{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{P \to N} } \\[1em] \rule{ x:M \in \Gamma & \spine{\Gamma}{s}{M}{\upshift{Q}} & \forall i < n.\; \checkp{p_i:Q}{\Delta_i} & \checkn{\Gamma, \Delta_i}{t_i}{\upshift{P}} } { \checkn{\Gamma}{\match{x \cdot s}{\overrightarrow{\arm{p_i}{t_i}}^{i < n}}}{\upshift{P}} } \end{matrix}\]</span><p>The rule for <span class="math inline">\(\return{v}\)</span> says that we can embed a value <span class="math inline">\(v\)</span> of type <span class="math inline">\(P\)</span> into the upshift type <span class="math inline">\(\upshift{P}\)</span> by immediately returning it. Lambda abstractions are pattern-style -- instead of a lambda binder <span class="math inline">\(\lam{x}{t}\)</span>, we give a whole list of patterns and branches <span class="math inline">\(\fun{\overrightarrow{\arm{p_i}{t_i}}}\)</span> to check at the type <span class="math inline">\(P \to N\)</span>. As a result, we need a judgement <span class="math inline">\(\checkp{p_i:P}{\Delta_i}\)</span> which gives the types of the bindings <span class="math inline">\(\Delta_i\)</span> of the pattern <span class="math inline">\(p_i\)</span>, and then we check each <span class="math inline">\(t_i\)</span> against the result type <span class="math inline">\(N\)</span>.</p><p>The match statement <span class="math inline">\(\match{x\cdot s}{\overrightarrow{\arm{p_i}{t_i}}}\)</span> also has similar issues in its typing rule. First, it finds a variable in the context, applies some arguments to it to find a value result of type <span class="math inline">\(\upshift{Q}\)</span>, and then pattern matches against type <span class="math inline">\(Q\)</span>. So we check that the spine <span class="math inline">\(s\)</span> sends <span class="math inline">\(M\)</span> to the type <span class="math inline">\(\upshift{Q}\)</span>, and then check that the patterns <span class="math inline">\(p_i\)</span> yield variables <span class="math inline">\(\Delta_i\)</span> at the type <span class="math inline">\(Q\)</span>, and then check the <span class="math inline">\(t_i\)</span> against the type <span class="math inline">\(\upshift{P}\)</span>.</p><p>Restricting the type at which we can match forces us to eta-expand terms of function type. Also, these rules omit a side-condition for pattern coverage. (I have <a href="http://semantic-domain.blogspot.com/2012/08/pattern-compilation-made-easy.html">an old blog post</a> about how to do that if you are curious.)</p>Both lambda-abstraction and application/pattern-matching need the judgement <span class="math inline">\(\checkp{p:P}{\Delta}\)</span> to find the types of the bindings. The rules for these are straightforward: <span class="math display">\[\begin{matrix} \rule{ } { \checkp{\thunk{x} {:} \downshift{\!N}}{x:N} } & \rule{ } { \checkp{\unit : 1}{\cdot} } \\[1em] \rule{ \checkp{p_1 : P_1}{\Delta_1} & \checkp{p_2 : P_2}{\Delta_2} } { \checkp{(p_1,p_2) : P_1 \times P_2}{\Delta_1, \Delta_2} } & \rule{ \checkp{p:P_i}{\Delta} & i \in \{1,2\} } { \checkp{\inj{i}{p} : P_1 + P_2}{\Delta} } \end{matrix}\]</span><p>Units yield no variables at type <span class="math inline">\(1\)</span>, pair patterns <span class="math inline">\((p_1, p_2)\)</span> return the variables of each component, coproduct injections <span class="math inline">\(\inj{i}{p}\)</span> return the variables of the sub-pattern <span class="math inline">\(p\)</span>, and thunk patterns <span class="math inline">\(\thunk{x}\)</span> at type <span class="math inline">\(\downshift{N}\)</span> return that variable <span class="math inline">\(x\)</span> at type <span class="math inline">\(N\)</span>.</p>At this point, it sure looks like we have a perfect bidirectional type system for a polarized calculus. What's the problem? The problem is that I palmed a card! Here's the relevant bit of the grammar I kicked things off with: <span class="math display">\[\begin{matrix} \ldots \\ \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, \color{red}{x:N} \\ \ldots \end{matrix}\]</span><p>The context <span class="math inline">\(\Gamma\)</span> has been restricted to <em>only</em> contain variables of negative type. It doesn't allow variables of positive type! And, I don't know how to add it in the "right" way. If we wanted positive variables (in fact, call-by-push-value <em>only</em> has positive variables), we could add them in the following way:</p><span class="math display">\[\begin{matrix} \mbox{Values} & v & ::= & () \bnfalt (v,v) \bnfalt \inj{i}{v} \bnfalt \thunk{t} \bnfalt \color{red}{u} \\ \mbox{Contexts} & \Gamma,\Delta & ::= & \cdot \bnfalt \Gamma, x:N \bnfalt \Gamma, \color{red}{u:P} \\ \mbox{Patterns} & p & ::= & () \bnfalt (p,p') \bnfalt \inj{i}{p} \bnfalt \thunk{x} \bnfalt \color{red}{u} \\[1em] \end{matrix}\]</span>So we add value variables <span class="math inline">\(u\)</span> to the syntax of values, and so we have to also add them to contexts, and also extend pattern matching with them to bind values. Then, the rules for these things would look like the following: <span class="math display">\[\begin{matrix} \rule{ } {\checkp{u:P}{u:P}} & \rule{ u:Q \in \Gamma & Q = P } { \check{\Gamma}{u}{P} } \end{matrix}\]</span><p>So a variable pattern at value type simply binds the variable at that type, and when we use a value variable we have the check that the type in the context matches the type that we're checking the term at.</p><p>And that's the wrong thing to do! The bidirectional recipe says that we should check equality of types <em>only</em> when we switch between checking and synthesis, and so while this rule might or might not <em>work</em>, it's clearly not arranged the information flow properly, since we have a random-looking subsumption test in the value variable rule.</p><p>Some additional idea is needed, and I'm not sure what, yet.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com14tag:blogger.com,1999:blog-8068466035875589791.post-51236290188326368272018-08-06T08:05:00.000-07:002018-08-06T08:05:28.145-07:00Category Theory in PL research<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script><p><a href="https://twitter.com/maxsnew/status/1024395703093813249">Over on Twitter, Max New wrote</a>:</p><blockquote><p>Maybe the reason category theory seems useless for PL to many people is that it only gets really useful when you start doing "sophisticated" stuff like internal, enriched and fibered categories and the intros for computer scientists don't get to those things.</p></blockquote><p>Max is a very sharp person, but I do not agree with this!</p><p>The most useful applications of category theory in my own research have all been very, very elementary, to the point that people who (unlike me) are genuinely good at category theory are always a little surprised that I can get away with such primitive machinery.</p><p>Basically, the key idea is to formulate models of languages/libraries in terms of categories of structured sets with structure-preserving maps (aka homomorphisms). And, well, that's it!</p><p>I don't have a really crisp explanation of the process of figuring out how to go from "I have a property I'd like to represent mathematically" to "sets should have the following structure and the following properties should be preserved by the homomorphisms", but I can give an example to illustrate the recipe.</p><ol style="list-style-type: decimal"><li><p>Let's suppose we want to control the memory usage of programs.</p></li><li><p>Think of a type as a set, together with some operations talking about the property in question. In this case, let's say that a type is a pair <span class="math inline">\((X \in \mathrm{Set}, w_X : X \to \mathbb{N})\)</span>, where <span class="math inline">\(X\)</span> are the elements of the type and the "weight function" <span class="math inline">\(w_X\)</span> gives you a number for each value telling you how much memory gets used.</p></li><li><p>Now, let's define what the functions <span class="math inline">\(f : (X, w_X) \to (Y, w_Y)\)</span> should be. Since values are sets, obviously a function between two types should be a function <span class="math inline">\(f : X \to Y\)</span> on the underlying sets. However, we have a weight function for each type, and so we should do something sensible them. The first, most obvious idea is to say that maps should preserve the weight exactly -- i.e., that <span class="math inline">\(w_Y(f(x)) = w_X(x)\)</span>.</p><p>This means that memory is never allocated, but it also means that memory is never <em>de</em>-allocated. So since we probably do want to allow deallocation, it makes sense to weaken this condition to something more like <span class="math inline">\(w_Y(f(x)) \leq w_X(x)\)</span>.</p></li><li><p>Now, we go off and prove things about this language. In short order we will discover that this is a monoidal closed category, and so we can give a linearly-typed language for memory-controlled programs.</p></li></ol><p>This is basically the idea in the late Martin Hofmann's 2003 paper, <a href="https://www.sciencedirect.com/science/article/pii/S0890540103000099"><em>Linear types and non-size-increasing polynomial time computation</em></a>. He called this category the category of "length spaces". This paper was a revelation to me, because it was the first time that I really understood how plain old ordinary mathematics could be used to model genuinely computational phenomena.</p><p>There are a <em>lot</em> of examples like this -- here are four more I like:</p><ul><li><p>Domains and continuous functions</p><p>The great grand-parent of using structured sets to model computations is, of course, domain theory. In this setting, a type is a set along with a complete partial order on it, and functions must be continuous with respect to this order. The intuition is that an always-nonterminating program is the least informative program, and that as programs loop on fewer and fewer inputs, they get more and more defined. Continuity basically means that giving a program a more-defined input will never make it loop more often.</p><p><a href="https://www.cs.bham.ac.uk/~sjv/">Steve Vickers</a>'s book <a href="http://admin.cambridge.org/academic/subjects/computer-science/programming-languages-and-applied-logic/topology-logic"><em>Topology via Logic</em></a> was how I learned to think about domains in a way that brought the computational intuition to the forefront.</p></li><li><p>Nominal sets and equivariant maps</p><p>In languages like ML, Haskell and Java, it is permissible to create pointers and test them for pointer equality, but it is not permissible to compare pointers for order (you can use <code>==</code> but you can't use <code><</code>). This means that the order in which pointers are created is not observable, a fact which is very important for both the equational theory and optimization of programs.</p><p>To model this, we can introduce a set of locations (aka names, or atoms). To model the fact that we can't observe the order, we can say that we want the meanings of programs to be independent of permutations of the names -- as long as we don't identify two locations, the program can't tell if we reorder the locations in the heap.</p><p>So, we can abstractly model the idea of reordering locations by saying that a type is a pair <span class="math inline">\((X, a : Perm \times X \to X)\)</span>, where <span class="math inline">\(X\)</span> is a set of values, and the "action" <span class="math inline">\(a\)</span> tells you how to rename all the values in the set when you are given a permutation. Then, a morphism <span class="math inline">\(f : (X, a) \to (Y, b)\)</span> will be a function such that <span class="math inline">\(b(\pi, f(x)) = f(a(\pi, x))\)</span> -- that is, we get the same answer whether we rename the argument to <span class="math inline">\(f\)</span> or apply the function and then rename.</p><p>I learned this from <a href="https://www.cl.cam.ac.uk/~amp12">Andy Pitts</a>'s work on <em>nominal logic</em> -- here is <a href="https://www.cl.cam.ac.uk/~amp12/papers/nomt/nomt.pdf">a nice introduction for SIGLOG news</a> which he wrote.</p></li><li><p>Partial orders and monotone maps</p><p>My student <a href="http://www.rntz.net/">Michael Arntzenius</a> has been looking at higher-order database query languages, and one of the ideas we have been working with (in <a href="http://www.rntz.net/datafun/index.html">the language <em>Datafun</em></a>), has been inspired by database query languages and dataflow analysis, where recursive queries are formulated as fixed points of monotone functions on partial orders.</p></li><li><p>Ultrametric spaces and non-expansive maps</p><p>In joint work with <a href="https://research.fb.com/people/benton-nick/">Nick Benton</a>, we used the category of (wait for it) complete 1-bounded bisected ultrametric spaces and non-expansive maps to model reactive computations.</p><p>Our goal was to model reactive programs, which have the property of <em>causality</em>. That is, an interactive system can only produce an output at time <span class="math inline">\(n\)</span> from the first <span class="math inline">\(n\)</span> inputs -- it can't look into the future to make an action.</p><p>We began with this standard definition of causality, which was defined on streams, and then -- after reading the classic paper of Alpern and Schneider <a href="https://www.cs.cornell.edu/fbs/publications/defliveness.pdf"><em>Defining Liveness</em></a> -- we noticed that it corresponded to nonexpansiveness when streams are given the Cantor metric (i.e., the distance between two streams is <span class="math inline">\(2^{-n}\)</span> when they first differ at the <span class="math inline">\(n\)</span>-th position). This led us to look at categories of metric spaces, and we were able to adapt some work of <a href="https://cs.au.dk/~birke/">Lars Birkedal</a> and his collaborators to devise <a href="https://www.cl.cam.ac.uk/~nk480/frp-lics11.pdf">a new language for reactive programming</a>.</p></li></ul><p>One point I want to make is that while this is not a mechanical process, it is also "ordinary mathematics". In the last two cases, we had to try very hard to understand the problems we were trying to solve, but we didn't need vast amounts of categorical machinery. We really only needed elementary definitions from category theory, and we needed it because it was the technology that let us recognise when we had actually found a nice algebraic solution to our problem.</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com2tag:blogger.com,1999:blog-8068466035875589791.post-82713496663560627642018-08-03T03:19:00.000-07:002018-08-03T03:19:43.533-07:00The Worm Ouroboros <div> <style type="text/css">code{white-space: pre;}</style> <style type="text/css">div.sourceCode { overflow-x: auto; } table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode { margin: 0; padding: 0; vertical-align: baseline; border: none; } table.sourceCode { width: 100%; line-height: 100%; } td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; } td.sourceCode { padding-left: 5px; } code > span.kw { color: #007020; font-weight: bold; } /* Keyword */ code > span.dt { color: #902000; } /* DataType */ code > span.dv { color: #40a070; } /* DecVal */ code > span.bn { color: #40a070; } /* BaseN */ code > span.fl { color: #40a070; } /* Float */ code > span.ch { color: #4070a0; } /* Char */ code > span.st { color: #4070a0; } /* String */ code > span.co { color: #60a0b0; font-style: italic; } /* Comment */ code > span.ot { color: #007020; } /* Other */ code > span.al { color: #ff0000; font-weight: bold; } /* Alert */ code > span.fu { color: #06287e; } /* Function */ code > span.er { color: #ff0000; font-weight: bold; } /* Error */ code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */ code > span.cn { color: #880000; } /* Constant */ code > span.sc { color: #4070a0; } /* SpecialChar */ code > span.vs { color: #4070a0; } /* VerbatimString */ code > span.ss { color: #bb6688; } /* SpecialString */ code > span.im { } /* Import */ code > span.va { color: #19177c; } /* Variable */ code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */ code > span.op { color: #666666; } /* Operator */ code > span.bu { } /* BuiltIn */ code > span.ex { } /* Extension */ code > span.pp { color: #bc7a00; } /* Preprocessor */ code > span.at { color: #7d9029; } /* Attribute */ code > span.do { color: #ba2121; font-style: italic; } /* Documentation */ code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */ code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */ code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */ </style><p>Here's a quick question: can the behaviour of the following bit of C code depend upon the semantics of the Python programming language?</p><div class="sourceCode"><table class="sourceCode c numberLines"><tr class="sourceCode"><td class="lineNumbers"><pre>1<br />2<br />3<br />4<br />5<br />6<br />7<br /></pre></td><td class="sourceCode"><pre><code class="sourceCode c"><span class="dt">int</span> sum(<span class="dt">size_t</span> len, <span class="dt">char</span> *array) { <br /> <span class="dt">int</span> total = <span class="dv">0</span>;<br /> <span class="cf">for</span> (<span class="dt">int</span> i = <span class="dv">0</span>; i < len; i++) { <br /> total += array[i];<br /> }<br /> <span class="cf">return</span> total;<br />}</code></pre></td></tr></table></div><p>Now, the very fact that I am asking this question probably makes you suspicious. (This suspicion is correct and natural, and incidentally also indicates that you basically understand Derrida's critique of the metaphysics of presence.) Luckily, as lecturers go, I'm a very generous grader: I'll accept both no <em>and</em> yes as answer!</p><p>The "no" answer is easy enough to explain: the C language specification defines an (informal) abstract machine, and the behaviour of the <code>sum</code> function can be explained entirely in terms of that abstract machine. Python never shows up in this semantics, so how on Earth could it possibly be relevant to the behaviour of this code?</p><p>To get to "yes", let's look at another bit of C code (please forgive the total lack of error handling, as well as the absence of all includes):</p><div class="sourceCode"><table class="sourceCode c numberLines"><tr class="sourceCode"><td class="lineNumbers"><pre>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br />18<br />19<br /></pre></td><td class="sourceCode"><pre><code class="sourceCode c"><span class="dt">int</span> foo(<span class="dt">char</span> *filename) { <br /> <span class="dt">int</span> fd = open(<span class="st">"foo.txt"</span>, O_RDONLY); <span class="co">// open a file</span><br /> <span class="kw">struct</span> stat s; <br /> fstat(fd, &s); <span class="co">// we only want s.size</span><br /> <span class="co">//</span><br /> <span class="co">// memory-mapped IO (every time, I remember Perlis epigram 11)</span><br /> <span class="co">//</span><br /> <span class="dt">char</span> *array = <br /> (<span class="dt">char</span> *) mmap(NULL, s.size, PROT_READ, MAP_SHARED, fd, <span class="dv">0</span>); <br /> <span class="co">//</span><br /> <span class="co">// Now we can add up the byte values:</span><br /> <span class="dt">int</span> total = sum(s.size, array);<br /> <span class="co">//</span><br /> <span class="co">// We're done with the file now.</span><br /> munmap(array, s.size);<br /> close(fd);<br /><br /> <span class="cf">return</span> total;<br />}</code></pre></td></tr></table></div><p>This function will open a file <code>"foo.txt"</code>, and then will get the file's size using <code>fstat</code>, and then use that size information to call <code>mmap</code>, mapping that file into memory, storing the start pointer at <code>array</code>. Then we'll call <code>sum</code> to total the bytes of the array. But note that:</p><ol style="list-style-type: decimal"><li>The semantics of <code>array</code> depend on the semantics of the file system.</li><li>The file system might be a user-mode file system, implemented using something like <a href="https://github.com/libfuse/libfuse">FUSE</a> or <a href="http://www.secfs.net/winfsp/">WinFsp</a>.</li><li>You could (and various people have) written user-mode filesystems to expose a <a href="https://www.mercurial-scm.org/">Mercurial</a> repository as a file system.</li><li>Mercurial is implemented in Python.</li></ol><p>Let me emphasize: on a modern computer, a pointer dereference could very lead to the <em>execution of Python code</em>. Indeed, seeing an assembly listing, like this one:</p><div class="sourceCode"><table class="sourceCode nasm numberLines"><tr class="sourceCode"><td class="lineNumbers"><pre>1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />16<br />17<br /></pre></td><td class="sourceCode"><pre><code class="sourceCode nasm"><span class="fu">sum:</span><br /> <span class="kw">test</span> <span class="kw">edi</span>, <span class="kw">edi</span><br /> <span class="kw">jle</span> .L4<br /> <span class="kw">mov</span> rdx, rsi<br /> <span class="kw">lea</span> <span class="kw">eax</span>, [rdi-<span class="dv">1</span>]<br /> <span class="kw">lea</span> rsi, [rsi+<span class="dv">1</span>+rax]<br /> <span class="kw">mov</span> <span class="kw">eax</span>, <span class="dv">0</span><br /><span class="fu">.L3:</span><br /> <span class="kw">movsx</span> <span class="kw">ecx</span>, <span class="dt">BYTE</span> <span class="dt">PTR</span> [rdx]<br /> <span class="kw">add</span> <span class="kw">eax</span>, <span class="kw">ecx</span><br /> <span class="kw">add</span> rdx, <span class="dv">1</span><br /> <span class="kw">cmp</span> rdx, rsi<br /> <span class="kw">jne</span> .L3<br /> rep <span class="kw">ret</span><br /><span class="fu">.L4:</span><br /> <span class="kw">mov</span> <span class="kw">eax</span>, <span class="dv">0</span><br /> <span class="kw">ret</span></code></pre></td></tr></table></div><p>offers no guarantee that anything "close to the metal" will actually happen. This is the normal state of affairs, which is almost certainly true about every computer upon which this article is being read. (If it's not, you've probably somehow compiled your browser into a kernel module, which is security-wise an incredibly reckless thing to do.)</p><p>You might wonder, what's the moral of this story? Actually, there isn't a moral -- instead, there's a question. Namely, how can we prove that this kind of thing is a sensible thing to do, ever? The key intellectual difficulty is that the rationale for for why this kind of program can work appears to be <em>circular</em>.</p><p>In general, the easy case of abstraction is when things build up in layers. We start with a piece of hardware, and on this hardware we run an operating system, which (by running in kernel mode to access various hardware primitives) implements address space virtualization. Since user-moder code can never touch the hardware primitives to manipulate things like page tables, we can conclude (assuming the implementation of virtual memory is correct) that the virtual memory abstraction is seamless -- user code can't tell (without relying on OS calls or side-channels like timing) how memory is implemented.</p><p>However, what memory-mapped IO and user-mode file systems let you do is to move part of the implementation of the memory abstraction into user space. Now the correctness of the Python implementation depends upon the correctness of the virtual memory, but the correctness of the virtual memory abstraction depends upon the correctness of the Python implementation!</p><p>If A implies B, and B implies A, then we are not licensed to conclude tha A and B hold. (This is because it's bad reasoning! Since false implies itself and false implies everything, if this principle were true in general then everything would be true.) And yet: something very close to this kind of reasoning is essential for building systems from modular components, or even writing concurrent algorithms. (We call it rely-guarantee, usually.)</p></div>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-54084728187156713822018-07-25T07:19:00.001-07:002018-07-25T07:32:15.968-07:00A Typed, Algebraic Approach to Parsing<p>(<strong>Note:</strong> If you're on the POPL PC, please don't read this post until after your reviewing is done.)</p><hr /><p>It's been over 20 years since Graham Hutton and Erik Meijer wrote their wonderful JFP paper on monadic combinator parsing. Do we need another paper about it?</p><p>Jeremy Yallop and I think the answer is yes! We have a new draft paper out, <a href="http://www.cl.cam.ac.uk/~nk480/parsing.pdf"><em>A Typed, Algebraic Approach to Parsing</em></a>.</p><p>The fundamental issue is that parser combinators are very pretty, but suffer from two serious problems.</p><ol style="list-style-type: decimal"><li><p>First, parser combinator librariess tend to be <strong>implementation-defined</strong>.</p><p>Context-free grammars have the very nice property that it's very easy to exactly define the set of strings that a parser for that grammar should match. This, in turn, makes it easy to prove that there are many grammar transformations which do not change that set. For example, if the BNF production <code>A</code> recognizes a language, and <code>B</code> recognizes a different language, then the production <code>A | B</code> will the union of the two languages recognized by <code>A</code> and <code>B</code>. As a result, swapping the order of an alternative from <code>A | B</code> to <code>B | A</code> will not change the language recognized.</p><p>However, most implementations of parser combinators do not satisfy these kinds of properties! Swapping the order of the alternatives usually <em>can</em> change the parse tree returned from a parser built froom parser combinators. This mismatch between the high-level explanation and what the code actually does makes debugging and refactoring harder, and also makes it hard to understand what a combinator parser does without knowing the actual implementation of the library.</p></li><li><p>Next, combinator parsing tends to be <strong>inefficient</strong>.</p><p>The standard parser combinator algorithm is usually some variant of backtracking recursive descent, and that means that parse times can become exponential in the length of the input. Basically, in the worst case you have to save a backtrack point after seeing each character, which gives you exponentially many branches you may have to explore.</p><p>There are two general classes of reaction to this problem. First, the combinator library can explicitly expose backtracking primitives to the user of the library, so that programmers can control when backtracking can occur. This does resolve the problem, but a drastic price: there is basically no specification of the accepted language beyond the actual code of the parser implementation.</p><p>The second (and somhewat better) reaction is to switch to another parsing technology. The most notable such alternative is <a href="http://bford.info/packrat/">PEG (aka packrat) parsing</a>. Now, the good thing about packrat parsing is that actually does have predictable linear-time parsing, but that comes at a pretty high cost: the memory usage of PEG parsers is linear in the input, and choice is not commutative.</p></li></ol><p>Now, this all sounds like a pretty strong argument for sticking with traditional parser generators, whether LR (eg, yacc) or LL (eg, ANTLR). A BNF spec has a clear declarative reading, and the generated parsers are efficient, so what's not to like? Two things:</p><ol style="list-style-type: decimal"><li><p>Honestly, the BNF notation for context-free grammars has its own weaknesses. For example, BNF has no notion of binding or variable, and this makes it quite hard to build grammars in a modular fashion. For example, it's a good idea for all the sequence constructions in a language to uniformly treat delimiters as separators or terminators, but with BNF you have to manually define each sequence type, making it easy to screw this up. If you had the ability to build abstractions, you define a single sequence construction primitive which would let you make the choice once and for all. This is easy to do with parser combinators, but missing from most parser generators (<a href="http://gallium.inria.fr/~fpottier/menhir/">Menhir</a> being a notable exception).</p></li><li><p>Furthermore, a problem common to all table-generating parser generators is that their error reporting is <em>horrible</em> -- when there is a bug in the grammar taking you out of the handled language class, the parser generator vomits its internal data structures over the programmer, who has to then pick through the chunks to figure out what the problems were. It would be much better if errors could be reported in terms of the actual grammar the programmer wrote!</p></li></ol><p>So Jeremy and I basically set out to fix all these problems. Our paper proceeds via the following steps:</p><ol style="list-style-type: decimal"><li><p>First, we found a better notation for context-free languages than BNF. Basically, if you take regular expressions and add a least fixed operator to them, then you can recognize exactly the context free languages. This is not a new observation; it's been known since at least the early 1970s. But the payoff of replacing nonterminals with fixed point operators is that there is now a very clear story on variable scope and binding.</p></li><li><p>As most of you know, we don't actually want to allow arbitrary context-free languages, since some of them are inherently ambiguous. So the next thing we do is we define a <em>type system</em> for our context-free expressions, which can statically identify unambiguous grammars which can be parsed efficiently (ie, by non-backtracking recursive descent with a single token of lookahead).</p><p>The benefit of this is that type checking is local and syntactic, and so all our errors can be reported in terms of the grammar that the programmer wrote.</p></li><li><p>Next, we define a family of parser combinators which operate on typed grammars. Our parser combinators have a very simple implementation story -- there's no backtracking and no fancy lookahead, so the type of parsers is as simple as can be. Moreoever, we can exploit the type information when parsing alternatives, by using a bit of lookahead to decide which branch to take.</p></li><li><p>Our parser combinators have very predictable performance, but are still fairly slow, due to all the overhead of indirecting through function pointers (due to all the higher-order functions involved). So we use <a href="http://okmij.org/ftp/ML/MetaOCaml.html"><em>staged programming</em></a> to eliminate this overhead. Basically, staging lets us eliminate all the overhead, leading to generated code which looks more or less exactly like the kinds of hand-coded recursive descent parsers that experts write by hand.</p></li></ol><p>The resulting parsers have excellent performance -- in our benchmarks, we outperform ocamlyacc-generated code.</p><p>So we get the abstraction benefits of parser combinators, good static error reporting about ill-formed grammars, and excellent performance to boot!</p>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com3tag:blogger.com,1999:blog-8068466035875589791.post-37079744328254282782018-04-24T09:34:00.000-07:002018-04-30T02:37:27.577-07:00Are functional programs easier to verify than imperative programs?<p>Via <a href="https://twitter.com/arntzenius">Michael Arntzenius (@arntzenius) on Twitter</a>, I saw that <a href="https://twitter.com/Hillelogram">Hillel Wayne</a> posted the following remark: <blockquote><a href="https://twitter.com/Hillelogram/status/987432178840756225">Lots of people say "FP is easier to analyze than imperative code because of purity" but whenever I ask for evidence people look at me like I'm crazy. So I'd like to make a challenge: I'll provide three imperative functions, and your job is to convert them into pure functions.</a></blockquote> <p>If you follow the link, you'll see his three programs. Each of them is basically a small imperative program, of the kind you might see a Python programmer write. I'm not going to post any correctness proofs of them, since my interest is rather to comment upon the difficulty of verification. Since I am too bad at concision to fit thoughts into 280 characters, I'm doing this via a blog post. :) <p>The high-order bit is that the three programs he suggested are equally easy/hard to verify in imperative and functional styles. This is because none of the programs make any significant use of procedural abstraction. <p>The difficulty of imperative programming arises from the <em>combination</em> of state, aliasing and procedure calls. Any two of these features can be handled without <em>that</em> much difficulty, but the combination of all three effectively makes reasoning (nearly) as difficult as correct concurrent programming. <p>At a high level, reasoning is easy (more honestly, feasible) when each construct in your language has a compositional specification -- when you can reason about the behaviour of a subprogram just in terms of its specification, rather than looking at its full source code each time. <p>Hoare logic gives you a compositional reasoning technique for reasoning about imperative updates. When data can be aliased, a specification must also say something about the aliasing expected by your program. The current best technique for these specifications is a generalization of Hoare logic called separation logic. (I should say something about how it works, but this post is already too long!) <p>The addition of full procedures makes the problem much more challenging. This is because the specification of a procedure call must state its effect, without saying anything about the specific program points from which it will be called. Otherwise you might as well have just inlined the function, and prove the correctness of a fully procedure-free program. Indeed, Leslie Lamport advocates doing just this, in his paper <a href="https://www.microsoft.com/en-us/research/publication/composition-way-make-proofs-harder/"><em>Composition: A Way to Make Proofs Harder</em></a>. This is actually a reasonable position, if you are interested in verifying algorithms (which exist in isolation) rather than verifying programs (which always exist as parts of a larger program). <p>Intuitively, the difficulty is a lot like reasoning about cooperative concurrency -- when you call an arbitrary procedure, it's a bit like yielding to the scheduler and having it execute some other piece of code for a while before returning control to you. For this logical/conceptual concurrency to actually accomplish anything, the unknown code hidden in the procedure call has to do something to your data, but at the same time can't break any of your invariants. So you need to reason about data invariants, object ownership, and temporal protocols, all at the same time. <p>Matters get yet more complicated yet if you want to specify programs which pass around genuinely unknown functions (for instance, storing pointers to functions in the heap, a device which many programmers call "objects"). In this case, you <em>can't</em> inline the functions, because you don't know what they will be! <p>But simply rejecting OO (or higher-order state) doesn't work, because plenty of important programs rely upon it. For example, if you want to prove that your scheduler implementation implements a fair strategy, you have to come up with a protocol between the scheduler and the code that compiler writers generate. Similar difficulties arise if you want to prove that if your compiler generates a bunch of provably correct <code>.o</code> files, the whole program actually does work when they get linked together. (For example, when will a C program work, if you link it to a new <code>malloc/free</code> implementation?) <p>Removing any one of the three difficulties makes things easier, but still leads to interesting systems: <ul><li>a) State + procedure calls ==> Idealized Algol. <a href="https://www.cs.cmu.edu/~crary/819-f09/Reynolds81.ps"><em>Idealized Algol</em></a> is a language with higher-order functions and mutable variables, but lacking pointers. John Reynolds invented a beautiful generalization of Hoare logic (called <a href="http://www.cs.cmu.edu/afs/cs/user/jcr/ftp/tempdir/speclogic.ps"><em>specification logic</em></a>) to reason about it in the early 80s. It's not well-known any more, but I really like it. <li>b) State + aliasing ==> basically, pointer programs without procedures, John Reynolds also invented the state-of-the-art technique for reasoning about these programs (called <em><a href="https://www.cs.cmu.edu/~jcr/seplogic.pdf">separation logic</a></em>) in the early 2000s. <li>c) Aliasing + procedures ==> purely functional programming. Basically, we don't have to care who else can see our data if they can never change it. In what you might see as a running theme, John Reynolds also devised many of the techniques we use for reasoning about purely functional programs (such as parametricity) in the late 1970s and early 1980s. </ul><p>Some related observations: <ol><li> This is, incidentally, why reasoning about performance of Haskell code is much harder than reasoning about its functional correctness. The <em>value</em> of a Haskell program never depends on any other parts of the program, which gives you a simple compositional model for functional correctness. <p>But whether forcing a thunk takes time or not depends on whether other parts of your program have forced it already. So reasoning about the <em>performance</em> of Haskell code requires complex reasoning about aliasing. This is one reason why newer purely functional languages, such as <a href="https://www.idris-lang.org/">Idris</a>, <a href="http://www.purescript.org/">Purescript</a>, and <a href="https://www.microsoft.com/en-us/research/project/koka/">Koka</a>, are all call-by-value rather than call-by-need -- it makes their performance model compositional. <li><p>Note that for building a real program, procedural decomposition is not optional. You just can't do without being able to break your program into pieces. Since this feature is basically non-negotiable, aliasable state becomes very dangerous. This is why people say that "imperative programming is harder than functional programming" -- the unstated assumption is that you have functions, and that state is aliasable pointer-like state. <p>However, by making the treatment of state a little less lax, you can retain the ease of reasoning while still permitting the controlled use of state. <li>The safe fragment of Rust is a mix of a) and c) -- it permits you to create aliases of pointers to data only when you don't use them to mutate. (Even though I doubt this was a direct inspiration, John Reynolds <em>also</em> pioneered this approach with his 1978 paper <a href="https://dl.acm.org/citation.cfm?id=512766"><em>Syntactic Control of Interference.</em></a>. (ACM link, sorry.) Peter O'Hearn wrote a paper <a href="https://surface.syr.edu/cgi/viewcontent.cgi?article=1011&context=lcsmith_other"><em>Syntactic Control of Interference, Revisited</em></a> with a modernized approach to the semantics and typing rules.) <p>If you add unsafe, then you need very fancy modern variants of separation logic to reason about the result. See <a href="http://www.mpi-sws.org/~dreyer/papers/rustbelt/paper.pdf"><em>RustBelt: Securing the Foundations of the Rust Programming Language</em></a>, by Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, Derek Dreyer. (Don't feel obliged to look at this unless you are very interested -- the work is excellent but unavoidably very technical.) <p>Indeed, semantically Haskell is very similar to Rust+unsafe in this regard -- the runST operator lets you create computations that use highly aliasable state, as long as it doesn't escape the scope of a computation. And the same fancy separation logic that works for proving the safety or Rust is needed to show the safety of Haskell! See <a href="https://lirias.kuleuven.be/bitstream/123456789/598529/1/paper.pdf"><em>A Logical Relation for Monadic Encapsulation of State: Proving contextual equivalences in the presence of runST</em></a> by Amin Timany, Leo Stefanesco, Morten Krogh-Jespersen, and Lars Birkedal. (Both of these papers appeared at the same conference!) </ol>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com7tag:blogger.com,1999:blog-8068466035875589791.post-65760365602634492212016-09-18T07:05:00.001-07:002016-09-18T07:06:11.178-07:00HOPE 2016 Talk Summaries<p>I'm attending ICFP 2016 in Nara, Japan (near Osaka) this year. In previous years, I've really enjoyed <a href="http://ezyang.tumblr.com/">Edward Yang's conference talk summaries</a>, and decided that this year, I would take notes about the talks I attended and put them online myself.</p> <p>Today, I attended the <a href="http://conf.researchr.org/track/icfp-2016/hope-2016-papers">HOPE 2016</a>workshop on higher-order programming with effects, and took the following notes. These are all rough notes, taken in haste and based on a highly-jetlagged listener's poor listening comprehension. So assume all errors and misunderstandings are my fault! </p> <h2>Keynote (Leo White)</h2> <p>Due to jetlag, I overslept and missed it! Alas.</p> <h2>Effects</h2> <ul><li><p><em>Effects via capabilities</em>, Fengyin Liu (speaker)</p> <p>The idea behind this work is to introduce explicit capability values that enable or disable effects. So you might have a capability c : IO, which must be passed to all IO operations. If you don't have the capability to pass to the IO operation, you can't invoke them.</p> <p>print : IO -> string -> unit</p> <p>Naively, though, there is an issue arising from the ability of closures to capture effect capabilities.</p> <p>So two function spaces are introduced A => B, which is the usual ML-style function space, and A -> B, which is a function space which cannot capture capabilities or ML-style functions.</p> <p>Interestingly, the full substitution lemma doesn't hold: you have to prove a value substitution theorem instead. This suggests to me that there might be a modal variant of this system, Davies-Pfenning style. </p> <p>Liu suggested some type equivalences like A -> (B => C) == A -> B ->C, because the outer -> prevents the capture of capabilities. They can't prove the soundness of this by induction, of course, so I suspect they'll eventually want a logical relation to establish soundness. (A logical relation for Dotty would be exciting -- maybe chat with Amal Ahmed & Nada Amin about this?)</p> <p>Polymorphism introduces some subtleties (ie, you might not know whether a type variable will be instantiated to a pure type), and they also worry about verbosity in practice (which might hinder adoption by Scala programmers). </p></li><li><p><em>A Logical Acount of a Type-and-Effect-System</em>, Kasper Svendsen (speaker)</p> <p>They have a parallel ML-style calculus with reads, writes and CAS. Locations are approximated in the type system with regions.</p> <ul><li><p>The region context is split into a public and private part, to control which locations you can get interference on.</p></li><li><p>Function types are annotated with which public, which private regions, and which read/write effects they have.</p></li><li><p>New regions can be introduced with dynamic extent. Private regions can be shared temporarily during parallel composition.</p></li><li><p>The type system makes region variables implicit -- this info is only in the typing derivation. So all terms look like ML terms.</p></li></ul> <p>The goal is to prove the correctness of program transformations which rely on the semantics of effects. For example, a sequential composition can be equated to a parallel composition</p> <p>e; e' == e || e'</p> <p>when there is no interference between these two terms. </p> <p>He explains the intuition with binary logical relation, which are formulated within the Iris program logic (paper at POPL 2015). Here, the logical relation is a per-type family of simulation relation, encoded as Hoare triples in a <em>unary</em> program logic.</p> <p>Here, they use a really neat feature of Iris. Since Iris has any monoidal resource, they use simulation relations as resources! Then they express the parallel composition via a separating conjunction of simulations, one for e and one for e'. The fact that e and e' don't interfere means that e' steps are stuttering steps for e, and vice-versa. They even let simulation relations share state, when the state is read-only.</p></li><li><p><em>Simple Dependent Polymorphic IO Effects</em>, Amin Timay (speaker)</p> <p>They want a logical relation for a type-and-effect system for IO. The IO effects are like regexps, but have effect variables, channels, and quantification over what is written -- eg, cat would have the effect:</p> <p>(forall x. Rd<a href="x">a</a>; Wr<a href="x">b</a>)*</p> <p>to indicate that it writes what it reads from a to b. They want the relation to relate (say) that a buffered implementation of cat refines the unbuffered version, and also to validate the parallel composition transformation for non-interfering operations.</p> <p>The language of effects is <em>very</em> rich, so I don't have a good feel for what it can and can't express. Eg, the reasoning for the buffering optimization was <em>entirely</em> type-based, based on the effect subtyping order! </p></li></ul> <h2>Verification</h2> <ul><li><p><em>Concurrent Data Structures Linked in Time</em>, German Delbianco (speaker)</p> <p>This talk was about Hoare-style verification of highly concurrent data structures, such as "concurrent snapshots". </p> <p>Eg, you have a shared array a, and a function write(i,x) which makes an atomic update to the array, and a function scan(), which returns a snapshot of the state of the array at a particular instant.</p> <p>He then presented Jayanti's algorithm for this, and gave a nice intuition for why the right notion for correctness of the snapshots was linearizability. What made the algorithm tricky is that the linearization points for the algorithm are not fixed -- they are not fixed (they can be vary with the runtime inputs). </p> <p>Then he introduced the notion of events in the ghost state, which are organized for each thread into subject and environment events, and which are each connected to an instant of time. The variation in where linearization points occur could then arise via modifications of when each event happened -- which is an update on the "linked list" of time, which can be modelled as ghost state!</p> <p>This has all been verified in FCSL, a Coq-based program logic that Ilya Sergey and Aleks Nanevski are some of the main forces behind.</p> <p>Ralf Jung asked if they had verified any clients, and German said they had used it for a few small clients, but nothing big. Ilya said that people just assumed snapshots were "obviously" good, but no one had any really good/complex examples of them. He said that they hoped to figure some out now that they had a declarative spec of them.</p></li><li><p><em>Growing a Proof Assistant</em>, William Bowman (speaker)</p> <p>WB began with an example of some notations for an informal type soundness proof. He pointed out that some of the notations (like BNF) are formal, and others (like defining let-binding in terms of lambdas) are informal.</p> <p>Then he pointed out that there are tools like Ott which compile BNF to Coq/Agda/LaTeX, and that most proof assistants (like Coq) have APIs to define notations, tactics, and plugins. Then he said that this wasn't enough, and that he wanted to build a proof assistant focused on extensibility.</p> <p>The core type theory was TT (basically MLTT with a countable universe hierarchy), and the extensibility framework was Racket (to reuse as much of the macro machinery as possible).</p> <p>Then he began explaining how cur could be used to implement pattern matching. Things like recursive functions required compile-time state to communicate information from outside the local syntax tree.</p> <p>Then he moved on to defining tactics, and gave a small language for this. He needed to support fully elaborating terms (so that user tactics could be defined solely on kernel syntax), and also a way to run cur code at expansion-time to support Mtac style tactics. </p> <p>This made me wonder about phase separation, but then he immediately put this and several other issues (such as how to preserve properties like parametricity while supporting reflection on syntax).</p> <p>Someone asked about how to combine extensions. WB replied that macros support this because of the architecture of Racket macros, which essentially support open recursion to permit multiple macros to run at once. He also remarked that this creates classical problems from OO like diamond inheritance. </p></li></ul> <h2>Compilation</h2> <ul><li><p>Algebraic Effects for Functional Programming, Daan Leijen (speaker)</p> <p>Daan Leijen gave a talk about adding algebraic effects to Koka. Koka had effect types based on monads, but (a) he wanted a better story for composition, and (b) he wanted to be able to handle exotic control operators (like yield, exceptions, and async/await) in a uniform way.</p> <p>He has a full compiler for effect handlers in Koka, with row-polymorphic effect typing. The compilation scheme uses a type-directed CPS translation to implement effect handlers targeting JS, .NET, JVM (which are rather direct-style). Then he talked a bit about the semantics and syntax of Koka to set the stage.</p> <p>Next, he talked about examples, starting with exceptions. He mentioned that row types in Koka permit duplicated lables. I wonder how Koka picks?</p> <p>I also wondered about using algebraic effects for the "mock object" pattern from OO testing.</p> <p>Then he showed iteration, and remarked that handlers permit the handler to control whether iteration stops or not. Then he showed asynchronous callbacks (Node-style), and showed that you could pass the handler's resume continuation as an argument.</p> <p>Then he talked about the compilation to managed code where first-class continuations don't exist. This uses a type-directed style to avoid using CPS when it doesn't need it. The main issue is with effect polymorphism, which he resolves by compiling it twice in both ways, and then using a worked-wrapper transformation to pick the right representation.</p> <p>SPJ asked if he depended upon a defined order of evaluation. Daan said this was about effects for Haskell, and handed off to Sam Lindley, who said that the situation in Haskell was the same as in Koka -- you have to say what the evaluation order is (?).</p></li><li><p><em>Administrative Normal Form, Continued</em>, Luke Maurer</p> <p>CPS is good for optimization, but due to the costs, not everyone uses it. Flanagan pointed out that some CPS optimizations work in direct style, but it's not all of then.</p> <p>Eg, case-of-case often creates values that go to known call sites, which shouldn't be allocated. This work extends ANF to support these, and to some significant effect in heap allocation rates.</p> <p>Case-of-case can be solved by pushing in the context, and while there are engineering issues of code duplication, it can often be helpful. Many compilers do this. </p> <p>However, it doesn't work for recursive functions. In CPS this transformation happens "almost for free". That is, you can do the static argument transformation, and then inline the static continuation to get the beta-reductions to fire. This is "contification".</p> <p>So how does contification work in ANF? He notes that the static argument transformation only worked because the program was a tail call. So they tag functions which are always tail called (they are called "labels"). Then they add a new rewrite which moves outer cases into a label.</p> <p>He gave the operational semantics for join points, which are pretty sensible.</p> <p>Then he pointed out that it helps with list fusion. An unfold uses a step function <code>state -> option (a x state)</code>, which lets you eliminate intermediate lists, but introduces useless Just/Nothing cruft. But the new label/case-of-case machinery handles this and reduces many of them, and some benchmarks kill <em>all</em> of them.</p> <p>This is in a new branch of GHC.</p> <p>I asked how they would know if they had all the new constructs they needed. SPJ said he didn't know any CPS optimizations that could not be handled now, but that he didn't have a proof. Someone asked about one benchmark which allocated less but ran more slowly. I didn't understand the answer, though. :( Sam Lindley asked about the connection to SSA and phi-functions. Luke said that labels are much like phi-functions in SSA.</p></li></ul> <h2>Semantics</h2> <ul><li><p><em>Functional models of full ground, and general, reference cells</em>, Ohad Kammar</p> <p>How can we understand reference cells. They have multiple axes:</p> <ul><li>Locality -- freshness of newly allocated cells. (ie, ref 0 == ref 0 is always false)</li><li>Ground vs Non-ground -- can we point to computations which can modify the heap (ref (int -> int))?</li><li>Full storage -- semantics can depend on shape of the heap: (ref ref bool)</li></ul> <p>This talk is about full ground state. Approach of the talk is denotational semantics for full ground state, to get an equational theory for programs. He wants to use it to analyze the value restriction, and relax it if possible. He also wants to validate runST.</p> <p>He found a bug in his semantics last night, so he will discuss the general techniques used rather than a particular model.</p> <ul><li><p>Levy introduced a trick to handle circular ground state, using <em>names</em>for reference types. (A la class names in OO).</p></li><li><p>Kripke worlds are a finite set of locations, and a name for each location. A world morphism is a name-preserving injection. Worlds have monoidal structure. The language lives in the functor category over worlds, and so can interpret CCCs, and references at a world w are interpreted as the set of locations of that type.</p></li><li><p>To model computations we need a monad</p></li></ul> <p>To decide which equations we need, we have the usual deref/assignment/allocation equations. We don't have a complete axiomatization of state, so we have to just pick some. So they picked a nontrivial one, the equation for runST.</p> <p>Then he gave the wrong monad, which used ends to model stores, but I don't have a good feel for them. It validated masking, but not dereference. Then he gave a second monad which had too little structure, in that it modelled deref but not masking.</p> <p>Then he gave a small type system for runST. It interpreted each region label in runST as a world (as above).</p> <p>Lars Birkedal asked about which equations he was interested in, and Ohad named some of them. Daan Leijen said he had done an operational proof of runST and didn't like, and encouraged Ohad to keep going.</p></li></ul> Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com2tag:blogger.com,1999:blog-8068466035875589791.post-57510263124793669082016-05-09T02:55:00.000-07:002016-05-31T02:50:36.589-07:00Löb's theorem is (almost) the Y combinatorAs a computer scientist, one of the stranger sociological facts you'll encounter is that regular mathematicians tend to dislike it (and its cousin formal logic). The reason for that is nicely encapsulated in the following anecdote that <a href="http://longnow.org/essays/richard-feynman-connection-machine/">Danny Hillis tells about Richard Feynman and the design of the Connection Machine computer:</a> <blockquote><p>[...] Richard had completed his analysis of the behavior of the router, and much to our surprise and amusement, he presented his answer in the form of a set of partial differential equations. To a physicist this may seem natural, but to a computer designer, treating a set of boolean circuits as a continuous, differentiable system is a bit strange. Feynman's router equations were in terms of variables representing continuous quantities such as "the average number of 1 bits in a message address." <b>I was much more accustomed to seeing analysis in terms of inductive proof and case analysis than taking the derivative of "the number of 1's" with respect to time.</b> Our discrete analysis said we needed seven buffers per chip; Feynman's equations suggested that we only needed five. We decided to play it safe and ignore Feynman. <p>The decision to ignore Feynman's analysis was made in September, but by next spring we were up against a wall. The chips that we had designed were slightly too big to manufacture and the only way to solve the problem was to cut the number of buffers per chip back to five. Since Feynman's equations claimed we could do this safely, his unconventional methods of analysis started looking better and better to us. We decided to go ahead and make the chips with the smaller number of buffers. <p>Fortunately, he was right. When we put together the chips the machine worked. </blockquote> <p>I've put the relevant bit in bold. To most mathematicians, inductive proofs are the tool of last resort. You always try to weave together the known properties of your object together in a creative way, and do an exhaustive case analysis only if all else fails. However, to logicians and computer scientists, induction is the tool of first resort – as soon as we have an object, we grind it to dust with induction and reassemble it bit by bit with the properties we want. <p>Regular mathematicians find this process roughly as appealing as filling in tax forms, but it does come with its compensations. One of those compensations is that we get to collect a wide variety of fixed point theorems, to justify an equally wide variety of recursive constructions. <p>The modal version of <a href="https://en.wikipedia.org/wiki/L%C3%B6b's_theorem">Löb's theorem</a> is one of the more exotic fixed point theorems in logic. It is often glossed as representing the "abstract content" of Gödel's incompleteness theorem, which is the sort of explanation that conceals as much as it reveals. If you look at the proof, though, you'll see that its structure is a very close match for something very familiar to computer scientists – the proof of Löb's theorem is (almost) a derivation of the Y combinator! <p>To understand this, let's try formulating the fragment of provability logic we need as a natural deduction system. The types we will use are the following: $$ \begin{array}{lcl} A & ::= & b \bnfalt A \to B \bnfalt \mu a.\,A(a) \bnfalt \Box A \end{array} $$ <p>This looks like the simply-typed lambda calculus, extended with recursive types and a special $\Box A$ type former. This is almost, but not quite, correct! $\Box A$ is indeed a modal operator -- in particular, it satisfies the axioms of the K4 modality. However, the recursive type $\mu a.\,A$ is not an ordinary recursive type, because unrestricted recursive types lead to logical inconsistency. Instead, $\mu a.\,A$ is a <em>modal</em> fixed point. That is, it satisfies the isomorphism: $$ \mu a.\,A \simeq A{[\Box(\mu a.\,A)/a]} $$ <p>Hopefully, this will become a bit clearer if we present typing and inference rules for this logic. We'll use a variant of Davies and Pfenning's judgemental formulation of modal logic, modified to support the K4 modality rather than the S4 modality. The basic judgement will be $\judge{\Delta; \Gamma}{e}{A}$, where $\Delta$ are modal variables and $\Gamma$ are ordinary variables. $$ \begin{array}{ll} \rule{ x:A \in \Gamma} { \judge{\Delta; \Gamma}{x}{A} } & \\[1em] \rule{ \judge{\Delta; \Gamma, x:A}{e}{B} } { \judge{\Delta; \Gamma}{\fun{x}{e}}{A \to B} } & \rule{ \judge{\Delta; \Gamma}{e}{A \to B} & \judge{\Delta; \Gamma}{e'}{A} } { \judge{\Delta; \Gamma}{e\;e'}{B} } \\[1em] \rule{ \judge{\Delta; \Gamma}{e}{A{[\Box(\mu a.\,A)/a]}} } { \judge{\Delta; \Gamma}{\fold(e)}{\mu a.\,A} } & \rule{ \judge{\Delta; \Gamma}{e}{\mu a.\,A} } { \judge{\Delta; \Gamma}{\unfold(e)}{A{[\Box(\mu a.\,A)/a]}} } \\[1em] \rule{ \judge{\Delta; \Delta}{e}{A} } { \judge{\Delta; \Gamma}{\box(e)}{\Box A} } & \rule{ \judge{\Delta; \Gamma}{e}{\Box A} & \judge{\Delta, x:A; \Gamma}{e'}{C} } { \judge{\Delta; \Gamma}{\letv{\box(x)}{e}{e'}}{C} } \end{array} $$ <p>The two key rules are the introduction and elimination rules for $\Box A$. <p>The elimination rule is the standard Davies-Pfenning rule. It says that if you can prove $\Box A$, you can add an $A$ to the modal context $\Delta$ and carry on with your proof. <p>The introduction rule is different, and weaker than the DP S4 rule. It says you can introduce a $\Box A$, if you can prove $A$ only using the modal hypotheses in $\Delta$. So the variables in $\Gamma$ are deleted, and the ones in $\Delta$ are copied into the ordinary hypothetical context. This setup is weaker than the DP introduction rule, and captures the fact that in K4 you can't derive the comonadic unit rule that $\Box A \to A$. <p>Once we have this, it becomes possible to derive Löb's theorem. Before we do that, let's look at the derivation of the Y combinator. To derive a fixed point at a type $P$, we introduce a recursive type $μa.\, a \to P$, and then we can add fold's and unfold's to the standard definition of the Y combinator: <pre><br />let y : (P → P) → P = <br /> λf. let f' = f in <br /> let g = <br /> (λr. let r' = r in <br /> f' (unfold r' (r')))<br /> in <br /> (g (fold g)) <br /></pre><p>You could also write this more compactly as <code>λf. let g = λx. f (unfold x x) in g (fold g)</code>, but I have introduced some spurious let-bindings for a reason which will become apparent in a second. <p>Namely, virtually the same construction works with Löb's theorem! We will take the <em>modal</em> recursive type $\mu a.\,a \to P$, and then add some boxing and unboxing to manage the boxes (I'll annotate the variables with their types so that you can see what's going on): <pre><br />let lob : □(□P → P) → □P = <br /> λf. let box(f)' = f in // f' modal, □P → P<br /> let box(g) = // g modal, □(μa. a → P) → P)<br /> box(λr. let box(r') = r in // r' modal, μa. a → P<br /> f' box(unfold r' (box r'))) // application type P<br /> in <br /> box(g (fold g)) <br /></pre> <p>Note that if you took $\Box$ to be the identity constructor, and erased all the $\box$'s from this term, then this proof of Löb's theorem is actually <em>exactly the same</em> as Curry's Y combinator! <p>However, people more typically take $\Box$ to mean something like "quotation" or "provability". What proof shows is that there's nothing mysterious going on with self-reference and quotation -- or rather, it is precisely the same mystery as the universality of the lambda-calculus. <p><b>Postscript:</b> One of the nice things about blogging is that you can be more informal than in a paper, but it's also one of the pitfalls. I got an email asking about the history of this system, and of course a blog post doesn't have a related work section! Basically, a few years ago, I worked out a <a href="http://semantic-domain.blogspot.co.uk/2012/08/a-computational-lambda-calculus-for.html">calculus giving a syntax for applicative functors</a>. An applicative is basically a monoidal functor with a strength, and I noticed that <a href="http://plato.stanford.edu/entries/logic-modal/">the modal logic K</a> is basically a monoidal functor <em>without</em> a strength. There matters stood until last year I read Kenneth Foner's Haskell Symposium paper <a href="http://bir.brandeis.edu/bitstream/handle/10192/30632/FonerThesis2015.pdf?sequence=3"><em>Getting a Quick Fix on Comonads</em></a>, which was about programming with Löb's theorem. This was interesting, and after <a href="https://en.wikipedia.org/wiki/L%C3%B6b's_theorem">Wikipedia's proof of Löb's theorem</a>, I saw that (a) the proof was basically Curry's paradox, and (b) I could adapt the rules for applicatives to model the modal logic K4, which could (with a small bit of tinkering) have a very similar calculus to applicatives. (In fact, the proof I give of Löb's theorem is a transliteration of the proof on Wikipedia into the system in this blog post!) <p>I should emphasize that I haven't really proved anything about these systems: I think the proofs should be straightforward, but <em>I haven't done them</em>. Luckily, at least for K, I don't have to -- recently, <a href="https://www.cs.ox.ac.uk/people/alex.kavvos/">Alex Kavvos</a> has put a draft paper on the arXiv, <a href="http://arxiv.org/abs/1602.04860"><em>System K: a simple modal λ-calculus</em></a>, which works out the metatheory of K with a dual-zone Davies/Pfenning-style calculus. If you need syntax for applicative functors minus strength, do take a look!Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com7tag:blogger.com,1999:blog-8068466035875589791.post-66695070246923905602016-05-06T06:48:00.000-07:002016-05-06T06:48:02.055-07:00Life Update: Moving to Cambridge<p>This August I'll be changing jobs, by leaving Birmingham and joining the <a href="http://www.cl.cam.ac.uk">the Computer Laboratory</a> at the <a href="http://www.cam.ac.uk">University of Cambridge</a>. <p>I lived in Cambridge when I did a postdoc with <a href="http://research.microsoft.com/en-us/um/people/nick/">Nick Benton</a> at <a href="http://research.microsoft.com/en-us/labs/cambridge/">Microsoft Research</a>, so it'll be nice to be able to reconnect with friends I haven't been able to see regularly a while. At the same time, of course, I will miss the friends I have made here, such as <a href="http://www.cs.bham.ac.uk/~drg/">Dan Ghica</a> and <a href="http://www.cs.bham.ac.uk/~pbl/">Paul Levy</a>. Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com6tag:blogger.com,1999:blog-8068466035875589791.post-1608829185424313712016-04-21T04:25:00.000-07:002016-04-21T04:25:01.282-07:00The Essence of Event-Driven Programming<p>Together with <a href="http://www.cis.upenn.edu/~jpaykin/">Jennifer Paykin</a> and <a href="http://www.cis.upenn.edu/~stevez/">Steve Zdancewic</a>, I have a new draft paper, <em><a href="http://www.mpi-sws.org/~neelk/essence-of-events.pdf">The Essence of Event-Driven Programming</a></em>: <blockquote> Event-driven programming is based on a natural abstraction: an event is a computation that can eventually return a value. This paper exploits the intuition relating events and time by drawing a Curry-Howard correspondence between a functional event-driven programming language and a linear-time temporal logic. In this logic, the eventually proposition ◇A describes the type of events, and Girard’s linear logic describes the effectful and concurrent nature of the programs. The correspondence reveals many interesting insights into the nature of event- driven programming, including a generalization of selective choice for synchronizing events, and an implementation in terms of callbacks where ◇A is just ¬□¬A. </blockquote> <p>If you've been following the work on temporal logic and functional reactive programming that I and others (e.g., <a href="http://www.asaj.org/papers/lics14.pdf">Alan Jeffrey</a>, <a href="http://www.ioc.ee/%7Ewolfgang/research/mfps-2012-paper.pdf">Wolfgang Jeltsch</a>, <a href="http://cs.mcgill.ca/~acave1/papers/fair-reactive.pdf">Andrew Cave <em>et al</em></a>) have been pursuing for a while, then you'll have heard that there's a really compelling view of reactive programs as proof terms for linear temporal logic. <p>However, the programs that come out seem to be most naturally conceived of as a kind of "higher-order synchronous dataflow" (in the same space as languages like <a href="http://rml.lri.fr/">ReactiveML</a> or <a href="http://www.di.ens.fr/~pouzet/lucid-synchrone/">Lucid</a>). These languages are based on the <em>synchrony hypothesis</em>, which postulates that programs do relatively little work relative to the clock rate of the event loop, and so a polling implementation where programs wake up once on each trip through the event loop is reasonable. (If you've done game programming, you may be familar with the term <a href="http://mollyrocket.com/861"><em>immediate mode GUI</em></a>, which is basically the same idea. <p>But most GUI toolkits typically are not implemented in this style. Instead, they work via <em>callbacks</em>, in an event-driven style. That is, you write little bits of code that get invoked for you whenever an event occurs. The advantage of this is that it lets your program sleep until an event actually happens, and when an event happens, only the relevant pieces of your program will run. For applications (like a text editor) where the display doesn't change often and events happen at a relatively low clock rate, you can avoid a lot of useless computation. <p>In this paper, we show that the event-based style <em>also</em> fits into the temporal logic framework! Basically, we can encode events (aka promises or futures) using the double-negation encoding of the possibility monad from modal logic. And then the axioms of temporal logic (such as the linearity of time) work out to be natural primitives of event-based programming, such as the <code>select</code> operator. <p>Also, before you ask: an implementation is in progress, but not yet done. I'll post a link when it works!Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-67497556284978901702016-04-19T06:21:00.000-07:002016-04-19T06:21:03.642-07:00Postdoctoral Position in Recursion, Guarded Recursion and Computational Effects<div><p >We are looking for a Postdoctoral Research Fellow to work on an EPSRC-funded project <a href="http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/N023757/1"><em >Recursion, Guarded Recursion and Computational Effects</em></a>. </p><p >We shall be investigating fine-grained typed calculi and operational, denotational and categorical semantics for languages that combine either guarded or general recursion and type recursion with a range of computational effects. This will build on existing work such as call-by-push-value and Nakano's guarded recursion calculus. </p><p >The successful candidate will have a PhD in programming language semantics or a related discipline, and an ability to conduct collaborative mathematical research. Knowledge of operational and denotational semantics, category theory, type theory and related subjects are all desirable. </p><p >You will work with Paul Levy (principal investigator) and Neelakantan Krishnaswami (co-investigator) as part of the Theoretical Computer Science group at the University of Birmingham. </p><p >The position lasts from 1 October 2016 until 30 September 2019. </p><p >You can read more and apply for the job at: </p><ul ><li ><a href="http://tinyurl.com/projguarded">http://tinyurl.com/projguarded</a></li></ul> <p >Don't hesitate to contact us (particularly <a href="mailto:P.B.Levy@cs.bham.ac.uk" data-linkid="paul levy">Paul Levy</a>) informally to discuss this position. </p><p >The closing date is soon: 22 May 2016. </p><p >Reference: 54824 Grade point 7 Starting salary £28,982 a year, in a range up to £37,768 a year. </p><span data-line=""></span></div>Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-39701957700828749872016-03-21T09:39:00.000-07:002016-03-21T09:39:11.268-07:00Agda is not a purely functional language <p>One of the more surprising things I've learned as a lecturer is the importance of telling lies to students. Basically, the real story regarding nearly everything in computer science is <em>diabolically complicated</em>, and if we try to tell the whole truth up front, our students will get too confused to learn anything. <p>So what we have to do is to start by telling a simple story which is true in broad outline, but which may be false in some particulars. Then, when they understand the simple story, we can point out the places where the story breaks down, and then use that to tell a slightly less-false story, and so on and so on, until they have a collection of models, which range from the simple-but-inaccurate to the complex-but-accurate. Then our students can use the simplest model they can get away with to solve their problem. <p>The same thing happens when teaching physics, where we start with Newtonian mechanics, and then refine it with special relativity and quantum mechanics, and then refine it further (in two incompatible ways!) with general relativity and quantum field theories. The art of problem solving is to pick the least complicated model that will solve the problem. <p>But despite how essential lies are, it's always a risk that our students discover that we are telling lies too early -- because their understanding is fragile, they don't have a sense of the zone of applicability of the story, and so an early failure can erode the confidence they need to try tackling problems. Happily, though, one advantage that computer science has over physics is that computers are more programmable than reality: we can build artificial worlds which actually do exactly obey the rules we lay out. <p>This is why my colleague <a href="http://cs.bham.ac.uk/~mhe">Martín Escardó</a> says that when teaching beginners, he prefers teaching Agda to teaching Haskell -- he has to tell fewer lies. The simple story that we want to start with when teaching functional programming is that data types are sets, and computer functions are mathematical functions. <p>In the presence of general recursion, though, this is a lie. Martín has done a lot of work on more accurate stories, such as <a href="http://cs.bham.ac.uk/~mhe/papers/entcs87.pdf">data types are topological spaces and functions are continuous maps</a>, but this is not what we want to tell someone who has never written a computer program before (unless they happen to have a PhD in mathematics)! <p>Agda's termination checking means that every definable function is total, and so in Agda it's much closer to true that types are sets and functions are functions. But it's not <em>quite</em> true! <p>Here's a little example, as an Agda program. First, let's get the mandatory bureaucracy out of the way -- we're defining the program name, and importing the boolean and identity type libraries. <pre><br /><span class="agda2-highlight-keyword">module</span> <span class="agda2-highlight-module">wat</span> <span class="agda2-highlight-keyword">where</span> <br /><br /><span class="agda2-highlight-keyword">open</span> <span class="agda2-highlight-keyword">import</span> <span class="agda2-highlight-module">Data.Bool</span><br /><span class="agda2-highlight-keyword">open</span> <span class="agda2-highlight-keyword">import</span> <span class="agda2-highlight-module">Relation.Binary.PropositionalEquality</span> <span class="agda2-highlight-keyword">using</span> <span class="agda2-highlight-symbol">(</span>_≡_<span class="agda2-highlight-symbol">;</span> refl<span class="agda2-highlight-symbol">)</span><br /></pre> <p>Next, let's define the identity function on booleans. <pre><br /><br /><span class="agda2-highlight-function">f</span> <span class="agda2-highlight-symbol">:</span> <span class="agda2-highlight-datatype">Bool</span> <span class="agda2-highlight-symbol">→</span> <span class="agda2-highlight-datatype">Bool</span><br /><span class="agda2-highlight-function">f</span> <span class="agda2-highlight-inductive-constructor">true</span> <span class="agda2-highlight-symbol">=</span> <span class="agda2-highlight-inductive-constructor">true</span><br /><span class="agda2-highlight-function">f</span> <span class="agda2-highlight-inductive-constructor">false</span> <span class="agda2-highlight-symbol">=</span> <span class="agda2-highlight-inductive-constructor">false</span><br /></pre><p>In fact, let's define it a second time! <pre><br /><span class="agda2-highlight-function">g</span> <span class="agda2-highlight-symbol">:</span> <span class="agda2-highlight-datatype">Bool</span> <span class="agda2-highlight-symbol">→</span> <span class="agda2-highlight-datatype">Bool</span><br /><span class="agda2-highlight-function">g</span> <span class="agda2-highlight-inductive-constructor">true</span> <span class="agda2-highlight-symbol">=</span> <span class="agda2-highlight-inductive-constructor">true</span><br /><span class="agda2-highlight-function">g</span> <span class="agda2-highlight-inductive-constructor">false</span> <span class="agda2-highlight-symbol">=</span> <span class="agda2-highlight-inductive-constructor">false</span><br /></pre><p>So now we have two definitions of the identity function <code><span class="agda2-highlight-function">f</span></code>, and <code><span class="agda2-highlight-function">g</span></code>, so they must be equal, right? Let's see if we can write a proof this fact. <pre><br /><span class="agda2-highlight-function">wat</span> <span class="agda2-highlight-symbol">:</span> <span class="agda2-highlight-function">f</span> <span class="agda2-highlight-operator"><span class="agda2-highlight-datatype">≡</span></span> <span class="agda2-highlight-function">g</span> <br /><span class="agda2-highlight-function">wat</span> <span class="agda2-highlight-symbol">=</span> <span style="color:red">refl</span><br /></pre><p>Oops! We see that <code><span style="color:red">refl</span></code> is in red -- Agda doesn't think it's a proof. In other words, two functions which are identical except for their name are not considered equal -- α-equivalence is not a program equivalence as far as Agda is concerned! <p>This is somewhat baffling, but does have an explanation. Basically, an idea that was <em>positively useful</em> in non-dependent functional languages has turned toxic in Agda. Namely, type definitions in ML and Haskell are <em>generative</em> -- two declarations of the same type will create two fresh names which cannot be substituted for each other. For example, in Ocaml the following two type definitions <pre><br /><span class="tuareg-font-lock-governing">type</span> <span class="type">foo</span> <span class="tuareg-font-lock-operator">=</span> <span class="tuareg-font-lock-constructor">A</span> <span class="tuareg-font-lock-operator">|</span> <span class="tuareg-font-lock-constructor">B</span> <br /><span class="tuareg-font-lock-governing">type</span> <span class="type">bar</span> <span class="tuareg-font-lock-operator">=</span> <span class="tuareg-font-lock-constructor">A</span> <span class="tuareg-font-lock-operator">|</span> <span class="tuareg-font-lock-constructor">B</span> <br /></pre> <p>are not interchangeable -- Ocaml will complain if you try to use a <code>foo</code> where a <code>bar</code> is expected: <pre><br /><span class="tuareg-font-lock-interactive-output">OCaml version 4.02.3<br /><br /><span class="tuareg-font-lock-governing">let</span> <span class="variable-name">x</span> <span class="tuareg-font-lock-operator">:</span> <span class="type">foo </span><span class="tuareg-font-lock-operator">=</span> <span class="tuareg-font-lock-constructor">A</span> <br /><br /><span class="tuareg-font-lock-governing">let</span> <span class="function-name">bar_id</span><span class="variable-name"> </span><span class="tuareg-font-lock-operator">(</span><span class="variable-name">b </span><span class="tuareg-font-lock-operator">:</span><span class="type"> bar</span><span class="tuareg-font-lock-operator">)</span> <span class="tuareg-font-lock-operator">=</span> b<br /><br /><span class="tuareg-font-lock-governing">let</span> <span class="variable-name">_</span> <span class="tuareg-font-lock-operator">=</span> bar_id <span class="tuareg-font-lock-error">x</span><span class="tuareg-font-lock-operator">;;</span><br /><br /><span class="tuareg-font-lock-interactive-output">Characters 96-97:<br /> let _ = bar_id x;;<br /> ^<br /></span><span class="tuareg-font-lock-interactive-error">Error: This expression has type foo but an<br /> expression was expected of type bar</span><br /></pre> <p>In other words, type definitions have an <em>effect</em> -- they create fresh type names. Haskell alludes to this fact with the <code>newtype</code> keyword, and in fact that side-effect is essential to its design: the type class mechanism permits overloading by type, and the generativity of type declarations is the key feature that lets users define custom overloads for their application-specific types. <p>Basically, Agda inherited this semantics of definitions from Haskell and ML. Unfortunately, this creates problems for Agda, by complicating its equality story. So we can't quite say that Agda programs are sets and functions. We <em>could</em> say that they are <a href="https://www.cl.cam.ac.uk/~amp12/papers/nomlfo/nomlfo.pdf">sets and functions internal to the Schanuel topos</a>, but (a) that's something we may not want to hit students with right off the bat, and (b) Agda doesn't actually include the axioms that would let you talk about this fact internally. (In fact, <a href="http://www.cl.cam.ac.uk/~amp12/">Andy Pitts</a> tells me that the right formulation of syntax for nominal dependent types is still an open research question.) Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com11tag:blogger.com,1999:blog-8068466035875589791.post-19513292774357013892016-03-17T09:46:00.002-07:002016-03-17T09:46:49.054-07:00Datafun: A Functional DatalogTogether with <a href="http://www.rntz.net">Michael Arntzenius</a>, I have a new draft paper, <a href="http://www.cs.bham.ac.uk/~krishnan/datafun.pdf"><em>Datafun: a Functional Datalog</em></a> <blockquote> Datalog may be considered either an unusually powerful query language or a carefully limited logic programming language. It has been applied successfully in a wide variety of problem domains thanks to its "sweet spot" combination of expressivity, optimizability, and declarativeness. However, most use-cases require extending Datalog in an application-specific manner. In this paper we define Datafun, an analogue of Datalog supporting higher-order functional programming. The key idea is to <em>track monotonicity via types.</em> </blockquote> <p>I've always liked domain specific languages, but have never perpetrated one before. Thanks to Michael, now I have! Even better, it's a higher-order version of Datalog, which is the language behind some of my favorite applications, such as John Whaley and Monica Lam's <a href="http://bddbddb.sourceforge.net/">BDDBDDB</a> tool for writing source code analyses. <p>You can find <a href="https://github.com/rntz/datafun/">the Github repo here</a>, as well. Michael decided to implement it in <a href="http://racket-lang.org">Racket</a>, which I had not looked at closely in several years. It's quite nice how little code it took to implement everything!Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com0tag:blogger.com,1999:blog-8068466035875589791.post-68361513503664211332015-11-18T02:42:00.000-08:002015-11-18T02:42:20.535-08:00Error Handling in MenhirMy favorite category of paper is the "cute trick" paper, where the author describes a programming trick that is both easy to implement and much better than the naive approach. One of my favorite cute trick papers is Clinton L. Jeffrey's 2003 ACM TOPLAS paper, <a href="http://people.via.ecp.fr/~stilgar/doc/compilo/parser/Generating%20LR%20Syntax%20Error%20Messages.pdf"><em>Generating LR Syntax Error Messages from Examples</em></a>. The abstract explains the idea: <blockquote>LR parser generators are powerful and well-understood, but the parsers they generate are not suited to provide good error messages. Many compilers incur extensive modifications to the source grammar to produce useful syntax error messages. Interpreting the parse state (and input token) at the time of error is a nonintrusive alternative that does not entangle the error recovery mechanism in error message production. Unfortunately, every change to the grammar may significantly alter the mapping from parse states to diagnostic messages, creating a maintenance problem. <p>Merr is a tool that allows a compiler writer to associate diagnostic messages with syntax errors by example, avoiding the need to add error productions to the grammar or interpret integer parse states. From a specification of errors and messages, Merr runs the compiler on each example error to obtain the relevant parse state and input token, and generates a yyerror() function that maps parse states and input tokens to diagnostic messages. Merr enables useful syntax error messages in LR-based compilers in a manner that is robust in the presence of grammar changes.</blockquote> Basically, you can take a list of syntactically incorrect programs and error messages for them, and the tool will run the parser to figure out which state the LR automaton will get stuck in for each broken program, and then ensure that whenever you hit that state, you report the error message you wrote. I like this idea an awful lot, because it means that even if you use the full power of LR parsing, you <em>still</em> have a better error message story than any other approach, including hand-coded recursive descent! <p>Recently, Francois Pottier has studied a generalization of this idea, in his new draft <a href="http://gallium.inria.fr/~fpottier/publis/fpottier-reachability.pdf"><em>Reachability and error diagnosis in LR(1) automata</em></a>. <blockquote>Given an LR(1) automaton, what are the states in which an error can be detected? For each such “error state”, what is a minimal input sentence that causes an error in this state? We propose an algorithm that answers these questions. Such an algorithm allows building a collection of pairs of an erroneous input sentence and a diagnostic message, ensuring that this collection covers every error state, and maintaining this property as the grammar evolves. We report on an application of this technique to the CompCert ISO C99 parser, and discuss its strengths and limitations. </blockquote> Basically, in Jeffrey's approach, you write broken programs and their error messages, and Pottier has woroked out an algorithm to generate a covering set of broken programs. Even more excitingly, Pottier has <em>implemented this idea in a production parser generator</em> --- the most recent version of his <a href="http://gallium.inria.fr/~fpottier/menhir/"><em>Menhir</em></a> parser generator. <p>So now there's no excuse for using parser combinators, since they offer worse runtime performance and worse error messages! :)Neel Krishnaswamihttp://www.blogger.com/profile/06853898957395028131noreply@blogger.com1