Wednesday, December 1, 2021

Obliteratingly Fast Parser Combinators

Jeremy Yallop, Ningning Xie, and I have a new draft out about combinator parsing, entitled Fusing Lexing and Parsing. The abstract is short, but sweet:

Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity, but harmful for performance.

We show how to fuse together separately-defined lexers and parsers, drastically improving performance without compromising modularity. Our staged parser combinator library, flap, provides a standard parser combinator interface, but generates psecialized token-free code that runs several times faster than ocamlyacc on a range of benchmarks.

I'm excited about this for three reasons.

First, even though the performance is way better this time around, going from competitive with lex+yacc to 3-9 times faster, the implementation is actually simpler than in our 2019 paper – we don't need a complex staging pipeline because we introduce an IR which makes all the invariants needed for fast code generation lexically apparent.

Second, really fast parsers are potentially a powerful tool for applications like IDEs. If you are fast enough, you are free from having to mess around with things like incremental parsing, because if you can parse a whole editor buffer in a millisecond it's fine to just reparse everything on every keystroke. Imagine what could happen if we could fused in typechecking, too! That's currently just wishful thinking (aka "future work"), of course, but since typechecking is mostly a tree traversal, it's not totally hopeless.

Finally, at a meta-level this is a good illustration of the incorrectness of the standard engineer proverb "fast, good, cheap – pick two". This advices presumes that we lie on the Pareto frontier, so the best you can do is manage tradeoffs. But that's actually a false assumption – it is possible to do things which are unconditionally better along all three dimensions.

No source code is ready for release yet, but Jeremy is likely to release an OPAM package for it in due course.

8 comments:

  1. Hi Neel

    The URL to the draft seems incorrect. Also, is there a comparison with Menhir?

    ReplyDelete
  2. Hi, the URL works for me, but you can look at Jeremy's web site and Ningning's web site for alternate links.

    We didn't try comapring with Menhir, because it is usually a little slower than ocamlyacc, but you're not the only person who has asked this. So maybe we should!

    ReplyDelete
  3. Link doesn't work for me either - it is a weird looking link "https://semantic-domain.blogspot.com/ww.cl.cam.ac.uk/~nk480/flap.pdf". Perhaps only 1 of those two sites is meant? [I got the paper from Jeremy's site, thanks.]

    ReplyDelete
  4. Ah, I mistyped the URL in the HTML, and blogspot was "fixing" it. The real mystery is why it worked for me!

    ReplyDelete
  5. I'm curious, do you have any opinion on https://mpickering.github.io/papers/parsley-icfp.pdf paper? Do you have any plans to repeat parsley's approach (or at least can you evaluate it's applicability for OCaml)?

    ReplyDelete
  6. is the source code available somewhere?

    ReplyDelete
  7. The source code is available here: https://github.com/yallop/ocaml-flap

    ReplyDelete
  8. Gyansetu's Computer Science section is a knowledge hub for tech enthusiasts! Their content offers essential insights and guidance for navigating the vast world of computer science. Kudos for being a trusted resource in this ever-evolving field!


    For more info:- https://www.gyansetu.in/blogs/how-can-a-b-com-graduate-become-a-data-scientist/

    ReplyDelete