Tuesday, November 15, 2022

CN: Verifying Systems C Code with Separation-Logic Refinement Types

We have a new paper on combining separation logic and refinement types to verify C code, appearing at POPL 2023 in a couple of months. It's called CN: Verifying Systems C Code with Separation-Logic Refinement Types, and it's by Christopher Pulte, Dhruv C. Makwana, Thomas Sewell, Kayvan Memarian, Peter Sewell, and me.

Despite significant progress in the verification of hypervisors, operating systems, and compilers, and in verification tooling, there exists a wide gap between the approaches used in verification projects and conventional development of systems software. We see two main challenges in bringing these closer together: verification handling the complexity of code and semantics of conventional systems software, and verification usability.

We describe an experiment in verification tool design aimed at addressing some aspects of both: we design and implement CN, a separation-logic refinement type system for C systems software, aimed at predictable proof automation, based on a realistic semantics of ISO C. CN reduces refinement typing to decidable propositional logic reasoning, uses first-class resources to support pointer aliasing and pointer arithmetic, features resource inference for iterated separating conjunction, and uses a novel syntactic restriction of ghost variables in specifications to guarantee their successful inference. We implement CN and formalise key aspects of the type system, including a soundness proof of type checking. To demonstrate the usability of CN we use it to verify a substantial component of Google’s pKVM hypervisor for Android.

There's some cool separation logic trickery that might become a blogpost, and of course the CN tool is open source, and is publically available as a backend of the Cerberus executable C semantics. Naturally I (and most of my coauthors) will be in Boston this January for POPL, and the talk about CN will be public then. If you can't wait, earlier this summer I gave a talk about CN which was recorded and on Youtube.

Friday, November 4, 2022

Two Papers about Refinement Types

The rule of thumb I use is that Noam Zeilberger is generally five to ten years ahead of me in identifying interesting problems. A decade ago he was working intensively on the semantics of refinement types, and lo and behold, in the last couple of years so have I. So I'd like tto tell you about two draft papers developing both the theory of refinement types, and how to apply them to verification problems!

  • Focusing on Liquid Refinement Typing, Dimitrios J. Economou, Neel Krishnaswami, Jana Dunfield.

    We present a foundation systematizing, in a way that works for any evaluation order, the variety of mechanisms for SMT constraint generation found in index refinement and liquid type systems. Using call-by-push-value, we design a polarized subtyping relation allowing us to prove that our logically focused typing algorithm is sound, complete, and decidable, even in cases seemingly likely to produce constraints with existential variables. We prove type soundness with respect to an elementary domain-theoretic denotational semantics. Soundness implies, relatively simply, our system’s totality and logical consistency.

    In this paper, my coauthors and I decided to take a look at Liquid Types, which is one of the most successful offshoots of the refinement types family tree. Two things make it particularly powerful.

    First, Liquid Types are engineered from the ground up for decidable typechecking. This ensures you never have to pray that a fragile pile of heuristics will be good enough to typecheck your program. Second, they extend the DML/Stardust style of refinement types with the idea of “measure functions”, which let you compute SMT predicates by recursion on the structure of your program data. This is a really potent extension which brings all kinds of deep functional correctness properties into reach for automated verification.

    So we decided to build a small refinement type system on top of a call-by-push-value calculus that exemplifies these two properties. We then give an algorithmic type system (proved sound and complete with respect to the declarative spec), and then give a denotational semantics to our calculus, which shows that our refinement type system is consistent.

  • Explicit Refinement Types, Jad Ghalayini and Neel Krishnaswami.

    We present 𝜆ert, a type theory supporting refinement types with explicit proofs. Instead of solving refinement constraints with an SMT solver like DML and Liquid Haskell, our system requires and permits programmers to embed proofs of properties within the program text, letting us support a rich logic of properties including quantifiers and induction. We show that the type system is sound by showing that every refined program erases to a simply-typed program, and by means of a denotational semantics, we show that every erased program has all of the properties demanded by its refined type. All of our proofs are formalised in Lean 4.

    In this paper, we take the opposite tack from the previous paper, in which we carefully restricted the system of refinements to ensure that an SMT solver (like Z3) could always discharge the constraints. SMT solvers, while powerful, struggle with logical features (such as quantifiers and induction) that are outside the core decidable fragment. (In fact, they basically have to struggle, because full first-order logic is undecidable!)

    So in this paper, Jad and I designed a refinement type system where there are no solvers at all – all logical obligations are discharged with explicit proofs. As a result, our language of refinements easily extends to quantified formulas (eg, we can express that a function is monotonic very easily).

One interesting thing is that both of these papers use very low-tech denotational semantics. The first paper uses elementary domain theory, and the second is even simper – it uses plain old sets ands functions. This was partly done to maximise the accessibility of our results – it’s sort of folklore that refinement types have to have complicated semantics, and I hoped we could push back against that belief.

However, Paul-AndrĂ© Melliès and Noam Zeilberger have a beautiful paper, Functors are type refinement systems, in which they give a uniform, structured treatment of refinement types in terms of fibrations. Our semantics in the two papers above can really be seen as what you get if you inlined all the abstractions in Noam and Paul-AndrĂ©’s paper. This makes our results elementary, but at the price of generality. Particularly if we want to pursue mechanised proofs further, then it would be a good idea to work explicitly with the fibrational interface, because that could let us change the semantics without having to go back and re-prove all of the properties relating syntax to semantics.

Tuesday, September 13, 2022

The Golden Age of PL Research

I was chatting with a PhD student while back, who remarked to me that many senior PL researchers seemed stuck in a defensive crouch. I though that was quite a striking observation, because (a) he was not wrong, but (b) people his age don't have or need that posture because we are currently in a golden age for PL research.

What I mean by that is that there was a period where the research tradition in programming languages nearly died outright in the US, and a lot of our more senior researchers remember this very keenly. Basically, AI winter and cheap microcomputers killed the Lisp machine, and as collateral damage managed to nearly wipe out PL research in America -- many PL researchers in the US ended up moving overseas or changing their research focus.

However, over the course of about a decade (from the mid 1990s to the mid 2000s), the field made the transition from being on life support to being very vigorous. I started grad school towards the end of this transition period, and so I ended up with first hand view of the change.

Here are some of the factors that triggered the transition, both practical and theoretical. (Note that the thing about a first-hand view is that it is not a global view: I am sure other people will have different opinions about what happened!)

  1. Syntactic Type Soundness Before Felleisen and Wright's 1994 paper A Syntactic Approach to Type Soundness appeared, proving type safety was surprisingly hard.

    E.g., if you look at Milner's A Theory of Type Polymorphism in Programming, you will see that what he does to prove type safety is:

    1. Solve a domain equation for a universal domain containing all the features of mini-ML
    2. Give a denotational interpretation of untyped mini-ML
    3. Define a logical relation as a family of retracts of the universal domain which exclude "wrong"
    4. Prove the fundamental theorem of logical relations.

    This is a lot more work than the Wright-Felleisen approach, in which you just write down the typing rules, the reduction relation, and prove a handful of lemmas about it.

    The technical simplicity of the Wright-Felleisen approach meant that it was very easy to apply PL design techniques to nearly arbitrary domains, and that made it possible to quickly build formal models of industrially interesting languages. Igarashi et al's Featherweight Java is probably one of the earliest example of this, and more recently WebAssembly has demonstrated that it is possible to build the formal spec in sync with the language design effort.

  2. Java Java was released in 1996, and its immediate success had a critical impact in popularising garbage collection. Before Java, using languages with garbage collection for application programming was an extremely controversial idea among working programmers. Guy Steele (who co-authored the Java spec) remarked about this on the ll1-discuss mailing list:

    We were not out to win over the Lisp programmers; we were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp.

    Java also ended up dragging programmers halfway to ML! In 1998, Gilad Bracha, Martin Odersky, David Stoutamire and Phil Wadler proposed adding parametric polymorphism to Java, and in 2004 after 5 years of arguments it was added in Java 5. This also made a big impact in popularising advanced type systems features.

  3. Dynamic Languages Languages like Perl, Python, Ruby, and Javascript (in roughly that order) began achieving serious popularity in the late 90s/early 2000s, primarily as tools for building websites. Like Java, all of these languages had garbage collection (even if only reference counting), which helped normalise it.

    But in addition to automatic memory management, these languages all supported first-class functions of one variety or another, and so gave literally millions of programmers their first introduction to functional programming.

    Younger programmers will be surprised by just how controversial an addition first-class functions were! Java omitted it initially, despite its designers being expert Lisp and Smalltalk programmers. Even C#, which was released 6 years after Java, did not support them at first. IMO it was only because of the experience people had with dynamic languages that created the pressure to add them.

    Nowadays, thanks to the influence of dynamic languages on the design zeitgeist, even languges whose design aesthetic is deliberately conservative, such as Go, include first-class functions without a second thought.

  4. SAT and SMT solvers CDCL-based SAT solvers was invented in the mid 1990s, and it revolutionised SAT solving. A few years later, by the early 2000s, the lazy approach to SMT was invented, which made it possible to reliably check whether quantifier-free formulas from all kinds of mixed domains were valid.

    This has proved to be an immense boon for PL research across a wild variety of different research areas, from verification of operatiing systems code like the Verve project, the KLEE symbolic execution engine, and refinement type systems like Liquid Haskell.

  5. Separation logic Hoare logic, invented in 1968, was one of the first systematic approaches to proving the correctness of imperative programs. But despite serious efforts to use it in verification, such as the Euclid system, it never quite caught on.

    The major reason for this was that it was very difficult to reason about pointers and aliasing with Hoare logic. Every Hoare logic assertion talks about the whole state, and so you need to write down explicit disjointness assertions to make clear that modifying one pointer will not change the value of a different pointer. So if you have n pointers, then you need O(n^2) disjointness assertions.

    Separation logic, invented by John Reynolds and Peter O'Hearn, solved this problem, by formalising the intuition that an assertion could really only be talking about part of the heap. This made modular reasoning about imperative programs radically easier, and has been used in numerous verification efforts since.

    One minor fact about separation logic. John C. Reynolds invented separation logic when was 65. At the time that most people start thinking about retirement, he was making yet another giant contribution to the whole field!

  6. Mature Proof Assistants Work on the Coq proof assistant began in 1984, but it went through multiple successive rounds of reimplementation as it influenced and was influenced by competing systems like LEGO, HOL and Isabelle. It probably achieved a recognizably modern form in 2002, when the 7.0 release was made.

    This was a milestone in terms of quality of implementation, since this is when (I think -- I wasn't there!) term normalisation started to happen via a bytecode VM. This made many things (like reflective proof) much faster, which made it feasible to build really large proofs in Coq. Indeed, Georges Gonthier reports that his 2005 mechanised proof of the 4-colour theorem began as an experiment to stress test the new fast VM!

    The result that made CS researchers outside of PL stand up and take notice was not the four color theorem, though. It was Xavier Leroy's correctness proof of a C compiler in the CompCert project, in 2006. People have been talking about program verification since the 1960s, and this was one of the first big programs that was actually verified -- and in a proof assistant, no less.

    Nowadays, you can go to a systems conference like SOSP and find whole sessions about verification, which would have been unthinkable when I was a student.

    Coq is not the only proof assistant, of course, and all its competitor systems (e.g., Isabelle) have seen substantial improvements in their maturity as well.

Some other important, but maybe less vital, factors:

  1. Step-indexing In 2001, McAllester and Appel showed how you could build a logical relations model using "step-indexing" as an alternative to the domain theoretic approach. This is a less general, but more elementary and flexible, approach to modelling the semantics of recursion than techniques like domain theory. The flexibility meant that languages with features like ML-style modules, machine code, and self-modifying code could all be given straightforward models.

    One small thing that makes me smile is that this is now the modern approach to proving type safety for fancy languages -- and that it is very similar to the approach the Wright/Felleisen approach displaced! I feel (i.e., believe but don't have a real argument for) that you need a big enough community or ecosystem to support a technically sophisticated research tradition, and now that PL is a bigger field again we can once more consider using advanced techniques.

  2. Flexible Type Analysis, Crary and Weirich, ICFP 1999 Compared to some of the other papers I've linked to, this paper is mostly forgotten -- according to Google Scholar, it has less than 10 citations in the last five years. However, IMO it really is a pivotal paper.

    What makes it so very important is that after writing this paper, Stephanie Weirich began collaborating with Simon Peyton-Jones. Stephanie, Simon, and their students and collaborators, began systematically extending GHC Haskell with the features described in this paper. And once they finished with that they carried on with many more extensions!

    This has been an immensely productive collaboration. It's why Haskell has grown so many "fancy" type system features, and in turn it has made it possible to actually try out many things in the literature in actual programming. The value of this can't be overstated: a huge number of PL researchers got their first introduction to what type theory can do from the advanced extensions to GHC.

  3. Model Checking This goes under "smaller factors" not because it's actually a smaller factor, but because I am very ignorant about the state of the art! It's long been important in hardware, but in software, the biggest success I know of is Microsoft's Static Driver Verifier, which is used to model check drivers before they are certified for Windows. This is obviously a big deal, but I can't really contextualise this because I don't know much about model checking. However, I am under the impression that SAT/SMT-based bounded model checking supplanted the older BDD-based approach, which would mean that model checking also benefited from the SAT/SMT revolution.

The net effect of all this is that a motivated, ambitious student can aim at problems which simultaneously have significant theoretical depth and substantial practical impact -- stuff that would have been field-shaking results 20 years ago are almost routine now.

For example, Maja Trela, one of my recently-graduated Part III (i.e., Master's) students here at Cambridge, just completed her dissertation under the primary supervision of Conrad Watt. In her dissertation, she used a separation logic library for Isabelle to verify the correctness of an efficient Webassembly interpreter, which is now being used as the reference interpreter by the Bytecode Alliance!

Thursday, March 17, 2022

Fold Considered Annoying

I recently read Shriram Krishamurthi and Kathi Fisler's ICER 2021 paper, Developing Behavioral Concepts of Higher-Order Functions. In this paper, they study not the theory of higher-order functions, bur rather the pedagogy of higher-order functions: how do we teach students how to program with them?

In particular, they studied how students understood different higher-order functions -- e.g., map, filter, fold and co. -- by asking them to perform tasks like classifying which functions were similar, which example input/outputs could conceivably be produced by particular HOFs, and so on.

Since this is a blog, I can say that one thing I really liked about this paper is that it confirmed my prejudices! In particular, they found that fold was much harder for students to understand than any of the other higher-order functions they studied.

This is something I have long believed, both from anecdotal evidence and for mathematical reasons. I'll pass over the anecdotes (since the paper has actual data), and focus instead on what I see as the core mathematical issue: the specification of fold is dramatically more complex than the specification of map or filter.

Both map and filter have extremely simple equational specifications. For example, map is characterised by the following three laws:

map f []         = []
map f (xs ++ ys) = (map f xs) ++ (map f ys)
map f [x]        = [f x]

Basically, this says that map takes nils to nils, distributes over appends, and applies f to elements pointwise. All three of these equations are universally-quantified, unconditional equations, which is about as simple as a specification can get.

The specification for filter is only a little more complex:

filter p []         = []
filter p (xs ++ ys) = (filter p xs) ++ (filter p ys)
filter p [x]        = if p(x) then [x] else []

filter does nothing to nils, distributes over appends, and for each element either keeps it or discards it. Again, All three of these equations are universally-quantified, unconditional equations.

On the other hand, what is the specification for fold? That is given by its universal property, which reads:

 [h [] = nil ∧
  (∀x, xs. cons x (h xs) = h (x :: xs))]  // Uh-oh!
          iff 
(fold nil cons = h)

This is not an equational specification at all! It is a bi-implication, and while the right-to-left direction is basically the definition of fold, the left-to-right direction contains a formula with a logical quantifier in the premise. We know from nearly a century of experience with high school calculus that quantifier alternations are a giant obstacle to understanding: every year thousands of students find themselves utterly baffled by epsilon-delta arguments, and here we have something harder – we have an implication with a universally quantified premise.

So I've always been very careful whenever I teach students about the fold function, because I think it has a very challenging spec for students to wrap their heads around.

If you're watching out for quantifiers, that also suggests that take-while will also be one of the more challenging for functions students – takewhile p xs returns the longest prefix of xs satisfying the predicate p, and "longest prefix" is going to be involve another higher-rank quantified formula.

The evidence in the paper seems weakly consistent with this idea, in that they report that take-while does seem harder for students, but it does not seem to be as much harder as I would have expected.

Anyway, I found this a really interesting paper!

Thursday, March 3, 2022

Simple Type Inference for System F

Henry Mercer, Cameron Ramsay, and I have a new draft paper on type inference out! Check out Implicit Polarized F: Local Type Inference for Impredicativity.

System F, the polymorphic lambda calculus, features the principle of impredicativity: polymorphic types may be explicitly instantiated at other types, enabling many powerful idioms such as Church encoding and data abstraction. Unfortunately, type applications need to be implicit for a language to be human-usable, and the problem of inferring all type applications in System F is undecidable. As a result, language designers have historically avoided impredicative type inference.

We reformulate System F in terms of call-by-push-value, and study type inference for it. Surprisingly, this new perspective yields a novel type inference algorithm which is extremely simple to implement (not even requiring unification), infers many types, and has a simple declarative specification. Furthermore, our approach offers type theoretic explanations of how many of the heuristics used in existing algorithms for impredicative polymorphism arise.

This algorithm is absurdly easy to implement, too.

Because we don't need unification, it also looks like this kind of algorithm ought to play very nicely with dependent types. My PhD student Ilya Kaisin is looking at this problem now, so stay tuned for even more.

Monday, January 17, 2022

Static typing vs. Dynamic Typing vs. Dana Scott

If you read about programming languages on the Internet, you'll inevitably run into (or even participate in) long arguments about whether dynamic or static typing is better. A long time ago, before I went to grad school, I enjoyed arguing with people on the Internet about this very topic.

But, I no longer find myself interested in that debate. This is for two reasons. One, like so many addicts before me, I've moved onto harder drugs, such as frequentism versus Bayesianism. (Frequentism is good, actually, and p-values are one of the great inventions of science. Just so you know.)

Two, I learned that these debates largely end up being echoes of an argument Dana Scott had with himself. You can tell he's a genius, since he ended up producing many of the strongest arguments for and many of the strongest arguments against!

One of the best arguments against grounding semantics on the untyped lambda calculus is Dana Scott's unpublished paper from 1969, A type-theoretical alternative to ISWIM, CUCH, OWHY. In this paper, he argues that untyped syntactic models are difficult to reason about, and we should design meta-languages with a clear mathematical status, in order to facilitate reasoning about languages and programs.

But why is it unpublished? Let's see what Scott himself has to say:

In particular, it was recognized that there were posets whose continuous function spaces of self-maps could be identified with the given posets themselves. And so there actually were “traditional” models of lambda-calculus that could be defined without first going through the proof theory of the formal system itself (and which could be related to many other mathematically meaningful structures and theories as well).

This revelation was both gratifying and embarrassing. After writing with tiresome sarcasm about the lack of meaning in the type-free lambda-calculus introduced only formally, the author himself found an interesting kind of semantical interpretation for the “type-free” language.

The result that a universal domain can be constructed from these lattice theoretic models of the lambda calculus is described in his epochal 1971 paper Continuous Lattices.

Indeed, Scott's results continue to have a lot of influence today. This goes well beyond being a technical tool in the toolbox: the sensibility and style of thinking of he invented continues to pervade our research culture and shapes the latest methods.

  1. The invention of the universal domain taught us that an inconsistent logic can still have a meaningful semantics. Prior to this work, it was believed that inconsistent logics were degenerate (due to everything following from false), but the D∞ model disproves that — it dramatically expands the kinds of arguments that we can model mathematically.

    This is incredibly philosophically important, since you can't use mathematics to study something you don't have a mathematical model of! And since as mathematicians we spend a lot of our time making wrong arguments, it's very valuable to be able to understand them.

    (As an unrelated aside, this is one of the things that I as a proof theorist most admire about finite model theory – things like conflict-driven clause learning in SAT solvers exploit the idea that being wrong can be productive in a really elegant way.)

  2. Having a universal domain makes it possible to formalise the intuition that a type is a subset of the set of all possible values in a mathematically coherent way. Naively, this idea is problematic, but with a universal domain a la Scott the approach works ("types as retracts" in the jargon).

    This approach underpins the functioning of program logics like Hoare logic (see Mellies and Zeilberger's paper Functors are Type Refinement Systems), is used to define the semantics of dependent type theories like Nuprl, and is also important for formulating the semantics of System F, the polymorphic lambda calculus.

  3. The construction of the universal domain as the limit of a sequence of finitary approximations lets you derive the abstract domains of abstract interpretation by taking Scott's inverse limit construction and cutting off the sequence at a finite point rather than going all the way to infinity. This is important for understanding how static analysis fits into the landscape of semantics -- see David Schmidt's paper Abstract Interpretation from a Denotational Semantics Perspective.

  4. In the domain-theoretic model, the idea of finitary information is realised by modelling a program as either giving you an answer or going into an infinite loop. If you change the idea of no-answer from looping to signalling an error (say, with an exception), then you can similarly derive gradual typing!

    Max S. New's and Amal Ahmed's Graduality from Embedding-projection Pairs really nails this idea down, but it has been one of the driving intuitions since the very beginning of work on the topic. Fritz Henglein's original paper explicitly formulates the rules for casts in analogy with the embedding-projection pairs found in the inverse limit construction.

    There's an important lesson here about how analogies can be productive for a long time before we give a completely formal expression of them -- Henglein wrote his paper in 1994!

  5. Since Dana Scott has very good taste, his emphasis was on the algebraic properties of his model construction, and so he quickly moved from complete lattices to domain theory.

    But this also meant that he laid out the path for people to look at other ways of obtaining universal domains. One super-important approach was pioneered by Maurice Nivat and the "Dutch school" of denotational semantics in the late 1970s and early 1980s, where instead of domains people looked at metric spaces and Banach's theorem. Bakker and Zucker's 1982 paper is still quite readable is a still-readable account of the early thinking on this topic, and it reached a mature form with the work of America and Rutten.

    This technique still extremely heavily used today, except for various reasons we have renamed it "step-indexing". This is the key technical mechanism which lets us apply universal domain constructions to give a good semantics to assembly language, because the ability to pun data and code by setting the PC and jumping needs the same mathematical machinery as interpreting the untyped lambda calculus.

  6. This point about the equivocation of code and data in assembly language holds more generally. For example, Löb's theorem can be understood as a modal variant of the derivation of the Y-combinator, illustrates how the interaction of self-reference and quotation resembles the ideas behind the D∞-style structure. This idea hasn't been worked out as nicely as New and Ahmed worked things out for gradual types, but what's a blog post without a provocation to future research?

Thinking of "typed versus untyped languages" sort of misses the point, theoretically. We can neither pull them apart, nor do we really want to. To quote Heraclitus (as translated by Kathleen Freeman):

They do not understand how that which differs with itself is in agreement: harmony consists of opposing tension, like that of the bow and the lyre.

Thursday, December 30, 2021

What is a programming language?

One of the more surprising facts about the discipline of programming language theory is that it is actually possible to define what programming languages are in a reasonably mathematically satisfying way.

  1. A language is a presentation of a (generalised) algebraic theory. Basically, think of a language as a set of generators and relations in the style of abstract algebra.

    You need to beef up the universal algebra story a bit to handle variables and binding (e.g., see the work of Fiore and Hamana on higher-order algebraic theories), but the core intuition that a language is a set of generators for terms, plus a set of equations these terms satisfy is exactly the right one.

    For example:

    1. the simply-typed lambda calculus
    2. regular expressions
    3. relational algebra
  2. A model of a a language is literally just any old mathematical structure which supports the generators of the language and respects the equations.

    For example:

    1. we can model the typed lambda calculus using sets for types and mathematical functions for terms,
    2. we can model regular expressions as denoting particular languages (ie, sets of strings)
    3. we can model relational algebra expressions as sets of tuples
  3. A model of computation or machine model is basically a description of an abstract machine that we think can be implemented with physical hardware, at least in principle. So these are things like finite state machines, Turing machines, Petri nets, pushdown automata, register machines, circuits, and so on. Basically, think of models of computation as the things you study in a computability class.

    Nearly all models of computation are transition systems, where you have a set of states, and a transition function which tells you how a computation can change one state into another. There are lots of generalisations of this (relations instead of functions, I/O, system calls, message passing communication, etc.), and the slick way of accounting for this generality is by modelling transition systems as coalgebras (i.e., the set S paired with a coalgebra map step : S → F(S), where F is a functor.

    In addition to this, every model of computation also comes with a notion of observation – in its most basic form, how do you tell when a program has finished executing? Observations can get quite intricate: to model computational complexity, you have to be able to observe runtime and memory usage, and for concurrent programs you can observe intermediate program states (due to the possibility of concurrent interference). More practically, the compiler correctness theorem for something like CompCert says that the sequence of system calls the compiled binary produces will be the same as the one the C abstract machine will produce.

    I don't know a nice algebraic/categorical characterisation of the notion of observation, though probably Alexandra Silva does.

    The Church-Turing thesis characterises which abstract machines we think it is possible to physically implement.

    Some examples of models of computation are:

    1. Turing machines,
    2. finite state machines
    3. RAM machine programs
  4. A language is a programming language when you can give at least one model of the language via a machine model.

    For example:

    1. the types of the lambda calculus can be viewed as partial equivalence relations over Gödel codes for some universal turing machine, and the terms of a type can be assigned to equivalence classes of the corresponding PER.

    2. Regular expressions can be interpreted into finite state machines quotiented by bisimulation.

    3. A set in relational algebra can be realised as equivalence classes of B-trees, and relational algebra expressions as nested loops walking over these trees.

    Note that in all three cases we have to quotient the states of the machine model by a suitable equivalence relation to preserve the equations of the language's theory.

    This quotient is very important, and is the source of a lot of confusion. It hides the equivalences the language theory wants to deny, but that is not always what the programmer wants – e.g., is merge sort equal to bubble sort? As mathematical functions, they surely are, but if you consider them as operations running on an actual computer, then we will have strong preferences! (Conversely, the ability to equate radically different machine configurations is the reason why optimising compilers and refactoring are things.)

    Moreover, we have to set up this quotient so it agrees with the machine model's notion of observation – for example, we want exactly and only the strings accepted by a regular expression to end in accepting states of the finite state machine. This property is called adequacy, and is what justifies the claim that this is a reasonable implementation of the programming language.

In other words, a programming language is a language which has an adequate implementation in a model of computation. Note that this definition does not mandate the implementation strategy!

A common source of confusion arises from the fact that if you have a nice type-theoretic language (like the STLC), then:

  1. the term model of this theory will be the initial model in the category of models, and
  2. you can turn the terms into a machine model by orienting some of the equations the lambda-theory satisfies and using them as rewrites.

These oriented equations are where operational semantics comes from, at least for purely functional languages. As a result we often abuse language to say the simply typed lambda calculus "is" a programming language.

But in fact, the lambda calculus is a presentation of an algebraic theory, and it is the same programming language regardless of whether it is implemented by term rewriting or by compilation to x86 machine code.