Semantic Domain: 2021

Thursday, December 30, 2021

What is a programming language?

One of the more surprising facts about the discipline of programming language theory is that it is actually possible to define what programming languages are in a reasonably mathematically satisfying way.

A language is a presentation of a (generalised) algebraic theory. Basically, think of a language as a set of generators and relations in the style of abstract algebra.

You need to beef up the universal algebra story a bit to handle variables and binding (e.g., see the work of Fiore and Hamana on higher-order algebraic theories), but the core intuition that a language is a set of generators for terms, plus a set of equations these terms satisfy is exactly the right one.

For example:
1. the simply-typed lambda calculus
2. regular expressions
3. relational algebra
A model of a a language is literally just any old mathematical structure which supports the generators of the language and respects the equations.

For example:
1. we can model the typed lambda calculus using sets for types and mathematical functions for terms,
2. we can model regular expressions as denoting particular languages (ie, sets of strings)
3. we can model relational algebra expressions as sets of tuples
A model of computation or machine model is basically a description of an abstract machine that we think can be implemented with physical hardware, at least in principle. So these are things like finite state machines, Turing machines, Petri nets, pushdown automata, register machines, circuits, and so on. Basically, think of models of computation as the things you study in a computability class.

Nearly all models of computation are transition systems, where you have a set of states, and a transition function which tells you how a computation can change one state into another. There are lots of generalisations of this (relations instead of functions, I/O, system calls, message passing communication, etc.), and the slick way of accounting for this generality is by modelling transition systems as coalgebras (i.e., the set S paired with a coalgebra map step : S → F(S), where F is a functor.

In addition to this, every model of computation also comes with a notion of observation – in its most basic form, how do you tell when a program has finished executing? Observations can get quite intricate: to model computational complexity, you have to be able to observe runtime and memory usage, and for concurrent programs you can observe intermediate program states (due to the possibility of concurrent interference). More practically, the compiler correctness theorem for something like CompCert says that the sequence of system calls the compiled binary produces will be the same as the one the C abstract machine will produce.

I don't know a nice algebraic/categorical characterisation of the notion of observation, though probably Alexandra Silva does.

The Church-Turing thesis characterises which abstract machines we think it is possible to physically implement.

Some examples of models of computation are:
1. Turing machines,
2. finite state machines
3. RAM machine programs
A language is a programming language when you can give at least one model of the language via a machine model.

For example:
1. the types of the lambda calculus can be viewed as partial equivalence relations over Gödel codes for some universal turing machine, and the terms of a type can be assigned to equivalence classes of the corresponding PER.
2. Regular expressions can be interpreted into finite state machines quotiented by bisimulation.
3. A set in relational algebra can be realised as equivalence classes of B-trees, and relational algebra expressions as nested loops walking over these trees.
Note that in all three cases we have to quotient the states of the machine model by a suitable equivalence relation to preserve the equations of the language's theory.

This quotient is very important, and is the source of a lot of confusion. It hides the equivalences the language theory wants to deny, but that is not always what the programmer wants – e.g., is merge sort equal to bubble sort? As mathematical functions, they surely are, but if you consider them as operations running on an actual computer, then we will have strong preferences! (Conversely, the ability to equate radically different machine configurations is the reason why optimising compilers and refactoring are things.)

Moreover, we have to set up this quotient so it agrees with the machine model's notion of observation – for example, we want exactly and only the strings accepted by a regular expression to end in accepting states of the finite state machine. This property is called adequacy, and is what justifies the claim that this is a reasonable implementation of the programming language.

In other words, a programming language is a language which has an adequate implementation in a model of computation. Note that this definition does not mandate the implementation strategy!

A common source of confusion arises from the fact that if you have a nice type-theoretic language (like the STLC), then:

the term model of this theory will be the initial model in the category of models, and
you can turn the terms into a machine model by orienting some of the equations the lambda-theory satisfies and using them as rewrites.

These oriented equations are where operational semantics comes from, at least for purely functional languages. As a result we often abuse language to say the simply typed lambda calculus "is" a programming language.

But in fact, the lambda calculus is a presentation of an algebraic theory, and it is the same programming language regardless of whether it is implemented by term rewriting or by compilation to x86 machine code.

Wednesday, December 1, 2021

Obliteratingly Fast Parser Combinators

Jeremy Yallop, Ningning Xie, and I have a new draft out about combinator parsing, entitled Fusing Lexing and Parsing. The abstract is short, but sweet:

Lexers and parsers are typically defined separately and connected by a token stream. This separate definition is important for modularity, but harmful for performance.

We show how to fuse together separately-defined lexers and parsers, drastically improving performance without compromising modularity. Our staged parser combinator library, flap, provides a standard parser combinator interface, but generates psecialized token-free code that runs several times faster than ocamlyacc on a range of benchmarks.

I'm excited about this for three reasons.

First, even though the performance is way better this time around, going from competitive with lex+yacc to 3-9 times faster, the implementation is actually simpler than in our 2019 paper – we don't need a complex staging pipeline because we introduce an IR which makes all the invariants needed for fast code generation lexically apparent.

Second, really fast parsers are potentially a powerful tool for applications like IDEs. If you are fast enough, you are free from having to mess around with things like incremental parsing, because if you can parse a whole editor buffer in a millisecond it's fine to just reparse everything on every keystroke. Imagine what could happen if we could fused in typechecking, too! That's currently just wishful thinking (aka "future work"), of course, but since typechecking is mostly a tree traversal, it's not totally hopeless.

Finally, at a meta-level this is a good illustration of the incorrectness of the standard engineer proverb "fast, good, cheap – pick two". This advices presumes that we lie on the Pareto frontier, so the best you can do is manage tradeoffs. But that's actually a false assumption – it is possible to do things which are unconditionally better along all three dimensions.

No source code is ready for release yet, but Jeremy is likely to release an OPAM package for it in due course.

Wednesday, November 17, 2021

POPL till you DROPL

My PhD student Faustyna Krawiec, along with me and several other coauthors, has a new paper out, which is due to appear at POPL 2022:

Provably Correct, Asymptotically Efficient, Higher-Order Reverse-Mode Automatic Differentiation

Faustyna Krawiec, Simon Peyton-Jones, Neel Krishnaswami, Tom Ellis, Richard A. Eisenberg, and Andrew Fitzgibbon.

In this paper, we give a simple and efficient implementation of reverse-mode automatic differentiation, which both extends easily to higher-order functions, and has run time and memory consumption linear in the run time of the original program. In addition to a formal description of the translation, we also describe an implementation of this algorithm, and prove its correctness by means of a logical relations argument.

We give an incredibly simple implementation of reverse-mode AD, which works in a language with higher-order functions, and is asymptotically efficient. When we started the work I was hoping it would be a good excuse to learn some differential geometry, but it turned out there is such a simple algorithm that we didn't need any fancy machinery! So I still don't know any differential geometry, but we've got a cool AD algorithm to share.

That's the only POPL paper this year with my name on it, but it turns out that a lot of students and close collaborators have papers without me as well! It might not matter for my academic prestige, but it makes me happy, especially when I see how interesting the work is!

My Phd student, Dima Szamozvancev and his other advisor Marcelo Fiore, have a paper Formal Metatheory of Second-Order Abstract Syntax. They reformulated Marcelo's work on second-order abstract syntax to fit into a constructive type theory, and then mechanised it in Agda. As a result, you can give a description of a syntax, and automatically get out the substitution operators and their correctness property. One particularly powerful feature is that it doesn't just handle binding, but also metavariables. This is essential if you want handle things like inference rules as first-class values!
My newest PhD student, Ilya Kaysin, has a paper, The Leaky Semicolon (no link yet) with several other people giving a compositional semantics to weak memory. As a fun fact, Ilya is entering his PhD with more POPL publications than I exited with. :)
Vikraman Chaudhury, is a (former) PhD student of Amr Sabry's, but he spent a year visiting me and we wrote a paper together on the semantics of capabilities. At this POPL, he has a paper with Amr Sabry and Jacek Karwowski, Symmetries in Reversible Programming: From Symmetric Rig Groupoids to Reversible Programming Languages4, which gives a nice semantics for reversible programming languages.

Tuesday, September 28, 2021

Design Benchmarks

A few weeks ago, Shriram Krishnamurthi asked an interesting question on Twitter:

It's common in PL to have benchmarks for performance. But does anyone know of PL design benchmarks?

This caught my attention because my longest-running research program -- my work on functional reactive programming, with 7 papers so far -- has been explicitly guided by a handful of design benchmarks.

I've never written them down explcitly, so I thought I'd take the opportunity to do so.

The language must validate the full beta- and eta-laws for all of the type constructors in the language.

This is a proxy benchmark, of course. My actual goal is to make languages which are easy to program with and reason about.

However, this is an open-ended and vague goal, and so I needed to find a specific, concrete operationalisation of it. There are many possible proxies for this goal, and the one I picked was to look for programming languages which validate a rich collection of semantically-justified equations. This specific choice was made for three reasons.
- First, this goal it pushes me away from spending time on quality-of-life features like rich standard libraries. These features obviously do make a language easier to use, but they do not directly touch upon any FRP-specific problems.
- Second, the search for well-behaved equational properties pushes me towards taking features known to be useful, and trying to decompose them into smaller type-theoretic components.
  
  As Jeff Raskin once noted, in user interfaces "intuitive" and "familiar" are basically synonyms, which means it is good to have a design benchmark which does not penalise doing things that are different from the old standard.
- Third, déformation professionnelle. This particular goal is the one that the toolbox I've acquired over the years -- expertise in semantics and type theory -- is best-adapted to move me towards.
  
  If my expertise lay in running user studies, writing model checkers, or programming with continuations, then I would probably approach the problem differently! For example, the work on the Garnet toolkit is both extremely good and very influential, and very different in character from anything I have pursued.
  
  Obviously, I could learn how to write a constraint solver or run a user study, but the art is long, and life short. I think it's OK for people to focus on different things, and to learn from each other. :)
I have a small collection of tiny programs which are intended to model various common UI patterns, and the languages I design must express these programs.

The purpose of these programs is basically to prevent over-simplifying away real complexity. If we're trying to optimise a language design for ease of reasoning, the easiest way of "achieving" the goal is to design an impoverished language.

For example, I check whether I can write programs which dynamically create new widgets, to ensure that I can't design something which restricts programmers to a fixed dataflow graph. Likewise, a program with two widgets sending notifications "sideways" to each other keeps me from trying to rely on hierarchical encapsulation. (This is super tempting, but as far as I can tell, no actual program consistently does this.)
The language must admit a simple implementation strategy.

This design principle arose as a reaction to a negative lesson.

My first paper on higher-order FRP, Ultrametric Semantics of Reactive Programs, used a system of dynamic, incremental dependency propagation (a baby version of the stuff Matt Hammer and Umut Acar used to do), and it ended up requiring a really weird implementation strategy for closures. In addition to being able to call them, I also needed to be able to move forward them in time, and this required a really weird set of hacks to go into a closure and move everything in its captured environment forward in time as well.

This was painful to implement, difficult to prove correct, and it leaked memory like a sieve. Trying to deal with these issues led directly to my ICFP 2013 paper on FRP, which is basically a type system for immediate mode GUI programming.

Note that I don't mind if the implementation strategy is challenging to prove correct (eg, my POPL 2021 paper with Simon Spies and Derek Dreyer is about inventing some needed machinery for modelling retained-mode GUI implementations), but the actual running code has to be simple.

This means I haven't looked at any dataflow or constraint stuff in years, but sometimes a design rule is there to prevent you from doing certain things.

Thursday, July 8, 2021

Postdoctoral opening in the TypeFoundry project

I have an opening for a postdoctoral position in Cambridge with the European ERC Consolidator Grant project TypeFoundry. This is a 2-year position, with the possibility of extension.

The TypeFoundry project aims to use recent developments in proof theory and semantics, such as polarized type theory and call-by-push-value, to identify the theoretical structures underpinning bidirectional type inference.

Informal enquiries are welcome; please contact me at: <nk480@cl.cam.ac.uk>.

More details and an application form can be found at <https://www.jobs.cam.ac.uk/job/30485/>.

Monday, February 15, 2021

Five (and a Half) Derivatives in Language Theory

Thanks to the influence of machine learning, differentiating programs has become a big business. However, there are a lot of other things in programming language theory which are also derivatives, and people sometimes get confused about the inter-relationships. So I decided to lay out some of them and some of their interconnections.

1. Brzozowski Derivatives

The first kind of derivative is the humble, yet beloved, Brzozowski derivative. Given a character set $\Sigma$ and a regular language $L \subseteq \Sigma^\ast$, the Brzozowski derivative of $L$ with respect to the character $c \in \Sigma$ is

\[ \delta_c(L) = \left\{ w \in \Sigma^\ast \;\mid|\; c \cdot w \in L \right\} \]

That is, it's the subset of $L$ prefixed with $c$, minus the leading character. It can be defined as a transformations on regular expressions $r$ as follows:

\[ \begin{array}{lcll} \delta_c(0) & = & \bot & \\ \delta_c(r_1 + r_2) & = & \delta_c(r_1) + \delta_c(r_2) & \\ \delta_c(c') & = & \epsilon & \mbox{when } c = c'\\ \delta_c(c') & = & \bot & \mbox{when } c \neq c' \\ \delta_c(\epsilon) & = & \bot & \\ \delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 & \mbox{when $r_1$ not nullable} \\ \delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 + \delta_c(r_2) & \mbox{when $r_1$ nullable} \\ \delta_c(r \ast) & = & \delta_c(r) \cdot r\ast & \\ \end{array} \]

The Brzozowski derivative is called a derivative for two reasons. First, it is linear with respect to $+$ on regular expressions -- i.e., $\delta_c(r_1 + r_2) = \delta_c(r_1) + \delta_c(r_2)$. Second, it satisfies the derivative rule for monomials $\delta_c(c^{n+1}) = c^n$. You may recall that the high-school rule for derivatives says $\delta_c(c^{n+1}) = n \times c^n$, but recall that in regular expressions, $r + r = r$, so $n \times r = r$.

However, this is a weak notion of derivative, which satisfies few of properties with other derivatives. The reason for this is that the rule for products does not match the product rule for high-school derivatives -- if the Brzozowski derivative satisfied the high school rule, it would be

\[ \delta_c(r_1 \cdot r_2) = \delta_c(r_1) \cdot r_2 + r_1 \cdot \delta_c(r_2) \]

2. Differential algebras

If we try to change the definition of Brzozowski derivative to match the high school rule, we get what is called a differential algebra. A differential algebra is a semiring $S$, plus a collection of unary functions $\delta_i : S \to S$ which are called the derivations. The axioms for derivations mirror the rules for differentiating polynomials -- the motivating model for differential algebras is a semiring of polynomials with addition and multiplication of polynomials as the semiring structure, and the derivation is the derivatives with respect to the polynomial's variable. The rules we get are:

\[ \begin{array}{lcll} \delta_c(0) & = & 0 & \\ \delta_c(r_1 + r_2) & = & \delta_c(r_1) + \delta_c(r_2) & \\ \delta_c(c') & = & \epsilon & \mbox{when } c = c'\\ \delta_c(c') & = & \bot & \mbox{when } c \neq c' \\ \delta_c(\epsilon) & = & 0 & \\ \delta_c(r_1 \cdot r_2) & = & \delta_c(r_1) \cdot r_2 + r_1 \cdot \delta_c(r_2) & \\ \delta_c(r \ast) & = & r\ast \cdot \delta_c(r) \cdot r\ast & \\ \end{array} \]

Note that the only things that changed is the clause for concatenation and the Kleene star. The axioms of a differential algebra actually say nothing about Kleene star, because they are specified for semirings, but to handle full regular languages you need to change the definition so that the differential respects the equation $r \ast = \epsilon + r \cdot r\ast$. If memory serves, Dexter Kozen has studied this extension. (Even if memory doesn't serve, he's a safe bet...!)

If you look at this definition for a while, you realize that the original Brzozowski derivative removes a single character $c$ from the front of a string, and the differential removes a single $c$ from anywhere in the string.

For regular algebras, this is a peculiar thing to want to do, but it suddenly becomes a lot more interesting when we move from semirings of polynomials to semiring categories of polynomial functors.

3. Derivatives of Data Types

One interesting factoid is that sets "look like" a semiring. The Cartesian product $A \times B$ and the disjoint union $A + B$ obviously don't satisfy the semiring equations, but if you replace the equality symbol with isomorphisms, then they do. If you take this analogy seriously and try categorify the notion of a semiring, we get what is variously called a rig category, semiring category, or bimonoidal category.

Instead of a set plus two binary operations, we have a category with two monoidal structures with the semiring equations becoming isomorphisms. The exact commuting diagrams needed were worked out in the early 70s, independently by Laplaza and Kelly.

One semiring category very useful category for programming is the category of polynomial functors. Most inductive datatypes like lists, binary trees, and so on can be understood as the least fixed point of sum and products. We can formalise this idea by giving a grammar of sums, products and

\[ \begin{array}{lcll} F,G & ::= & \mathrm{Id} & \\ & | & \underline{0} \\ & | & F \oplus G & \\ & | & \underline{1} \\ & | & F \otimes G & \\ & | & K(A) & \mbox{where } A \in \mathrm{Set} \\ \end{array} \]

Every term of this grammar can be interpreted as a functor:

\[ \newcommand{\interp}[1]{[\![ #1 ]\!]} \begin{array}{lcll} \interp{F} & : & Set \to Set \\ \interp{\mathrm{Id}} & = & X \mapsto X \\ \interp{\underline{0}} & = & X \mapsto \interp{F \oplus G} & = & X \mapsto \interp{F}\,X + \interp{G}\,X \\ \interp{F \otimes G} & = & X \mapsto \interp{F}\,X \times \interp{G}\,X \\ \interp{K(A)} & = & X \mapsto A \end{array} \]

So you can see that $\oplus$ is interpreted as the coproduct functor, $\otimes$ is the product functor, $K(A)$ is the constant functor, and $\mathrm{Id}$ is interpreted as the identity functor.

The least fixed point of one of these polynomials is an inductive type. So to model lists of type X, we would define the list functor

\[ \mathrm{ListF} \triangleq K(1) \oplus (K(X) \otimes \mathrm{Id}) \]

and then take its least fixed point.

Since we have a semiring category, we can ask if it has a categorical analogue of a derivation, to give it differential structure. And these polynomials do! The derivative with respect to the variable $\mathrm{Id}$ can be defined as

\[ \begin{array}{lcll} \delta(K(A)) & = & 0 & \\ \delta(\underline{0}) & = & 0 & \\ \delta(F \oplus G) & = & \delta(F) \oplus \delta(G) & \\ \delta(\mathrm{Id}) & = & 1 & \\ \delta(\underline{1}) & = & 0 & \\ \delta(F \otimes G) & = & \delta(F) \otimes G \oplus F \otimes \delta(G) & \\ \end{array} \]

If you have a functor for a binary tree:

\[ \mathrm{TreeF} \triangleq \underline{1} \oplus (\mathrm{Id} \otimes \mathrm{Id}) \]

Then the differential is: \[ \delta(\mathrm{TreeF}) = (\underline{1} \otimes \mathrm{Id}) \oplus (\mathrm{Id} \otimes \underline{1}) \]

This tells you whether the left or the right position was chosen. The utility of this is that given an element $x \in X$, and you have an element of $\delta(F)\,X$, then you can "fill in the gap" to get an element of X. That is, you can define:

\[ \mathit{fill}_F : \delta(F)\,X \times X \to F\,X \]

If you take $X$ to be the type of trees $\mu(\mathrm{TreeF})$, this gives you an indication of whether to take the left or right subtree, plus the tree you didn't follow. Then a list of these things gives you a path into a tree -- which is just Huet's famous zipper.

The exposition so far largely glosses Conor McBride's famous paper The Derivative of a Regular Type is the Type of its One-Hole Contexts.

The precise connection to linearity and derivatives was explored by Abbot, Altenkirch, Ghani and McBride in their paper Derivatives of Containers, using significantly more categorical machinery. In their paper, Altenkirch et al explain how the $\mathit{fill}$ operation is really a linear function

\[ \mathit{fill}_F : \delta(F)\,X \otimes X \multimap F\,X \]

and how $\delta(F)$ can be viewed as giving a kind of tangent for $F$ at $X$.

4. Differential Linear Logic

The appearance of linear types here should be delightful rather than shocking. We think of the ususal derivative as associating a tangent vector space to each point of an original space, and some of the primordial models of linear type theory are finite-dimensional vector spaces and linear transformations (a model of the multiplicative-additive fragment of classical linear logic, indeed also of the exponentials when the field the vector space is over is finite), and Banach spaces and short maps, a simple model of intuitionistic linear logic including the full exponential.

As a result, we should expect that we should be able to find a way of making differentiation makes sense within linear type theory, and since a type theory is the initial model of a class of models, this means that derivative-like constructions are something we should look for any time linearity arises.

The pure form of this recipe can be found in Erhard and Regnier's differential linear logic. One of the really cool features of this line of work is that derivatives arise as an extension to the usual structural rules for the derivatives -- you just adjust how to treat variables and derivatives appear like magic. One fact about this line of work is that it has tended to focus on proof nets rather than sequent calculus, which made it less accessible than one would like. What solved this issue for me was Marie Kerjean's A Logical Account for Linear Partial Differential Equations.

I do want to admit that despite starting off this section saying you shouldn't be shocked, actually I am -- this stuff really seems like sorcery to me. So even though I believe intellectually that everything should work out like this, I'm still astounded that it does. (It's rather like the feeling I get whenever I run a program I did a machine-checked correctness proof of.)

5. Automatic Differentiation

These days, the most famous kind of differentiation in language theory is, of course, automatic differentiation, which takes a program and rewrites it to produce a new program computing the derivative of the original program. Automatic differentiation comes in two main varieties, called forward mode and reverse mode.

The idea of forward mode is incredibly simple -- you just treat real numbers as an abstract datatype, and replace your original real number implementation with a new one based on dual numbers. Intuitively, a dual number is a pair $a + b \epsilon$, where $\epsilon$ is just a formal infinitesimal which is "nilpotent" (i.e., $\epsilon^2 = 0$).

Then the data abstraction properties of the typed lambda calculus ensure that when you run the program with the dual number implementation of reals, you are returned a pair of the original value and its partial derivatives (i.e. the Jacobian matrix). Dan Piponi blogged about this over 15 years ago(!), but even though I read it at the time, the algorithm was so simple I failed to recognise its importance!

Reverse mode is more complicated -- it comes from the observation that in machine learning, you are generally optimising a function $R^n \to R$, where the result is some scoring function. Since $n$ might be very large and the dimensionality of the result is just 1, you can achieve superior computational complexity by computing the transpose of the Jacobian rather than the Jacobian itself.

When you do so, the resulting syntactic program transformations all look like they are doing fancy continuation manipulations, and their semantic counterparts all look like manipulations of the dual space. So the syntatic continuation/double-negation operations are reflecting the fact that that finite-dimensional vector spaces are isomorphic to their double-duals (i.e., $V^{\ast\ast} \simeq V$).

In addition to being really neat, this is actually a really nice problem to look at. People basically care about reverse mode AD because it goes fast, which means you get this lovely interplay between operational concerns and semantic models. A second bonus is that AD offers computer scientists a good reason to learn some differential geometry, which in turn is a good excuse to learn general relativity.

6? Incremental Computation

After all this talk of differentiation, you might wonder -- what about finite differences?

It turns out that they have a nice interpretation in lambda calculus, too! In 2015, Yufei Cai, Paolo Giarusso, Tillman Rendel and Klaus Ostermann introduced the incremental lambda calculus.

The basic idea behind this is that for each type $X$, you introduce a type of "changes" $\Delta X$ for it too. A morphism between $(X, \Delta X)$ and $(Y, \Delta Y)$ is a pair of functions $f : X \to Y$ and $\mathit{df} : X \to \Delta X \to \Delta Y$, where $\mathit{df}$ is the incrementalization -- $df\,x\,\mathit{dx}$ tells you how the to change the output of $f$,x to account for the change $\mathit{dx}$ to the input.

This looks a lot like what you get out of the tangent space functor, but it turns out that the incremental lambda calculus is not simply a derivative a la the differential lambda calculus, because the derivative of a function is unique, and these incrementalisations don't have to be. Michael Arntzenius and I used the ILC in our paper on extending seminaive evaluation from Datalog to a higher-order language, and it was critical for the efficiency of our result that we had this freedom.

Mario Picallo-Alvarez worked out answers to a lot of the semantics questions underlying this in his PhD thesis, Change actions: from incremental computation to discrete derivatives, showing how incremental lambda calculus arises as a generalisation of the models of the differential lambda calculus.

He noted that the "real" semantics for change types should look like a category, in which a type is a category where the objects are values, and a morphisms $\delta : v_1 \to v_2$ is evidence that $v_1$ can be changed into $v_2$. On the grounds that 2-categories are painful, Mario worked with change monoids (i.e., no typing, but yes to identities and composition), and the Giarusso-style change structures that Michael and I used retained the typed changes, but discarded all the categorical axioms. So there's more to be done here!