Monday, October 20, 2014

Focusing is not Call-by-Push-Value

Ever since I learned about them, I've thought of call-by-push-value and focusing (aka polarization) as essentially two different views of the same problem: they both give a fine-grained decomposition of higher-order effectful programs which permits preserving the full βη-theory of the language.

Until this morning, I had thought that the differences were merely cosmetic, with CBPV arising from Paul Levy's analysis of the relationship between denotational semantics and operational semantics, and focusing arising an analysis of the relationship between operational semantics and proof theory (a lot of people have looked at this, but I learned about it from Noam Zeilberger). Both systems decompose a Moggi-style computational monad into a pair of adjoint operators, which mediate between values and computations (in CBPV) and positive and negative types (in focusing). So I thought this meant that “value type” and “positive type” were synonyms, as were “computation type” and “negative type”.

This morning, I realized I was wrong! Focusing and call-by-push-value make precisely the opposite choices in their treatment of variables! To understand this point, let's first recall the syntax of types for a call-by-push-value (on top) and a polarized (on bottom) calculus.

\begin{mathpar}
\begin{array}{llcl}
\mbox{Value Types} &
X,Y,Z & ::= & \Val{A} \bnfalt 0 \bnfalt X + Y \bnfalt 1 \bnfalt X \times Y \\
\mbox{Computation Types} & 
A,B,C & ::= & \F{X} \bnfalt X \to A \bnfalt \top \bnfalt X \With Y \\[1em]

\mbox{Positive Types} & 
P,Q & ::= & \Down{N} \bnfalt 0 \bnfalt P + Q \bnfalt 1 \bnfalt P \times Q \\
\mbox{Computation Types} & 
M,N & ::= & \Up{P} \bnfalt P \to N \bnfalt \top \bnfalt M \With N \\
\end{array}
\end{mathpar}

At first glance, these two grammars look identical, save only for the renamings $\Val{-} \iff \Down{-}$ and $\F{-} \iff \Up{-}$. But this is misleading! If they are actually the same idea, the reason has to be much more subtle. The reason for this is that the typing judgements for these two systems are actually quite different.

In call-by-push-value, the idea is that $\Val{A}$ is a functor which is left adjoint to $\F{X}$. As a result, values are interpreted in a category of values $\ValueOp$, and computations are interpreted in a category of computations $\CompOp$. The adjunction between values and computations means that the hom-set $\VHom{X}{\Val{A}}$ is ismorphic to the hom-set $\CHom{\F{X}}{A}$. This adjunction gives rise to the two basic judgement forms of call-by-push-value, the value judgement $\judge{\Gamma}{v}{X}$ and the computation judgement $\judgec{\Gamma}{t}{A}$. The idea is that $\interp{\judgev{\Gamma}{v}{X}} \in \VHom{\Gamma}{X}$ and $\interp{\judgec{\Gamma}{t}{A}} \in \CHom{\F{\Gamma}}{A}$.

The key bit is in the interpretation of contexts in computations, so let me highlight that:

$\interp{\judgec{\Gamma}{t}{A}} \in \CHom{\F{\Gamma}}{A}$

Note that we interpret contexts as $\F{\Gamma}$, and so this says that variables refer to values.

However, in a polarized type theory, we observe that positive types are “left-invertible”, and negative types are “right-invertible”. In proof theory, a rule is invertibile when the conclusion implies the premise. For example, the right rule for implication introduction in intuitionistic logic reads

\begin{mathpar}
\inferrule*[]
          {\judgend{\Gamma, S}{T}}
          {\judgend{\Gamma}{S \to T}}
\end{mathpar}

This is invertible because you can prove, as a theorem, that

\begin{mathpar}
\inferrule*[]
          {\judgend{\Gamma}{S \to T}}
          {\judgend{\Gamma, S}{T}}
\end{mathpar}

is an admissible rule of the system. Similarly, sums have a left rule:

\begin{mathpar}
\inferrule*[]
          {\judgend{\Gamma, S}{Q} \\ \judgend{\Gamma, T}{Q}}
          {\judgend{\Gamma, S + T}{Q}}
\end{mathpar}

such that the following two rules are admissible:

\begin{mathpar}
\inferrule*[]
          {\judgend{\Gamma, S + T}{Q}}
          {\judgend{\Gamma, S}{Q}}

\and

\inferrule*[]
          {\judgend{\Gamma, S + T}{Q}}
          {\judgend{\Gamma, T}{Q}}
\end{mathpar}

The key idea behind polarization is that one should specify the calculus modulo the invertible rules. That is, the judgement on the right should fundamentally be a judgement that a term has a positive type, and the hypotheses in the context should be negative. That is, the two primary judgements of a polarized system are the positive introduction judgement

$\judge{\Gamma}{v}{P}$

which explains how introductions for positive types work, and the negative elimination (or spine judgement)

$\spine{\Gamma}{s}{N}{P}$

which explains how eliminations for negative types work. The eliminations for positive types are derived and the introductions for negative types are derived judgements (which end up being rules for pattern matching and lambda-abstractions) which make cut-elimination hold, plus a few book-keeping rules to hook these two judgements together. The critical point is that the grammar for $\Gamma$ consists of negative types:

$\Gamma ::= \cdot \bnfalt \Gamma, x:N$

This is because positive types are (by definition) left-invertible, and so there is no reason to permit them to appear as hypotheses. As a result, the context clearly has a very different character than in call-by-push-value.

I don't have a punchline for this post, in the sense of “and therefore the following weird things happen as a consequence”, but I would be astonished if there weren't some interesting consequences! Both focalization and call-by-push-value teach us that it pays large dividends to pay attention to the fine structure of computation, and it's really surprising that they are apparently not looking at the same fine structure, despite apparently arising from the same dichotomy at the type level.

14 comments:

  1. I believe you can account for this difference - and see focusing as an enrichment of CPBV, iv anything - by generalizing your view of what focusing is to the account I finally worked out in Structural Focalization. That account allows hypothetical contexts with hypotheses x:N which are discharged by cut admissibility and hypotheses z:<P> which are discharged by regular-old-bog-standard-substitution (of values for hypotheses, no less!). There's an interesting observation which I don't understand the significance of, that cut admissibility only works in the presence of atomic positive hypotheses, but if give yourself freedom to halt pattern matching for positive values (the admissibility of which turns out to be an identity principle rule in structural focalization).

    I originally came about this observation not because of the differences between CBPV and the usual (Laurent-Zeilberger-Liang-Miller) account of focusing. Rather, it grew out of worry about the treatment of atomic propositions in focusing - you're forced to extend the grammar of focusing to allow x:N and z:p+. And out of three years that I spent trying to understand ~50 lines of Twelf code Frank wrote one night.

    ReplyDelete
    Replies
    1. I didn't finish my thought... if you give yourself freedom to halt pattern matching for positive values, you can make terms in the focused system look (and, I believe, behave) just like CPBV terms.

      I also meant to add in the end: there's much to the distinctions you've noticed in terms of programming that hasn't been sufficiently explored at all! But I think Structural Focalization makes a good bit of headway on the proof theory side of the problem by giving an account of hypothetical values and their discharge.

      Delete
    2. Yeah, I was going to make a similar comment, that there is a concept of "weak focusing" (where the context is allowed to contain invertible hypotheses) which is a refinement of Andreoli's focusing, and that this is explained in your paper which Neel linked to. (I wouldn't go so far as to call this an "enrichment" of CBPV, because I think Paul Levy takes an equally broad view of what "CBPV" means.)


      I would also echo the sentiment that these distinctions should be further explored. From my perspective, one of the reasons why it is difficult to communicate in this area is that all of these distinctions (unfocused/focused/weakly-focused/etc) collapse in the traditional categorical models used to assign denotations to proofs, so we need new kinds of models.

      Delete
    3. Didn't see that Neel was linking to me already. Headdesk, red-face, etc.

      I've trained myself out of thinking about weak focusing, because what Frank and I call weak focusing was designed to avoid a problem that Structural Focalization resolves gracefully. Thanks for bringing it up here, Noam, it seems quite relevant, since you naturally end up with a context of strictly positive propositions.

      And your point is well taken about Levy taking an equally expansive view of CPBV. That was a graceless comment on my part.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. Yeah... I *think* I agree with Rob and Noam here. To try to put it succinctly, isn't the story here just that CBPV is "concentrating and positive-one-step-inverting" whereas Andreoli is "fully concentrating and inverting"? (and ordinary natural deduction is "concentrating on negatives and positive-one-step-inverting"?

    ReplyDelete
  4. Saying "recall the syntax of types" is a little confusing: I don't recall seeing any presentation of CBPV with a connective called ∧. Levy (1999) has a "Π", which is arguably an n-ary intersection with explicit intro and elim constructs, so I guess that's what you mean?

    (This doesn't affect the rest of your post, since it mentions neither ∧/Π (CBPV) nor ∧ (focusing)…)

    ReplyDelete
  5. I suspect the observation that contexts in CPBV are values may be rather shallowly based on the particular (common) choice of presentation. For example, you asked that arrow have a positive type on the left, but is that an essential choice, or couldn't we just as well work with a CPBV-style system taking a negative on the left of the arrow? When I played with this design space (in the context of definition of realizability truth and falsity values for simply-typed-lambda-calculus), I had the impression that either design choices were possible and corresponded to different flavors of this familiar connective -- just as focused variants of . If all connectives require positives in their negative occurences, then a representation with only values in the context is possible, but as soon as one of them takes a computation this property (coincidence) fades away.

    I've been trying recently to work out the relation between Guillaume Munch-Maccagnoni's polarized presentation of System L and focusing for sequent calculi. Guillaume insists on studying the untyped calculus first, and that is an interesting (and unusual to me) experience; (untyped) normal forms have a phase-alternating structure that seems strongly related to focusing, but one cannot enforce what Jason calls "full concentration" above without type information, so that would be another form of weak focusing.

    ReplyDelete
    Replies
    1. CBPV is less about restricting the arrow to take a positive on the left than saying that it is indeed possible to have a negative on the left provided that it is mediated by U. In other words negative computations can be passed as values too. This is probably what happens in the realizability semantics you are thinking about.

      Reacting to the blog post, it would probably help to relate focusing, weak focusing, CBPV, etc. if we had a calculus X of which we could say, in a formal way, "CBPV is an indirect denotational semantics of X", "(strong) focusing is the study of (eta-long and) beta-normal forms of X"... Then there probably would be no surprise about both the similarity and the differences between focusing and CBPV.

      Delete
  6. Since I'm interested in writing elegant compilers, I'm curious on how this line of work can be used to improve on CPS. I've recently read "A dissection of L" and talked with its author Arnaud Spiwack; he argues that System L makes for a good compiler intermediate language, and I believe the reason is similar to why CPS is a good language (sometimes better than ANF, see "Compiling with Continuations, continued") — apparently, System L is (secretly?) "the language of CPS" (I hear that's a quote from Guillaume Munch). I've recently run into advantages of a variant of CBPV over ANF, so I might just use both together.

    Now I'd like to see whether CBPV relates to polarized System L like ANF relates to CPS.

    ReplyDelete
    Replies
    1. Dear Paolo, although I don't recognize this precise quote from my publications, I have indeed studied how CPS translations factor through abstract-machine-like calculi ("L" calculi) with polarities. This is not secret---I'll be happy to provide you with more detailed references in a private message---you can find it in my PhD thesis and my articles, along with other uses of abstract-machine-like calculi with polarities, as intermediate representations, not for compilers yet, but applied so far to logic and category theory, as well as programming languages (in Chapter III and to a certain extent Chapter IV).

      Moreover, in an upcoming article with Gabriel Scherer (which I expect won't remain "secret" for very long) we argue that abstract-machine-like calculi offer a principled reconstruction in direct style of CPS and defunctionalisation (you can read already Chapter I of my thesis for this perspective). This can be seen in the continuity of Danvy et al.'s works that relate abstract machines to CPS and defunctionalisation, and Hofmann and Streicher's work before them.

      In our article, a polarised abstract-machine-like calculus is used to model sums elegantly. The link with CBPV is meant to be that of a direct-style language for CBPV models---a precise correspondence is the subject of ongoing work (see already the Chapter II from my thesis).

      I hope this helps, and can only encourage you to integrate abstract-machine-like calculi in your investigations.

      Delete
  7. Hi Neel, any update on these thoughts on focusing/polarisation and CBPV? To be honest, I couldn't follow your entire argument. I did notice that back in 2012, Robert Harper made a comment that he and Paul though that focusing and CPBV were "essentially" the same thing.

    https://existentialtype.wordpress.com/2012/08/25/polarity-in-type-theory/#comment-1047

    ReplyDelete
    Replies
    1. Hi Steven, as an update to my comments above, I can now mention two relevant articles, [1] with Gabriel Scherer and [2] with Marcelo Fiore and Pierre-Louis Curien. The point of view of these articles is that CBPV and intuitionistic focusing are indeed based on the same model of higher-order computation with evaluation order.

      But claiming “Focusing is not Call-by-Push-Value” appeared to me a good reaction to the idea that they are the same. In fact, I would go even further than Neel. Although the two formalisms show striking similarities, they arise from very different perspectives (the study of normal forms for focusing, and monadic-style models for CBPV). Clarifying their resemblance and differences using formal comparisons, when they are already very similar formally, is missing the point. One cannot be satisfied by an explanation of CBPV as a form of weak focusing.

      In [1,2], we aim for a unified representation of the model of computation behind CBPV and intuitionistic focusing. But for this we need to be careful about further crucial distinctions. The most important is that polarities do not need to coincide with the focusing properties, but they do coincide with evaluation orders. In this approach, the structure of CBPV and (weak or strong) focusing can be recovered as various ways of eliminating binders in terms (corresponding to some monadic style, and to reduction to some normal forms). Thus, in line with Noam's comment, it means that one has to also introduce distinctions, between terms, that are traditionally collapsed, suggesting that we need new kinds of models, as he writes. In particular, I believe that everything in [1,2] should (linearity aside) fit within a theory of bi-cartesian closed (bi-)duploids [3] where “bi-cartesian closed” is currently being worked on, and the second “bi-” has yet to be investigated, but is strongly suggested by the syntax.

      [1] Polarised Intermediate Representation of Lambda Calculus with Sums, Proc. LICS 2015.

      [2] A Theory of Effects and Resources: Adjunction Models and Polarised Calculi, Proc. POPL 2016.

      [3] Models of a Non-Associative Composition, Proc FoSSaCS 2014.

      Delete
  8. Neel, should this post say instead that U is *right adjoint* to F?

    ReplyDelete