If you read about programming languages on the Internet, you'll inevitably run into (or even participate in) long arguments about whether dynamic or static typing is better. A long time ago, before I went to grad school, I enjoyed arguing with people on the Internet about this very topic.
But, I no longer find myself interested in that debate. This is for two reasons. One, like so many addicts before me, I've moved onto harder drugs, such as frequentism versus Bayesianism. (Frequentism is good, actually, and p-values are one of the great inventions of science. Just so you know.)
Two, I learned that these debates largely end up being echoes of an argument Dana Scott had with himself. You can tell he's a genius, since he ended up producing many of the strongest arguments for and many of the strongest arguments against!
One of the best arguments against grounding semantics on the untyped lambda calculus is Dana Scott's unpublished paper from 1969, A type-theoretical alternative to ISWIM, CUCH, OWHY. In this paper, he argues that untyped syntactic models are difficult to reason about, and we should design meta-languages with a clear mathematical status, in order to facilitate reasoning about languages and programs.
But why is it unpublished? Let's see what Scott himself has to say:
In particular, it was recognized that there were posets whose continuous function spaces of self-maps could be identified with the given posets themselves. And so there actually were “traditional” models of lambda-calculus that could be defined without first going through the proof theory of the formal system itself (and which could be related to many other mathematically meaningful structures and theories as well).
This revelation was both gratifying and embarrassing. After writing with tiresome sarcasm about the lack of meaning in the type-free lambda-calculus introduced only formally, the author himself found an interesting kind of semantical interpretation for the “type-free” language.
The result that a universal domain can be constructed from these lattice theoretic models of the lambda calculus is described in his epochal 1971 paper Continuous Lattices.
Indeed, Scott's results continue to have a lot of influence today. This goes well beyond being a technical tool in the toolbox: the sensibility and style of thinking of he invented continues to pervade our research culture and shapes the latest methods.
The invention of the universal domain taught us that an inconsistent logic can still have a meaningful semantics. Prior to this work, it was believed that inconsistent logics were degenerate (due to everything following from false), but the D∞ model disproves that — it dramatically expands the kinds of arguments that we can model mathematically.
This is incredibly philosophically important, since you can't use mathematics to study something you don't have a mathematical model of! And since as mathematicians we spend a lot of our time making wrong arguments, it's very valuable to be able to understand them.
(As an unrelated aside, this is one of the things that I as a proof theorist most admire about finite model theory – things like conflict-driven clause learning in SAT solvers exploit the idea that being wrong can be productive in a really elegant way.)
Having a universal domain makes it possible to formalise the intuition that a type is a subset of the set of all possible values in a mathematically coherent way. Naively, this idea is problematic, but with a universal domain a la Scott the approach works ("types as retracts" in the jargon).
This approach underpins the functioning of program logics like Hoare logic (see Mellies and Zeilberger's paper Functors are Type Refinement Systems), is used to define the semantics of dependent type theories like Nuprl, and is also important for formulating the semantics of System F, the polymorphic lambda calculus.
The construction of the universal domain as the limit of a sequence of finitary approximations lets you derive the abstract domains of abstract interpretation by taking Scott's inverse limit construction and cutting off the sequence at a finite point rather than going all the way to infinity. This is important for understanding how static analysis fits into the landscape of semantics -- see David Schmidt's paper Abstract Interpretation from a Denotational Semantics Perspective.
In the domain-theoretic model, the idea of finitary information is realised by modelling a program as either giving you an answer or going into an infinite loop. If you change the idea of no-answer from looping to signalling an error (say, with an exception), then you can similarly derive gradual typing!
Max S. New's and Amal Ahmed's Graduality from Embedding-projection Pairs really nails this idea down, but it has been one of the driving intuitions since the very beginning of work on the topic. Fritz Henglein's original paper explicitly formulates the rules for casts in analogy with the embedding-projection pairs found in the inverse limit construction.
There's an important lesson here about how analogies can be productive for a long time before we give a completely formal expression of them -- Henglein wrote his paper in 1994!
Since Dana Scott has very good taste, his emphasis was on the algebraic properties of his model construction, and so he quickly moved from complete lattices to domain theory.
But this also meant that he laid out the path for people to look at other ways of obtaining universal domains. One super-important approach was pioneered by Maurice Nivat and the "Dutch school" of denotational semantics in the late 1970s and early 1980s, where instead of domains people looked at metric spaces and Banach's theorem. Bakker and Zucker's 1982 paper is still quite readable is a still-readable account of the early thinking on this topic, and it reached a mature form with the work of America and Rutten.
This technique still extremely heavily used today, except for various reasons we have renamed it "step-indexing". This is the key technical mechanism which lets us apply universal domain constructions to give a good semantics to assembly language, because the ability to pun data and code by setting the PC and jumping needs the same mathematical machinery as interpreting the untyped lambda calculus.
This point about the equivocation of code and data in assembly language holds more generally. For example, Löb's theorem can be understood as a modal variant of the derivation of the Y-combinator, illustrates how the interaction of self-reference and quotation resembles the ideas behind the D∞-style structure. This idea hasn't been worked out as nicely as New and Ahmed worked things out for gradual types, but what's a blog post without a provocation to future research?
Thinking of "typed versus untyped languages" sort of misses the point, theoretically. We can neither pull them apart, nor do we really want to. To quote Heraclitus (as translated by Kathleen Freeman):
They do not understand how that which differs with itself is in agreement: harmony consists of opposing tension, like that of the bow and the lyre.