(This is an extended blog post version of my talk “Ultraproducts as a Bridge Between Discrete and Continuous Analysis” that I gave at the Simons institute for the theory of computing at the workshop “Neo-Classical methods in discrete analysis“. Some of the material here is drawn from previous blog posts, notably “Ultraproducts as a bridge between hard analysis and soft analysis” and “Ultralimit analysis and quantitative algebraic geometry“‘. The text here has substantially more details than the talk; one may wish to skip all of the proofs given here to obtain a closer approximation to the original talk.)
Discrete analysis, of course, is primarily interested in the study of discrete (or “finitary”) mathematical objects: integers, rational numbers (which can be viewed as ratios of integers), finite sets, finite graphs, finite or discrete metric spaces, and so forth. However, many powerful tools in mathematics (e.g. ergodic theory, measure theory, topological group theory, algebraic geometry, spectral theory, etc.) work best when applied to continuous (or “infinitary”) mathematical objects: real or complex numbers, manifolds, algebraic varieties, continuous topological or metric spaces, etc. In order to apply results and ideas from continuous mathematics to discrete settings, there are basically two approaches. One is to directly discretise the arguments used in continuous mathematics, which often requires one to keep careful track of all the bounds on various quantities of interest, particularly with regard to various error terms arising from discretisation which would otherwise have been negligible in the continuous setting. The other is to construct continuous objects as limits of sequences of discrete objects of interest, so that results from continuous mathematics may be applied (often as a “black box”) to the continuous limit, which then can be used to deduce consequences for the original discrete objects which are quantitative (though often ineffectively so). The latter approach is the focus of this current talk.
The following table gives some examples of a discrete theory and its continuous counterpart, together with a limiting procedure that might be used to pass from the former to the latter:
|Ramsey theory||Topological dynamics||Compactness|
|Density Ramsey theory||Ergodic theory||Furstenberg correspondence principle|
|Graph/hypergraph regularity||Measure theory||Graph limits|
|Polynomial regularity||Linear algebra||Ultralimits|
|Structural decompositions||Hilbert space geometry||Ultralimits|
|Fourier analysis||Spectral theory||Direct and inverse limits|
|Quantitative algebraic geometry||Algebraic geometry||Schemes|
|Discrete metric spaces||Continuous metric spaces||Gromov-Hausdorff limits|
|Approximate group theory||Topological group theory||Model theory|
As the above table illustrates, there are a variety of different ways to form a limiting continuous object. Roughly speaking, one can divide limits into three categories:
- Topological and metric limits. These notions of limits are commonly used by analysts. Here, one starts with a sequence (or perhaps a net) of objects in a common space , which one then endows with the structure of a topological space or a metric space, by defining a notion of distance between two points of the space, or a notion of open neighbourhoods or open sets in the space. Provided that the sequence or net is convergent, this produces a limit object , which remains in the same space, and is “close” to many of the original objects with respect to the given metric or topology.
- Categorical limits. These notions of limits are commonly used by algebraists. Here, one starts with a sequence (or more generally, a diagram) of objects in a category , which are connected to each other by various morphisms. If the ambient category is well-behaved, one can then form the direct limit or the inverse limit of these objects, which is another object in the same category , and is connected to the original objects by various morphisms.
- Logical limits. These notions of limits are commonly used by model theorists. Here, one starts with a sequence of objects or of spaces , each of which is (a component of) a model for given (first-order) mathematical language (e.g. if one is working in the language of groups, might be groups and might be elements of these groups). By using devices such as the ultraproduct construction, or the compactness theorem in logic, one can then create a new object or a new space , which is still a model of the same language (e.g. if the spaces were all groups, then the limiting space will also be a group), and is “close” to the original objects or spaces in the sense that any assertion (in the given language) that is true for the limiting object or space, will also be true for many of the original objects or spaces, and conversely. (For instance, if is an abelian group, then the will also be abelian groups for many .)
The purpose of this talk is to highlight the third type of limit, and specifically the ultraproduct construction, as being a “universal” limiting procedure that can be used to replace most of the limits previously mentioned. Unlike the topological or metric limits, one does not need the original objects to all lie in a common space in order to form an ultralimit ; they are permitted to lie in different spaces ; this is more natural in many discrete contexts, e.g. when considering graphs on vertices in the limit when goes to infinity. Also, no convergence properties on the are required in order for the ultralimit to exist. Similarly, ultraproduct limits differ from categorical limits in that no morphisms between the various spaces involved are required in order to construct the ultraproduct.
With so few requirements on the objects or spaces , the ultraproduct construction is necessarily a very “soft” one. Nevertheless, the construction has two very useful properties which make it particularly useful for the purpose of extracting good continuous limit objects out of a sequence of discrete objects. First of all, there is Łos’s theorem, which roughly speaking asserts that any first-order sentence which is asymptotically obeyed by the , will be exactly obeyed by the limit object ; in particular, one can often take a discrete sequence of “partial counterexamples” to some assertion, and produce a continuous “complete counterexample” that same assertion via an ultraproduct construction; taking the contrapositives, one can often then establish a rigorous equivalence between a quantitative discrete statement and its qualitative continuous counterpart. Secondly, there is the countable saturation property that ultraproducts automatically enjoy, which is a property closely analogous to that of compactness in topological spaces, and can often be used to ensure that the continuous objects produced by ultraproduct methods are “complete” or “compact” in various senses, which is particularly useful in being able to upgrade qualitative (or “pointwise”) bounds to quantitative (or “uniform”) bounds, more or less “for free”, thus reducing significantly the burden of “epsilon management” (although the price one pays for this is that one needs to pay attention to which mathematical objects of study are “standard” and which are “nonstandard”). To achieve this compactness or completeness, one sometimes has to restrict to the “bounded” portion of the ultraproduct, and it is often also convenient to quotient out the “infinitesimal” portion in order to complement these compactness properties with a matching “Hausdorff” property, thus creating familiar examples of continuous spaces, such as locally compact Hausdorff spaces.
Ultraproducts are not the only logical limit in the model theorist’s toolbox, but they are one of the simplest to set up and use, and already suffice for many of the applications of logical limits outside of model theory. In this post, I will set out the basic theory of these ultraproducts, and illustrate how they can be used to pass between discrete and continuous theories in each of the examples listed in the above table.
Apart from the initial “one-time cost” of setting up the ultraproduct machinery, the main loss one incurs when using ultraproduct methods is that it becomes very difficult to extract explicit quantitative bounds from results that are proven by transferring qualitative continuous results to the discrete setting via ultraproducts. However, in many cases (particularly those involving regularity-type lemmas) the bounds are already of tower-exponential type or worse, and there is arguably not much to be lost by abandoning the explicit quantitative bounds altogether.
— 1. Basic theory —
To set up the machinery of ultraproducts (and nonstandard analysis) we will need to pick a non-principal ultrafilter on the natural numbers . By definition, this is a collection of subsets of the natural numbers that obeys the following axioms:
- (i) contains , but does not contain .
- (ii) If is in , then any subset of containing is in .
- (iii) If lie in , then also lies in .
- (iv) If , then exactly one of and lies in .
- (v) No finite set lies in .
A collection that obeys axioms (i)-(iii) is a filter, a collection that obeys (i)-(iv) is an ultrafilter, and a collection that obeys (i)-(v) is a non-principal ultrafilter. (One can certainly build ultrafilters on other sets than the natural numbers , but for the type of applications we have in mind, ultrafilters on are generally sufficient, much as how one can largely rely on sequences instead of nets when using considering topological or metric notions of a limit in concrete applications.)
If is an ultrafilter, a subset of is said to be -large if it lies in , and -small otherwise. To emphasise the limit interpretation of ultrafilters, we will also use “for sufficiently close to ” as a synonym for “for an -large set of “. Note from the above axioms that if is partitioned into finitely many components, then exactly one of these components is -large. One can thus think of an ultrafilter as a voting protocol, which converts the preferences of a countably infinite number of voters (indexed by the natural numbers) amongst a finite number of candidates into a “winner”, namely the candidate who amasses an -large set of votes; this voting protocol obeys all the hypotheses of Arrow’s theorem (rationality, independence of irrelevant alternatives, etc.) but not its conclusion (there are no dictators), due to the infinite number of voters. One can also view an ultrafilter as a finitely additive, -valued probability measure on the natural numbers, in which every subset of is measurable.
Every natural number defines a principal ultrafilter (that is, an ultrafilter that does not obey axiom (v)), defined by setting if and only if . By abuse of notation, we may identify with , so that the set of principal ultrafilters is identified with . The set of all ultrafilters (principal and non-principal) is denoted , and may be identified with the Stone-Cech compactification of the natural numbers; this space has a number of interesting structures on it (for instance, it is a left-continuous topological semigroup, which is a useful fact in Ramsey theory and topological dynamics, as discussed in this previous post), but we will not discuss these structures further here. In particular, the use of ultrafilters in establishing correspondence principles between discrete and continuous structures is unrelated to the use of ultrafilters (particularly minimal or idempotent ultrafilters) in Ramsey theory and topological dynamics that is discussed in that previous post. (However, it is certainly possible to use both types of ultrafilters in separate places for a single application; see this paper of Bergelson and myself for an example.)
The existence of a non-principal ultrafilter can be guaranteed by an application of Zorn’s lemma (basically by running the transfinite version of a greedy algorithm to build the collection of -large and -small sets):
Lemma 1 There exists a non-principal ultrafilter .
Proof: Define the Frechet filter to be the collection of all the cofinite subsets of , that is to say those subsets of whose complement is finite. This is a filter. A routine application of Zorn’s lemma shows that this filter must be contained in at least one maximal filter . If is not an ultrafilter, then there is a set which is neither -large or -small; if we then let be the filter generated by and (that is, consists of those subsets of that either lie in , or contain the intersection of with an -large set), then one checks that is a filter that is strictly larger than , contradicting the maximality of . Thus is an ultrafilter; since it contains , it is non-principal as required.
Remark 1 The above argument relied upon the axiom of choice. One can construct ultrafilters using slightly weaker set-theoretic axioms than the axiom of choice (e.g. the Boolean prime ideal theorem), but if one has no choice-like axioms whatsoever (or more precisely, one uses just the axioms ZF of Zermelo-Fraenkel set theory) then the existence of ultrafilters is undecidable. However, there is a result of Gödel that asserts that if a result is finitary in the sense that it can be phrased as a first-order statement in Peano Arithmetic, and it can be proven using the axiom of choice (or more precisely in ZFC set theory), then it can also be proven without the axiom of choice (i.e. in ZF set theory). In practical terms, this means that one can ignore the reliance on the axiom of choice for any finitary application of ultraproduct methods. Alternatively, instead of using Zorn’s lemma to construct a genuine ultrafilter, it is often possible (though messy) to build some sort of “partial” or “incomplete” ultrafilter which is still close enough to an ultrafilter to prove a desired result, but which does not need the full strength of the axiom of choice to construct. We will not discuss such methods here, but see e.g. this paper of Palmgren. (See also this previous blog post for a “cheap” version of nonstandard analysis which uses the Frechet filter rather than an ultrafilter.) In any event, we will freely rely on the axiom of choice in the rest of this post.
Henceforth, we select a single non-principal ultrafilter to use in the rest of the post. The choice of this ultrafilter is highly non-unique ( is uncountable), but for the purposes of establishing a correspondence between discrete and continuous mathematics, the precise choice of ultrafilter will not be relevant, as long as it is kept fixed throughout the discussion.
We can use the ultrafilter to form limits of sequences , even if these sequences have no limit in the topological or metric senses. For instance, if takes values in a single finite space for all natural numbers , then we can define the ultralimit to be the unique element of such that for sufficiently close to ; this element exists and is unique by the ultrafilter axioms (i)-(iv). For instance, the ultralimit of the sequence is either or , with the former occurring if the even numbers are -large, and the latter occurring if the odd numbers are -large. So the ultralimit here depends on , but once is fixed, the ultralimit is well-defined. One can also view as the outcome of an “election” among candidates in , in which each “voter” has as a preferred candidate, and the ultrafilter is used as the election protocol. (See this previous post for further discussion of this viewpoint.)
Now we consider the more general situation in which the to not lie in a single finite space , but instead each lies in a space which may depend on , and may also be infinite. Here, there does not necessarily exist a single object that is equal to the for sufficiently close to . The situation here is similar to that in metric spaces, in that a Cauchy sequence in a metric space need not be convergent if the space is infinite. In the latter case, the problem can be resolved by constructing the metric completion of , which can be defined (slightly non-rigorously) as the space of formal limits of Cauchy sequences , with two formal limits , declared to be equal if converges to zero (in the classical sense) as . To construct this Cauchy completion rigorous within the framework of set theory, one can define to be the set of equivalence classes of tuples of Cauchy sequences in , with the equivalence relation holding if and only if converges to zero as . In order to view as a subset of its completion , one usually identifies each point with the equivalence class of the corresponding constant sequence . With this construction, one can for instance build the real numbers as the metric completion of the rationals (and this is indeed one of the most popular ways to construct the reals, which is often covered in undergraduate analysis classes). Note though while one can use this set-theoretic machinery of equivalence classes of Cauchy sequences of rationals to construct the reals, for the purposes of using the real numbers for mathematical applications, this set-theoretic construction is not relevant; the only thing one needs to retain is that Cauchy sequences of rationals (or of reals) will converge to a real, and conversely every real is the limit of a Cauchy sequence of rationals.
We can take a similar tack to construct ultralimits of sequences in different spaces . If one is willing to be slightly non-rigorous, the definition is almost the same: we can view these ultralimits as formal objects, with two ultralimits and declared to be equal if and only if for sufficiently close to . To do things rigorously within the framework of set theory, one has to take a little extra care to avoid the set-theoretic issues associated to Russell’s paradox (or more precisely, avoiding the uses of proper classes that are too large to be sets). One way to deal with this is as follows. One can postulate the existence of a standard universe that contains all the objects of interest for one’s particular mathematical problem: for instance, if one is interested in a sequence of groups , then might contain all the elements of all of the . The only requirements we make of this standard universe are that (a) it is a set (rather than a proper class), and (b) it is fixed throughout the mathematical discussion. The former requirement means that , in general, cannot contain all spaces of a particular type; for instance, one cannot make contain all groups, all vector spaces, all graphs, etc., as one would soon run into Russell-type paradoxes if this occurred. However, in practice, for any given mathematical problem (other than those arising within mathematical logic), one usually does not need to work simultaneously with all spaces; usually a much smaller number of spaces, e.g. a countable or uncountable family of such spaces, will suffice. Henceforth we will take the existence of a standard universe for granted. Elements of this universe will be referred to as standard objects, and subsets of the universe as standard spaces; for instance, we will usually place the classical number systems , etc. inside this standard universe, so elements of become standard natural numbers, elements of become standard reals, and so forth. We then have:
Definition 2 (Formal definition of ultralimits and ultraproducts) Let be a sequence of standard objects, defined for an -large set (thus for all ). The ultralimit is then defined to be the equivalence class of tuples of standard objects defined on an -large set , such that for sufficiently close to in the common domain of definition. Such an ultralimit is called a nonstandard object, and the set of all ultralimits is called the nonstandard universe . We identify every standard object with the nonstandard object , thus embedding in .
Given a sequence of standard spaces (i.e. subsets of ) defined for sufficiently close to , the ultraproduct is defined as the set of all ultralimits , where lies in for an -large set of . Such an ultraproduct (which is a subset of ) will be called a nonstandard space (also known as an internal space). Similarly, the ultraproduct of a sequence of standard groups will be called a nonstandard group (or internal group), the ultraproduct of a sequence of standard fields will be called a nonstandard field (or internal field), and so forth.
If is a standard space, the nonstandard space (that is, the space of ultralimits of sequences in ) will be called the ultrapower of and is denoted ; note that embeds as a subset of . The ultrapower of the standard natural numbers will be referred to as the nonstandard natural numbers, the ultrapower of the standard reals will be referred to as the nonstandard real numbers (also known as the umber), and so forth.
One can verify that when is finite, the ultrapower is equal to itself, and that the notion of an ultralimit in the case when all the lie in corresponds to the notion of an ultralimit defined previously.
The basic ontological Venn diagram is this: the standard universe embeds into the larger nonstandard universe , and both exist inside the even larger “external universe” of ZFC set theory, which we will not give a name to (it is not a set, but is instead a proper class). This allows us to study various mathematical objects on three different levels: standard, nonstandard, and external. For instance, we have standard groups (groups in ) and nonstandard groups (ultraproducts of groups in ); both are examples of external groups (groups that are somewhere in the external universe), but there are certainly examples of external groups that are neither standard groups nor nonstandard groups. It is also worth cautioning that while we identify standard objects with their nonstandard counterparts , we do not identify standard spaces with their nonstandard counterparts ; in particular, we do not view standard groups as examples of nonstandard groups (except in the finite case), nor do we view standard fields as examples of nonstandard fields, etc. Instead, one can view the nonstandard space as a “completion” of the standard space , analogous to metric completion; see this previous blog post for further discussion of this perspective. A related perspective is to view as a double-precision version of , so for instance can be viewed as the space of “double-precision real numbers” as opposed to the “single-precision” real numbers in , or similarly may be viewed as a space of “long integers“, with being a space of “short integers”.
The analogues of standard groups, nonstandard groups, and external groups in finitary mathematics would be “groups that do not depend on an asymptotic parameter “, “groups that may depend on an asymptotic parameter “, and “informal group-like objects (such as the “group” of all “bounded” integers ) that may require asymptotic notation to define” respectively. The latter concept is hard to pin down rigorously in finitary mathematics, even though the intuition it captures (e.g. that the “set” of “bounded” integers should be closed under addition) is fairly clear; but once one uses ultraproducts to separate the three levels of standard, nonstandard, and external objects, it is possible to make rigorous sense of these sorts of intuitions, as we shall see later.
Given a sequence of standard functions between standard sets for an -large set of , we define the ultralimit of this sequence to be the function from the nonstandard space to the nonstandard space defined by the formula
for all sequences that lie in for an -large set of . It is easy to see that this ultralimit is well-defined; we refer to such functions as nonstandard functions.
From the above construction, we may now transfer operations and relations on standard spaces to their nonstandard counterparts. For instance, if is a sequence of standard groups, then one can take the ultralimit of the multiplication operations to obtain a multiplication operation on the ultraproduct ; similarly, the inversion operation on gives rise to an inversion operation on . Similarly, the arithmetic operations on the standard natural numbers can be transferred to arithmetic operations on the nonstandard natural numbers , and properties and relations on the natural numbers can similarly be transferred. For instance, a nonstandard number can be said to be prime if the are prime for an -large set of (i.e. the ultralimit of the truth value of “ is prime” is true); one nonstandard number can be said to be larger than another if one has for an -large set of ; and so forth.
One can then check by hand that statements that are true for standard objects are also true for their nonstandard counterparts. For instance, because the group operations of the standard groups are associative, the group operation on the nonstandard group is also associative, and in fact will also be a group (as viewed externally). Similarly, addition and multiplication in the nonstandard natural numbers obey the usual commutative, associative, and distributive laws, because these transfer from the corresponding laws for the standard natural numbers . For a more topical example, it is now a theorem that given any standard natural number , there exist standard primes such that ; it is an instructive exercise to check that the same result then automatically holds in the nonstandard setting, thus for every nonstandard natural number there exist nonstandard primes such that .
These facts are special cases of the transfer principle, and can be formalised by Łos’s theorem. Informally, this theorem asserts that if an “elementary” sentence holds for (an -large subsequence of) a sequence of standard objects, then it also holds in the ultralimit, and conversely. To make this assertion rigorous, we need some of the notation from first-order logic; this is somewhat formal and the reader may wish to skim briefly over the definitions and proofs that follow. To simplify the discussion, we restrict attention to one-sorted languages, although in practice one often works with many-sorted languages instead.
Definition 3 (Languages and structures, formal definition) A signature is a collection of formal operation symbols and relation symbols , each of which is assigned an arity (a non-negative integer). (For instance, the signature of the language of groups would consist of the -ary operation , the -ary operation , and the -ary operation .) -ary operations are also known as constant symbols. Given a signature and an infinite supply of variable symbols , define a term to be a formal string generated by the following rules:
- Every variable symbol is a term.
- If is a -ary operation and are terms, then is a term.
By abuse of notation, we often move operation symbols to more traditional locations, for instance writing as , as , etc.. By further abuse of notation, parentheses are usually omitted when there is no loss of ambiguity, e.g. is often abbreviated as .
A formula involving some finite number of distinct variable symbols (known as the free variables of the formula) for some is then a formal string generated by the following rules:
- If is a -ary relation and are terms which only involve variables from , then is a formula of .
- If are terms only involving variables from , then is a formula with free variables .
- If are formulae with free variables , then , and are formulae with free variables .
- If is a formula with free variables (or some permutation thereof), then and are formulae with free variables .
A formula with no free variables is known as a sentence. As before, we abuse notation by moving relation symbols to more traditional locations, e.g. writing as , and by removing parentheses whenever there is no loss of ambiguity; we also permit concatenation of quantifiers, for instance abbreviating as . Thus, for instance, in the language of the natural numbers,
is a formula of one variable , and
is a sentence (which, in this case, encodes the even Goldbach conjecture).
We refer to the collection of such formulae as the first-order language (or language, for short) associated with the given signature (and to the supply of variable symbols).
A structure for a first-order language is a set , together with a map for every -ary operation , and a boolean relation for every -ary relation . (For instance, a group is a structure for the first-order language of groups.) Any term of variables may be interpreted as a function by viewing the variable symbols as ranging in , and replacing each operation or relation with their respective interpretations . Similarly, any formula may be interpreted as a function , and in particular any sentence is interpreted to be either true or false (and we write if the former occurs), by interpreting quantifiers such as and as quantification over . (For instance, is interpreted as true for if there exists such that is true.)
Given a sequence of standard structures for a given first-order language , we can form the ultraproduct in the obvious manner, with the nonstandard interpretations of the operator and relation symbols being the ultralimits of their standard counterparts. We refer to this ultraproduct as a nonstandard structure for the first-order language ; both standard structures and nonstandard structures are examples of external structures, but as remarked before we do not consider standard structures as examples of nonstandard structures (except when the structure is finite).
Theorem 4 (Łos’s theorem) Let be a sequence of standard structures for a first-order language , and let be the associated nonstandard structure.
- (i) If is a term involving the variables , then is the ultralimit of the .
- (ii) If is a formula with free variables , then is the ultralimit of .
- (iii) If is a sentence, then is true if and only if is true for sufficiently close to .
Informally, this theorem asserts that first-order logic is continuous with respect to ultralimits.
Proof: The claim (i) is obtained by a routine induction on the length of the term and the ultrafilter axioms, while the claim (iii) is a consequence of (ii). The claim (ii) is similarly obtained by a routine induction on the length of the formula and the ultrafilter axioms; the only non-trivial claim to verify is that if the formula is the ultralimit of the formulae , then the formula is the ultralimit of the formulae , and similarly for the universal quantifier .
We just prove the claim for the existential quantifier , as the claim for is similar (and can be deduced from the existential case by rewriting in the equivalent form . Suppose first that for are such that holds for sufficiently close to ; by the axiom of choice, we may thus find such that holds for sufficiently close to . Taking ultralimits and using the induction hypothesis, we conclude that where , so that is true. Conversely, suppose that fails for sufficiently close to ; then for any in , fails for sufficiently close to , and hence on taking ultralimits fails also, and the claim follows.
We now give a classic consequence of Łos’s theorem to logic, namely the countable case of the compactness theorem:
Corollary 5 (Compactness theorem, countable case) Let be a first-order language, and let be a sequence of sentences. Suppose that for each natural number , there exists a structure for such that are satisfied by this structure. Then there exists a structure such that are simultaneously satisfied by this structure.
The general case of the compactness theorem may be established by using ultraproducts on more general sets than the natural numbers; we leave the verification of this fact to the reader.
Proof: We select the standard universe to contain all of the , so that the are all standard structures. If we then set to be the nonstandard structure , the claim follows from Łos’s theorem.
— 2. Rank decompositions —
We can now give a simple example of how ultralimits may be used to connect quantitative finitary statements to qualitative infinitary statements, by giving an easy case of the polyomial regularity lemma, introduced by Ben Green and myself. Given a finite-dimensional vector space over a field and a natural number , we define a polynomial of degree at most on to be a function of the form
for any basis of and some coefficients ; it is easy to see that the choice of basis is irrelevant for this definition. If is infinite dimensional instead of finite dimensional, we say that is a polynomial of degree at most if its restriction to any finite subspace is still a polynomial of degree at most . The space of such polynomials will be denoted ; this is itself a vector space over .
If for some , we define the rank to be the least integer such that we can find such that is a function of the , that is to say there exists a function such that for all . When is finite-dimensional, the rank is necessarily finite (just take to be the coordinate functions), but the rank may be infinite when is infinite dimensional. This notion of rank is analogous to the notion of the rank of a quadratic form (which is essentially the case of the above notion).
We then have the following simple fact:
Lemma 6 (Polynomial regularity lemma) Let be a non-negative integer, and let be a function. Then there exists a constant with the following properties: for any , any vector space over a field , and any polynomials , there exists polynomials for some and a quantity with the following properties:
- (i) (Representation modulo low rank) For each , there exists a representation , where and .
- (ii) (Independence modulo low rank) For every , not all zero, we have .
(In applications, one actually needs an iterated version of this lemma, in which the degree polynomials involved in are themselves regularised in terms of high rank or lower degree polynomials, and so on and so forth down to the linear polynomials; for simplicity we will not discuss this iterated version here, but see the above-mentioned paper of Ben and myself for more discussion.)
The standard proof of this lemma is not difficult, and can be described by the following algorithm:
- (Step 0) Initialise , for , and .
- (Step 1) If property (ii) holds then STOP. Otherwise, move on to Step 2.
- (Step 2) By the failure of (ii), and by permuting the if necessary, we may write as a linear combination of , plus a polynomial of rank at most . If we then delete , we see that every is still a linear combination of , plus an error of rank at most .
- (Step 3) Replace by (thus deleting ), replace by , and return to Step 1.
Since decreases by one at each loop of the algorithm, it loops at most times, and the claim follows.
The above quantitative lemma can be deduced instead from the following basic fact in linear algebra, a variant of the Steinitz exchange lemma:
Indeed, one just sets to be a basis of the linear span of the . The can also be constructed from the by an algorithm extremely similar to the one given above; we leave this to the reader as an exercise.
Corollary 8 (Qualitative polynomial regularity lemma) Let be a non-negative integer. Then for any , any vector space over a field , and any polynomials , there exists polynomials for some with the following properties:
- (i) (Representation modulo low rank) For each , there exists a representation , where and .
- (ii) (Independence modulo low rank) For every , not all zero, we have .
Proof: Let be the set of all polynomials in of finite rank; this is clearly a subspace of , so we may form the quotient space , which is still a vector space over . If we then apply Lemma 7 to the projection of the to this quotient space , we obtain the claim.
While Corollary 8 certainly looks very similar to Lemma 6, it appears at first glance that it cannot be used to prove Lemma 6, because the latter lemma has meaningful content when is finite-dimensional, whereas the former Corollary is trivial in that setting. However, this can be rectified by passing from the finitary setting to the infinitary setting. This can be done in a number of ways, but we will do this using the ultrafilter machinery just developed.
Proof: (Proof of Lemma 6 using Corollary 8.) We use the “compactness and contradiction” method. Suppose for contradiction that Lemma 6 failed. Carefully negating the quantifiers, and using the axiom of choice, this means that there exists and , with the property that for each , we can find a vector space over a field and polynomials , such that for any , there does NOT exist , , obeying the properties (i), (ii) of Lemma 6 with the given data.
We may place all of the objects just constructed in a standard universe . Taking ultralimits (and using Łos’s theorem repeatedly), we obtain a nonstandard field , a nonstandard vector space over that field, polynomials , with the property that for any standard integer , there does NOT exist , , obeying the properties (i), (ii) of Lemma 6 with the given data. However, if one applies Corollary 8, we can find obeying the conclusions (i), (ii) of Corollary 8. This implies the conclusions (i), (ii) of Lemma 6 if is chosen large enough, giving the required contradiction.
— 3. Saturation and colouring —
Ultraproducts enjoy a very useful compactness-type property, known as countable saturation (or more precisely, -saturation). This property may be phrased in many ways, but we will use the following version which emphasises the analogy with topological compactness.
Let be a nonstandard set, thus for some standard sets . We call an internal subset or nonstandard subset of to be any subset of of the form ; by Łos’s theorem we see that for sufficiently close to . For instance, in the nonstandard natural numbers , the nonstandard intervals are internal sets for any nonstandard natural number ; indeed, if , then is the ultraproduct of the standard intervals . More generally, from Łos’s theorem we see that any subset of that is defined by a first-order predicate is an internal set. On the other hand, as we shall see later, the standard natural numbers are not an internal subset of (and are thus merely an external subset of ).
From Łos’s theorem we see that the internal subsets of a nonstandard set form a boolean algebra. We then have the following compactness property:
Proposition 9 (Countable saturation) Every countable cover of by internal sets has a finite subcover. Equivalently, if are a countable sequence of internal sets such that is non-empty for each , then is also non-empty.
We remark that if we had used ultraproducts on larger index sets than the natural numbers, then we would be able to obtain stronger saturation properties than countable saturation, although it is never possible to get unlimited saturation (since every point in a nonstandard set is an internal set). In model theoretic applications it can be technically convenient to work with an extremely saturated model (e.g. a structure which is saturated up to an inaccessible cardinal), but for most applications outside of logic and set theory, such “monster models” are not necessary, and the more modest device of ultraproducts over countable index sets is often sufficient.
Proof: It suffices to prove the second claim, as the former follows by taking complements.
Write , then for each , is the ultraproduct of . In particular, is non-empty for all in an -large subset of the natural numbers. By shrinking the as necessary we may assume that they are monotone, thus , and such that does not contain any natural number less than ; in particular, . For any , let denote the largest natural number such that , and then let denote an element of . Then by construction, for each , we have for all , and thus the nonstandard object lies in for all , and thus is non-empty as required.
This shows for instance that (as claimed previously) is not an internal subset of , since it can be countably covered by the singleton sets for which are internal, but for which no finite cover exists. Among other things, this establishes the overspill principle: if a nonstandard formula holds for all standard natural numbers, then it must hold for at least one unbounded natural number (defined as a nonstandard natural number that is not a standard natural number).
Another immediate corollary of the above proposition is that if a nonstandard set is covered by a countable sequence of internal subsets, then the topology on generated by is compact (because it has a countable base of internal susbets, and Proposition 9 then gives countable compactness).
As an application of this compactness observation, we use ultraproducts to illustrate the topological dynamics version of the Furstenberg correspondence principle, that connects Ramsey theory to topological dynamics. More precisely, we consider the following results:
Theorem 11 (Topological multiple recurrence) Let be a non-empty compact topological space, and let be a homeomorphism. Then for any open cover of and any , there exists an open set in this cover, a point , and a positive integer such that .
It is not difficult to use Theorem 10 to establish Theorem 11. Conversely, we will use ultraproducts to show that Theorem 11 implies Theorem 10. (We will not prove either of these theorems here, but see e.g. this previous blog post for a proof.)
Again, we use compactness and contradiction. Suppose for contradiction that Theorem 10 failed. Then there exists such that for any standard , we may partition into colour classes , none of which contain any -term arithmetic progressions. Taking ultraproducts, we obtain a partition of the nonstandard interval associated to the unbounded natural number into internal colour classes , none of which contains a nonstandard -term arithmetic progression.
Now consider the shifted colour classes for and . These are a countable family of internal sets that cover , and so by countable saturation, this gives the structure of a compact topological space.
We would like to use the shift map on . Unfortunately, this map is not quite a bijection on . To get around this, we work in the slightly smaller space
A further application of countable saturation reveals that is still compact (with the induced topology from ). Furthermore, the shift map can be verified to be a homeomorphism on , and as is unbounded, we see from the overspill principle that is non-empty. It is covered by the open sets , so by Theorem 11, we can find and a (standard) positive integer such that lie in a single , thus contains a non-standard -term arithmetic progression, giving the required contradiction.
— 4. Algebraic geometry and ultraproducts —
Another use of ultraproducts is to provide a bridge between quantitative algebraic geometry and qualitative algebraic geometry. This topic is discussed in detail in this previous post; we will just give some key results here without proof.
Theorem 12 (Algebraic geometry is continuous with respect to ultralimits) Let be a sequence of algebraically closed fields, let be a natural number, and let be a (standard) affine algebraic set, that is to say the solution set of some finite number of polynomials . We define the complexity of an algebraic set to be the least such that is cut out by at most polynomials, each of degree at most .
Then the nonstandard set is an algebraic set if and only if the complexity of the are uniformly bounded for sufficiently close to . Furthermore, if is indeed an algebraic set, then:
A similar statement holds for constructive sets: an ultraproduct of constructive sets is constructive if and only if the original sets have uniformly bounded complexity for sufficiently close to . These claims are proven in the previous blog post; the basic idea is to express basic concepts such as dimension, irreducibility, and degree in first-order terms, so that Łos’s theorem may be applied.
Using this theorem and the compactness and contradiction method, one can easily convert a number of qualitative algebraic results into quantitative ones. For instance, it is a basic fact in qualitative algebraic geometry that every algebraic set decomposes into a finite number of irreducible sets. Using the above theorem (and a relationship between degree and complexity), one can then show that the number of such sets is bounded by a quantity depending only on the degree of the original set (and in particular, uniform in the choice of field); again, see the previous blog post for details. Such quantitative bounds can be obtained by other means (for instance, by using the machinery of schemes), but the ultrafilter approach, being so general in nature, requires almost no actual algebraic geometry to set up and use.
— 5. Metric completion and Gromov-Hausdorff limits —
We previously gave an analogy between ultraproducts and metric completion. It turns out that this analogy can be made substantially more precise. We first need some basic nonstandard analysis notation. Call a nonstandard real number bounded if one has for some standard , and infinitesimal if one has for all standard . For instance, if is an unbounded natural number, then is bounded (but not standard), and is infinitesimal. We write and to denote the assertions that is bounded and infinitesimal respectively.
Proof: Uniqueness is clear (as no non-zero standard real can be infinitesimal). For existence, we consider the “Dedekind cut” , which is a non-empty bounded half-line in . Taking to be the supremum of this cut, we easily verify that for any standard , and the claim follows.
One can also prove this theorem by the usual Bolzano-Weierstrass argument, trapping in a nested sequence of dyadic standard intervals; we leave this argument to the interested reader.
One can view the standard part operation algebraically, as a homomorphism from the bounded nonstandard reals to the standard reals with kernel ; note that the sets are perfectly well-defined as an (external) set in the nonstandard analysis formalism, even though it does not quite make rigorous set-theoretic sense in finitary mathematics (one cannot use finitary asymptotic notation such as inside set-builder notation). Thus for instance we can say that is an ideal in the ring , which is an assertion which is intuitively obvious in the finitary setting, but a bit hard to formalise properly. Note that can then be interpreted as the image under the standard part map of the bounded nonstandard rationals . This can in fact be used to give an alternative definition of the real numbers, which is the nonstandard version of the Cauchy completion construction; we leave this construction to the reader.
We can perform similar constructions with other metric spaces. Actually, in order to have a reaonable notion of boundedness, it is convenient to work in the category of pointed metric spaces , that is to say a metric space with a preferred “origin” . If one has a sequence of standard pointed metric spaces, one can take ultralimits and create a nonstandard pointed metric space . Here, is the ultraproduct of the , is a point in , and is the ultralimit of the .
At present, is a nonstandard metric space, but not an external metric space, because the nonstandard distance between two points in is a nonstandard real, and not necessarily a bounded one. To create an external metric space, we cut down the size of in two different ways. Firstly, we restrict attention to the bounded portion of . Next, we place an equivalence relation on by declaring if . The quotient space , which we will denote suggestively as can then be checked to be a metric space with metric given by the formula
whenever , with corresponding equivalence classes . The quotient map will be called the standard part map, as it generalises the map in Proposition 13.
We refer to the pointed external metric spaces as the nonstandard hull of the standard pointed metric spaces ; it is also known as the ultralimit of these spaces, although we avoid this terminology here as we are using the term “ultralimit” for a slightly different (although certainly related) operation. These spaces are automatically complete, as can easily be verified using countable saturation. If the spaces are asymptotically uniformly locally totally bounded in the sense that for every there exists such that for sufficiently close to , one can cover the balls by at most balls of radius , then the nonstandard hull is also locally totally bounded by Łos’s theorem, and thus locally compact by the Heine-Borel theorem. Furthermore, it is not difficult to show in this case that the converge along to in the pointed Gromov-Hausdorff sense, thus for every , the Gromov-Hausdorff distance between and is less than for sufficiently close to . In particular, this demonstrates the basic fact that any sequence of asymptotically uniformly locally totally bounded pointed metric spaces has a convergent subsequence in the pointed Gromov-Hausdorff sense.
There are several interesting special cases of the nonstandard hull construction. For instance, if the spaces are equal to a single locally totally bounded space, then the nonstandard hull becomes a metric completion of , with every bounded sequence in giving rise to a limit point in . When the metric spaces are normed vector spaces, in which nonstandard analysis can be used as a framework to describe the theory of concentration compactness; see this previous blog post for further discussion. Finally, if one starts with a finitely generated group with a word metric , then by taking the nonstandard hull of with a rescaled word metric for a well-chosen set of radii , one can obtain a useful metric space known as the asymptotic cone of , which for instance can be used to establish Gromov’s theorem on groups of polynomial growth; see this paper of van den Dries and Wilkie for details. Related to this, one can take the non-standard hull of a sequence of approximate groups (and the groups generated by such approximate groups) to obtain open neighbourhoods of locally compact groups, allowing one to use the theory of Hilbert’s fifth problem (which classifies the latter) to study approximate groups; see this text of mine for more discussion.
— 6. Loeb measure and regularity —
Measure theory, with its emphasis on countable additivity, is not obviously a first-order theory, and so it is not a priori obvious that it interacts well with ultraproducts. Nevertheless, thanks primarily to the countable saturation property of ultraproducts, one can often obtain a good notion of measure on nonstandard spaces, by carefully taking the ultralimit of measures on standard spaces. We give a fundamental such construction here, namely the Loeb measure construction, although for simplicity we restrict attention to the setting of nonstandard finite sets (also known as hyperfinite sets) to avoid some technicalities. Loeb measures are a special case of the more general concept of a Kiesler measure in model theory, but we will not discuss this more general concept here.
Let be a sequence of standard non-empty finite sets, so that the ultraproduct is a nonstandard non-empty finite set. On each of the , we can define the uniform probability measure , which assigns the measure to each subset of . Of course, is a finitely additive probability measure. Taking ultraproducts and then taking standard parts, we can then define a finitely additive probability measure on the boolean algebra of internal subsets of , by the formula
One easily verifies that this is a finitely additive probability measure. Furthermore, thanks to the countable saturation property, we see that is a premeasure, that is to say that one has the countable additivity property
whenever are a family of disjoint internal subsets of whose union is also an internal subset. Applying the Caratheodory extension theorem (or more precisely, the Hahn-Kolmogorov theorem), we can then extend to a countably additive probability measure on the -algebra generated by ; we refer to as a Loeb space, with being the Loeb probability measure and being the Loeb -algebra.
Remark 2 One technical caveat: in general, the Loeb -algebra will not be countably generated, and so in particular the Lebesgue spaces associated to Loeb spaces will not be separable. However, in most finitary applications, one can often work with countably generated sub--algebras of the Loeb -algebra, in which case separability can be restored.
The relationship between and is closely analogous to that between the elementary measure (or Jordan measure) of elementary subsets of (say) the unit cube, and that of Lebesgue measure of Borel subsets of that cube; see e.g. this text of mine for the basic theory here. For instance, it is not difficult to show that every Loeb measurable set is “almost internal” in the sense that for any standard , one can find an internal set which is -close to in the sense that the outer measure
of the symmetric difference is at most . Related to this, the Loeb measure and the outer measure agree on any Loeb measurable set, and the simple internal functions on (i.e. finite linear combinations of indicator functions of internal sets) will be dense in for any (standard) .
For instance, if is an unbounded natural number, then the set is not an internal subset of , but it is the intersection of the countable family of internal sets for . Each of the internal sets has measure , and so is a Loeb measurable set with Loeb measure zero. As another example, for any standard and , the residue class is an internal subset of of Loeb measure exactly (regardless of what the remainder of is modulo ).
As a quick application of Loeb measure, we give an instance of the Furstenberg correspondence principle, closely analogous to the topological dynamics example from a few sections earlier. Specifically, we connect the following two theorems:
Theorem 15 (Furstenberg multiple recurrence) Let be a probability space, and let be an almost everywhere defined bijection that is measure-preserving. Then for any set of positive measure and any , there exists a point , and a positive integer such that .
Again, it is not difficult to show that Theorem 14 implies Theorem 15. To show the converse, we again use the compactness and contradiction method. If Theorem 14 failed, then there exist and , and a sequence going to infinity, together with subsets of density at least which contain no -term arithmetic progressions. Taking ultralimits, we obtain an unbounded natural number and an internal subset of Loeb measure at least which contains no non-standard -term arithmetic progressions. The shift map is defined and bijective almost everywhere on , and is measure-preserving, and Theorem 15 then provides the required contradiction. (As remarked earlier, this probability space is likely to be inseparable, but one can make it separable simply by reducing to the -algebra generated by and its (standard) shifts .)
Remark 3 The ultralimit construction actually gives a lot more relevant structure on than just that of a measure-preserving system. For instance, one can view the space of parallelograms as a special subset of , which one can also endow with a Loeb measure, and these spaces (as well as higher-dimensional analogues) which can be used to analyse the further additive structure of (in a manner that closely parallels the structural theory of Host and Kra on measure-preserving systems); see this paper of Szegedy for some details of this approach.
The above example already shows the power of basic measure-theoretic tools (such as the Hahn-Kolmogorov theorem) in connecting discrete and continuous results together. Now we turn to some applications of other basic measure theory tools, such as product measure and conditional expectation. Specifically, we will use ultraproducts and measure theory to prove the following “strong” form of the Szemerédi regularity lemma:
- (Structure) takes values in , and there exists a partition such that is constant on for all .
- (Smallness) takes values in , and .
- (Pseudorandomness) For any , one has .
The connection between the regularity lemma and nonstandard measure theory was first obtained by Elek and Szegedy, in which the extension of the arguments here to hypergraphs was also obtained.
We will deduce Lemma 16 from the following infinitary version:
- (Structure) takes values in , and there exists a partition into finitely many internal sets such that is constant on for all .
- (Smallness) takes values in , and .
- (Pseudorandomness) For any measurable , one has .
We will prove Lemma 17 shortly, but let us first see how it implies Lemma 16. We once again use the compactness and contradiction method. If Lemma 16 fails, then there exists and , and for each there exists a finite non-empty set and a function , such that for no does there exist a decomposition
obeying the conclusions of Lemma 16. We then take ultralimits to obtain a nonstandard finite non-empty set and an internal function . The standard part is then a Loeb measurable function on taking values in . We apply Lemma 17 to obtain a decomposition
with the stated properties for some finite . Note that is already a simple internal function. The function need not be simple internal, but may be approximated to arbitrary accuracy in by a simple internal function. Thus we may find a decomposition
where is simple internal and is such that
for all measurable , and is simple internal, takes values in , and is such that
By Łos’s theorem, an analogous decomposition then holds for for sufficiently close to , but this contradicts the construction of the .
Now we prove Lemma 17. The proof hinges on a comparison between two spaces, the Loeb space
on the product space , and the subtly different probability space
that is the product of the Loeb space with itself, thus is the -algebra on generated by rectangles with Loeb measurable in , and is the product measure, which is the unique probability measure on such that
for all Loeb measurable .
It is not difficult to show that the former probability space is a refinement of the latter, thus contains , and and agree on their common domain of definition. However, the former space is larger in general; the basic reason for this is for large finite spaces , it is possible to find subsets of that are not well approximable by bounded boolean combinations of rectangles (e.g. random subsets of will have this property with high probability in the asymptotic limit when goes to infinity). Despite this disparity, we may still form the conditional expectation of , which one can define as the orthogonal projection of in the Hilbert space to the closed subspace . We then have a decomposition
where is orthogonal to all -measurable functions; in particular, it is orthogonal to for any Loeb measurable , so that
for all such .
Next, we know that is generated (up to null sets) by products of internal sets, so we may approximate to arbitary error in by a simple function that is a finite linear combination of indicators of rectangles; as takes values in , we can ensure that and . This gives a decomposition
and one easily verifies that all the required properties of Lemma 17 are obeyed.
Remark 4 It is instructive to deconstruct this proof and compare it with the standard “energy increment” proof of the regularity lemma. The energy increment process is now concealed within the standard proof of the existence of orthogonal projections in Hilbert spaces (see e.g. Proposition 1 of this previous blog post), so on some level the two proofs of the regularity lemma are essentially equivalent. However, the ultrafilter proof allows one to hide many of the ingredients of the argument into existing Hilbert space theory.
Remark 5 A similar construction allows one to extract a graph limit from a sequence of finite graphs , by first forming the ultraproduct and then the conditional expectation , which is an -measurable function from to . At this point we encounter the technical issue noted previously that is not countably generated. However, we may find a countably generated sub--algebra of with the property that is equal -almost everywhere with a -measurable function , by expressing as the almost everywhere limit of simple functions (bounded linear combinations of rectangles with ) and then letting be the -algebra generated by the . The function is then essentially a graphon, after lifting the separable probability space to the unit interval with uniform Lebesgue measure in the standard fashion. Among other things, this construction can give an alternate proof that every sequence of graphs has a subsequence converging to a graphon.
Remark 6 Although it is not needed here, one convenient fact about Loeb spaces on products is that they obey the following Fubini-Tonelli type theorem: if are nonstandard finite spaces and is Loeb measurable on , then for all , the function is Loeb measurable on , the integral is Loeb measurable on , and we have the identity
Similarly if the roles of and are swapped; in particular we have the familiar identity
These identities are obvious when is an internal simple function (as it follows from the corresponding claim in the finite case), and the general case follows from a standard approximation argument (using the monotone convergence theorem repeatedly). One can also obtain an analogous claim in the absolutely integrable category, in the spirit of Fubini’s theorem; we leave this to the interested reader.