You are currently browsing the tag archive for the ‘nonstandard analysis’ tag.
Many structures in mathematics are incomplete in one or more ways. For instance, the field of rationals or the reals are algebraically incomplete, because there are some non-trivial algebraic equations (such as in the case of the rationals, or in the case of the reals) which could potentially have solutions (because they do not imply a necessarily false statement, such as , just using the laws of algebra), but do not actually have solutions in the specified field.
Similarly, the rationals , when viewed now as a metric space rather than as a field, are also metrically incomplete, beause there exist sequences in the rationals (e.g. the decimal approximations of the irrational number ) which could potentially converge to a limit (because they form a Cauchy sequence), but do not actually converge in the specified metric space.
A third type of incompleteness is that of logical incompleteness, which applies now to formal theories rather than to fields or metric spaces. For instance, Zermelo-Frankel-Choice (ZFC) set theory is logically incomplete, because there exist statements (such as the consistency of ZFC) which could potentially be provable by the theory (because it does not lead to a contradiction, or at least so we believe, just from the axioms and deductive rules of the theory), but is not actually provable in this theory.
A fourth type of incompleteness, which is slightly less well known than the above three, is what I will call elementary incompleteness (and which model theorists call the failure of the countable saturation property). It applies to any structure that is describable by a first-order language, such as a field, a metric space, or a universe of sets. For instance, in the language of ordered real fields, the real line is elementarily incomplete, because there exists a sequence of statements (such as the statements for natural numbers ) in this language which are potentially simultaneously satisfiable (in the sense that any finite number of these statements can be satisfied by some real number ) but are not actually simultaneously satisfiable in this theory.
In each of these cases, though, it is possible to start with an incomplete structure and complete it to a much larger structure to eliminate the incompleteness. For instance, starting with an arbitrary field , one can take its algebraic completion (or algebraic closure) ; for instance, can be viewed as the algebraic completion of . This field is usually significantly larger than the original field , but contains as a subfield, and every element of can be described as the solution to some polynomial equation with coefficients in . Furthermore, is now algebraically complete (or algebraically closed): every polynomial equation in which is potentially satisfiable (in the sense that it does not lead to a contradiction such as from the laws of algebra), is actually satisfiable in .
Similarly, starting with an arbitrary metric space , one can take its metric completion ; for instance, can be viewed as the metric completion of . Again, the completion is usually much larger than the original metric space , but contains as a subspace, and every element of can be described as the limit of some Cauchy sequence in . Furthermore, is now a complete metric space: every sequence in which is potentially convergent (in the sense of being a Cauchy sequence), is now actually convegent in .
In a similar vein, we have the Gödel completeness theorem, which implies (among other things) that for any consistent first-order theory for a first-order language , there exists at least one completion of that theory , which is a consistent theory in which every sentence in which is potentially true in (because it does not lead to a contradiction in ) is actually true in . Indeed, the completeness theorem provides at least one model (or structure) of the consistent theory , and then the completion can be formed by interpreting every sentence in using to determine its truth value. Note, in contrast to the previous two examples, that the completion is usually not unique in any way; a theory can have multiple inequivalent models , giving rise to distinct completions of the same theory.
Finally, if one starts with an arbitrary structure , one can form an elementary completion of it, which is a significantly larger structure which contains as a substructure, and such that every element of is an elementary limit of a sequence of elements in (I will define this term shortly). Furthermore, is elementarily complete; any sequence of statements that are potentially simultaneously satisfiable in (in the sense that any finite number of statements in this collection are simultaneously satisfiable), will actually be simultaneously satisfiable. As we shall see, one can form such an elementary completion by taking an ultrapower of the original structure . If is the standard universe of all the standard objects one considers in mathematics, then its elementary completion is known as the nonstandard universe, and is the setting for nonstandard analysis.
As mentioned earlier, completion tends to make a space much larger and more complicated. If one algebraically completes a finite field, for instance, one necessarily obtains an infinite field as a consequence. If one metrically completes a countable metric space with no isolated points, such as , then one necessarily obtains an uncountable metric space (thanks to the Baire category theorem). If one takes a logical completion of a consistent first-order theory that can model true arithmetic, then this completion is no longer describable by a recursively enumerable schema of axioms, thanks to Gödel’s incompleteness theorem. And if one takes the elementary completion of a countable structure, such as the integers , then the resulting completion will necessarily be uncountable.
However, there are substantial benefits to working in the completed structure which can make it well worth the massive increase in size. For instance, by working in the algebraic completion of a field, one gains access to the full power of algebraic geometry. By working in the metric completion of a metric space, one gains access to powerful tools of real analysis, such as the Baire category theorem, the Heine-Borel theorem, and (in the case of Euclidean completions) the Bolzano-Weierstrass theorem. By working in a logically and elementarily completed theory (aka a saturated model) of a first-order theory, one gains access to the branch of model theory known as definability theory, which allows one to analyse the structure of definable sets in much the same way that algebraic geometry allows one to analyse the structure of algebraic sets. Finally, when working in an elementary completion of a structure, one gains a sequential compactness property, analogous to the Bolzano-Weierstrass theorem, which can be interpreted as the foundation for much of nonstandard analysis, as well as providing a unifying framework to describe various correspondence principles between finitary and infinitary mathematics.
In this post, I wish to expand upon these above points with regard to elementary completion, and to present nonstandard analysis as a completion of standard analysis in much the same way as, say, complex algebra is a completion of real algebra, or real metric geometry is a completion of rational metric geometry.
I have blogged a number of times in the past about the relationship between finitary (or “hard”, or “quantitative”) analysis, and infinitary (or “soft”, or “qualitative”) analysis. One way to connect the two types of analysis is via compactness arguments (and more specifically, contradiction and compactness arguments); such arguments can convert qualitative properties (such as continuity) to quantitative properties (such as bounded), basically because of the fundamental fact that continuous functions on a compact space are bounded (or the closely related fact that sequentially continuous functions on a sequentially compact space are bounded).
A key stage in any such compactness argument is the following: one has a sequence of “quantitative” or “finitary” objects or spaces, and one has to somehow end up with a “qualitative” or “infinitary” limit object or limit space. One common way to achieve this is to embed everything inside some universal space and then use some weak compactness property of that space, such as the Banach-Alaoglu theorem (or its sequential counterpart). This is for instance the idea behind the Furstenberg correspondence principle relating ergodic theory to combinatorics; see for instance this post of mine on this topic.
However, there is a slightly different approach, which I will call ultralimit analysis, which proceeds via the machinery of ultrafilters and ultraproducts; typically, the limit objects one constructs are now the ultraproducts (or ultralimits) of the original objects . There are two main facts that make ultralimit analysis powerful. The first is that one can take ultralimits of arbitrary sequences of objects, as opposed to more traditional tools such as metric completions, which only allow one to take limits of Cauchy sequences of objects. The second fact is Los’s theorem, which tells us that is an elementary limit of the (i.e. every sentence in first-order logic which is true for the for large enough, is true for ). This existence of elementary limits is a manifestation of the compactness theorem in logic; see this earlier blog post for more discussion. So we see that compactness methods and ultrafilter methods are closely intertwined. (See also my earlier class notes for a related connection between ultrafilters and compactness.)
Ultralimit analysis is very closely related to nonstandard analysis. I already discussed some aspects of this relationship in an earlier post, and will expand upon it at the bottom of this post. Roughly speaking, the relationship between ultralimit analysis and nonstandard analysis is analogous to the relationship between measure theory and probability theory.
To illustrate how ultralimit analysis is actually used in practice, I will show later in this post how to take a qualitative infinitary theory – in this case, basic algebraic geometry – and apply ultralimit analysis to then deduce a quantitative version of this theory, in which the complexity of the various algebraic sets and varieties that appear as outputs are controlled uniformly by the complexity of the inputs. The point of this exercise is to show how ultralimit analysis allows for a relatively painless conversion back and forth between the quantitative and qualitative worlds, though in some cases the quantitative translation of a qualitative result (or vice versa) may be somewhat unexpected. In an upcoming paper of myself, Ben Green, and Emmanuel Breuillard (announced in the previous blog post), we will rely on ultralimit analysis to reduce the messiness of various quantitative arguments by replacing them with a qualitative setting in which the theory becomes significantly cleaner.
For sake of completeness, I also redo some earlier instances of the correspondence principle via ultralimit analysis, namely the deduction of the quantitative Gromov theorem from the qualitative one, and of Szemerédi’s theorem from the Furstenberg recurrence theorem, to illustrate how close the two techniques are to each other.
One of the most basic theorems in linear algebra is that every finite-dimensional vector space has a finite basis. Let us give a statement of this theorem in the case when the underlying field is the rationals:
Theorem 1 (Finite generation implies finite basis, infinitary version) Let be a vector space over the rationals , and let be a finite collection of vectors in . Then there exists a collection of vectors in , with , such that
- ( generates ) Every can be expressed as a rational linear combination of the .
- ( independent) There is no non-trivial linear relation , among the (where non-trivial means that the are not all zero).
In fact, one can take to be a subset of the .
Proof: We perform the following “rank reduction argument”. Start with initialised to (so initially we have ). Clearly generates . If the are linearly independent then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the , say , is a rational linear combination of the . In such a case, becomes redundant, and we may delete it (reducing the rank by one). We repeat this procedure; it can only run for at most steps and so terminates with obeying both of the desired properties.
In additive combinatorics, one often wants to use results like this in finitary settings, such as that of a cyclic group where is a large prime. Now, technically speaking, is not a vector space over , because one only multiply an element of by a rational number if the denominator of that rational does not divide . But for very large, “behaves” like a vector space over , at least if one restricts attention to the rationals of “bounded height” – where the numerator and denominator of the rationals are bounded. Thus we shall refer to elements of as “vectors” over , even though strictly speaking this is not quite the case.
On the other hand, saying that one element of is a rational linear combination of another set of elements is not a very interesting statement: any non-zero element of already generates the entire space! However, if one again restricts attention to rational linear combinations of bounded height, then things become interesting again. For instance, the vector can generate elements such as or using rational linear combinations of bounded height, but will not be able to generate such elements of as without using rational numbers of unbounded height.
For similar reasons, the notion of linear independence over the rationals doesn’t initially look very interesting over : any two non-zero elements of are of course rationally dependent. But again, if one restricts attention to rational numbers of bounded height, then independence begins to emerge: for instance, and are independent in this sense.
Thus, it becomes natural to ask whether there is a “quantitative” analogue of Theorem 1, with non-trivial content in the case of “vector spaces over the bounded height rationals” such as , which asserts that given any bounded collection of elements, one can find another set which is linearly independent “over the rationals up to some height”, such that the can be generated by the “over the rationals up to some height”. Of course to make this rigorous, one needs to quantify the two heights here, the one giving the independence, and the one giving the generation. In order to be useful for applications, it turns out that one often needs the former height to be much larger than the latter; exponentially larger, for instance, is not an uncommon request. Fortunately, one can accomplish this, at the cost of making the height somewhat large:
Theorem 2 (Finite generation implies finite basis, finitary version) Let be an integer, and let be a function. Let be an abelian group which admits a well-defined division operation by any natural number of size at most for some constant depending only on ; for instance one can take for a prime larger than . Let be a finite collection of “vectors” in . Then there exists a collection of vectors in , with , as well an integer , such that
- (Complexity bound) for some depending only on .
- ( generates ) Every can be expressed as a rational linear combination of the of height at most (i.e. the numerator and denominator of the coefficients are at most ).
- ( independent) There is no non-trivial linear relation among the in which the are rational numbers of height at most .
In fact, one can take to be a subset of the .
Proof: We perform the same “rank reduction argument” as before, but translated to the finitary setting. Start with initialised to (so initially we have ), and initialise . Clearly generates at this height. If the are linearly independent up to rationals of height then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the , say , is a rational linear combination of the , whose height is bounded by some function depending on and . In such a case, becomes redundant, and we may delete it (reducing the rank by one), but note that in order for the remaining to generate we need to raise the height upper bound for the rationals involved from to some quantity depending on . We then replace by and continue the process. We repeat this procedure; it can only run for at most steps and so terminates with and obeying all of the desired properties. (Note that the bound on is quite poor, being essentially an -fold iteration of ! Thus, for instance, if is exponential, then the bound on is tower-exponential in nature.)
(A variant of this type of approximate basis lemma was used in my paper with Van Vu on the singularity probability of random Bernoulli matrices.)
Looking at the statements and proofs of these two theorems it is clear that the two results are in some sense the “same” result, except that the latter has been made sufficiently quantitative that it is meaningful in such finitary settings as . In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I know of that illustrates how nonstandard analysis can be used to transfer a quantifier-heavy finitary statement, such as Theorem 2, into a quantifier-light infinitary statement, such as Theorem 1, thus lessening the need to perform “epsilon management” duties, such as keeping track of unspecified growth functions such as . This type of transference is discussed at length in this previous blog post of mine.
In this particular case, the amount of effort needed to set up the nonstandard machinery in order to reduce Theorem 2 from Theorem 1 is too great for this transference to be particularly worthwhile, especially given that Theorem 2 has such a short proof. However, when performing a particularly intricate argument in additive combinatorics, in which one is performing a number of “rank reduction arguments”, “energy increment arguments”, “regularity lemmas”, “structure theorems”, and so forth, the purely finitary approach can become bogged down with all the epsilon management one needs to do to organise all the parameters that are flying around. The nonstandard approach can efficiently hide a large number of these parameters from view, and it can then become worthwhile to invest in the nonstandard framework in order to clean up the rest of a lengthy argument. Furthermore, an advantage of moving up to the infinitary setting is that one can then deploy all the firepower of an existing well-developed infinitary theory of mathematics (in this particular case, this would be the theory of linear algebra) out of the box, whereas in the finitary setting one would have to painstakingly finitise each aspect of such a theory that one wished to use (imagine for instance trying to finitise the rank-nullity theorem for rationals of bounded height).
The nonstandard approach is very closely related to use of compactness arguments, or of the technique of taking ultralimits and ultraproducts; indeed we will use an ultrafilter in order to create the nonstandard model in the first place.
I will also discuss a two variants of both Theorem 1 and Theorem 2 which have actually shown up in my research. The first is that of the regularity lemma for polynomials over finite fields, which came up when studying the equidistribution of such polynomials (in this paper with Ben Green). The second comes up when is dealing not with a single finite collection of vectors, but rather with a family of such vectors, where ranges over a large set; this gives rise to what we call the sunflower lemma, and came up in this recent paper of myself, Ben Green, and Tamar Ziegler.
This post is mostly concerned with nonstandard translations of the “rank reduction argument”. Nonstandard translations of the “energy increment argument” and “density increment argument” were briefly discussed in this recent post; I may return to this topic in more detail in a future post.
In the course of the ongoing logic reading seminar at UCLA, I learned about the property of countable saturation. A model of a language is countably saturated if, every countable sequence of formulae in (involving countably many constants in ) which is finitely satisfiable in (i.e. any finite collection in the sequence has a solution in ), is automatically satisfiable in (i.e. there is a solution to all simultaneously). Equivalently, a model is countably saturated if the topology generated by the definable sets is countably compact.
Update, Nov 19: I have learned that the above terminology is not quite accurate; countable saturation allows for an uncountable sequence of formulae, as long as the constants used remain finite. So, the discussion here involves a weaker property than countable saturation, which I do not know the official term for. If one chooses a special type of ultrafilter, namely a “countably incomplete” ultrafilter, one can recover the full strength of countable saturation, though it is not needed for the remarks here. Most models are not countably saturated. Consider for instance the standard natural numbers as a model for arithmetic. Then the sequence of formulae “” for is finitely satisfiable in , but not satisfiable.
However, if one takes a model of and passes to an ultrapower , whose elements consist of sequences in , modulo equivalence with respect to some fixed non-principal ultrafilter , then it turns out that such models are automatically countably compact. Indeed, if are finitely satisfiable in , then they are also finitely satisfiable in (either by inspection, or by appeal to Los’s theorem and/or the transfer principle in non-standard analysis), so for each there exists which satisfies . Letting be the ultralimit of the , we see that satisfies all of the at once.
In particular, non-standard models of mathematics, such as the non-standard model of the natural numbers, are automatically countably saturated.
This has some cute consequences. For instance, suppose one has a non-standard metric space (an ultralimit of standard metric spaces), and suppose one has a standard sequence of elements of which are standard-Cauchy, in the sense that for any standard one has for all sufficiently large . Then there exists a non-standard element such that standard-converges to in the sense that for every standard one has for all sufficiently large . Indeed, from the standard-Cauchy hypothesis, one can find a standard for each standard that goes to zero (in the standard sense), such that the formulae “” are finitely satisfiable, and hence satisfiable by countable saturation. Thus we see that non-standard metric spaces are automatically “standardly complete” in some sense.
This leads to a non-standard structure theorem for Hilbert spaces, analogous to the orthogonal decomposition in Hilbert spaces:
Theorem 1 (Non-standard structure theorem for Hilbert spaces) Let be a non-standard Hilbert space, let be a bounded (external) subset of , and let . Then there exists a decomposition , where is “almost standard-generated by ” in the sense that for every standard , there exists a standard finite linear combination of elements of which is within of , and is “standard-orthogonal to ” in the sense that for all .
Proof: Let be the infimum of all the (standard) distances from to a standard linear combination of elements of , then for every standard one can find a standard linear combination of elements of which lie within of . From the parallelogram law we see that is standard-Cauchy, and thus standard-converges to some limit , which is then almost standard-generated by by construction. An application of Pythagoras then shows that is standard-orthogonal to every element of .
This is the non-standard analogue of a combinatorial structure theorem for Hilbert spaces (see e.g. Theorem 2.6 of my FOCS paper). There is an analogous non-standard structure theorem for -algebras (the counterpart of Theorem 3.6 from that paper) which I will not discuss here, but I will give just one sample corollary:
Theorem 2 (Non-standard arithmetic regularity lemma) Let be a non-standardly finite abelian group, and let be a function. Then one can split , where is standard-uniform in the sense that all Fourier coefficients are (uniformly) , and is standard-almost periodic in the sense that for every standard , one can approximate to error in norm by a standard linear combination of characters (which is also bounded).
This can be used for instance to give a non-standard proof of Roth’s theorem (which is not much different from the “finitary ergodic” proof of Roth’s theorem, given for instance in Section 10.5 of my book with Van Vu). There is also a non-standard version of the Szemerédi regularity lemma which can be used, among other things, to prove the hypergraph removal lemma (the proof then becomes rather close to the infinitary proof of this lemma in this paper of mine). More generally, the above structure theorem can be used as a substitute for various “energy increment arguments” in the combinatorial literature, though it does not seem that there is a significant saving in complexity in doing so unless one is performing quite a large number of these arguments.
One can also cast density increment arguments in a nonstandard framework. Here is a typical example. Call a non-standard subset of a non-standard finite set dense if one has for some standard .
Theorem 3 Suppose Szemerédi’s theorem (every set of integers of positive upper density contains an arithmetic progression of length ) fails for some . Then there exists an unbounded non-standard integer , a dense subset of with no progressions of length , and with the additional property that
for any subprogression of of unbounded size (thus there is no sizeable density increment on any large progression).
Proof: Let be a (standard) set of positive upper density which contains no progression of length . Let be the asymptotic maximal density of inside a long progression, thus . For any , one can then find a standard integer and a standard subset of of density at least such that can be embedded (after a linear transformation) inside , so in particular has no progressions of length . Applying the saturation property, one can then find an unbounded and a set of of density at least for every standard (i.e. of density at least ) with no progressions of length . By construction, we also see that for any subprogression of of unbounded size, hs density at most for any standard , thus has density at most , and the claim follows.
This can be used as the starting point for any density-increment based proof of Szemerédi’s theorem for a fixed , e.g. Roth’s proof for , Gowers’ proof for arbitrary , or Szemerédi’s proof for arbitrary . (It is likely that Szemerédi’s proof, in particular, simplifies a little bit when translated to the non-standard setting, though the savings are likely to be modest.)
I’m also hoping that the recent results of Hrushovski on the noncommutative Freiman problem require only countable saturation, as this makes it more likely that they can be translated to a non-standard setting and thence to a purely finitary framework.
Last year on this blog, I sketched out a non-rigorous probabilistic argument justifying the following well-known theorem:
Theorem 1. (Non-measurable sets exist) There exists a subset of the unit interval which is not Lebesgue-measurable.
The idea was to let E be a “random” subset of . If one (non-rigorously) applies the law of large numbers, one expects E to have “density” 1/2 with respect to every subinterval of , which would contradict the Lebesgue differentiation theorem.
I was recently asked whether I could in fact make the above argument rigorous. This turned out to be more difficult than I had anticipated, due to some technicalities in trying to make the concept of a random subset of (which requires an uncountable number of “coin flips” to generate) both rigorous and useful. However, there is a simpler variant of the above argument which can be made rigorous. Instead of letting E be a “random” subset of , one takes E to be an “alternating” set that contains “every other” real number in ; this again should have density 1/2 in every subinterval and thus again contradict the Lebesgue differentiation theorem.
Of course, in the standard model of the real numbers, it makes no sense to talk about “every other” or “every second” real number, as the real numbers are not discrete. If however one employs the language of non-standard analysis, then it is possible to make the above argument rigorous, and this is the purpose of my post today. I will assume some basic familiarity with non-standard analysis, for instance as discussed in this earlier post of mine.
This post is in some ways an antithesis of my previous postings on hard and soft analysis. In those posts, the emphasis was on taking a result in soft analysis and converting it into a hard analysis statement (making it more “quantitative” or “effective”); here we shall be focusing on the reverse procedure, in which one harnesses the power of infinitary mathematics – in particular, ultrafilters and nonstandard analysis – to facilitate the proof of finitary statements.
Arguments in hard analysis are notorious for their profusion of “epsilons and deltas”. In the more sophisticated arguments of this type, one can end up having an entire army of epsilons that one needs to manage, in particular choosing each epsilon carefully to be sufficiently small compared to other parameters (including other epsilons), while of course avoiding an impossibly circular situation in which a parameter is ultimately required to be small with respect to itself, which is absurd. This art of epsilon management, once mastered, is not terribly difficult – it basically requires one to mentally keep track of which quantities are “small”, “very small”, “very very small”, and so forth – but when these arguments get particularly lengthy, then epsilon management can get rather tedious, and also has the effect of making these arguments unpleasant to read. In particular, any given assertion in hard analysis usually comes with a number of unsightly quantifiers (For every there exists an N…) which can require some thought for a reader to parse. This is in contrast with soft analysis, in which most of the quantifiers (and the epsilons) can be cleanly concealed via the deployment of some very useful terminology; consider for instance how many quantifiers and epsilons are hidden within, say, the Heine-Borel theorem: “a subset of a Euclidean space is compact if and only if it is closed and bounded”.
For those who practice hard analysis for a living (such as myself), it is natural to wonder if one can somehow “clean up” or “automate” all the epsilon management which one is required to do, and attain levels of elegance and conceptual clarity comparable to those in soft analysis, hopefully without sacrificing too much of the “elementary” or “finitary” nature of hard analysis in the process.