One of the most basic theorems in linear algebra is that every finite-dimensional vector space has a finite basis. Let us give a statement of this theorem in the case when the underlying field is the rationals:
Theorem 1 (Finite generation implies finite basis, infinitary version) Let be a vector space over the rationals , and let be a finite collection of vectors in . Then there exists a collection of vectors in , with , such that
- ( generates ) Every can be expressed as a rational linear combination of the .
- ( independent) There is no non-trivial linear relation , among the (where non-trivial means that the are not all zero).
In fact, one can take to be a subset of the .
Proof: We perform the following “rank reduction argument”. Start with initialised to (so initially we have ). Clearly generates . If the are linearly independent then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the , say , is a rational linear combination of the . In such a case, becomes redundant, and we may delete it (reducing the rank by one). We repeat this procedure; it can only run for at most steps and so terminates with obeying both of the desired properties.
In additive combinatorics, one often wants to use results like this in finitary settings, such as that of a cyclic group where is a large prime. Now, technically speaking, is not a vector space over , because one only multiply an element of by a rational number if the denominator of that rational does not divide . But for very large, “behaves” like a vector space over , at least if one restricts attention to the rationals of “bounded height” – where the numerator and denominator of the rationals are bounded. Thus we shall refer to elements of as “vectors” over , even though strictly speaking this is not quite the case.
On the other hand, saying that one element of is a rational linear combination of another set of elements is not a very interesting statement: any non-zero element of already generates the entire space! However, if one again restricts attention to rational linear combinations of bounded height, then things become interesting again. For instance, the vector can generate elements such as or using rational linear combinations of bounded height, but will not be able to generate such elements of as without using rational numbers of unbounded height.
For similar reasons, the notion of linear independence over the rationals doesn’t initially look very interesting over : any two non-zero elements of are of course rationally dependent. But again, if one restricts attention to rational numbers of bounded height, then independence begins to emerge: for instance, and are independent in this sense.
Thus, it becomes natural to ask whether there is a “quantitative” analogue of Theorem 1, with non-trivial content in the case of “vector spaces over the bounded height rationals” such as , which asserts that given any bounded collection of elements, one can find another set which is linearly independent “over the rationals up to some height”, such that the can be generated by the “over the rationals up to some height”. Of course to make this rigorous, one needs to quantify the two heights here, the one giving the independence, and the one giving the generation. In order to be useful for applications, it turns out that one often needs the former height to be much larger than the latter; exponentially larger, for instance, is not an uncommon request. Fortunately, one can accomplish this, at the cost of making the height somewhat large:
Theorem 2 (Finite generation implies finite basis, finitary version) Let be an integer, and let be a function. Let be an abelian group which admits a well-defined division operation by any natural number of size at most for some constant depending only on ; for instance one can take for a prime larger than . Let be a finite collection of “vectors” in . Then there exists a collection of vectors in , with , as well an integer , such that
- (Complexity bound) for some depending only on .
- ( generates ) Every can be expressed as a rational linear combination of the of height at most (i.e. the numerator and denominator of the coefficients are at most ).
- ( independent) There is no non-trivial linear relation among the in which the are rational numbers of height at most .
In fact, one can take to be a subset of the .
Proof: We perform the same “rank reduction argument” as before, but translated to the finitary setting. Start with initialised to (so initially we have ), and initialise . Clearly generates at this height. If the are linearly independent up to rationals of height then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the , say , is a rational linear combination of the , whose height is bounded by some function depending on and . In such a case, becomes redundant, and we may delete it (reducing the rank by one), but note that in order for the remaining to generate we need to raise the height upper bound for the rationals involved from to some quantity depending on . We then replace by and continue the process. We repeat this procedure; it can only run for at most steps and so terminates with and obeying all of the desired properties. (Note that the bound on is quite poor, being essentially an -fold iteration of ! Thus, for instance, if is exponential, then the bound on is tower-exponential in nature.)
(A variant of this type of approximate basis lemma was used in my paper with Van Vu on the singularity probability of random Bernoulli matrices.)
Looking at the statements and proofs of these two theorems it is clear that the two results are in some sense the “same” result, except that the latter has been made sufficiently quantitative that it is meaningful in such finitary settings as . In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I know of that illustrates how nonstandard analysis can be used to transfer a quantifier-heavy finitary statement, such as Theorem 2, into a quantifier-light infinitary statement, such as Theorem 1, thus lessening the need to perform “epsilon management” duties, such as keeping track of unspecified growth functions such as . This type of transference is discussed at length in this previous blog post of mine.
In this particular case, the amount of effort needed to set up the nonstandard machinery in order to reduce Theorem 2 from Theorem 1 is too great for this transference to be particularly worthwhile, especially given that Theorem 2 has such a short proof. However, when performing a particularly intricate argument in additive combinatorics, in which one is performing a number of “rank reduction arguments”, “energy increment arguments”, “regularity lemmas”, “structure theorems”, and so forth, the purely finitary approach can become bogged down with all the epsilon management one needs to do to organise all the parameters that are flying around. The nonstandard approach can efficiently hide a large number of these parameters from view, and it can then become worthwhile to invest in the nonstandard framework in order to clean up the rest of a lengthy argument. Furthermore, an advantage of moving up to the infinitary setting is that one can then deploy all the firepower of an existing well-developed infinitary theory of mathematics (in this particular case, this would be the theory of linear algebra) out of the box, whereas in the finitary setting one would have to painstakingly finitise each aspect of such a theory that one wished to use (imagine for instance trying to finitise the rank-nullity theorem for rationals of bounded height).
The nonstandard approach is very closely related to use of compactness arguments, or of the technique of taking ultralimits and ultraproducts; indeed we will use an ultrafilter in order to create the nonstandard model in the first place.
I will also discuss a two variants of both Theorem 1 and Theorem 2 which have actually shown up in my research. The first is that of the regularity lemma for polynomials over finite fields, which came up when studying the equidistribution of such polynomials (in this paper with Ben Green). The second comes up when is dealing not with a single finite collection of vectors, but rather with a family of such vectors, where ranges over a large set; this gives rise to what we call the sunflower lemma, and came up in this recent paper of myself, Ben Green, and Tamar Ziegler.
This post is mostly concerned with nonstandard translations of the “rank reduction argument”. Nonstandard translations of the “energy increment argument” and “density increment argument” were briefly discussed in this recent post; I may return to this topic in more detail in a future post.
— 1. Equivalence of Theorems 1 and 2 —
Both Theorem 1 and Theorem 2 are easy enough to prove. But we will now spend a certain amount of effort in showing that one can deduce each theorem from the other without actually going through the proof of either. This may not seem particularly worthwhile (or to be serious overkill) in the case of these two particular theorems, but the method of deduction is extremely general, and can be used to relate much more deep and difficult infinitary and finitary theorems to each other without a significant increase in effort. (This is closely related to various correspondence principles between combinatorics and parts of infinitary mathematics, such as ergodic theory; see also my post on soft and hard analysis for a closely related equivalence.)
Let’s first show why the finitary theorem, Theorem 2, implies Theorem 1. We argue by contradiction. If Theorem 1 failed, then we could find a vector space over the rationals, and a finite collection of vectors, for which no finite subcollection of the obeyed both the generation property and the linear independence property. In other words, whenever a subcollection happened to generate by rationals, then it must necessarily contain a linear dependence.
We use this to create a function as follows. Given any natural number , consider all the finite subcollections of which can generate the using rationals of height at most . By the above hypothesis, all such subcollections contain a linear dependence involving rationals of some finite height. There may be many such dependences; we pick one arbitrarily. We then choose to be any natural number larger than the heights of all the rationals involved in all the linear dependencies thus chosen. (Here we implicitly use the fact that there are only finitely many subcollections of the to search through.)
Having chosen this function , we then apply Theorem 2 to the vectors and this choice of function , to obtain a subcollection which generate the using rationals of height at most , and have no linear dependence involving rationals of height at most . But this contradicts the construction of , and gives the claim.
Remark 1 Note how important it is here that the growth function in Theorem 2 is not specified in advance, but is instead a parameter that can be set to be as “large” as needed. Indeed, for Theorem 2 for any fixed (e.g. exponential, tower-exponential, Ackermann, etc.) gives a statement which is strictly “weaker” than Theorem 1 in a sense that I will not try to make precise here; it is only the union of all these statements for all conceivable that gives the full strength of Theorem 1. A similar phenomenon occurs with the finite convergence principle. It is this “second order” nature of infinitary statements (they quantify not just over numerical parameters such as or , but also over functional parameters such as ) that make such statements appear deeper than finitary ones, but the distinction largely disappears if one is willing to perform such second-order quantifications.
Now we turn to the more interesting deduction, which is to obtain Theorem 2 from Theorem 1. Again, one argues by contradiction. Suppose that Theorem 2 failed. Carefully negating all the quantifiers (and using the axiom of choice), we conclude that there exists a function and a natural number with the following property: given any natural number , there exists an abelian group which is divisible up to height , and elements in such that there is no subcollection of the , together with an integer , such that generate using rationals of height at most , and such that the have no linear dependence using rationals of height at most .
We now perform an ultralimit as . We will not pause here to recall the machinery of ultrafilters, ultralimits, and ultraproducts, but refer the reader instead to this previous blog post of mine for discussion.
We pick a non-principal ultrafilter of the natural numbers. Starting with the “standard” abelian groups , we then form their ultraproduct , defined as the space of sequences with for each , modulo equivalence by ; thus two sequences and are considered equal if for a -large set of (i.e. for a set of that lies in ).
Now that non-standard objects are in play, we will need to take some care to distinguish between standard objects (e.g. standard natural numbers) and their nonstandard counterparts.
Since each of the are an abelian group, is also an abelian group (an easy special case of the transfer principle). Since each is divisible up to height , is divisible up to all (standard) heights; in other words, is actually a vector space over the (standard) rational numbers . The point is that while none of the are, strictly speaking, vector spaces over , they increasingly behave as if they were such spaces, and in the limit one recovers genuine vector space structure.
For each , one can take an ultralimit of the elements to generate an element of the ultraproduct . So now we have vectors of a vector space over – precisely the setting of Theorem 1! So we apply that theorem and obtain a subcollection of the , such that each can be generated from the using (standard) rationals, and such that the are linearly independent over the (standard) rationals.
Since all (standard) rationals have a finite height, one can find a (standard) natural number such that each of the can be generated from the using (standard) rationals of height at most . Undoing the ultralimit, we conclude that for a -large set of ‘s, all of the can be generated from the using rationals of height at most . But by hypothesis, this implies for all sufficiently large in this -large set, the contain a non-trivial rational dependence of height at most , thus
for some integers of magnitude at most , with the not all zero.
By the pigeonhole principle (and the finiteness of ), each of the is constant in on a -large set of . So if we take an ultralimit again to go back to the nonstandard world, the quantities , are standard integers (rather than merely nonstandard integers). Thus we have
with the not all zero, i.e. we have a linear dependence amongst the . But this contradicts Theorem 1.
— 2. Polynomials over finite fields —
Let a fixed finite field (e.g. the field of two elements), and consider a high-dimensional finite vector space over . A polynomial of degree can then be defined as a combination of monomials each of degree at most , or alternatively as a function whose derivative vanishes; see this previous blog post for some discussion of this equivalence.
We define the rank of a degree polynomial to be the least number of degree polynomials , such that is completely determined by , i.e. for some function . In the case when has degree , this concept is very close to the familiar rank of a quadratic form or matrix.
A generalisation of the notion of linear independence is that of linear independence modulo low rank. Let us call a collection of degree polynomials -linearly independent if every non-trivial linear combination with not all zero, has rank at least :
There is then the following analogue of Theorem 2:
Theorem 3 (Polynomial regularity lemma at one degree, finitary version) Let be integers, let be a finite field and let be a function. Let be a vector space over , and let be polynomials of degree . Then there exists a collection of polynomials of degree , with , as well an integer , such that
- (Complexity bound) for some depending only on .
- ( generates ) Every can be expressed as a -linear combination of the , plus an error which has rank at most .
- ( independent) There is no non-trivial linear relation among the in which has rank at most .
In fact, one can take to be a subset of the .
This theorem can be proven in much the same way as Theorem 2, and the reader is invited to do so as an exercise. The constant can in fact be taken to be independent of and , but this is not important to us here.
Roughly speaking, Theorem 3 asserts that a finite family of degree polynomials can be expressed as a linear combination of degree polynomials which are “linearly independent modulo low rank errors”, plus some lower rank objects. One can think of this as regularising the degree polynomials, modulo combinations of lower degree polynomials. For applications (and in particular, for understanding the equidistribution) one also needs to regularise the degree polynomials that arise this way, and so forth for increasingly lower degrees until all polynomials are regularised. (A similar phenomenon occurs for the hypergraph regularity lemma.)
When working with theorems like this, it is helpful to think conceptually of “quotienting out” by all polynomials of low rank. Unfortunately, in the finitary setting, the polynomials of low rank do not form a group, and so the quotient is ill-defined. However, this can be rectified by passing to the infinitary setting. Indeed, once one does so, one can quotient out the low rank polynomials, and Theorem 3 follows directly from Theorem 1 (or more precisely, the analogue of that theorem in which the field of rationals is replaced by the finite field ).
Let’s see how this works. To prove Theorem 3, suppose for contradiction that the theorem failed. Then one can find , such that for every natural , one can find a vector space and polynomials of degree , for which there do not exist polynomials with and an integer such that each can be expressed as a linear combination of the modulo an error of rank at most , and such that there are no nontrivial linear relations amongst the modulo errors of rank at most .
Taking an ultralimit as before, we end up with a (nonstandard) vector space over (which is likely to be infinite), and (nonstandard) polynomials of degree (here it is best to use the “local” definition of a polynomial of degree , as a (nonstandard) function whose derivative, but one can also view this as a (nonstandard) sum of monomials if one is careful).
The space of (nonstandard) degree polynomials on is a (nonstandard) vector space over . Inside this vector space, one has the subspace consisting of all polynomials whose rank is a standard integer (as opposed to a nonstandard integer); call these the bounded rank polynomials. This is easily seen to be a subspace of (although it is not a nonstandard or internal subspace, i.e. the ultralimit of subspaces of the ). As such, one can rigorously form the quotient space of degree polynomials, modulo bounded rank polynomials.
The polynomials then have representatives in this quotient space. Applying Theorem 1 (for the field ), one can then find a subcollection which are linearly independent in this space, which generate . Undoing the quotient, we see that the are linear combinations of the plus a bounded rank error, while no nontrivial linear combination of has bounded rank. Undoing the ultralimit as in the previous section, we obtain the desired contradiction.
We thus see that in the nonstandard world, the somewhat nonrigorous concepts of “low rank” and “high rank” can be formalised as that of “bounded rank” and “unbounded rank”. Furthermore, the former space forms a subspace, so in the nonstandard world one can rigorously talk about “quotienting out by bounded rank errors”. Thus we see that the algebraic machinery of quotient spaces can be applied in the nonstandard world directly, whereas in the finitary world it can only be applied heuristically. In principle, one could also start deploying more advanced tools of abstract algebra (e.g. exact sequences, cohomology, etc.) in the nonstandard setting, although this has not yet seriously begun to happen in additive combinatorics (although there are strong hints of some sort of “additive cohomology” emerging in the body of work surrounding the inverse conjecture for the Gowers norm, especially on the ergodic theory side of things).
— 3. Sunflowers —
Now we return to vector spaces (or approximate vector spaces) over the rationals, such as for a large prime . Instead of working with a single (small) tuple of vectors in , we now consider a family of such vectors in , where ranges over a large set, for instance a dense subset of the interval for some large . This situation happens to show up in our recent work on the inverse conjecture for the Gowers norm, where the represent the various “frequencies” that arise in a derivative of a function with respect to the shift . (This need to consider families is an issue that also comes up in the finite field ergodic theory analogue of the inverse conjectures, due to the unbounded number of generators in that case, but interestingly does not arise in the ergodic theory over .)
In Theorem 2, the main distinction was between linear dependence and linear independence of the tuple (or some reduction of this tuple, such as ). We will continue to be interested in the linear dependence or independence of the tuples for various . But we also wish to understand how the vary with as well. At one extreme (the “structured” case), there is no dependence on : for all and all . At the other extreme (the “pseudorandom” case), the are basically independent as varies; in particular, for (almost) all of the pairs , the tuples and are not just separately independent, but are jointly independent. One can think of and as being in “general position” relative to each other.
The sunflower lemma asserts that any family is basically a combination of the above scenarios, thus one can divide the family into a linearly independent core collection of vectors that do not depend on , together with petals , which are in “general position” in the above sense, relative to the core. However, as a price one pays for this, one has to refine to a dense subset of . This lemma, which significantly generalises Theorem 2, is formalised as follows:
Theorem 4 (Sunflower lemma, finitary version) Let be an integer, and let be a function. Let be an abelian group which admits a well-defined division operation by any natural number of size at most for some constant depending only on . Let be a finite set, and let be a collection of -tuples of vectors in indexed by . Then there exists a subset of , integers with , a collection of “core” vectors in for some , a collection of “petal” vectors for each , as well an integer , such that
- (Complexity bound) for some depending only on .
- ( dense) one has for some depending only on .
- ( generates ) Every with and can be expressed as a rational linear combination of the and of height at most .
- ( independent) There is no non-trivial rational linear relation among the of height at most .
- ( in general position relative to ) More generally, for of the pairs , there is no non-trivial linear relation among of height at most .
One can take the to be a subcollection of the , though this is not particularly useful in applications.
Proof: We perform a two-parameter “rank reduction argument”, where the rank is indexed by the pair (ordered lexicographically). We initially set , , , , and for .
At each stage of the iteration, will generate (at height ), and we will have some complexity bound on and some density bound on . So one needs to check the independence of and the general position of relative to .
If there is a linear relation of at height , then one can use this to reduce the size of the core by one, leaving the petal size unchanged, just as in the proof of Theorem 2. So let us move on, and suppose that there is no linear relation of at height , but instead there is a failure of the general position hypothesis. In other words, for at least pairs , one can find a relation of the form
where the are rationals of height at most , not all zero. The number of possible values for such rationals is bounded by some quantity depending on . Thus, by the pigeonhole principle, we can find pairs (i.e. at least pairs for some depending only on ) such that
for some fixed rationals of height at most . By the pigeonhole principle again, we can then find a fixed such that
for all in some subset of with , where
If the and all vanished then we would have a linear dependence amongst the core vectors, which we already know how to deal with. So suppose that we have at least one active petal coefficient, say . Then upon rearranging, we can express as some rational linear combination of the original core vectors , a new core vector , and the other petals , with heights bounded by . We may thus refine to , delete the petal vector , and add the vector to the core, thus decreasing by one and increasing by one. One still has the generation property so long as one replaces with a larger depending on .
Since each iteration of this process either reduces by one keeping fixed, or reduces by one increasing , we see that after at most steps, the process must terminate, when we have both the linear independence of the property and the general position of the property. (Note here that we are basically performing a proof by infinite descent.) At that stage, one easily verifies that we have obtained all the required conclusions of the theorem.
As one can see, this result is a little bit trickier to prove than Theorem 2. Let us now see how it will translate to the nonstandard setting, and see what the nonstandard analogue of Theorem 5 is. We will skip some details, and get to the point where we can motivate and prove this nonstandard analogue; this analogue does in fact imply Theorem 5 by repeating the arguments from previous sections, but we will leave this as an exercise for the interested reader.
As before, the starting point is to introduce a parameter , so that the approximate vector space now depends on (and becomes an actual vector space in the ultralimit ), and the parameter set now also depends on . We will think of as going to infinity as , as this is the most interesting case (for bounded , the result basically collapses back to Theorem 2). In that case, the ultralimit of the is a nonstandard finite set (i.e. an ultralimit of finite sets) whose (nonstandard) cardinality is an unbounded nonstandard integer: it is a nonstandard integer (indeed, it is the ultralimit of the ) which is larger than any standard integer. On the other hand, and remain standard (i.e. they do not involve ).
For each , one starts with a family of -tuples of vectors in . Taking ultralimits, one ends up with a family of -tuples of vectors in . Furthermore, for each , the maps are nonstandard (or internal) functions from to , i.e. they are ultralimits of maps from to . The internal nature of these maps (which is a kind of “measurability” condition on these functions) will be important later. Of course, and are also internal (being ultralimits of and respectively).
We say that a subset of is dense if it is an internal subset (i.e. it is the ultralimit of some subsets of ), and if for some standard (recall that are nonstandard integers). If an internal subset is not dense, we say that it is sparse, which in nonstandard asymptotic notation is equivalent to . If a statement holds on all in dense set of , we say that it holds for many ; if it holds for all outside of a sparse set, we say it holds for almost all . These are analogous to the more familiar concepts of “holding with positive probability” and “holding almost surely” in probability theory. For instance, if holds for many in , and holds for almost all in , then and jointly hold for many in . Note how all the epsilons have been neatly hidden away in this nonstandard framework.
Now we state the nonstandard analogue of Theorem 5.
Theorem 5 (Sunflower lemma, nonstandard version) Let be a (standard) integer, let be a (nonstandard) vector space over the standard rationals , and let be a (nonstandard) set. Let be a collection of -tuples of vectors in indexed by , such that all the maps for are internal. Then there exists a dense subset of , a bounded-dimensional subspace of , a (standard) integer with , and a collection of “petal” vectors for each , with the maps being internal for all , such that
- ( generates ) Every with and lies in the span of and the .
- ( in general position relative to ) For almost all of the pairs , the vectors are linearly independent modulo over .
Proof: Define a partial representation of the family to be a dense subset of , a bounded dimensional space , a standard integer with , and a collection of depending internally on that obeys the generation property (but not necessarily the general position property). Clearly we have at least one partial representation, namely the trivial one where is empty, , , and . Now, among all such partial representations, let us take a representation with the minimal value of . (Here we are of course using the well-ordering property of the standard natural numbers.) We claim that this representation enjoys the general position property, which will give the claim.
Indeed, suppose this was not the case. Then, for many pairs , the vectors have a linear dependence modulo over . (Actually, there is a technical “measurability” issue to address here, which I will return to later.) By symmetry and pigeonholing, we may assume that the coefficient of (say) of this dependence is non-zero. (Again, there is a measurability issue here.) Applying the pigeonhole principle, one can find such that
have a linear dependence over modulo for many . (Again, there is a measurability issue here.)
Fix . The number of possible linear combinations of is countable. Because of this (and using a “countable pigeonhole principle”) that I will address below), we can find a fixed rational linear combination of the such that
have a linear dependence over modulo for all in some dense subset of . But now one can pass from to the dense subset , delete the petal , and add the vector to the core space , thus creating a partial representation with a smaller value of , contradicting minimality, and we are done.
We remark here that whereas the finitary analogue of this result was proven using the method of infinite descent, the nonstandard version could instead be proven using the (equivalent) well-ordering principle. One could easily recast the nonstandard version in descent form also, but it is somewhat more difficult to cast the finitary argument using well-ordering due to the extra parameters and quantifiers in play.
Let us now address the measurability issues. The main problem here is that the property of having a linear dependence over the standard rationals is not an internal property, because it requires knowledge of what the standard rationals are, which is not an internal concept in the language of vector spaces. However, for each fixed choice of rational coefficients, the property of having a specific linear dependence with those selected coefficients is an internal concept (here we crucially rely on the hypothesis that the maps were internal), so really what we have here is a sort of “-internal” property (a countable union of internal properties). But this is good enough for many purposes. In particular, we have
Lemma 6 (Countable pigeonhole principle) Let be a nonstandardly finite set (i.e. the ultralimit of finite sets ), and for each standard natural number , let be an internal subset of . Then one of the following holds:
- (Positive density) There exists a natural number such that for many (i.e. is a dense subset of ).
- (Zero density) For almost all , one has for all . (In other words, the (external) set in is contained in a sparse subset of .)
This lemma is sufficient to resolve all the measurability issues raised in the previous proof. It is analogous to the trivial statement in measure theory that given a countable collection of measurable subsets of a space of positive measure, either one of the measurable sets has positive measure, or else their union has measure zero (i.e. the sets fail to cover almost all of the space).
Proof: If any of the are dense, we are done. So suppose this is not the case. Since is a definable subset of which is not dense, it is sparse, thus . Now it is convenient to undo the ultralimit and work in the finite sets that is the ultralimit of. Note that each , being internal, is also an ultralimit of some finite subsets of .
For each standard integer , the set is sparse in , and in particular has density less than . Thus, one can find a -large set such that
for all . One can arrange matters so that the are decreasing in . One then sets the set to equal , where is the smallest integer for which (or is empty if lies in all the , or in none), and let be the ultralimit of the . Then we see that for every standard , and so is a sparse subset of . Furthermore, contains for every standard , and so we are in the zero density conclusion of the argument.
(Curiously, I don’t see how to prove this lemma without unpacking the limit; it doesn’t seem to follow just from, say, the overspill principle. Instead, it seems to be exploiting the weak countable saturation property I mentioned in a previous post. But perhaps I missed a simple argument.)
— 4. Summary —
Let me summarise with a brief list of pros and cons of switching to a nonstandard framework. First, the pros:
- Many “first-order” parameters such as or disappear from view, as do various “negligible” errors. More importantly, “second-order” parameters, such as the function appearing in Theorem 2, also disappear from view. (In principle, third-order and higher parameters would also disappear, though I do not yet know of an actual finitary argument in my fields of study which would have used such parameters (with the exception of Ramsey theory, where such parameters must come into play in order to generate such enormous quantities as Graham’s number).) As such, a lot of tedious “epsilon management” disappears.
- Iterative (and often parameter-heavy) arguments can often be replaced by minimisation (or more generally, extremisation) arguments, taking advantage of such properties as the well-ordering principle, the least upper bound axiom, or compactness.
- The transfer principle lets one use “for free” any (first-order) statement about standard mathematics in the non-standard setting (provided that all objects involved are internal; see below).
- Mature and powerful theories from infinitary mathematics (e.g. linear algebra, real analysis, representation theory, topology, functional analysis, measure theory, Lie theory, ergodic theory, model theory, etc.) can be used rigorously in a nonstandard setting (as long as one is aware of the usual infinitary pitfalls, of course; see below).
- One can formally define terms that correspond to what would otherwise only be heuristic (or heavily parameterised and quantified) concepts such as “small”, “large”, “low rank”, “independent”, “uniformly distributed”, etc.
- The conversion from a standard result to its nonstandard counterpart, or vice versa, is fairly quick (but see below), and generally only needs to be done only once or twice per paper.
Next, the cons:
- Often requires the axiom of choice, as well as a certain amount of set theory. (There are however weakened versions of nonstandard analysis that can avoid choice that are still suitable for many applications.)
- One needs the machinery of ultralimits and ultraproducts to set up the conversion from standard to nonstandard structures.
- The conversion usually proceeds by a proof by contradiction, which (in conjunction with the use of ultralimits) may not be particularly intuitive.
- One cannot efficiently discern what quantitative bounds emerge from a nonstandard argument (other than by painstakingly converting it back to a standard one, or by applying the tools of proof mining). (On the other hand, in particularly convoluted standard arguments, the quantitative bounds are already so poor – e.g. of iterated tower-exponential type – that losing these bounds is no great loss.)
- One has to take some care to distinguish between standard and nonstandard objects (and also between internal and external sets and functions, which are concepts somewhat analogous to measurable and non-measurable sets and functions in measurable theory). More generally, all the usual pitfalls of infinitary analysis (e.g. interchanging limits, the need to ensure measurability or continuity) emerge in this setting, in contrast to the finitary setting where they are usually completely trivial.
- It can be difficult at first to conceptually visualise what nonstandard objects look like (although this becomes easier once one maps nonstandard analysis concepts to heuristic concepts such as “small” and “large” as mentioned earlier, thus for instance one can think of an unbounded nonstandard natural number as being like an incredibly large standard natural number).
- It is inefficient for both nonstandard and standard arguments to coexist within a paper; this makes things a little awkward if one for instance has to cite a result from a standard mathematics paper in a nonstandard mathematics one.
- There are philosophical objections to using mathematical structures that only exist abstractly, rather than corresponding to the “real world”. (Note though that similar objections were also raised in the past with regard to the use of, say, complex numbers, non-Euclidean geometries, or even negative numbers.)
- Formally, there is no increase in logical power gained by using nonstandard analysis (at least if one accepts the axiom of choice); anything which can be proven by nonstandard methods can also be proven by standard ones. In practice, though, the length and clarity of the nonstandard proof may be substantially better than the standard one.
In view of the pros and cons, I would not say that nonstandard analysis is suitable in all situations, nor is it unsuitable in all situations, but one needs to carefully evaluate the costs and benefits in a given setting; also, in some cases having both a finitary and infinitary proof side by side for the same result may be more valuable than just having one of the two proofs. My rule of thumb is that if a finitary argument is already spitting out iterated tower-exponential type bounds or worse in an argument, this is a sign that the argument “wants” to be infinitary, and it may be simpler to move over to an infinitary setting (such as the nonstandard setting).