One of the most basic theorems in linear algebra is that every finite-dimensional vector space has a finite basis. Let us give a statement of this theorem in the case when the underlying field is the rationals:
Theorem 1 (Finite generation implies finite basis, infinitary version) Let
be a vector space over the rationals
, and let
be a finite collection of vectors in
. Then there exists a collection
of vectors in
, with
, such that
- (
generates
) Every
can be expressed as a rational linear combination of the
.
- (
independent) There is no non-trivial linear relation
,
among the
(where non-trivial means that the
are not all zero).
In fact, one can take
to be a subset of the
.
Proof: We perform the following “rank reduction argument”. Start with initialised to
(so initially we have
). Clearly
generates
. If the
are linearly independent then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the
, say
, is a rational linear combination of the
. In such a case,
becomes redundant, and we may delete it (reducing the rank
by one). We repeat this procedure; it can only run for at most
steps and so terminates with
obeying both of the desired properties.
In additive combinatorics, one often wants to use results like this in finitary settings, such as that of a cyclic group where
is a large prime. Now, technically speaking,
is not a vector space over
, because one only multiply an element of
by a rational number if the denominator of that rational does not divide
. But for
very large,
“behaves” like a vector space over
, at least if one restricts attention to the rationals of “bounded height” – where the numerator and denominator of the rationals are bounded. Thus we shall refer to elements of
as “vectors” over
, even though strictly speaking this is not quite the case.
On the other hand, saying that one element of is a rational linear combination of another set of elements is not a very interesting statement: any non-zero element of
already generates the entire space! However, if one again restricts attention to rational linear combinations of bounded height, then things become interesting again. For instance, the vector
can generate elements such as
or
using rational linear combinations of bounded height, but will not be able to generate such elements of
as
without using rational numbers of unbounded height.
For similar reasons, the notion of linear independence over the rationals doesn’t initially look very interesting over : any two non-zero elements of
are of course rationally dependent. But again, if one restricts attention to rational numbers of bounded height, then independence begins to emerge: for instance,
and
are independent in this sense.
Thus, it becomes natural to ask whether there is a “quantitative” analogue of Theorem 1, with non-trivial content in the case of “vector spaces over the bounded height rationals” such as , which asserts that given any bounded collection
of elements, one can find another set
which is linearly independent “over the rationals up to some height”, such that the
can be generated by the
“over the rationals up to some height”. Of course to make this rigorous, one needs to quantify the two heights here, the one giving the independence, and the one giving the generation. In order to be useful for applications, it turns out that one often needs the former height to be much larger than the latter; exponentially larger, for instance, is not an uncommon request. Fortunately, one can accomplish this, at the cost of making the height somewhat large:
Theorem 2 (Finite generation implies finite basis, finitary version) Let
be an integer, and let
be a function. Let
be an abelian group which admits a well-defined division operation by any natural number of size at most
for some constant
depending only on
; for instance one can take
for
a prime larger than
. Let
be a finite collection of “vectors” in
. Then there exists a collection
of vectors in
, with
, as well an integer
, such that
- (Complexity bound)
for some
depending only on
.
- (
generates
) Every
can be expressed as a rational linear combination of the
of height at most
(i.e. the numerator and denominator of the coefficients are at most
).
- (
independent) There is no non-trivial linear relation
among the
in which the
are rational numbers of height at most
.
In fact, one can take
to be a subset of the
.
Proof: We perform the same “rank reduction argument” as before, but translated to the finitary setting. Start with initialised to
(so initially we have
), and initialise
. Clearly
generates
at this height. If the
are linearly independent up to rationals of height
then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the
, say
, is a rational linear combination of the
, whose height is bounded by some function depending on
and
. In such a case,
becomes redundant, and we may delete it (reducing the rank
by one), but note that in order for the remaining
to generate
we need to raise the height upper bound for the rationals involved from
to some quantity
depending on
. We then replace
by
and continue the process. We repeat this procedure; it can only run for at most
steps and so terminates with
and
obeying all of the desired properties. (Note that the bound on
is quite poor, being essentially an
-fold iteration of
! Thus, for instance, if
is exponential, then the bound on
is tower-exponential in nature.)
(A variant of this type of approximate basis lemma was used in my paper with Van Vu on the singularity probability of random Bernoulli matrices.)
Looking at the statements and proofs of these two theorems it is clear that the two results are in some sense the “same” result, except that the latter has been made sufficiently quantitative that it is meaningful in such finitary settings as . In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I know of that illustrates how nonstandard analysis can be used to transfer a quantifier-heavy finitary statement, such as Theorem 2, into a quantifier-light infinitary statement, such as Theorem 1, thus lessening the need to perform “epsilon management” duties, such as keeping track of unspecified growth functions such as
. This type of transference is discussed at length in this previous blog post of mine.
In this particular case, the amount of effort needed to set up the nonstandard machinery in order to reduce Theorem 2 from Theorem 1 is too great for this transference to be particularly worthwhile, especially given that Theorem 2 has such a short proof. However, when performing a particularly intricate argument in additive combinatorics, in which one is performing a number of “rank reduction arguments”, “energy increment arguments”, “regularity lemmas”, “structure theorems”, and so forth, the purely finitary approach can become bogged down with all the epsilon management one needs to do to organise all the parameters that are flying around. The nonstandard approach can efficiently hide a large number of these parameters from view, and it can then become worthwhile to invest in the nonstandard framework in order to clean up the rest of a lengthy argument. Furthermore, an advantage of moving up to the infinitary setting is that one can then deploy all the firepower of an existing well-developed infinitary theory of mathematics (in this particular case, this would be the theory of linear algebra) out of the box, whereas in the finitary setting one would have to painstakingly finitise each aspect of such a theory that one wished to use (imagine for instance trying to finitise the rank-nullity theorem for rationals of bounded height).
The nonstandard approach is very closely related to use of compactness arguments, or of the technique of taking ultralimits and ultraproducts; indeed we will use an ultrafilter in order to create the nonstandard model in the first place.
I will also discuss a two variants of both Theorem 1 and Theorem 2 which have actually shown up in my research. The first is that of the regularity lemma for polynomials over finite fields, which came up when studying the equidistribution of such polynomials (in this paper with Ben Green). The second comes up when is dealing not with a single finite collection of vectors, but rather with a family
of such vectors, where
ranges over a large set; this gives rise to what we call the sunflower lemma, and came up in this recent paper of myself, Ben Green, and Tamar Ziegler.
This post is mostly concerned with nonstandard translations of the “rank reduction argument”. Nonstandard translations of the “energy increment argument” and “density increment argument” were briefly discussed in this recent post; I may return to this topic in more detail in a future post.
— 1. Equivalence of Theorems 1 and 2 —
Both Theorem 1 and Theorem 2 are easy enough to prove. But we will now spend a certain amount of effort in showing that one can deduce each theorem from the other without actually going through the proof of either. This may not seem particularly worthwhile (or to be serious overkill) in the case of these two particular theorems, but the method of deduction is extremely general, and can be used to relate much more deep and difficult infinitary and finitary theorems to each other without a significant increase in effort. (This is closely related to various correspondence principles between combinatorics and parts of infinitary mathematics, such as ergodic theory; see also my post on soft and hard analysis for a closely related equivalence.)
Let’s first show why the finitary theorem, Theorem 2, implies Theorem 1. We argue by contradiction. If Theorem 1 failed, then we could find a vector space over the rationals, and a finite collection
of vectors, for which no finite subcollection
of the
obeyed both the generation property and the linear independence property. In other words, whenever a subcollection
happened to generate
by rationals, then it must necessarily contain a linear dependence.
We use this to create a function as follows. Given any natural number
, consider all the finite subcollections
of
which can generate the
using rationals of height at most
. By the above hypothesis, all such subcollections contain a linear dependence involving rationals of some finite height. There may be many such dependences; we pick one arbitrarily. We then choose
to be any natural number larger than the heights of all the rationals involved in all the linear dependencies thus chosen. (Here we implicitly use the fact that there are only finitely many subcollections of the
to search through.)
Having chosen this function , we then apply Theorem 2 to the vectors
and this choice of function
, to obtain a subcollection
which generate the
using rationals of height at most
, and have no linear dependence involving rationals of height at most
. But this contradicts the construction of
, and gives the claim.
Remark 1 Note how important it is here that the growth function
in Theorem 2 is not specified in advance, but is instead a parameter that can be set to be as “large” as needed. Indeed, for Theorem 2 for any fixed
(e.g. exponential, tower-exponential, Ackermann, etc.) gives a statement which is strictly “weaker” than Theorem 1 in a sense that I will not try to make precise here; it is only the union of all these statements for all conceivable
that gives the full strength of Theorem 1. A similar phenomenon occurs with the finite convergence principle. It is this “second order” nature of infinitary statements (they quantify not just over numerical parameters such as
or
, but also over functional parameters such as
) that make such statements appear deeper than finitary ones, but the distinction largely disappears if one is willing to perform such second-order quantifications.
Now we turn to the more interesting deduction, which is to obtain Theorem 2 from Theorem 1. Again, one argues by contradiction. Suppose that Theorem 2 failed. Carefully negating all the quantifiers (and using the axiom of choice), we conclude that there exists a function and a natural number
with the following property: given any natural number
, there exists an abelian group
which is divisible up to height
, and elements
in
such that there is no subcollection
of the
, together with an integer
, such that
generate
using rationals of height at most
, and such that the
have no linear dependence using rationals of height at most
.
We now perform an ultralimit as . We will not pause here to recall the machinery of ultrafilters, ultralimits, and ultraproducts, but refer the reader instead to this previous blog post of mine for discussion.
We pick a non-principal ultrafilter of the natural numbers. Starting with the “standard” abelian groups
, we then form their ultraproduct
, defined as the space of sequences
with
for each
, modulo equivalence by
; thus two sequences
and
are considered equal if
for a
-large set of
(i.e. for a set of
that lies in
).
Now that non-standard objects are in play, we will need to take some care to distinguish between standard objects (e.g. standard natural numbers) and their nonstandard counterparts.
Since each of the are an abelian group,
is also an abelian group (an easy special case of the transfer principle). Since each
is divisible up to height
,
is divisible up to all (standard) heights; in other words,
is actually a vector space over the (standard) rational numbers
. The point is that while none of the
are, strictly speaking, vector spaces over
, they increasingly behave as if they were such spaces, and in the limit one recovers genuine vector space structure.
For each , one can take an ultralimit of the elements
to generate an element
of the ultraproduct
. So now we have
vectors
of a vector space
over
– precisely the setting of Theorem 1! So we apply that theorem and obtain a subcollection
of the
, such that each
can be generated from the
using (standard) rationals, and such that the
are linearly independent over the (standard) rationals.
Since all (standard) rationals have a finite height, one can find a (standard) natural number such that each of the
can be generated from the
using (standard) rationals of height at most
. Undoing the ultralimit, we conclude that for a
-large set of
‘s, all of the
can be generated from the
using rationals of height at most
. But by hypothesis, this implies for all sufficiently large
in this
-large set, the
contain a non-trivial rational dependence of height at most
, thus
for some integers of magnitude at most
, with the
not all zero.
By the pigeonhole principle (and the finiteness of ), each of the
is constant in
on a
-large set of
. So if we take an ultralimit again to go back to the nonstandard world, the quantities
,
are standard integers (rather than merely nonstandard integers). Thus we have
with the not all zero, i.e. we have a linear dependence amongst the
. But this contradicts Theorem 1.
— 2. Polynomials over finite fields —
Let a fixed finite field (e.g. the field
of two elements), and consider a high-dimensional finite vector space
over
. A polynomial
of degree
can then be defined as a combination of monomials each of degree at most
, or alternatively as a function whose
derivative vanishes; see this previous blog post for some discussion of this equivalence.
We define the rank of a degree
polynomial
to be the least number
of degree
polynomials
, such that
is completely determined by
, i.e.
for some function
. In the case when
has degree
, this concept is very close to the familiar rank of a quadratic form or matrix.
A generalisation of the notion of linear independence is that of linear independence modulo low rank. Let us call a collection of degree
polynomials
-linearly independent if every non-trivial linear combination
with
not all zero, has rank at least
:
There is then the following analogue of Theorem 2:
Theorem 3 (Polynomial regularity lemma at one degree, finitary version) Let
be integers, let
be a finite field and let
be a function. Let
be a vector space over
, and let
be polynomials of degree
. Then there exists a collection
of polynomials of degree
, with
, as well an integer
, such that
- (Complexity bound)
for some
depending only on
.
- (
generates
) Every
can be expressed as a
-linear combination of the
, plus an error
which has rank
at most
.
- (
independent) There is no non-trivial linear relation
among the
in which
has rank
at most
.
In fact, one can take
to be a subset of the
.
This theorem can be proven in much the same way as Theorem 2, and the reader is invited to do so as an exercise. The constant can in fact be taken to be independent of
and
, but this is not important to us here.
Roughly speaking, Theorem 3 asserts that a finite family of degree polynomials can be expressed as a linear combination of degree
polynomials which are “linearly independent modulo low rank errors”, plus some lower rank objects. One can think of this as regularising the degree
polynomials, modulo combinations of lower degree polynomials. For applications (and in particular, for understanding the equidistribution) one also needs to regularise the degree
polynomials that arise this way, and so forth for increasingly lower degrees until all polynomials are regularised. (A similar phenomenon occurs for the hypergraph regularity lemma.)
When working with theorems like this, it is helpful to think conceptually of “quotienting out” by all polynomials of low rank. Unfortunately, in the finitary setting, the polynomials of low rank do not form a group, and so the quotient is ill-defined. However, this can be rectified by passing to the infinitary setting. Indeed, once one does so, one can quotient out the low rank polynomials, and Theorem 3 follows directly from Theorem 1 (or more precisely, the analogue of that theorem in which the field of rationals is replaced by the finite field
).
Let’s see how this works. To prove Theorem 3, suppose for contradiction that the theorem failed. Then one can find , such that for every natural
, one can find a vector space
and polynomials
of degree
, for which there do not exist polynomials
with
and an integer
such that each
can be expressed as a linear combination of the
modulo an error of rank at most
, and such that there are no nontrivial linear relations amongst the
modulo errors of rank at most
.
Taking an ultralimit as before, we end up with a (nonstandard) vector space over
(which is likely to be infinite), and (nonstandard) polynomials
of degree
(here it is best to use the “local” definition of a polynomial of degree
, as a (nonstandard) function whose
derivative, but one can also view this as a (nonstandard) sum of monomials if one is careful).
The space of (nonstandard) degree
polynomials on
is a (nonstandard) vector space over
. Inside this vector space, one has the subspace
consisting of all polynomials
whose rank
is a standard integer (as opposed to a nonstandard integer); call these the bounded rank polynomials. This is easily seen to be a subspace of
(although it is not a nonstandard or internal subspace, i.e. the ultralimit of subspaces of the
). As such, one can rigorously form the quotient space
of degree
polynomials, modulo bounded rank
polynomials.
The polynomials then have representatives
in this quotient space. Applying Theorem 1 (for the field
), one can then find a subcollection
which are linearly independent in this space, which generate
. Undoing the quotient, we see that the
are linear combinations of the
plus a bounded rank error, while no nontrivial linear combination of
has bounded rank. Undoing the ultralimit as in the previous section, we obtain the desired contradiction.
We thus see that in the nonstandard world, the somewhat nonrigorous concepts of “low rank” and “high rank” can be formalised as that of “bounded rank” and “unbounded rank”. Furthermore, the former space forms a subspace, so in the nonstandard world one can rigorously talk about “quotienting out by bounded rank errors”. Thus we see that the algebraic machinery of quotient spaces can be applied in the nonstandard world directly, whereas in the finitary world it can only be applied heuristically. In principle, one could also start deploying more advanced tools of abstract algebra (e.g. exact sequences, cohomology, etc.) in the nonstandard setting, although this has not yet seriously begun to happen in additive combinatorics (although there are strong hints of some sort of “additive cohomology” emerging in the body of work surrounding the inverse conjecture for the Gowers norm, especially on the ergodic theory side of things).
— 3. Sunflowers —
Now we return to vector spaces (or approximate vector spaces) over the rationals, such as
for a large prime
. Instead of working with a single (small) tuple
of vectors in
, we now consider a family
of such vectors in
, where
ranges over a large set, for instance a dense subset of the interval
for some large
. This situation happens to show up in our recent work on the inverse conjecture for the Gowers norm, where the
represent the various “frequencies” that arise in a derivative
of a function
with respect to the shift
. (This need to consider families is an issue that also comes up in the finite field ergodic theory analogue of the inverse conjectures, due to the unbounded number of generators in that case, but interestingly does not arise in the ergodic theory over
.)
In Theorem 2, the main distinction was between linear dependence and linear independence of the tuple (or some reduction of this tuple, such as
). We will continue to be interested in the linear dependence or independence of the tuples
for various
. But we also wish to understand how the
vary with
as well. At one extreme (the “structured” case), there is no dependence on
:
for all
and all
. At the other extreme (the “pseudorandom” case), the
are basically independent as
varies; in particular, for (almost) all of the pairs
, the tuples
and
are not just separately independent, but are jointly independent. One can think of
and
as being in “general position” relative to each other.
The sunflower lemma asserts that any family is basically a combination of the above scenarios, thus one can divide the family into a linearly independent core collection of vectors
that do not depend on
, together with petals
, which are in “general position” in the above sense, relative to the core. However, as a price one pays for this, one has to refine
to a dense subset
of
. This lemma, which significantly generalises Theorem 2, is formalised as follows:
Theorem 4 (Sunflower lemma, finitary version) Let
be an integer, and let
be a function. Let
be an abelian group which admits a well-defined division operation by any natural number of size at most
for some constant
depending only on
. Let
be a finite set, and let
be a collection of
-tuples of vectors in
indexed by
. Then there exists a subset
of
, integers
with
, a collection
of “core” vectors in
for some
, a collection of “petal” vectors
for each
, as well an integer
, such that
- (Complexity bound)
for some
depending only on
.
- (
dense) one has
for some
depending only on
.
- (
generates
) Every
with
and
can be expressed as a rational linear combination of the
and
of height at most
.
- (
independent) There is no non-trivial rational linear relation among the
of height at most
.
- (
in general position relative to
) More generally, for
of the pairs
, there is no non-trivial linear relation among
of height at most
.
One can take the to be a subcollection of the
, though this is not particularly useful in applications.
Proof: We perform a two-parameter “rank reduction argument”, where the rank is indexed by the pair (ordered lexicographically). We initially set
,
,
,
, and
for
.
At each stage of the iteration, will generate
(at height
), and we will have some complexity bound on
and some density bound on
. So one needs to check the independence of
and the general position of
relative to
.
If there is a linear relation of at height
, then one can use this to reduce the size
of the core by one, leaving the petal size
unchanged, just as in the proof of Theorem 2. So let us move on, and suppose that there is no linear relation of
at height
, but instead there is a failure of the general position hypothesis. In other words, for at least
pairs
, one can find a relation of the form
where the are rationals of height at most
, not all zero. The number of possible values for such rationals is bounded by some quantity depending on
. Thus, by the pigeonhole principle, we can find
pairs (i.e. at least
pairs for some
depending only on
) such that
for some fixed rationals of height at most
. By the pigeonhole principle again, we can then find a fixed
such that
for all in some subset
of
with
, where
If the and
all vanished then we would have a linear dependence amongst the core vectors, which we already know how to deal with. So suppose that we have at least one active petal coefficient, say
. Then upon rearranging, we can express
as some rational linear combination of the original core vectors
, a new core vector
, and the other petals
, with heights bounded by
. We may thus refine
to
, delete the petal vector
, and add the vector
to the core, thus decreasing
by one and increasing
by one. One still has the generation property so long as one replaces
with a larger
depending on
.
Since each iteration of this process either reduces by one keeping
fixed, or reduces
by one increasing
, we see that after at most
steps, the process must terminate, when we have both the linear independence of the
property and the general position of the
property. (Note here that we are basically performing a proof by infinite descent.) At that stage, one easily verifies that we have obtained all the required conclusions of the theorem.
As one can see, this result is a little bit trickier to prove than Theorem 2. Let us now see how it will translate to the nonstandard setting, and see what the nonstandard analogue of Theorem 5 is. We will skip some details, and get to the point where we can motivate and prove this nonstandard analogue; this analogue does in fact imply Theorem 5 by repeating the arguments from previous sections, but we will leave this as an exercise for the interested reader.
As before, the starting point is to introduce a parameter , so that the approximate vector space
now depends on
(and becomes an actual vector space in the ultralimit
), and the parameter set
now also depends on
. We will think of
as going to infinity as
, as this is the most interesting case (for bounded
, the result basically collapses back to Theorem 2). In that case, the ultralimit
of the
is a nonstandard finite set (i.e. an ultralimit of finite sets) whose (nonstandard) cardinality
is an unbounded nonstandard integer: it is a nonstandard integer (indeed, it is the ultralimit of the
) which is larger than any standard integer. On the other hand,
and
remain standard (i.e. they do not involve
).
For each , one starts with a family
of
-tuples of vectors in
. Taking ultralimits, one ends up with a family
of
-tuples of vectors in
. Furthermore, for each
, the maps
are nonstandard (or internal) functions from
to
, i.e. they are ultralimits of maps from
to
. The internal nature of these maps (which is a kind of “measurability” condition on these functions) will be important later. Of course,
and
are also internal (being ultralimits of
and
respectively).
We say that a subset of
is dense if it is an internal subset (i.e. it is the ultralimit of some subsets
of
), and if
for some standard
(recall that
are nonstandard integers). If an internal subset is not dense, we say that it is sparse, which in nonstandard asymptotic notation is equivalent to
. If a statement
holds on all
in dense set of
, we say that it holds for many
; if it holds for all
outside of a sparse set, we say it holds for almost all
. These are analogous to the more familiar concepts of “holding with positive probability” and “holding almost surely” in probability theory. For instance, if
holds for many
in
, and
holds for almost all
in
, then
and
jointly hold for many
in
. Note how all the epsilons have been neatly hidden away in this nonstandard framework.
Now we state the nonstandard analogue of Theorem 5.
Theorem 5 (Sunflower lemma, nonstandard version) Let
be a (standard) integer, let
be a (nonstandard) vector space over the standard rationals
, and let
be a (nonstandard) set. Let
be a collection of
-tuples of vectors in
indexed by
, such that all the maps
for
are internal. Then there exists a dense subset
of
, a bounded-dimensional subspace
of
, a (standard) integer
with
, and a collection of “petal” vectors
for each
, with the maps
being internal for all
, such that
- (
generates
) Every
with
and
lies in the span of
and the
.
- (
in general position relative to
) For almost all of the pairs
, the vectors
are linearly independent modulo
over
.
Of course, using Theorem 1 one could obtain a basis for
with
, at which point the theorem more closely resembles Theorem 5.
Proof: Define a partial representation of the family to be a dense subset
of
, a bounded dimensional space
, a standard integer
with
, and a collection of
depending internally on
that obeys the generation property (but not necessarily the general position property). Clearly we have at least one partial representation, namely the trivial one where
is empty,
,
, and
. Now, among all such partial representations, let us take a representation with the minimal value of
. (Here we are of course using the well-ordering property of the standard natural numbers.) We claim that this representation enjoys the general position property, which will give the claim.
Indeed, suppose this was not the case. Then, for many pairs , the vectors
have a linear dependence modulo
over
. (Actually, there is a technical “measurability” issue to address here, which I will return to later.) By symmetry and pigeonholing, we may assume that the
coefficient of (say) of this dependence is non-zero. (Again, there is a measurability issue here.) Applying the pigeonhole principle, one can find
such that
have a linear dependence over modulo
for many
. (Again, there is a measurability issue here.)
Fix . The number of possible linear combinations of
is countable. Because of this (and using a “countable pigeonhole principle”) that I will address below), we can find a fixed rational linear combination
of the
such that
have a linear dependence over modulo
for all
in some dense subset
of
. But now one can pass from
to the dense subset
, delete the petal
, and add the vector
to the core space
, thus creating a partial representation with a smaller value of
, contradicting minimality, and we are done.
We remark here that whereas the finitary analogue of this result was proven using the method of infinite descent, the nonstandard version could instead be proven using the (equivalent) well-ordering principle. One could easily recast the nonstandard version in descent form also, but it is somewhat more difficult to cast the finitary argument using well-ordering due to the extra parameters and quantifiers in play.
Let us now address the measurability issues. The main problem here is that the property of having a linear dependence over the standard rationals is not an internal property, because it requires knowledge of what the standard rationals are, which is not an internal concept in the language of vector spaces. However, for each fixed choice of rational coefficients, the property of having a specific linear dependence with those selected coefficients is an internal concept (here we crucially rely on the hypothesis that the maps
were internal), so really what we have here is a sort of “
-internal” property (a countable union of internal properties). But this is good enough for many purposes. In particular, we have
Lemma 6 (Countable pigeonhole principle) Let
be a nonstandardly finite set (i.e. the ultralimit of finite sets
), and for each standard natural number
, let
be an internal subset of
. Then one of the following holds:
- (Positive density) There exists a natural number
such that
for many
(i.e.
is a dense subset of
).
- (Zero density) For almost all
, one has
for all
. (In other words, the (external) set
in is contained in a sparse subset of
.)
This lemma is sufficient to resolve all the measurability issues raised in the previous proof. It is analogous to the trivial statement in measure theory that given a countable collection of measurable subsets of a space of positive measure, either one of the measurable sets has positive measure, or else their union has measure zero (i.e. the sets fail to cover almost all of the space).
Proof: If any of the are dense, we are done. So suppose this is not the case. Since
is a definable subset of
which is not dense, it is sparse, thus
. Now it is convenient to undo the ultralimit and work in the finite sets
that
is the ultralimit of. Note that each
, being internal, is also an ultralimit of some finite subsets
of
.
For each standard integer , the set
is sparse in
, and in particular has density less than
. Thus, one can find a
-large set
such that
for all . One can arrange matters so that the
are decreasing in
. One then sets the set
to equal
, where
is the smallest integer for which
(or
is empty if
lies in all the
, or in none), and let
be the ultralimit of the
. Then we see that
for every standard
, and so
is a sparse subset of
. Furthermore,
contains
for every standard
, and so we are in the zero density conclusion of the argument.
(Curiously, I don’t see how to prove this lemma without unpacking the limit; it doesn’t seem to follow just from, say, the overspill principle. Instead, it seems to be exploiting the weak countable saturation property I mentioned in a previous post. But perhaps I missed a simple argument.)
— 4. Summary —
Let me summarise with a brief list of pros and cons of switching to a nonstandard framework. First, the pros:
- Many “first-order” parameters such as
or
disappear from view, as do various “negligible” errors. More importantly, “second-order” parameters, such as the function
appearing in Theorem 2, also disappear from view. (In principle, third-order and higher parameters would also disappear, though I do not yet know of an actual finitary argument in my fields of study which would have used such parameters (with the exception of Ramsey theory, where such parameters must come into play in order to generate such enormous quantities as Graham’s number).) As such, a lot of tedious “epsilon management” disappears.
- Iterative (and often parameter-heavy) arguments can often be replaced by minimisation (or more generally, extremisation) arguments, taking advantage of such properties as the well-ordering principle, the least upper bound axiom, or compactness.
- The transfer principle lets one use “for free” any (first-order) statement about standard mathematics in the non-standard setting (provided that all objects involved are internal; see below).
- Mature and powerful theories from infinitary mathematics (e.g. linear algebra, real analysis, representation theory, topology, functional analysis, measure theory, Lie theory, ergodic theory, model theory, etc.) can be used rigorously in a nonstandard setting (as long as one is aware of the usual infinitary pitfalls, of course; see below).
- One can formally define terms that correspond to what would otherwise only be heuristic (or heavily parameterised and quantified) concepts such as “small”, “large”, “low rank”, “independent”, “uniformly distributed”, etc.
- The conversion from a standard result to its nonstandard counterpart, or vice versa, is fairly quick (but see below), and generally only needs to be done only once or twice per paper.
Next, the cons:
- Often requires the axiom of choice, as well as a certain amount of set theory. (There are however weakened versions of nonstandard analysis that can avoid choice that are still suitable for many applications.)
- One needs the machinery of ultralimits and ultraproducts to set up the conversion from standard to nonstandard structures.
- The conversion usually proceeds by a proof by contradiction, which (in conjunction with the use of ultralimits) may not be particularly intuitive.
- One cannot efficiently discern what quantitative bounds emerge from a nonstandard argument (other than by painstakingly converting it back to a standard one, or by applying the tools of proof mining). (On the other hand, in particularly convoluted standard arguments, the quantitative bounds are already so poor – e.g. of iterated tower-exponential type – that losing these bounds is no great loss.)
- One has to take some care to distinguish between standard and nonstandard objects (and also between internal and external sets and functions, which are concepts somewhat analogous to measurable and non-measurable sets and functions in measurable theory). More generally, all the usual pitfalls of infinitary analysis (e.g. interchanging limits, the need to ensure measurability or continuity) emerge in this setting, in contrast to the finitary setting where they are usually completely trivial.
- It can be difficult at first to conceptually visualise what nonstandard objects look like (although this becomes easier once one maps nonstandard analysis concepts to heuristic concepts such as “small” and “large” as mentioned earlier, thus for instance one can think of an unbounded nonstandard natural number as being like an incredibly large standard natural number).
- It is inefficient for both nonstandard and standard arguments to coexist within a paper; this makes things a little awkward if one for instance has to cite a result from a standard mathematics paper in a nonstandard mathematics one.
- There are philosophical objections to using mathematical structures that only exist abstractly, rather than corresponding to the “real world”. (Note though that similar objections were also raised in the past with regard to the use of, say, complex numbers, non-Euclidean geometries, or even negative numbers.)
- Formally, there is no increase in logical power gained by using nonstandard analysis (at least if one accepts the axiom of choice); anything which can be proven by nonstandard methods can also be proven by standard ones. In practice, though, the length and clarity of the nonstandard proof may be substantially better than the standard one.
In view of the pros and cons, I would not say that nonstandard analysis is suitable in all situations, nor is it unsuitable in all situations, but one needs to carefully evaluate the costs and benefits in a given setting; also, in some cases having both a finitary and infinitary proof side by side for the same result may be more valuable than just having one of the two proofs. My rule of thumb is that if a finitary argument is already spitting out iterated tower-exponential type bounds or worse in an argument, this is a sign that the argument “wants” to be infinitary, and it may be simpler to move over to an infinitary setting (such as the nonstandard setting).
11 comments
Comments feed for this article
15 December, 2009 at 12:18 am
Anonymous
Professor Tao, I am appalled by the way you define things. definitions and sometimes basic concepts lack necessary rigor. The problem I believe is you don’t even know what to make of these terms, since if you did, you would be in a position to masterfully describe them in simplistic terms without the need to compensate in rigor.
If things are pseudorandom, then they are pseudorandom, there’s no need for ” ” (quotations).
15 December, 2009 at 8:44 am
Anonymous
To the first post: can you name some of the rigor-lacking definitions or concepts that Prof. Tao define?
15 December, 2009 at 9:15 am
Anonymous
In mathematics, there are vague ideas which defy quantization and definition. Describing such ideas requires a great amount of effort, in absorbing and synthesizing information. Thus, the sharing of such insights provides us a great benefit. In short, I see no problem with this (purposeful) choice of didactic structure. Moreover, such descriptions of these vague ideas demonstrate mastery of the subject material rather than ineptitude.
15 December, 2009 at 7:08 pm
Anonymous
despite the downvotes of anon #1’s comments, I admittingly start to agree with various aspects. the quality deteriorated somewhat.
16 December, 2009 at 9:43 am
Anonymous
all of you are idiots. Tao is human. Give him some slack.
16 December, 2009 at 4:21 pm
Terence Tao
My thoughts on these sorts of issues are given at
https://terrytao.wordpress.com/career-advice/there%E2%80%99s-more-to-mathematics-than-rigour-and-proofs/
(for the research aspects) and
https://terrytao.wordpress.com/career-advice/talks-are-not-the-same-as-papers/
(for the pedagogical aspects).
Generally speaking, rigorous details of various arguments given in my blog posts can often be found in the papers referenced as links inside such posts.
17 December, 2009 at 8:05 am
Anonymous
I’m appalled that some readers dared say they’re appalled at the way Prof Tao writes his blog. I’ve been reading this blog for some time now and the quality is always top-notch. But even if the quality isn’t as high as a that of published paper, it does not matter, as I’m very grateful to Prof. Tao for taking the time to write all these blogs that explain his highly technical research papers. This is a great service to the math community that very few, perhaps none, other mathematicians have sacrificed their time to do.
20 January, 2010 at 1:44 am
Approximate bases, sunflowers, and nonstandard analysis « A Dialogue on Infinity
[…] January 20, 2010 Posted by Alexandre Borovik in Uncategorized. trackback Just a link to Terry […]
30 January, 2010 at 7:08 pm
Ultralimit analysis, and quantitative algebraic geometry « What’s new
[…] analysis, ultrafilters, ultralimit analysis | by Terence Tao I have blogged a number of times in the past about the relationship between finitary (or “hard”, or “quantitative”) […]
19 March, 2010 at 9:53 pm
A computational perspective on set theory « What’s new
[…] mathematics (involving only finitely many points or numbers, etc.); I have explored this theme in a number of previous blog posts. So, one may ask: what is the finitary analogue of statements such as […]
8 December, 2015 at 3:16 pm
attack on/ of the sunflowers! | Turing Machine
[…] 21. Approximate bases, sunflowers, and nonstandard analysis | What’s new […]