One of the most basic theorems in linear algebra is that every finite-dimensional vector space has a finite basis. Let us give a statement of this theorem in the case when the underlying field is the rationals:

Theorem 1 (Finite generation implies finite basis, infinitary version) Let {V} be a vector space over the rationals {{\mathbb Q}}, and let {v_1,\ldots,v_n} be a finite collection of vectors in {V}. Then there exists a collection {w_1,\ldots,w_k} of vectors in {V}, with {1 \leq k \leq n}, such that

  • ({w} generates {v}) Every {v_j} can be expressed as a rational linear combination of the {w_1,\ldots,w_k}.
  • ({w} independent) There is no non-trivial linear relation {a_1 w_1 + \ldots + a_k w_k = 0}, {a_1,\ldots,a_k \in {\mathbb Q}} among the {w_1,\ldots,w_m} (where non-trivial means that the {a_i} are not all zero).

In fact, one can take {w_1,\ldots,w_m} to be a subset of the {v_1,\ldots,v_n}.

Proof: We perform the following “rank reduction argument”. Start with {w_1,\ldots,w_k} initialised to {v_1,\ldots,v_n} (so initially we have {k=n}). Clearly {w} generates {v}. If the {w_i} are linearly independent then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the {w_i}, say {w_k}, is a rational linear combination of the {w_1,\ldots,w_{k-1}}. In such a case, {w_k} becomes redundant, and we may delete it (reducing the rank {k} by one). We repeat this procedure; it can only run for at most {n} steps and so terminates with {w_1,\ldots,w_m} obeying both of the desired properties. \Box

In additive combinatorics, one often wants to use results like this in finitary settings, such as that of a cyclic group {{\mathbb Z}/p{\mathbb Z}} where {p} is a large prime. Now, technically speaking, {{\mathbb Z}/p{\mathbb Z}} is not a vector space over {{\mathbb Q}}, because one only multiply an element of {{\mathbb Z}/p{\mathbb Z}} by a rational number if the denominator of that rational does not divide {p}. But for {p} very large, {{\mathbb Z}/p{\mathbb Z}} “behaves” like a vector space over {{\mathbb Q}}, at least if one restricts attention to the rationals of “bounded height” – where the numerator and denominator of the rationals are bounded. Thus we shall refer to elements of {{\mathbb Z}/p{\mathbb Z}} as “vectors” over {{\mathbb Q}}, even though strictly speaking this is not quite the case.

On the other hand, saying that one element of {{\mathbb Z}/p{\mathbb Z}} is a rational linear combination of another set of elements is not a very interesting statement: any non-zero element of {{\mathbb Z}/p{\mathbb Z}} already generates the entire space! However, if one again restricts attention to rational linear combinations of bounded height, then things become interesting again. For instance, the vector {1} can generate elements such as {37} or {\frac{p-1}{2}} using rational linear combinations of bounded height, but will not be able to generate such elements of {{\mathbb Z}/p{\mathbb Z}} as {\lfloor\sqrt{p}\rfloor} without using rational numbers of unbounded height.

For similar reasons, the notion of linear independence over the rationals doesn’t initially look very interesting over {{\mathbb Z}/p{\mathbb Z}}: any two non-zero elements of {{\mathbb Z}/p{\mathbb Z}} are of course rationally dependent. But again, if one restricts attention to rational numbers of bounded height, then independence begins to emerge: for instance, {1} and {\lfloor\sqrt{p}\rfloor} are independent in this sense.

Thus, it becomes natural to ask whether there is a “quantitative” analogue of Theorem 1, with non-trivial content in the case of “vector spaces over the bounded height rationals” such as {{\mathbb Z}/p{\mathbb Z}}, which asserts that given any bounded collection {v_1,\ldots,v_n} of elements, one can find another set {w_1,\ldots,w_k} which is linearly independent “over the rationals up to some height”, such that the {v_1,\ldots,v_n} can be generated by the {w_1,\ldots,w_k} “over the rationals up to some height”. Of course to make this rigorous, one needs to quantify the two heights here, the one giving the independence, and the one giving the generation. In order to be useful for applications, it turns out that one often needs the former height to be much larger than the latter; exponentially larger, for instance, is not an uncommon request. Fortunately, one can accomplish this, at the cost of making the height somewhat large:

Theorem 2 (Finite generation implies finite basis, finitary version) Let {n \geq 1} be an integer, and let {F: {\mathbb N} \rightarrow {\mathbb N}} be a function. Let {V} be an abelian group which admits a well-defined division operation by any natural number of size at most {C(F,n)} for some constant {C(F,n)} depending only on {F,n}; for instance one can take {V = {\mathbb Z}/p{\mathbb Z}} for {p} a prime larger than {C(F,n)}. Let {v_1,\ldots,v_n} be a finite collection of “vectors” in {V}. Then there exists a collection {w_1,\ldots,w_k} of vectors in {V}, with {1 \leq k \leq n}, as well an integer {M \geq 1}, such that

  • (Complexity bound) {M \leq C(F,n)} for some {C(F,n)} depending only on {F, n}.
  • ({w} generates {v}) Every {v_j} can be expressed as a rational linear combination of the {w_1,\ldots,w_k} of height at most {M} (i.e. the numerator and denominator of the coefficients are at most {M}).
  • ({w} independent) There is no non-trivial linear relation {a_1 w_1 + \ldots + a_k w_k = 0} among the {w_1,\ldots,w_k} in which the {a_1,\ldots,a_k} are rational numbers of height at most {F(M)}.

In fact, one can take {w_1,\ldots,w_k} to be a subset of the {v_1,\ldots,v_n}.

Proof: We perform the same “rank reduction argument” as before, but translated to the finitary setting. Start with {w_1,\ldots,w_k} initialised to {v_1,\ldots,v_n} (so initially we have {k=n}), and initialise {M=1}. Clearly {w} generates {v} at this height. If the {w_i} are linearly independent up to rationals of height {F(M)} then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the {w_i}, say {w_k}, is a rational linear combination of the {w_1,\ldots,w_{k-1}}, whose height is bounded by some function depending on {F(M)} and {k}. In such a case, {w_k} becomes redundant, and we may delete it (reducing the rank {k} by one), but note that in order for the remaining {w_1,\ldots,w_{k-1}} to generate {v_1,\ldots,v_n} we need to raise the height upper bound for the rationals involved from {M} to some quantity {M'} depending on {M, F(M), k}. We then replace {M} by {M'} and continue the process. We repeat this procedure; it can only run for at most {n} steps and so terminates with {w_1,\ldots,w_m} and {M} obeying all of the desired properties. (Note that the bound on {M} is quite poor, being essentially an {n}-fold iteration of {F}! Thus, for instance, if {F} is exponential, then the bound on {M} is tower-exponential in nature.) \Box

(A variant of this type of approximate basis lemma was used in my paper with Van Vu on the singularity probability of random Bernoulli matrices.)

Looking at the statements and proofs of these two theorems it is clear that the two results are in some sense the “same” result, except that the latter has been made sufficiently quantitative that it is meaningful in such finitary settings as {{\mathbb Z}/p{\mathbb Z}}. In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I know of that illustrates how nonstandard analysis can be used to transfer a quantifier-heavy finitary statement, such as Theorem 2, into a quantifier-light infinitary statement, such as Theorem 1, thus lessening the need to perform “epsilon management” duties, such as keeping track of unspecified growth functions such as {F}. This type of transference is discussed at length in this previous blog post of mine.

In this particular case, the amount of effort needed to set up the nonstandard machinery in order to reduce Theorem 2 from Theorem 1 is too great for this transference to be particularly worthwhile, especially given that Theorem 2 has such a short proof. However, when performing a particularly intricate argument in additive combinatorics, in which one is performing a number of “rank reduction arguments”, “energy increment arguments”, “regularity lemmas”, “structure theorems”, and so forth, the purely finitary approach can become bogged down with all the epsilon management one needs to do to organise all the parameters that are flying around. The nonstandard approach can efficiently hide a large number of these parameters from view, and it can then become worthwhile to invest in the nonstandard framework in order to clean up the rest of a lengthy argument. Furthermore, an advantage of moving up to the infinitary setting is that one can then deploy all the firepower of an existing well-developed infinitary theory of mathematics (in this particular case, this would be the theory of linear algebra) out of the box, whereas in the finitary setting one would have to painstakingly finitise each aspect of such a theory that one wished to use (imagine for instance trying to finitise the rank-nullity theorem for rationals of bounded height).

The nonstandard approach is very closely related to use of compactness arguments, or of the technique of taking ultralimits and ultraproducts; indeed we will use an ultrafilter in order to create the nonstandard model in the first place.

I will also discuss a two variants of both Theorem 1 and Theorem 2 which have actually shown up in my research. The first is that of the regularity lemma for polynomials over finite fields, which came up when studying the equidistribution of such polynomials (in this paper with Ben Green). The second comes up when is dealing not with a single finite collection {v_1,\ldots,v_n} of vectors, but rather with a family {(v_{h,1},\ldots,v_{h,n})_{h \in H}} of such vectors, where {H} ranges over a large set; this gives rise to what we call the sunflower lemma, and came up in this recent paper of myself, Ben Green, and Tamar Ziegler.

This post is mostly concerned with nonstandard translations of the “rank reduction argument”. Nonstandard translations of the “energy increment argument” and “density increment argument” were briefly discussed in this recent post; I may return to this topic in more detail in a future post.

— 1. Equivalence of Theorems 1 and 2 —

Both Theorem 1 and Theorem 2 are easy enough to prove. But we will now spend a certain amount of effort in showing that one can deduce each theorem from the other without actually going through the proof of either. This may not seem particularly worthwhile (or to be serious overkill) in the case of these two particular theorems, but the method of deduction is extremely general, and can be used to relate much more deep and difficult infinitary and finitary theorems to each other without a significant increase in effort. (This is closely related to various correspondence principles between combinatorics and parts of infinitary mathematics, such as ergodic theory; see also my post on soft and hard analysis for a closely related equivalence.)

Let’s first show why the finitary theorem, Theorem 2, implies Theorem 1. We argue by contradiction. If Theorem 1 failed, then we could find a vector space {V} over the rationals, and a finite collection {v_1,\ldots,v_n} of vectors, for which no finite subcollection {w_1,\ldots,w_k} of the {v_1,\ldots,v_n} obeyed both the generation property and the linear independence property. In other words, whenever a subcollection {w_1,\ldots,w_k} happened to generate {v_1,\ldots,v_n} by rationals, then it must necessarily contain a linear dependence.

We use this to create a function {F: {\mathbb N} \rightarrow {\mathbb N}} as follows. Given any natural number {M}, consider all the finite subcollections {w_1,\ldots,w_k} of {v_1,\ldots,v_n} which can generate the {v_1,\ldots,v_n} using rationals of height at most {M}. By the above hypothesis, all such subcollections contain a linear dependence involving rationals of some finite height. There may be many such dependences; we pick one arbitrarily. We then choose {F(M)} to be any natural number larger than the heights of all the rationals involved in all the linear dependencies thus chosen. (Here we implicitly use the fact that there are only finitely many subcollections of the {v_1,\ldots,v_n} to search through.)

Having chosen this function {F}, we then apply Theorem 2 to the vectors {v_1,\ldots,v_n} and this choice of function {F}, to obtain a subcollection {w_1,\ldots,w_k} which generate the {v_1,\ldots,v_n} using rationals of height at most {M}, and have no linear dependence involving rationals of height at most {F(M)}. But this contradicts the construction of {F}, and gives the claim.

Remark 1 Note how important it is here that the growth function {F} in Theorem 2 is not specified in advance, but is instead a parameter that can be set to be as “large” as needed. Indeed, for Theorem 2 for any fixed {F} (e.g. exponential, tower-exponential, Ackermann, etc.) gives a statement which is strictly “weaker” than Theorem 1 in a sense that I will not try to make precise here; it is only the union of all these statements for all conceivable {F} that gives the full strength of Theorem 1. A similar phenomenon occurs with the finite convergence principle. It is this “second order” nature of infinitary statements (they quantify not just over numerical parameters such as {N} or {\epsilon}, but also over functional parameters such as {F}) that make such statements appear deeper than finitary ones, but the distinction largely disappears if one is willing to perform such second-order quantifications.

Now we turn to the more interesting deduction, which is to obtain Theorem 2 from Theorem 1. Again, one argues by contradiction. Suppose that Theorem 2 failed. Carefully negating all the quantifiers (and using the axiom of choice), we conclude that there exists a function {F: {\mathbb N} \rightarrow {\mathbb N}} and a natural number {n} with the following property: given any natural number {K}, there exists an abelian group {V_K} which is divisible up to height {K}, and elements {v_{1,K},\ldots,v_{n,K}} in {V_K} such that there is no subcollection {w_{1,K},\ldots,w_{k,K}} of the {v_{1,K},\ldots,v_{n,K}}, together with an integer {M \leq K}, such that {w_{1,K},\ldots,w_{k,K}} generate {v_{1,K},\ldots,v_{n,K}} using rationals of height at most {M}, and such that the {w_{1,K},\ldots,w_{k,K}} have no linear dependence using rationals of height at most {F(M)}.

We now perform an ultralimit as {K \rightarrow \infty}. We will not pause here to recall the machinery of ultrafilters, ultralimits, and ultraproducts, but refer the reader instead to this previous blog post of mine for discussion.

We pick a non-principal ultrafilter {p} of the natural numbers. Starting with the “standard” abelian groups {V_K}, we then form their ultraproduct {V = \prod_K V_K / p}, defined as the space of sequences {v = (v_K)_{K \in {\mathbb N}}} with {v_K \in V_K} for each {K}, modulo equivalence by {p}; thus two sequences {v = (v_K)_{K \in {\mathbb N}}} and {v' = (v'_K)_{K \in {\mathbb N}}} are considered equal if {v_K = v'_K} for a {p}-large set of {K} (i.e. for a set of {K} that lies in {p}).

Now that non-standard objects are in play, we will need to take some care to distinguish between standard objects (e.g. standard natural numbers) and their nonstandard counterparts.

Since each of the {V_K} are an abelian group, {V} is also an abelian group (an easy special case of the transfer principle). Since each {V_K} is divisible up to height {K}, {V} is divisible up to all (standard) heights; in other words, {V} is actually a vector space over the (standard) rational numbers {{\mathbb Q}}. The point is that while none of the {V_K} are, strictly speaking, vector spaces over {{\mathbb Q}}, they increasingly behave as if they were such spaces, and in the limit one recovers genuine vector space structure.

For each {1 \leq i \leq n}, one can take an ultralimit of the elements {v_{i,K} \in V_K} to generate an element {v_i := (v_{i,K})_{K \in {\mathbb N}}} of the ultraproduct {V}. So now we have {n} vectors {v_1,\ldots,v_n} of a vector space {V} over {{\mathbb Q}} – precisely the setting of Theorem 1! So we apply that theorem and obtain a subcollection {w_1,\ldots,w_k \in V} of the {v_1,\ldots,v_n}, such that each {v_i} can be generated from the {w_1,\ldots,w_k} using (standard) rationals, and such that the {w_1,\ldots,w_k} are linearly independent over the (standard) rationals.

Since all (standard) rationals have a finite height, one can find a (standard) natural number {M} such that each of the {v_i} can be generated from the {w_1,\ldots,w_k} using (standard) rationals of height at most {M}. Undoing the ultralimit, we conclude that for a {p}-large set of {K}‘s, all of the {v_{i,K}} can be generated from the {w_{1,K},\ldots,w_{k,K}} using rationals of height at most {M}. But by hypothesis, this implies for all sufficiently large {K} in this {p}-large set, the {w_{1,K},\ldots,w_{k,K}} contain a non-trivial rational dependence of height at most {F(M)}, thus

\displaystyle  \frac{a_{1,K}}{q_{1,K}} w_{1,K} + \ldots + \frac{a_{k,K}}{q_{k,K}} w_{k,K} = 0

for some integers {a_{i,K},q_{i,K}} of magnitude at most {F(M)}, with the {a_{k,K}} not all zero.

By the pigeonhole principle (and the finiteness of {F(M)}), each of the {a_{i,K},q_{i,K}} is constant in {K} on a {p}-large set of {K}. So if we take an ultralimit again to go back to the nonstandard world, the quantities {a_i := (a_{i,K})_{K \in {\mathbb N}}}, {q_i := (q_{i,K})_{K \in {\mathbb N}}} are standard integers (rather than merely nonstandard integers). Thus we have

\displaystyle  \frac{a_{1}}{q_{1}} w_{1} + \ldots + \frac{a_{k}}{q_{k}} w_{k} = 0

with the {a_i} not all zero, i.e. we have a linear dependence amongst the {w_1,\ldots,w_k}. But this contradicts Theorem 1.

— 2. Polynomials over finite fields —

Let {{\Bbb F}} a fixed finite field (e.g. the field {{\Bbb F}_2} of two elements), and consider a high-dimensional finite vector space {V} over {{\Bbb F}}. A polynomial {P: {\Bbb F}^n \rightarrow {\Bbb F}} of degree {\leq d} can then be defined as a combination of monomials each of degree at most {d}, or alternatively as a function whose {d+1^{th}} derivative vanishes; see this previous blog post for some discussion of this equivalence.

We define the rank {rank_{\leq d-1}(P)} of a degree {\leq d} polynomial {P} to be the least number {k} of degree {\leq d-1} polynomials {Q_1,\ldots,Q_k}, such that {P} is completely determined by {Q_1,\ldots,Q_k}, i.e. {P = f( Q_1,\ldots,Q_k )} for some function {f: {\Bbb F}^k \rightarrow {\Bbb F}}. In the case when {P} has degree {\leq 2}, this concept is very close to the familiar rank of a quadratic form or matrix.

A generalisation of the notion of linear independence is that of linear independence modulo low rank. Let us call a collection {P_1,\ldots,P_n} of degree {\leq d} polynomials {M}-linearly independent if every non-trivial linear combination {a_1 P_1 + \ldots + a_n P_n} with {a_1,\ldots,a_n \in {\Bbb F}} not all zero, has rank at least {M}:

\displaystyle  rank_{\leq d-1}(a_1 P_1 + \ldots + a_n P_n) \leq M.

There is then the following analogue of Theorem 2:

Theorem 3 (Polynomial regularity lemma at one degree, finitary version) Let {n,d \geq 1} be integers, let {{\Bbb F}} be a finite field and let {F: {\mathbb N} \rightarrow {\mathbb N}} be a function. Let {V} be a vector space over {{\Bbb F}}, and let {P_1,\ldots,P_n: V \rightarrow F} be polynomials of degree {\leq d}. Then there exists a collection {Q_1,\ldots,Q_k: V \rightarrow F} of polynomials of degree {\leq d}, with {1 \leq k \leq n}, as well an integer {M \geq 1}, such that

  • (Complexity bound) {M \leq C(F,n,d,{\Bbb F})} for some {C(F,n,d,{\Bbb F})} depending only on {F, n, d, {\Bbb F}}.
  • ({Q} generates {P}) Every {P_j} can be expressed as a {{\Bbb F}}-linear combination of the {Q_1,\ldots,Q_k}, plus an error {E} which has rank {rank_{\leq d-1}(E)} at most {M}.
  • ({P} independent) There is no non-trivial linear relation {a_1 Q_1 + \ldots + a_k Q_k = E} among the {w_1,\ldots,w_m} in which {E} has rank {rank_{\leq d-1}(E)} at most {F(M)}.

In fact, one can take {Q_1,\ldots,Q_k} to be a subset of the {P_1,\ldots,P_n}.

This theorem can be proven in much the same way as Theorem 2, and the reader is invited to do so as an exercise. The constant {C(F,n,d,{\Bbb F})} can in fact be taken to be independent of {d} and {{\Bbb F}}, but this is not important to us here.

Roughly speaking, Theorem 3 asserts that a finite family of degree {\leq d} polynomials can be expressed as a linear combination of degree {\leq d} polynomials which are “linearly independent modulo low rank errors”, plus some lower rank objects. One can think of this as regularising the degree {\leq d} polynomials, modulo combinations of lower degree polynomials. For applications (and in particular, for understanding the equidistribution) one also needs to regularise the degree {\leq d-1} polynomials that arise this way, and so forth for increasingly lower degrees until all polynomials are regularised. (A similar phenomenon occurs for the hypergraph regularity lemma.)

When working with theorems like this, it is helpful to think conceptually of “quotienting out” by all polynomials of low rank. Unfortunately, in the finitary setting, the polynomials of low rank do not form a group, and so the quotient is ill-defined. However, this can be rectified by passing to the infinitary setting. Indeed, once one does so, one can quotient out the low rank polynomials, and Theorem 3 follows directly from Theorem 1 (or more precisely, the analogue of that theorem in which the field of rationals {{\mathbb Q}} is replaced by the finite field {{\Bbb F}}).

Let’s see how this works. To prove Theorem 3, suppose for contradiction that the theorem failed. Then one can find {F, n, d, {\Bbb F}}, such that for every natural {K}, one can find a vector space {V_K} and polynomials {P_{1,K},\ldots,P_{n,K}: V_K \rightarrow {\Bbb F}} of degree {\leq d}, for which there do not exist polynomials {Q_{1,K},\ldots,Q_{k,K}} with {k \leq n} and an integer {M \leq K} such that each {P_{j,K}} can be expressed as a linear combination of the {Q_{i,K}} modulo an error of rank at most {M}, and such that there are no nontrivial linear relations amongst the {Q_{i,K}} modulo errors of rank at most {F(M)}.

Taking an ultralimit as before, we end up with a (nonstandard) vector space {V} over {{\Bbb F}} (which is likely to be infinite), and (nonstandard) polynomials {P_1,\ldots,P_n: V \rightarrow {\Bbb F}} of degree {\leq d} (here it is best to use the “local” definition of a polynomial of degree {\leq d}, as a (nonstandard) function whose {d+1^{th}} derivative, but one can also view this as a (nonstandard) sum of monomials if one is careful).

The space {Poly_{\leq d}( V )} of (nonstandard) degree {\leq d} polynomials on {V} is a (nonstandard) vector space over {{\Bbb F}}. Inside this vector space, one has the subspace {Lowrank_{\leq d}(V)} consisting of all polynomials {P \in Poly_{\leq d}(V)} whose rank {rank_{\leq d-1}(V)} is a standard integer (as opposed to a nonstandard integer); call these the bounded rank polynomials. This is easily seen to be a subspace of {Poly_{\leq d}(V)} (although it is not a nonstandard or internal subspace, i.e. the ultralimit of subspaces of the {Poly_{\leq d}( V_K )}). As such, one can rigorously form the quotient space {Poly_{\leq d}(V) / Lowrank_{\leq d}(V)} of degree {\leq d} polynomials, modulo bounded rank {\leq d} polynomials.

The polynomials {P_1,\ldots,P_n} then have representatives {P_1,\ldots,P_n \hbox{ mod } Lowrank_{\leq d}(V)} in this quotient space. Applying Theorem 1 (for the field {{\Bbb F}}), one can then find a subcollection {Q_1,\ldots,Q_k \hbox{ mod } Lowrank_{\leq d}(V)} which are linearly independent in this space, which generate {P_1,\ldots,P_n}. Undoing the quotient, we see that the {P_1,\ldots,P_n} are linear combinations of the {Q_1,\ldots,Q_k} plus a bounded rank error, while no nontrivial linear combination of {Q_1,\ldots,Q_k} has bounded rank. Undoing the ultralimit as in the previous section, we obtain the desired contradiction.

We thus see that in the nonstandard world, the somewhat nonrigorous concepts of “low rank” and “high rank” can be formalised as that of “bounded rank” and “unbounded rank”. Furthermore, the former space forms a subspace, so in the nonstandard world one can rigorously talk about “quotienting out by bounded rank errors”. Thus we see that the algebraic machinery of quotient spaces can be applied in the nonstandard world directly, whereas in the finitary world it can only be applied heuristically. In principle, one could also start deploying more advanced tools of abstract algebra (e.g. exact sequences, cohomology, etc.) in the nonstandard setting, although this has not yet seriously begun to happen in additive combinatorics (although there are strong hints of some sort of “additive cohomology” emerging in the body of work surrounding the inverse conjecture for the Gowers norm, especially on the ergodic theory side of things).

— 3. Sunflowers —

Now we return to vector spaces (or approximate vector spaces) {V} over the rationals, such as {V = {\mathbb Z}/p{\mathbb Z}} for a large prime {p}. Instead of working with a single (small) tuple {v_1,\ldots,v_n} of vectors in {V}, we now consider a family {(v_{1,h},\ldots,v_{n,h})_{h \in H}} of such vectors in {V}, where {H} ranges over a large set, for instance a dense subset of the interval {X := [-N,N] = \{-N,\ldots,N\}} for some large {N}. This situation happens to show up in our recent work on the inverse conjecture for the Gowers norm, where the {v_{1,h},\ldots,v_{n,h}} represent the various “frequencies” that arise in a derivative {\Delta_h f} of a function {f} with respect to the shift {h}. (This need to consider families is an issue that also comes up in the finite field ergodic theory analogue of the inverse conjectures, due to the unbounded number of generators in that case, but interestingly does not arise in the ergodic theory over {{\mathbb Z}}.)

In Theorem 2, the main distinction was between linear dependence and linear independence of the tuple {v_{1},\ldots,v_{n}} (or some reduction of this tuple, such as {w_1,\ldots,w_k}). We will continue to be interested in the linear dependence or independence of the tuples {v_{1,h},\ldots,v_{n,h}} for various {h}. But we also wish to understand how the {v_{i,h}} vary with {h} as well. At one extreme (the “structured” case), there is no dependence on {h}: {v_{i,h} = v_i} for all {i} and all {h}. At the other extreme (the “pseudorandom” case), the {v_{i,h}} are basically independent as {h} varies; in particular, for (almost) all of the pairs {h, h' \in H}, the tuples {v_{1,h},\ldots,v_{n,h}} and {v_{1,h'},\ldots,v_{n,h'}} are not just separately independent, but are jointly independent. One can think of {v_{1,h},\ldots,v_{n,h}} and {v_{1,h'},\ldots,v_{n,h'}} as being in “general position” relative to each other.

The sunflower lemma asserts that any family {(v_{1,h},\ldots,v_{n,h})_{h \in H}} is basically a combination of the above scenarios, thus one can divide the family into a linearly independent core collection of vectors {(w_1,\ldots,w_m)} that do not depend on {h}, together with petals {(v'_{1,h},\ldots,v'_{k,h})_{h \in H'}}, which are in “general position” in the above sense, relative to the core. However, as a price one pays for this, one has to refine {H} to a dense subset {H'} of {H}. This lemma, which significantly generalises Theorem 2, is formalised as follows:

Theorem 4 (Sunflower lemma, finitary version) Let {n \geq 1} be an integer, and let {F: {\mathbb N} \rightarrow {\mathbb N}} be a function. Let {V} be an abelian group which admits a well-defined division operation by any natural number of size at most {C(F,n)} for some constant {C(F,n)} depending only on {F,n}. Let {H} be a finite set, and let {(v_{1,h},\ldots,v_{n,h})_{h \in H}} be a collection of {n}-tuples of vectors in {V} indexed by {H}. Then there exists a subset {H'} of {H}, integers {k,m \geq 0} with {m+k \leq n}, a collection {w_1,\ldots,w_m} of “core” vectors in {V} for some {m}, a collection of “petal” vectors {(v'_{1,h},\ldots,v'_{k,h})_{h \in H'}} for each {h \in H'}, as well an integer {M \geq 1}, such that

  • (Complexity bound) {M \leq C(F,n)} for some {C(F,n)} depending only on {F, n}.
  • ({H'} dense) one has {|H'| \geq c(F,n) |H|} for some {c(F,n) > 0} depending only on {F, n}.
  • ({w, v'} generates {v}) Every {v_{j,h}} with {1 \leq j \leq n} and {h \in H'} can be expressed as a rational linear combination of the {w_1,\ldots,w_m} and {v'_{1,h},\ldots,v'_{k,h}} of height at most {M}.
  • ({w} independent) There is no non-trivial rational linear relation among the {w_1,\ldots,w_m} of height at most {F(M)}.
  • ({v'} in general position relative to {w}) More generally, for {1-\frac{1}{F(M)}} of the pairs {(h,h') \in H' \times H'}, there is no non-trivial linear relation among {w_1,\ldots,w_m, v'_{1,h},\ldots,v'_{k,h},v'_{1,h'},\ldots,v'_{k,h'}} of height at most {F(M)}.

One can take the {v'_{1,h},\ldots,v'_{k,h}} to be a subcollection of the {v_{1,h},\ldots,v_{n,h}}, though this is not particularly useful in applications.

Proof: We perform a two-parameter “rank reduction argument”, where the rank is indexed by the pair {(k,m)} (ordered lexicographically). We initially set {m=0}, {k=n}, {H'=H}, {M=1}, and {v'_{i,h} = v_{i,h}} for {h \in H}.

At each stage of the iteration, {w, v'} will generate {v} (at height {M}), and we will have some complexity bound on {M, m} and some density bound on {H'}. So one needs to check the independence of {w} and the general position of {v'} relative to {w}.

If there is a linear relation of {w} at height {F(M)}, then one can use this to reduce the size {m} of the core by one, leaving the petal size {k} unchanged, just as in the proof of Theorem 2. So let us move on, and suppose that there is no linear relation of {w} at height {F(M)}, but instead there is a failure of the general position hypothesis. In other words, for at least {|H'|^2 / F(M)} pairs {(h,h') \in H' \times H'}, one can find a relation of the form

\displaystyle  a_{1,h,h'} w_1 + \ldots + a_{m,h,h'} w_m + b_{1,h,h'} v'_{1,h} + \ldots + b_{k,h,h'} v'_{k,h} + c_{1,h,h'} v'_{1,h'} + \ldots + c_{k,h,h'} v'_{k,h'} = 0

where the {a_{i,h,h'},b_{i,h,h'},c_{i,h,h'}} are rationals of height at most {F(M)}, not all zero. The number of possible values for such rationals is bounded by some quantity depending on {m,k,F(M)}. Thus, by the pigeonhole principle, we can find {\gg_{F(M),m,k} |H'|^2} pairs (i.e. at least {c(F(M),m,k) |H'|^2} pairs for some {c(F(M),m,k)>0} depending only on {F(M),m,k}) such that

\displaystyle  a_{1} w_1 + \ldots + a_{m} w_m + b_{1} v'_{1,h} + \ldots + b_{k} v'_{k,h} + c_{1} v'_{1,h'} + \ldots + c_{k} v'_{k,h'} = 0

for some fixed rationals {a_i,b_i,c_i} of height at most {F(M)}. By the pigeonhole principle again, we can then find a fixed {h_0 \in H'} such that

\displaystyle  a_{1} w_1 + \ldots + a_{m} w_m + b_{1} v'_{1,h} + \ldots + b_{k} v'_{k,h} = u_{h_0}

for all {h} in some subset {H''} of {H'} with {|H''| \gg_{F(M),m,k} |H'|}, where

\displaystyle  u_{h_0} := - c_{1} v'_{1,h_0} - \ldots - c_{k} v'_{k,h_0}.

If the {b_i} and {c_i} all vanished then we would have a linear dependence amongst the core vectors, which we already know how to deal with. So suppose that we have at least one active petal coefficient, say {b_k}. Then upon rearranging, we can express {v'_{k,h}} as some rational linear combination of the original core vectors {w_1,\ldots,w_m}, a new core vector {u_{h_0}}, and the other petals {v'_{1,h},\ldots,v'_{k-1,h}}, with heights bounded by {\ll_{F(M),k,m} 1}. We may thus refine {H'} to {H''}, delete the petal vector {v'_{k,h}}, and add the vector {u} to the core, thus decreasing {k} by one and increasing {m} by one. One still has the generation property so long as one replaces {M} with a larger {M'} depending on {M, F(M), k, m}.

Since each iteration of this process either reduces {m} by one keeping {k} fixed, or reduces {k} by one increasing {m}, we see that after at most {2n} steps, the process must terminate, when we have both the linear independence of the {w} property and the general position of the {v'} property. (Note here that we are basically performing a proof by infinite descent.) At that stage, one easily verifies that we have obtained all the required conclusions of the theorem. \Box

As one can see, this result is a little bit trickier to prove than Theorem 2. Let us now see how it will translate to the nonstandard setting, and see what the nonstandard analogue of Theorem 5 is. We will skip some details, and get to the point where we can motivate and prove this nonstandard analogue; this analogue does in fact imply Theorem 5 by repeating the arguments from previous sections, but we will leave this as an exercise for the interested reader.

As before, the starting point is to introduce a parameter {K}, so that the approximate vector space {V_K} now depends on {K} (and becomes an actual vector space in the ultralimit {V}), and the parameter set {H_K} now also depends on {K}. We will think of {|H_K|} as going to infinity as {K \rightarrow \infty}, as this is the most interesting case (for bounded {H_K}, the result basically collapses back to Theorem 2). In that case, the ultralimit {H} of the {H_K} is a nonstandard finite set (i.e. an ultralimit of finite sets) whose (nonstandard) cardinality {|H|} is an unbounded nonstandard integer: it is a nonstandard integer (indeed, it is the ultralimit of the {|H_K|}) which is larger than any standard integer. On the other hand, {n} and {F} remain standard (i.e. they do not involve {K}).

For each {K}, one starts with a family {(v_{1,h,K},\ldots,v_{n,h,K})_{h \in H_K}} of {n}-tuples of vectors in {V_K}. Taking ultralimits, one ends up with a family {(v_{1,h},\ldots,v_{n,h})_{h \in H}} of {n}-tuples of vectors in {V}. Furthermore, for each {1 \leq i \leq n}, the maps {h \mapsto v_{i,h}} are nonstandard (or internal) functions from {H} to {V}, i.e. they are ultralimits of maps from {H_K} to {V_K}. The internal nature of these maps (which is a kind of “measurability” condition on these functions) will be important later. Of course, {H} and {V} are also internal (being ultralimits of {H_K} and {V_K} respectively).

We say that a subset {H'} of {H} is dense if it is an internal subset (i.e. it is the ultralimit of some subsets {H'_K} of {H_K}), and if {|H'| \geq \epsilon |H|} for some standard {\epsilon > 0} (recall that {|H'|, |H|} are nonstandard integers). If an internal subset is not dense, we say that it is sparse, which in nonstandard asymptotic notation is equivalent to {|H'|=o(|H|)}. If a statement {P(h)} holds on all {h} in dense set of {H}, we say that it holds for many {h}; if it holds for all {h} outside of a sparse set, we say it holds for almost all {h}. These are analogous to the more familiar concepts of “holding with positive probability” and “holding almost surely” in probability theory. For instance, if {P(h)} holds for many {h} in {H}, and {Q(h)} holds for almost all {h} in {H}, then {P(h)} and {Q(h)} jointly hold for many {h} in {H}. Note how all the epsilons have been neatly hidden away in this nonstandard framework.

Now we state the nonstandard analogue of Theorem 5.

Theorem 5 (Sunflower lemma, nonstandard version) Let {n \geq 1} be a (standard) integer, let {V} be a (nonstandard) vector space over the standard rationals {{\mathbb Q}}, and let {H} be a (nonstandard) set. Let {(v_{1,h},\ldots,v_{n,h})_{h \in H}} be a collection of {n}-tuples of vectors in {V} indexed by {H}, such that all the maps {h \mapsto v_{i,h}} for {1 \leq i \leq n} are internal. Then there exists a dense subset {H'} of {H}, a bounded-dimensional subspace {W} of {V}, a (standard) integer {k \geq 0} with {\dim(W) + k \leq n}, and a collection of “petal” vectors {(v'_{1,h},\ldots,v'_{k,h})_{h \in H'}} for each {h \in H'}, with the maps {h \mapsto v'_{i,h}} being internal for all {1 \leq i \leq k}, such that

  • ({W, v'} generates {v}) Every {v_{j,h}} with {1 \leq j \leq n} and {h \in H'} lies in the span of {W} and the {v'_{1,h},\ldots,v'_{k,h}}.
  • ({v'} in general position relative to {W}) For almost all of the pairs {(h,h') \in H' \times H'}, the vectors {v'_{1,h},\ldots,v'_{k,h},v'_{1,h'},\ldots,v'_{k,h'}} are linearly independent modulo {W} over {{\mathbb Q}}.

Of course, using Theorem 1 one could obtain a basis {w_1,\ldots,w_m} for {W} with {m = \dim(W)}, at which point the theorem more closely resembles Theorem 5.

Proof: Define a partial representation of the family {(v_{1,h},\ldots,v_{n,h})} to be a dense subset {H'} of {H}, a bounded dimensional space {W}, a standard integer {k} with {\dim(W)+k \leq n}, and a collection of {(v'_{1,h},\ldots,v'_{k,h})_{h \in H'}} depending internally on {h} that obeys the generation property (but not necessarily the general position property). Clearly we have at least one partial representation, namely the trivial one where {W} is empty, {k=n}, {H' := H}, and {v'_{i,h} := v_{i,h}}. Now, among all such partial representations, let us take a representation with the minimal value of {k}. (Here we are of course using the well-ordering property of the standard natural numbers.) We claim that this representation enjoys the general position property, which will give the claim.

Indeed, suppose this was not the case. Then, for many pairs {(h,h') \in H' \times H'}, the vectors {v'_{1,h},\ldots,v'_{k,h},v'_{1,h'},\ldots,v'_{k,h'}} have a linear dependence modulo {W} over {{\mathbb Q}}. (Actually, there is a technical “measurability” issue to address here, which I will return to later.) By symmetry and pigeonholing, we may assume that the {v'_{k,h}} coefficient of (say) of this dependence is non-zero. (Again, there is a measurability issue here.) Applying the pigeonhole principle, one can find {h_0 \in H'} such that

\displaystyle  v'_{1,h},\ldots,v'_{k,h},v'_{1,h_0},\ldots,v'_{k,h_0}

have a linear dependence over {{\mathbb Q}} modulo {W} for many {h}. (Again, there is a measurability issue here.)

Fix {h_0}. The number of possible linear combinations of {v'_{1,h_0},\ldots,v'_{k,h_0}} is countable. Because of this (and using a “countable pigeonhole principle”) that I will address below), we can find a fixed rational linear combination {u_{h_0}} of the {v'_{1,h_0},\ldots,v'_{k,h_0}} such that

\displaystyle  v'_{1,h},\ldots,v'_{k,h}, u_{h_0}

have a linear dependence over {{\mathbb Q}} modulo {W} for all {h} in some dense subset {H''} of {H'}. But now one can pass from {H'} to the dense subset {H''}, delete the petal {v'_{k,h}}, and add the vector {u_{h_0}} to the core space {W}, thus creating a partial representation with a smaller value of {k}, contradicting minimality, and we are done. \Box

We remark here that whereas the finitary analogue of this result was proven using the method of infinite descent, the nonstandard version could instead be proven using the (equivalent) well-ordering principle. One could easily recast the nonstandard version in descent form also, but it is somewhat more difficult to cast the finitary argument using well-ordering due to the extra parameters and quantifiers in play.

Let us now address the measurability issues. The main problem here is that the property of having a linear dependence over the standard rationals {{\mathbb Q}} is not an internal property, because it requires knowledge of what the standard rationals are, which is not an internal concept in the language of vector spaces. However, for each fixed choice of rational coefficients, the property of having a specific linear dependence with those selected coefficients is an internal concept (here we crucially rely on the hypothesis that the maps {h \mapsto v_{i,h}} were internal), so really what we have here is a sort of “{\sigma}-internal” property (a countable union of internal properties). But this is good enough for many purposes. In particular, we have

Lemma 6 (Countable pigeonhole principle) Let {H} be a nonstandardly finite set (i.e. the ultralimit of finite sets {H_K}), and for each standard natural number {n}, let {E_n} be an internal subset of {H}. Then one of the following holds:

  • (Positive density) There exists a natural number {n} such that {h \in E_n} for many {h \in H} (i.e. {E_n} is a dense subset of {H}).
  • (Zero density) For almost all {h \in H}, one has {h \not \in E_n} for all {n}. (In other words, the (external) set {\bigcup_{n \in {\mathbb N}} E_n} in is contained in a sparse subset of {H}.)

This lemma is sufficient to resolve all the measurability issues raised in the previous proof. It is analogous to the trivial statement in measure theory that given a countable collection of measurable subsets of a space of positive measure, either one of the measurable sets has positive measure, or else their union has measure zero (i.e. the sets fail to cover almost all of the space).

Proof: If any of the {E_n} are dense, we are done. So suppose this is not the case. Since {E_n} is a definable subset of {H} which is not dense, it is sparse, thus {|E_n| = o(|H|)}. Now it is convenient to undo the ultralimit and work in the finite sets {H_K} that {H} is the ultralimit of. Note that each {E_n}, being internal, is also an ultralimit of some finite subsets {E_{n,K}} of {H_K}.

For each standard integer {M > 0}, the set {E_1 \cup \ldots \cup E_M} is sparse in {H}, and in particular has density less than {1/M}. Thus, one can find a {p}-large set {S_M \subset {\mathbb N}} such that

\displaystyle  |E_{1,K} \cup \ldots \cup E_{M,K}| \leq |H_K|/M

for all {K \in S_M}. One can arrange matters so that the {S_M} are decreasing in {M}. One then sets the set {E_K} to equal {E_{1,K} \cup \ldots \cup E_{M,K}}, where {M} is the smallest integer for which {K \in S_M} (or {E_K} is empty if {K} lies in all the {S_M}, or in none), and let {E} be the ultralimit of the {E_K}. Then we see that {|E| \leq |H|/M} for every standard {M}, and so {E} is a sparse subset of {H}. Furthermore, {E} contains {E_M} for every standard {M}, and so we are in the zero density conclusion of the argument. \Box

(Curiously, I don’t see how to prove this lemma without unpacking the limit; it doesn’t seem to follow just from, say, the overspill principle. Instead, it seems to be exploiting the weak countable saturation property I mentioned in a previous post. But perhaps I missed a simple argument.)

— 4. Summary —

Let me summarise with a brief list of pros and cons of switching to a nonstandard framework. First, the pros:

  • Many “first-order” parameters such as {\epsilon} or {N} disappear from view, as do various “negligible” errors. More importantly, “second-order” parameters, such as the function {F} appearing in Theorem 2, also disappear from view. (In principle, third-order and higher parameters would also disappear, though I do not yet know of an actual finitary argument in my fields of study which would have used such parameters (with the exception of Ramsey theory, where such parameters must come into play in order to generate such enormous quantities as Graham’s number).) As such, a lot of tedious “epsilon management” disappears.
  • Iterative (and often parameter-heavy) arguments can often be replaced by minimisation (or more generally, extremisation) arguments, taking advantage of such properties as the well-ordering principle, the least upper bound axiom, or compactness.
  • The transfer principle lets one use “for free” any (first-order) statement about standard mathematics in the non-standard setting (provided that all objects involved are internal; see below).
  • Mature and powerful theories from infinitary mathematics (e.g. linear algebra, real analysis, representation theory, topology, functional analysis, measure theory, Lie theory, ergodic theory, model theory, etc.) can be used rigorously in a nonstandard setting (as long as one is aware of the usual infinitary pitfalls, of course; see below).
  • One can formally define terms that correspond to what would otherwise only be heuristic (or heavily parameterised and quantified) concepts such as “small”, “large”, “low rank”, “independent”, “uniformly distributed”, etc.
  • The conversion from a standard result to its nonstandard counterpart, or vice versa, is fairly quick (but see below), and generally only needs to be done only once or twice per paper.

Next, the cons:

  • Often requires the axiom of choice, as well as a certain amount of set theory. (There are however weakened versions of nonstandard analysis that can avoid choice that are still suitable for many applications.)
  • One needs the machinery of ultralimits and ultraproducts to set up the conversion from standard to nonstandard structures.
  • The conversion usually proceeds by a proof by contradiction, which (in conjunction with the use of ultralimits) may not be particularly intuitive.
  • One cannot efficiently discern what quantitative bounds emerge from a nonstandard argument (other than by painstakingly converting it back to a standard one, or by applying the tools of proof mining). (On the other hand, in particularly convoluted standard arguments, the quantitative bounds are already so poor – e.g. of iterated tower-exponential type – that losing these bounds is no great loss.)
  • One has to take some care to distinguish between standard and nonstandard objects (and also between internal and external sets and functions, which are concepts somewhat analogous to measurable and non-measurable sets and functions in measurable theory). More generally, all the usual pitfalls of infinitary analysis (e.g. interchanging limits, the need to ensure measurability or continuity) emerge in this setting, in contrast to the finitary setting where they are usually completely trivial.
  • It can be difficult at first to conceptually visualise what nonstandard objects look like (although this becomes easier once one maps nonstandard analysis concepts to heuristic concepts such as “small” and “large” as mentioned earlier, thus for instance one can think of an unbounded nonstandard natural number as being like an incredibly large standard natural number).
  • It is inefficient for both nonstandard and standard arguments to coexist within a paper; this makes things a little awkward if one for instance has to cite a result from a standard mathematics paper in a nonstandard mathematics one.
  • There are philosophical objections to using mathematical structures that only exist abstractly, rather than corresponding to the “real world”. (Note though that similar objections were also raised in the past with regard to the use of, say, complex numbers, non-Euclidean geometries, or even negative numbers.)
  • Formally, there is no increase in logical power gained by using nonstandard analysis (at least if one accepts the axiom of choice); anything which can be proven by nonstandard methods can also be proven by standard ones. In practice, though, the length and clarity of the nonstandard proof may be substantially better than the standard one.

In view of the pros and cons, I would not say that nonstandard analysis is suitable in all situations, nor is it unsuitable in all situations, but one needs to carefully evaluate the costs and benefits in a given setting; also, in some cases having both a finitary and infinitary proof side by side for the same result may be more valuable than just having one of the two proofs. My rule of thumb is that if a finitary argument is already spitting out iterated tower-exponential type bounds or worse in an argument, this is a sign that the argument “wants” to be infinitary, and it may be simpler to move over to an infinitary setting (such as the nonstandard setting).