You are currently browsing the tag archive for the ‘nonstandard analysis’ tag.

One of the most basic theorems in linear algebra is that every finite-dimensional vector space has a finite basis. Let us give a statement of this theorem in the case when the underlying field is the rationals:

Theorem 1 (Finite generation implies finite basis, infinitary version) Let ${V}$ be a vector space over the rationals ${{\mathbb Q}}$, and let ${v_1,\ldots,v_n}$ be a finite collection of vectors in ${V}$. Then there exists a collection ${w_1,\ldots,w_k}$ of vectors in ${V}$, with ${1 \leq k \leq n}$, such that

• (${w}$ generates ${v}$) Every ${v_j}$ can be expressed as a rational linear combination of the ${w_1,\ldots,w_k}$.
• (${w}$ independent) There is no non-trivial linear relation ${a_1 w_1 + \ldots + a_k w_k = 0}$, ${a_1,\ldots,a_k \in {\mathbb Q}}$ among the ${w_1,\ldots,w_m}$ (where non-trivial means that the ${a_i}$ are not all zero).

In fact, one can take ${w_1,\ldots,w_m}$ to be a subset of the ${v_1,\ldots,v_n}$.

Proof: We perform the following “rank reduction argument”. Start with ${w_1,\ldots,w_k}$ initialised to ${v_1,\ldots,v_n}$ (so initially we have ${k=n}$). Clearly ${w}$ generates ${v}$. If the ${w_i}$ are linearly independent then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the ${w_i}$, say ${w_k}$, is a rational linear combination of the ${w_1,\ldots,w_{k-1}}$. In such a case, ${w_k}$ becomes redundant, and we may delete it (reducing the rank ${k}$ by one). We repeat this procedure; it can only run for at most ${n}$ steps and so terminates with ${w_1,\ldots,w_m}$ obeying both of the desired properties. $\Box$

In additive combinatorics, one often wants to use results like this in finitary settings, such as that of a cyclic group ${{\mathbb Z}/p{\mathbb Z}}$ where ${p}$ is a large prime. Now, technically speaking, ${{\mathbb Z}/p{\mathbb Z}}$ is not a vector space over ${{\mathbb Q}}$, because one only multiply an element of ${{\mathbb Z}/p{\mathbb Z}}$ by a rational number if the denominator of that rational does not divide ${p}$. But for ${p}$ very large, ${{\mathbb Z}/p{\mathbb Z}}$ “behaves” like a vector space over ${{\mathbb Q}}$, at least if one restricts attention to the rationals of “bounded height” – where the numerator and denominator of the rationals are bounded. Thus we shall refer to elements of ${{\mathbb Z}/p{\mathbb Z}}$ as “vectors” over ${{\mathbb Q}}$, even though strictly speaking this is not quite the case.

On the other hand, saying that one element of ${{\mathbb Z}/p{\mathbb Z}}$ is a rational linear combination of another set of elements is not a very interesting statement: any non-zero element of ${{\mathbb Z}/p{\mathbb Z}}$ already generates the entire space! However, if one again restricts attention to rational linear combinations of bounded height, then things become interesting again. For instance, the vector ${1}$ can generate elements such as ${37}$ or ${\frac{p-1}{2}}$ using rational linear combinations of bounded height, but will not be able to generate such elements of ${{\mathbb Z}/p{\mathbb Z}}$ as ${\lfloor\sqrt{p}\rfloor}$ without using rational numbers of unbounded height.

For similar reasons, the notion of linear independence over the rationals doesn’t initially look very interesting over ${{\mathbb Z}/p{\mathbb Z}}$: any two non-zero elements of ${{\mathbb Z}/p{\mathbb Z}}$ are of course rationally dependent. But again, if one restricts attention to rational numbers of bounded height, then independence begins to emerge: for instance, ${1}$ and ${\lfloor\sqrt{p}\rfloor}$ are independent in this sense.

Thus, it becomes natural to ask whether there is a “quantitative” analogue of Theorem 1, with non-trivial content in the case of “vector spaces over the bounded height rationals” such as ${{\mathbb Z}/p{\mathbb Z}}$, which asserts that given any bounded collection ${v_1,\ldots,v_n}$ of elements, one can find another set ${w_1,\ldots,w_k}$ which is linearly independent “over the rationals up to some height”, such that the ${v_1,\ldots,v_n}$ can be generated by the ${w_1,\ldots,w_k}$ “over the rationals up to some height”. Of course to make this rigorous, one needs to quantify the two heights here, the one giving the independence, and the one giving the generation. In order to be useful for applications, it turns out that one often needs the former height to be much larger than the latter; exponentially larger, for instance, is not an uncommon request. Fortunately, one can accomplish this, at the cost of making the height somewhat large:

Theorem 2 (Finite generation implies finite basis, finitary version) Let ${n \geq 1}$ be an integer, and let ${F: {\mathbb N} \rightarrow {\mathbb N}}$ be a function. Let ${V}$ be an abelian group which admits a well-defined division operation by any natural number of size at most ${C(F,n)}$ for some constant ${C(F,n)}$ depending only on ${F,n}$; for instance one can take ${V = {\mathbb Z}/p{\mathbb Z}}$ for ${p}$ a prime larger than ${C(F,n)}$. Let ${v_1,\ldots,v_n}$ be a finite collection of “vectors” in ${V}$. Then there exists a collection ${w_1,\ldots,w_k}$ of vectors in ${V}$, with ${1 \leq k \leq n}$, as well an integer ${M \geq 1}$, such that

• (Complexity bound) ${M \leq C(F,n)}$ for some ${C(F,n)}$ depending only on ${F, n}$.
• (${w}$ generates ${v}$) Every ${v_j}$ can be expressed as a rational linear combination of the ${w_1,\ldots,w_k}$ of height at most ${M}$ (i.e. the numerator and denominator of the coefficients are at most ${M}$).
• (${w}$ independent) There is no non-trivial linear relation ${a_1 w_1 + \ldots + a_k w_k = 0}$ among the ${w_1,\ldots,w_k}$ in which the ${a_1,\ldots,a_k}$ are rational numbers of height at most ${F(M)}$.

In fact, one can take ${w_1,\ldots,w_k}$ to be a subset of the ${v_1,\ldots,v_n}$.

Proof: We perform the same “rank reduction argument” as before, but translated to the finitary setting. Start with ${w_1,\ldots,w_k}$ initialised to ${v_1,\ldots,v_n}$ (so initially we have ${k=n}$), and initialise ${M=1}$. Clearly ${w}$ generates ${v}$ at this height. If the ${w_i}$ are linearly independent up to rationals of height ${F(M)}$ then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the ${w_i}$, say ${w_k}$, is a rational linear combination of the ${w_1,\ldots,w_{k-1}}$, whose height is bounded by some function depending on ${F(M)}$ and ${k}$. In such a case, ${w_k}$ becomes redundant, and we may delete it (reducing the rank ${k}$ by one), but note that in order for the remaining ${w_1,\ldots,w_{k-1}}$ to generate ${v_1,\ldots,v_n}$ we need to raise the height upper bound for the rationals involved from ${M}$ to some quantity ${M'}$ depending on ${M, F(M), k}$. We then replace ${M}$ by ${M'}$ and continue the process. We repeat this procedure; it can only run for at most ${n}$ steps and so terminates with ${w_1,\ldots,w_m}$ and ${M}$ obeying all of the desired properties. (Note that the bound on ${M}$ is quite poor, being essentially an ${n}$-fold iteration of ${F}$! Thus, for instance, if ${F}$ is exponential, then the bound on ${M}$ is tower-exponential in nature.) $\Box$

(A variant of this type of approximate basis lemma was used in my paper with Van Vu on the singularity probability of random Bernoulli matrices.)

Looking at the statements and proofs of these two theorems it is clear that the two results are in some sense the “same” result, except that the latter has been made sufficiently quantitative that it is meaningful in such finitary settings as ${{\mathbb Z}/p{\mathbb Z}}$. In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I know of that illustrates how nonstandard analysis can be used to transfer a quantifier-heavy finitary statement, such as Theorem 2, into a quantifier-light infinitary statement, such as Theorem 1, thus lessening the need to perform “epsilon management” duties, such as keeping track of unspecified growth functions such as ${F}$. This type of transference is discussed at length in this previous blog post of mine.

In this particular case, the amount of effort needed to set up the nonstandard machinery in order to reduce Theorem 2 from Theorem 1 is too great for this transference to be particularly worthwhile, especially given that Theorem 2 has such a short proof. However, when performing a particularly intricate argument in additive combinatorics, in which one is performing a number of “rank reduction arguments”, “energy increment arguments”, “regularity lemmas”, “structure theorems”, and so forth, the purely finitary approach can become bogged down with all the epsilon management one needs to do to organise all the parameters that are flying around. The nonstandard approach can efficiently hide a large number of these parameters from view, and it can then become worthwhile to invest in the nonstandard framework in order to clean up the rest of a lengthy argument. Furthermore, an advantage of moving up to the infinitary setting is that one can then deploy all the firepower of an existing well-developed infinitary theory of mathematics (in this particular case, this would be the theory of linear algebra) out of the box, whereas in the finitary setting one would have to painstakingly finitise each aspect of such a theory that one wished to use (imagine for instance trying to finitise the rank-nullity theorem for rationals of bounded height).

The nonstandard approach is very closely related to use of compactness arguments, or of the technique of taking ultralimits and ultraproducts; indeed we will use an ultrafilter in order to create the nonstandard model in the first place.

I will also discuss a two variants of both Theorem 1 and Theorem 2 which have actually shown up in my research. The first is that of the regularity lemma for polynomials over finite fields, which came up when studying the equidistribution of such polynomials (in this paper with Ben Green). The second comes up when is dealing not with a single finite collection ${v_1,\ldots,v_n}$ of vectors, but rather with a family ${(v_{h,1},\ldots,v_{h,n})_{h \in H}}$ of such vectors, where ${H}$ ranges over a large set; this gives rise to what we call the sunflower lemma, and came up in this recent paper of myself, Ben Green, and Tamar Ziegler.

This post is mostly concerned with nonstandard translations of the “rank reduction argument”. Nonstandard translations of the “energy increment argument” and “density increment argument” were briefly discussed in this recent post; I may return to this topic in more detail in a future post.

In the course of the ongoing logic reading seminar at UCLA, I learned about the property of countable saturation. A model ${M}$ of a language ${L}$ is countably saturated if, every countable sequence ${P_1(x), P_2(x), \ldots}$ of formulae in ${L}$ (involving countably many constants in ${M}$) which is finitely satisfiable in ${M}$ (i.e. any finite collection ${P_1(x),\ldots,P_n(x)}$ in the sequence has a solution ${x}$ in ${M}$), is automatically satisfiable in ${M}$ (i.e. there is a solution ${x}$ to all ${P_n(x)}$ simultaneously). Equivalently, a model is countably saturated if the topology generated by the definable sets is countably compact.

Update, Nov 19: I have learned that the above terminology is not quite accurate; countable saturation allows for an uncountable sequence of formulae, as long as the constants used remain finite. So, the discussion here involves a weaker property than countable saturation, which I do not know the official term for. If one chooses a special type of ultrafilter, namely a “countably incomplete” ultrafilter, one can recover the full strength of countable saturation, though it is not needed for the remarks here. Most models are not countably saturated. Consider for instance the standard natural numbers ${{\Bbb N}}$ as a model for arithmetic. Then the sequence of formulae “${x > n}$” for ${n=1,2,3,\ldots}$ is finitely satisfiable in ${{\Bbb N}}$, but not satisfiable.

However, if one takes a model ${M}$ of ${L}$ and passes to an ultrapower ${*M}$, whose elements ${x}$ consist of sequences ${(x_n)_{n \in {\Bbb N}}}$ in ${M}$, modulo equivalence with respect to some fixed non-principal ultrafilter ${p}$, then it turns out that such models are automatically countably compact. Indeed, if ${P_1(x), P_2(x), \ldots}$ are finitely satisfiable in ${*M}$, then they are also finitely satisfiable in ${M}$ (either by inspection, or by appeal to Los’s theorem and/or the transfer principle in non-standard analysis), so for each ${n}$ there exists ${x_n \in M}$ which satisfies ${P_1,\ldots,P_n}$. Letting ${x = (x_n)_{n \in {\Bbb N}} \in *M}$ be the ultralimit of the ${x_n}$, we see that ${x}$ satisfies all of the ${P_n}$ at once.

In particular, non-standard models of mathematics, such as the non-standard model ${*{\Bbb N}}$ of the natural numbers, are automatically countably saturated.

This has some cute consequences. For instance, suppose one has a non-standard metric space ${*X}$ (an ultralimit of standard metric spaces), and suppose one has a standard sequence ${(x_n)_{n \in {\mathbb N}}}$ of elements of ${*X}$ which are standard-Cauchy, in the sense that for any standard ${\varepsilon > 0}$ one has ${d( x_n, x_m ) < \varepsilon}$ for all sufficiently large ${n,m}$. Then there exists a non-standard element ${x \in *X}$ such that ${x_n}$ standard-converges to ${x}$ in the sense that for every standard ${\varepsilon > 0}$ one has ${d(x_n, x) < \varepsilon}$ for all sufficiently large ${n}$. Indeed, from the standard-Cauchy hypothesis, one can find a standard ${\varepsilon(n) > 0}$ for each standard ${n}$ that goes to zero (in the standard sense), such that the formulae “${d(x_n,x) < \varepsilon(n)}$” are finitely satisfiable, and hence satisfiable by countable saturation. Thus we see that non-standard metric spaces are automatically “standardly complete” in some sense.

This leads to a non-standard structure theorem for Hilbert spaces, analogous to the orthogonal decomposition in Hilbert spaces:

Theorem 1 (Non-standard structure theorem for Hilbert spaces) Let ${*H}$ be a non-standard Hilbert space, let ${S}$ be a bounded (external) subset of ${*H}$, and let ${x \in H}$. Then there exists a decomposition ${x = x_S + x_{S^\perp}}$, where ${x_S \in *H}$ is “almost standard-generated by ${S}$” in the sense that for every standard ${\varepsilon > 0}$, there exists a standard finite linear combination of elements of ${S}$ which is within ${\varepsilon}$ of ${S}$, and ${x_{S^\perp} \in *H}$ is “standard-orthogonal to ${S}$” in the sense that ${\langle x_{S^\perp}, s\rangle = o(1)}$ for all ${s \in S}$.

Proof: Let ${d}$ be the infimum of all the (standard) distances from ${x}$ to a standard linear combination of elements of ${S}$, then for every standard ${n}$ one can find a standard linear combination ${x_n}$ of elements of ${S}$ which lie within ${d+1/n}$ of ${x}$. From the parallelogram law we see that ${x_n}$ is standard-Cauchy, and thus standard-converges to some limit ${x_S \in *H}$, which is then almost standard-generated by ${S}$ by construction. An application of Pythagoras then shows that ${x_{S^\perp} := x-x_S}$ is standard-orthogonal to every element of ${S}$. $\Box$

This is the non-standard analogue of a combinatorial structure theorem for Hilbert spaces (see e.g. Theorem 2.6 of my FOCS paper). There is an analogous non-standard structure theorem for ${\sigma}$-algebras (the counterpart of Theorem 3.6 from that paper) which I will not discuss here, but I will give just one sample corollary:

Theorem 2 (Non-standard arithmetic regularity lemma) Let ${*G}$ be a non-standardly finite abelian group, and let ${f: *G \rightarrow [0,1]}$ be a function. Then one can split ${f = f_{U^\perp} + f_U}$, where ${f_U: *G \rightarrow [-1,1]}$ is standard-uniform in the sense that all Fourier coefficients are (uniformly) ${o(1)}$, and ${f_{U^\perp}: *G \rightarrow [0,1]}$ is standard-almost periodic in the sense that for every standard ${\varepsilon > 0}$, one can approximate ${f_{U^\perp}}$ to error ${\varepsilon}$ in ${L^1(*G)}$ norm by a standard linear combination of characters (which is also bounded).

This can be used for instance to give a non-standard proof of Roth’s theorem (which is not much different from the “finitary ergodic” proof of Roth’s theorem, given for instance in Section 10.5 of my book with Van Vu). There is also a non-standard version of the Szemerédi regularity lemma which can be used, among other things, to prove the hypergraph removal lemma (the proof then becomes rather close to the infinitary proof of this lemma in this paper of mine). More generally, the above structure theorem can be used as a substitute for various “energy increment arguments” in the combinatorial literature, though it does not seem that there is a significant saving in complexity in doing so unless one is performing quite a large number of these arguments.

One can also cast density increment arguments in a nonstandard framework. Here is a typical example. Call a non-standard subset ${X}$ of a non-standard finite set ${Y}$ dense if one has ${|X| \geq \varepsilon |Y|}$ for some standard ${\varepsilon > 0}$.

Theorem 3 Suppose Szemerédi’s theorem (every set of integers of positive upper density contains an arithmetic progression of length ${k}$) fails for some ${k}$. Then there exists an unbounded non-standard integer ${N}$, a dense subset ${A}$ of ${[N] := \{1,\ldots,N\}}$ with no progressions of length ${k}$, and with the additional property that

$\displaystyle \frac{|A \cap P|}{|P|} \leq \frac{|A \cap [N]|}{N} + o(1)$

for any subprogression ${P}$ of ${[N]}$ of unbounded size (thus there is no sizeable density increment on any large progression).

Proof: Let ${B \subset {\Bbb N}}$ be a (standard) set of positive upper density which contains no progression of length ${k}$. Let ${\delta := \limsup_{|P| \rightarrow \infty} |B \cap P|/|P|}$ be the asymptotic maximal density of ${B}$ inside a long progression, thus ${\delta > 0}$. For any ${n > 0}$, one can then find a standard integer ${N_n \geq n}$ and a standard subset ${A_n}$ of ${[N_n]}$ of density at least ${\delta-1/n}$ such that ${A_n}$ can be embedded (after a linear transformation) inside ${B}$, so in particular ${A_n}$ has no progressions of length ${k}$. Applying the saturation property, one can then find an unbounded ${N}$ and a set ${A}$ of ${[N]}$ of density at least ${\delta-1/n}$ for every standard ${n}$ (i.e. of density at least ${\delta-o(1)}$) with no progressions of length ${k}$. By construction, we also see that for any subprogression ${P}$ of ${[N]}$ of unbounded size, ${A}$ hs density at most ${\delta+1/n}$ for any standard ${n}$, thus has density at most ${\delta+o(1)}$, and the claim follows. $\Box$

This can be used as the starting point for any density-increment based proof of Szemerédi’s theorem for a fixed ${k}$, e.g. Roth’s proof for ${k=3}$, Gowers’ proof for arbitrary ${k}$, or Szemerédi’s proof for arbitrary ${k}$. (It is likely that Szemerédi’s proof, in particular, simplifies a little bit when translated to the non-standard setting, though the savings are likely to be modest.)

I’m also hoping that the recent results of Hrushovski on the noncommutative Freiman problem require only countable saturation, as this makes it more likely that they can be translated to a non-standard setting and thence to a purely finitary framework.

Last year on this blog, I sketched out a non-rigorous probabilistic argument justifying the following well-known theorem:

Theorem 1. (Non-measurable sets exist) There exists a subset $E$ of the unit interval ${}[0,1]$ which is not  Lebesgue-measurable.

The idea was to let E be a “random” subset of ${}[0,1]$.  If one (non-rigorously) applies the law of large numbers, one expects E to have “density” 1/2 with respect to every subinterval of ${}[0,1]$, which would contradict the Lebesgue differentiation theorem.

I was recently asked whether I could in fact make the above argument rigorous. This turned out to be more difficult than I had anticipated, due to some technicalities in trying to make the concept of a random subset of ${}[0,1]$ (which requires an uncountable number of “coin flips” to generate) both rigorous and useful.  However, there is a simpler variant of the above argument which can be made rigorous.  Instead of letting E be a “random” subset of ${}[0,1]$, one takes E to be an “alternating” set that contains “every other” real number in ${}[0,1]$; this again should have density 1/2 in every subinterval and thus again contradict the Lebesgue differentiation theorem.

Of course, in the standard model of the real numbers, it makes no sense to talk about “every other” or “every second” real number, as the real numbers are not discrete.  If however one employs the language of non-standard analysis, then it is possible to make the above argument rigorous, and this is the purpose of my post today. I will assume some basic familiarity with non-standard analysis, for instance as discussed in this earlier post of mine.

This post is in some ways an antithesis of my previous postings on hard and soft analysis. In those posts, the emphasis was on taking a result in soft analysis and converting it into a hard analysis statement (making it more “quantitative” or “effective”); here we shall be focusing on the reverse procedure, in which one harnesses the power of infinitary mathematics – in particular, ultrafilters and nonstandard analysis – to facilitate the proof of finitary statements.

Arguments in hard analysis are notorious for their profusion of “epsilons and deltas”. In the more sophisticated arguments of this type, one can end up having an entire army of epsilons $\epsilon_1, \epsilon_2, \epsilon_3, \ldots$ that one needs to manage, in particular choosing each epsilon carefully to be sufficiently small compared to other parameters (including other epsilons), while of course avoiding an impossibly circular situation in which a parameter is ultimately required to be small with respect to itself, which is absurd. This art of epsilon management, once mastered, is not terribly difficult – it basically requires one to mentally keep track of which quantities are “small”, “very small”, “very very small”, and so forth – but when these arguments get particularly lengthy, then epsilon management can get rather tedious, and also has the effect of making these arguments unpleasant to read. In particular, any given assertion in hard analysis usually comes with a number of unsightly quantifiers (For every $\epsilon$ there exists an N…) which can require some thought for a reader to parse. This is in contrast with soft analysis, in which most of the quantifiers (and the epsilons) can be cleanly concealed via the deployment of some very useful terminology; consider for instance how many quantifiers and epsilons are hidden within, say, the Heine-Borel theorem: “a subset of a Euclidean space is compact if and only if it is closed and bounded”.

For those who practice hard analysis for a living (such as myself), it is natural to wonder if one can somehow “clean up” or “automate” all the epsilon management which one is required to do, and attain levels of elegance and conceptual clarity comparable to those in soft analysis, hopefully without sacrificing too much of the “elementary” or “finitary” nature of hard analysis in the process.