You are currently browsing the tag archive for the ‘nonstandard analysis’ tag.

I have blogged several times in the past about nonstandard analysis, which among other things is useful in allowing one to import tools from infinitary (or qualitative) mathematics in order to establish results in finitary (or quantitative) mathematics. One drawback, though, to using nonstandard analysis methods is that the bounds one obtains by such methods are usually ineffective: in particular, the conclusions of a nonstandard analysis argument may involve an unspecified constant ${C}$ that is known to be finite but for which no explicit bound is obviously available. (In many cases, a bound can eventually be worked out by performing proof mining on the argument, and in particular by carefully unpacking the proofs of all the various results from infinitary mathematics that were used in the argument, as opposed to simply using them as “black boxes”, but this is a time-consuming task and the bounds that one eventually obtains tend to be quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).)

Because of this fact, it would seem that quantitative bounds, such as polynomial type bounds ${X \leq C Y^C}$ that show that one quantity ${X}$ is controlled in a polynomial fashion by another quantity ${Y}$, are not easily obtainable through the ineffective methods of nonstandard analysis. Actually, this is not the case; as I will demonstrate by an example below, nonstandard analysis can certainly yield polynomial type bounds. The catch is that the exponent ${C}$ in such bounds will be ineffective; but nevertheless such bounds are still good enough for many applications.

Let us now illustrate this by reproving a lemma from this paper of Mei-Chu Chang (Lemma 2.14, to be precise), which was recently pointed out to me by Van Vu. Chang’s paper is focused primarily on the sum-product problem, but she uses a quantitative lemma from algebraic geometry which is of independent interest. To motivate the lemma, let us first establish a qualitative version:

Lemma 1 (Qualitative solvability) Let ${P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}}$ be a finite number of polynomials in several variables with rational coefficients. If there is a complex solution ${z = (z_1,\ldots,z_d) \in {\bf C}^d}$ to the simultaneous system of equations

$\displaystyle P_1(z) = \ldots = P_r(z) = 0,$

then there also exists a solution ${z \in \overline{{\bf Q}}^d}$ whose coefficients are algebraic numbers (i.e. they lie in the algebraic closure ${{\bf Q}}$ of the rationals).

Proof: Suppose there was no solution to ${P_1(z)=\ldots=P_r(z)=0}$ over ${\overline{{\bf Q}}}$. Applying Hilbert’s nullstellensatz (which is available as ${\overline{{\bf Q}}}$ is algebraically closed), we conclude the existence of some polynomials ${Q_1,\ldots,Q_r}$ (with coefficients in ${\overline{{\bf Q}}}$) such that

$\displaystyle P_1 Q_1 + \ldots + P_r Q_r = 1$

as polynomials. In particular, we have

$\displaystyle P_1(z) Q_1(z) + \ldots + P_r(z) Q_r(z) = 1$

for all ${z \in {\bf C}^d}$. This shows that there is no solution to ${P_1(z)=\ldots=P_r(z)=0}$ over ${{\bf C}}$, as required. $\Box$

Remark 1 Observe that in the above argument, one could replace ${{\bf Q}}$ and ${{\bf C}}$ by any other pair of fields, with the latter containing the algebraic closure of the former, and still obtain the same result.

The above lemma asserts that if a system of rational equations is solvable at all, then it is solvable with some algebraic solution. But it gives no bound on the complexity of that solution in terms of the complexity of the original equation. Chang’s lemma provides such a bound. If ${H \geq 1}$ is an integer, let us say that an algebraic number has height at most ${H}$ if its minimal polynomial (after clearing denominators) consists of integers of magnitude at most ${H}$.

Lemma 2 (Quantitative solvability) Let ${P_1,\ldots,P_r: {\bf C}^d \rightarrow {\bf C}}$ be a finite number of polynomials of degree at most ${D}$ with rational coefficients, each of height at most ${H}$. If there is a complex solution ${z = (z_1,\ldots,z_d) \in {\bf C}^d}$ to the simultaneous system of equations

$\displaystyle P_1(z) = \ldots = P_r(z) = 0,$

then there also exists a solution ${z \in \overline{{\bf Q}}^d}$ whose coefficients are algebraic numbers of degree at most ${C}$ and height at most ${CH^C}$, where ${C = C_{D, d,r}}$ depends only on ${D}$, ${d}$ and ${r}$.

Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat similar, actually, to the approach I took to the nullstellensatz in my own blog post). She also notes that one could also establish the result through the machinery of Gröbner bases. In each of these arguments, it was not possible to use Lemma 1 (or the closely related nullstellensatz) as a black box; one actually had to unpack one of the proofs of that lemma or nullstellensatz to get the polynomial bound. However, using nonstandard analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant ${C}$) directly from Lemma 1 (or more precisely, the generalisation in Remark 1) without having to inspect the proof, and instead simply using it as a black box, thus providing a “soft” proof of Lemma 2 that is an alternative to the “hard” proofs mentioned above.

Here’s how the proof works. Informally, the idea is that Lemma 2 should follow from Lemma 1 after replacing the field of rationals ${{\bf Q}}$ with “the field of rationals of polynomially bounded height”. Unfortunately, the latter object does not really make sense as a field in standard analysis; nevertheless, it is a perfectly sensible object in nonstandard analysis, and this allows the above informal argument to be made rigorous.

We turn to the details. As is common whenever one uses nonstandard analysis to prove finitary results, we use a “compactness and contradiction” argument (or more precisely, an “ultralimit and contradiction” argument). Suppose for contradiction that Lemma 2 failed. Carefully negating the quantifiers (and using the axiom of choice), we conclude that there exists ${D, d, r}$ such that for each natural number ${n}$, there is a positive integer ${H^{(n)}}$ and a family ${P_1^{(n)}, \ldots, P_r^{(n)}: {\bf C}^d \rightarrow {\bf C}}$ of polynomials of degree at most ${D}$ and rational coefficients of height at most ${H^{(n)}}$, such that there exist at least one complex solution ${z^{(n)} \in {\bf C}^d}$ to

$\displaystyle P_1^{(n)}(z^{(n)}) = \ldots = P_r(z^{(n)}) = 0, \ \ \ \ \ (1)$

but such that there does not exist any such solution whose coefficients are algebraic numbers of degree at most ${n}$ and height at most ${n (H^{(n)})^n}$.

Now we take ultralimits (see e.g. this previous blog post of a quick review of ultralimit analysis, which we will assume knowledge of in the argument that follows). Let ${p \in \beta {\bf N} \backslash {\bf N}}$ be a non-principal ultrafilter. For each ${i=1,\ldots,r}$, the ultralimit

$\displaystyle P_i := \lim_{n \rightarrow p} P_i^{(n)}$

of the (standard) polynomials ${P_i^{(n)}}$ is a nonstandard polynomial ${P_i: {}^* {\bf C}^d \rightarrow {}^* {\bf C}}$ of degree at most ${D}$, whose coefficients now lie in the nonstandard rationals ${{}^* {\bf Q}}$. Actually, due to the height restriction, we can say more. Let ${H := \lim_{n \rightarrow p} H^{(n)} \in {}^* {\bf N}}$ be the ultralimit of the ${H^{(n)}}$, this is a nonstandard natural number (which will almost certainly be unbounded, but we will not need to use this). Let us say that a nonstandard integer ${a}$ is of polynomial size if we have ${|a| \leq C H^C}$ for some standard natural number ${C}$, and say that a nonstandard rational number ${a/b}$ is of polynomial height if ${a}$, ${b}$ are of polynomial size. Let ${{\bf Q}_{poly(H)}}$ be the collection of all nonstandard rationals of polynomial height. (In the language of nonstandard analysis, ${{\bf Q}_{poly(H)}}$ is an external set rather than an internal one, because it is not itself an ultraproduct of standard sets; but this will not be relevant for the argument that follows.) It is easy to see that ${{\bf Q}_{poly(H)}}$ is a field, basically because the sum or product of two integers of polynomial size, remains of polynomial size. By construction, it is clear that the coefficients of ${P_i}$ are nonstandard rationals of polynomial height, and thus ${P_1,\ldots,P_r}$ are defined over ${{\bf Q}_{poly(H)}}$.

Meanwhile, if we let ${z := \lim_{n \rightarrow p} z^{(n)} \in {}^* {\bf C}^d}$ be the ultralimit of the solutions ${z^{(n)}}$ in (1), we have

$\displaystyle P_1(z) = \ldots = P_r(z) = 0,$

thus ${P_1,\ldots,P_r}$ are solvable in ${{}^* {\bf C}}$. Applying Lemma 1 (or more precisely, the generalisation in Remark 1), we see that ${P_1,\ldots,P_r}$ are also solvable in ${\overline{{\bf Q}_{poly(H)}}}$. (Note that as ${{\bf C}}$ is algebraically closed, ${{}^*{\bf C}}$ is also (by Los’s theorem), and so ${{}^* {\bf C}}$ contains ${\overline{{\bf Q}_{poly(H)}}}$.) Thus, there exists ${w \in \overline{{\bf Q}_{poly(H)}}^d}$ with

$\displaystyle P_1(w) = \ldots = P_r(w) = 0.$

As ${\overline{{\bf Q}_{poly(H)}}^d}$ lies in ${{}^* {\bf C}^d}$, we can write ${w}$ as an ultralimit ${w = \lim_{n \rightarrow p} w^{(n)}}$ of standard complex vectors ${w^{(n)} \in {\bf C}^d}$. By construction, the coefficients of ${w}$ each obey a non-trivial polynomial equation of degree at most ${C}$ and whose coefficients are nonstandard integers of magnitude at most ${C H^C}$, for some standard natural number ${C}$. Undoing the ultralimit, we conclude that for ${n}$ sufficiently close to ${p}$, the coefficients of ${w^{(n)}}$ obey a non-trivial polynomial equation of degree at most ${C}$ whose coefficients are standard integers of magnitude at most ${C (H^{(n)})^C}$. In particular, these coefficients have height at most ${C (H^{(n)})^C}$. Also, we have

$\displaystyle P_1^{(n)}(w^{(n)}) = \ldots = P_r^{(n)}(w^{(n)}) = 0.$

But for ${n}$ larger than ${C}$, this contradicts the construction of the ${P_i^{(n)}}$, and the claim follows. (Note that as ${p}$ is non-principal, any neighbourhood of ${p}$ in ${{\bf N}}$ will contain arbitrarily large natural numbers.)

Remark 2 The same argument actually gives a slightly stronger version of Lemma 2, namely that the integer coefficients used to define the algebraic solution ${z}$ can be taken to be polynomials in the coefficients of ${P_1,\ldots,P_r}$, with degree and coefficients bounded by ${C_{D,d,r}}$.

One of the key difficulties in performing analysis in infinite-dimensional function spaces, as opposed to finite-dimensional vector spaces, is that the Bolzano-Weierstrass theorem no longer holds: a bounded sequence in an infinite-dimensional function space need not have any convergent subsequences (when viewed using the strong topology). To put it another way, the closed unit ball in an infinite-dimensional function space usually fails to be (sequentially) compact.

As compactness is such a useful property to have in analysis, various tools have been developed over the years to try to salvage some sort of substitute for the compactness property in infinite-dimensional spaces. One of these tools is concentration compactness, which was discussed previously on this blog. This can be viewed as a compromise between weak compactness (which is true in very general circumstances, but is often too weak for applications) and strong compactness (which would be very useful in applications, but is usually false), in which one obtains convergence in an intermediate sense that involves a group of symmetries acting on the function space in question.

Concentration compactness is usually stated and proved in the language of standard analysis: epsilons and deltas, limits and supremas, and so forth. In this post, I wanted to note that one could also state and prove the basic foundations of concentration compactness in the framework of nonstandard analysis, in which one now deals with infinitesimals and ultralimits instead of epsilons and ordinary limits. This is a fairly mild change of viewpoint, but I found it to be informative to view this subject from a slightly different perspective. The nonstandard proofs require a fair amount of general machinery to set up, but conversely, once all the machinery is up and running, the proofs become slightly shorter, and can exploit tools from (standard) infinitary analysis, such as orthogonal projections in Hilbert spaces, or the continuous-pure point decomposition of measures. Because of the substantial amount of setup required, nonstandard proofs tend to have significantly more net complexity than their standard counterparts when it comes to basic results (such as those presented in this post), but the gap between the two narrows when the results become more difficult, and for particularly intricate and deep results it can happen that nonstandard proofs end up being simpler overall than their standard analogues, particularly if the nonstandard proof is able to tap the power of some existing mature body of infinitary mathematics (e.g. ergodic theory, measure theory, Hilbert space theory, or topological group theory) which is difficult to directly access in the standard formulation of the argument.

Many structures in mathematics are incomplete in one or more ways. For instance, the field of rationals ${{\bf Q}}$ or the reals ${{\bf R}}$ are algebraically incomplete, because there are some non-trivial algebraic equations (such as ${x^2=2}$ in the case of the rationals, or ${x^2=-1}$ in the case of the reals) which could potentially have solutions (because they do not imply a necessarily false statement, such as ${1=0}$, just using the laws of algebra), but do not actually have solutions in the specified field.

Similarly, the rationals ${{\bf Q}}$, when viewed now as a metric space rather than as a field, are also metrically incomplete, beause there exist sequences in the rationals (e.g. the decimal approximations ${3, 3.1, 3.14, 3.141, \ldots}$ of the irrational number ${\pi}$) which could potentially converge to a limit (because they form a Cauchy sequence), but do not actually converge in the specified metric space.

A third type of incompleteness is that of logical incompleteness, which applies now to formal theories rather than to fields or metric spaces. For instance, Zermelo-Frankel-Choice (ZFC) set theory is logically incomplete, because there exist statements (such as the consistency of ZFC) which could potentially be provable by the theory (because it does not lead to a contradiction, or at least so we believe, just from the axioms and deductive rules of the theory), but is not actually provable in this theory.

A fourth type of incompleteness, which is slightly less well known than the above three, is what I will call elementary incompleteness (and which model theorists call the failure of the countable saturation property). It applies to any structure that is describable by a first-order language, such as a field, a metric space, or a universe of sets. For instance, in the language of ordered real fields, the real line ${{\bf R}}$ is elementarily incomplete, because there exists a sequence of statements (such as the statements ${0 < x < 1/n}$ for natural numbers ${n=1,2,\ldots}$) in this language which are potentially simultaneously satisfiable (in the sense that any finite number of these statements can be satisfied by some real number ${x}$) but are not actually simultaneously satisfiable in this theory.

In each of these cases, though, it is possible to start with an incomplete structure and complete it to a much larger structure to eliminate the incompleteness. For instance, starting with an arbitrary field ${k}$, one can take its algebraic completion (or algebraic closure) ${\overline{k}}$; for instance, ${{\bf C} = \overline{{\bf R}}}$ can be viewed as the algebraic completion of ${{\bf R}}$. This field is usually significantly larger than the original field ${k}$, but contains ${k}$ as a subfield, and every element of ${\overline{k}}$ can be described as the solution to some polynomial equation with coefficients in ${k}$. Furthermore, ${\overline{k}}$ is now algebraically complete (or algebraically closed): every polynomial equation in ${\overline{k}}$ which is potentially satisfiable (in the sense that it does not lead to a contradiction such as ${1=0}$ from the laws of algebra), is actually satisfiable in ${\overline{k}}$.

Similarly, starting with an arbitrary metric space ${X}$, one can take its metric completion ${\overline{X}}$; for instance, ${{\bf R} = \overline{{\bf Q}}}$ can be viewed as the metric completion of ${{\bf Q}}$. Again, the completion ${\overline{X}}$ is usually much larger than the original metric space ${X}$, but contains ${X}$ as a subspace, and every element of ${\overline{X}}$ can be described as the limit of some Cauchy sequence in ${X}$. Furthermore, ${\overline{X}}$ is now a complete metric space: every sequence in ${\overline{X}}$ which is potentially convergent (in the sense of being a Cauchy sequence), is now actually convegent in ${\overline{X}}$.

In a similar vein, we have the Gödel completeness theorem, which implies (among other things) that for any consistent first-order theory ${T}$ for a first-order language ${L}$, there exists at least one completion ${\overline{T}}$ of that theory ${T}$, which is a consistent theory in which every sentence in ${L}$ which is potentially true in ${\overline{T}}$ (because it does not lead to a contradiction in ${\overline{T}}$) is actually true in ${\overline{T}}$. Indeed, the completeness theorem provides at least one model (or structure) ${{\mathfrak U}}$ of the consistent theory ${T}$, and then the completion ${\overline{T} = \hbox{Th}({\mathfrak U})}$ can be formed by interpreting every sentence in ${L}$ using ${{\mathfrak U}}$ to determine its truth value. Note, in contrast to the previous two examples, that the completion is usually not unique in any way; a theory ${T}$ can have multiple inequivalent models ${{\mathfrak U}}$, giving rise to distinct completions of the same theory.

Finally, if one starts with an arbitrary structure ${{\mathfrak U}}$, one can form an elementary completion ${{}^* {\mathfrak U}}$ of it, which is a significantly larger structure which contains ${{\mathfrak U}}$ as a substructure, and such that every element of ${{}^* {\mathfrak U}}$ is an elementary limit of a sequence of elements in ${{\mathfrak U}}$ (I will define this term shortly). Furthermore, ${{}^* {\mathfrak U}}$ is elementarily complete; any sequence of statements that are potentially simultaneously satisfiable in ${{}^* {\mathfrak U}}$ (in the sense that any finite number of statements in this collection are simultaneously satisfiable), will actually be simultaneously satisfiable. As we shall see, one can form such an elementary completion by taking an ultrapower of the original structure ${{\mathfrak U}}$. If ${{\mathfrak U}}$ is the standard universe of all the standard objects one considers in mathematics, then its elementary completion ${{}^* {\mathfrak U}}$ is known as the nonstandard universe, and is the setting for nonstandard analysis.

As mentioned earlier, completion tends to make a space much larger and more complicated. If one algebraically completes a finite field, for instance, one necessarily obtains an infinite field as a consequence. If one metrically completes a countable metric space with no isolated points, such as ${{\bf Q}}$, then one necessarily obtains an uncountable metric space (thanks to the Baire category theorem). If one takes a logical completion of a consistent first-order theory that can model true arithmetic, then this completion is no longer describable by a recursively enumerable schema of axioms, thanks to Gödel’s incompleteness theorem. And if one takes the elementary completion of a countable structure, such as the integers ${{\bf Z}}$, then the resulting completion ${{}^* {\bf Z}}$ will necessarily be uncountable.

However, there are substantial benefits to working in the completed structure which can make it well worth the massive increase in size. For instance, by working in the algebraic completion of a field, one gains access to the full power of algebraic geometry. By working in the metric completion of a metric space, one gains access to powerful tools of real analysis, such as the Baire category theorem, the Heine-Borel theorem, and (in the case of Euclidean completions) the Bolzano-Weierstrass theorem. By working in a logically and elementarily completed theory (aka a saturated model) of a first-order theory, one gains access to the branch of model theory known as definability theory, which allows one to analyse the structure of definable sets in much the same way that algebraic geometry allows one to analyse the structure of algebraic sets. Finally, when working in an elementary completion of a structure, one gains a sequential compactness property, analogous to the Bolzano-Weierstrass theorem, which can be interpreted as the foundation for much of nonstandard analysis, as well as providing a unifying framework to describe various correspondence principles between finitary and infinitary mathematics.

In this post, I wish to expand upon these above points with regard to elementary completion, and to present nonstandard analysis as a completion of standard analysis in much the same way as, say, complex algebra is a completion of real algebra, or real metric geometry is a completion of rational metric geometry.

I have blogged a number of times in the past about the relationship between finitary (or “hard”, or “quantitative”) analysis, and infinitary (or “soft”, or “qualitative”) analysis. One way to connect the two types of analysis is via compactness arguments (and more specifically, contradiction and compactness arguments); such arguments can convert qualitative properties (such as continuity) to quantitative properties (such as bounded), basically because of the fundamental fact that continuous functions on a compact space are bounded (or the closely related fact that sequentially continuous functions on a sequentially compact space are bounded).

A key stage in any such compactness argument is the following: one has a sequence ${X_n}$ of “quantitative” or “finitary” objects or spaces, and one has to somehow end up with a “qualitative” or “infinitary” limit object ${X}$ or limit space. One common way to achieve this is to embed everything inside some universal space and then use some weak compactness property of that space, such as the Banach-Alaoglu theorem (or its sequential counterpart). This is for instance the idea behind the Furstenberg correspondence principle relating ergodic theory to combinatorics; see for instance this post of mine on this topic.

However, there is a slightly different approach, which I will call ultralimit analysis, which proceeds via the machinery of ultrafilters and ultraproducts; typically, the limit objects ${X}$ one constructs are now the ultraproducts (or ultralimits) of the original objects ${X_\alpha}$. There are two main facts that make ultralimit analysis powerful. The first is that one can take ultralimits of arbitrary sequences of objects, as opposed to more traditional tools such as metric completions, which only allow one to take limits of Cauchy sequences of objects. The second fact is Los’s theorem, which tells us that ${X}$ is an elementary limit of the ${X_\alpha}$ (i.e. every sentence in first-order logic which is true for the ${X_\alpha}$ for ${\alpha}$ large enough, is true for ${X}$). This existence of elementary limits is a manifestation of the compactness theorem in logic; see this earlier blog post for more discussion. So we see that compactness methods and ultrafilter methods are closely intertwined. (See also my earlier class notes for a related connection between ultrafilters and compactness.)

Ultralimit analysis is very closely related to nonstandard analysis. I already discussed some aspects of this relationship in an earlier post, and will expand upon it at the bottom of this post. Roughly speaking, the relationship between ultralimit analysis and nonstandard analysis is analogous to the relationship between measure theory and probability theory.

To illustrate how ultralimit analysis is actually used in practice, I will show later in this post how to take a qualitative infinitary theory – in this case, basic algebraic geometry – and apply ultralimit analysis to then deduce a quantitative version of this theory, in which the complexity of the various algebraic sets and varieties that appear as outputs are controlled uniformly by the complexity of the inputs. The point of this exercise is to show how ultralimit analysis allows for a relatively painless conversion back and forth between the quantitative and qualitative worlds, though in some cases the quantitative translation of a qualitative result (or vice versa) may be somewhat unexpected. In an upcoming paper of myself, Ben Green, and Emmanuel Breuillard (announced in the previous blog post), we will rely on ultralimit analysis to reduce the messiness of various quantitative arguments by replacing them with a qualitative setting in which the theory becomes significantly cleaner.

For sake of completeness, I also redo some earlier instances of the correspondence principle via ultralimit analysis, namely the deduction of the quantitative Gromov theorem from the qualitative one, and of Szemerédi’s theorem from the Furstenberg recurrence theorem, to illustrate how close the two techniques are to each other.

One of the most basic theorems in linear algebra is that every finite-dimensional vector space has a finite basis. Let us give a statement of this theorem in the case when the underlying field is the rationals:

Theorem 1 (Finite generation implies finite basis, infinitary version) Let ${V}$ be a vector space over the rationals ${{\mathbb Q}}$, and let ${v_1,\ldots,v_n}$ be a finite collection of vectors in ${V}$. Then there exists a collection ${w_1,\ldots,w_k}$ of vectors in ${V}$, with ${1 \leq k \leq n}$, such that

• (${w}$ generates ${v}$) Every ${v_j}$ can be expressed as a rational linear combination of the ${w_1,\ldots,w_k}$.
• (${w}$ independent) There is no non-trivial linear relation ${a_1 w_1 + \ldots + a_k w_k = 0}$, ${a_1,\ldots,a_k \in {\mathbb Q}}$ among the ${w_1,\ldots,w_m}$ (where non-trivial means that the ${a_i}$ are not all zero).

In fact, one can take ${w_1,\ldots,w_m}$ to be a subset of the ${v_1,\ldots,v_n}$.

Proof: We perform the following “rank reduction argument”. Start with ${w_1,\ldots,w_k}$ initialised to ${v_1,\ldots,v_n}$ (so initially we have ${k=n}$). Clearly ${w}$ generates ${v}$. If the ${w_i}$ are linearly independent then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the ${w_i}$, say ${w_k}$, is a rational linear combination of the ${w_1,\ldots,w_{k-1}}$. In such a case, ${w_k}$ becomes redundant, and we may delete it (reducing the rank ${k}$ by one). We repeat this procedure; it can only run for at most ${n}$ steps and so terminates with ${w_1,\ldots,w_m}$ obeying both of the desired properties. $\Box$

In additive combinatorics, one often wants to use results like this in finitary settings, such as that of a cyclic group ${{\mathbb Z}/p{\mathbb Z}}$ where ${p}$ is a large prime. Now, technically speaking, ${{\mathbb Z}/p{\mathbb Z}}$ is not a vector space over ${{\mathbb Q}}$, because one only multiply an element of ${{\mathbb Z}/p{\mathbb Z}}$ by a rational number if the denominator of that rational does not divide ${p}$. But for ${p}$ very large, ${{\mathbb Z}/p{\mathbb Z}}$ “behaves” like a vector space over ${{\mathbb Q}}$, at least if one restricts attention to the rationals of “bounded height” – where the numerator and denominator of the rationals are bounded. Thus we shall refer to elements of ${{\mathbb Z}/p{\mathbb Z}}$ as “vectors” over ${{\mathbb Q}}$, even though strictly speaking this is not quite the case.

On the other hand, saying that one element of ${{\mathbb Z}/p{\mathbb Z}}$ is a rational linear combination of another set of elements is not a very interesting statement: any non-zero element of ${{\mathbb Z}/p{\mathbb Z}}$ already generates the entire space! However, if one again restricts attention to rational linear combinations of bounded height, then things become interesting again. For instance, the vector ${1}$ can generate elements such as ${37}$ or ${\frac{p-1}{2}}$ using rational linear combinations of bounded height, but will not be able to generate such elements of ${{\mathbb Z}/p{\mathbb Z}}$ as ${\lfloor\sqrt{p}\rfloor}$ without using rational numbers of unbounded height.

For similar reasons, the notion of linear independence over the rationals doesn’t initially look very interesting over ${{\mathbb Z}/p{\mathbb Z}}$: any two non-zero elements of ${{\mathbb Z}/p{\mathbb Z}}$ are of course rationally dependent. But again, if one restricts attention to rational numbers of bounded height, then independence begins to emerge: for instance, ${1}$ and ${\lfloor\sqrt{p}\rfloor}$ are independent in this sense.

Thus, it becomes natural to ask whether there is a “quantitative” analogue of Theorem 1, with non-trivial content in the case of “vector spaces over the bounded height rationals” such as ${{\mathbb Z}/p{\mathbb Z}}$, which asserts that given any bounded collection ${v_1,\ldots,v_n}$ of elements, one can find another set ${w_1,\ldots,w_k}$ which is linearly independent “over the rationals up to some height”, such that the ${v_1,\ldots,v_n}$ can be generated by the ${w_1,\ldots,w_k}$ “over the rationals up to some height”. Of course to make this rigorous, one needs to quantify the two heights here, the one giving the independence, and the one giving the generation. In order to be useful for applications, it turns out that one often needs the former height to be much larger than the latter; exponentially larger, for instance, is not an uncommon request. Fortunately, one can accomplish this, at the cost of making the height somewhat large:

Theorem 2 (Finite generation implies finite basis, finitary version) Let ${n \geq 1}$ be an integer, and let ${F: {\mathbb N} \rightarrow {\mathbb N}}$ be a function. Let ${V}$ be an abelian group which admits a well-defined division operation by any natural number of size at most ${C(F,n)}$ for some constant ${C(F,n)}$ depending only on ${F,n}$; for instance one can take ${V = {\mathbb Z}/p{\mathbb Z}}$ for ${p}$ a prime larger than ${C(F,n)}$. Let ${v_1,\ldots,v_n}$ be a finite collection of “vectors” in ${V}$. Then there exists a collection ${w_1,\ldots,w_k}$ of vectors in ${V}$, with ${1 \leq k \leq n}$, as well an integer ${M \geq 1}$, such that

• (Complexity bound) ${M \leq C(F,n)}$ for some ${C(F,n)}$ depending only on ${F, n}$.
• (${w}$ generates ${v}$) Every ${v_j}$ can be expressed as a rational linear combination of the ${w_1,\ldots,w_k}$ of height at most ${M}$ (i.e. the numerator and denominator of the coefficients are at most ${M}$).
• (${w}$ independent) There is no non-trivial linear relation ${a_1 w_1 + \ldots + a_k w_k = 0}$ among the ${w_1,\ldots,w_k}$ in which the ${a_1,\ldots,a_k}$ are rational numbers of height at most ${F(M)}$.

In fact, one can take ${w_1,\ldots,w_k}$ to be a subset of the ${v_1,\ldots,v_n}$.

Proof: We perform the same “rank reduction argument” as before, but translated to the finitary setting. Start with ${w_1,\ldots,w_k}$ initialised to ${v_1,\ldots,v_n}$ (so initially we have ${k=n}$), and initialise ${M=1}$. Clearly ${w}$ generates ${v}$ at this height. If the ${w_i}$ are linearly independent up to rationals of height ${F(M)}$ then we are done. Otherwise, there is a non-trivial linear relation between them; after shuffling things around, we see that one of the ${w_i}$, say ${w_k}$, is a rational linear combination of the ${w_1,\ldots,w_{k-1}}$, whose height is bounded by some function depending on ${F(M)}$ and ${k}$. In such a case, ${w_k}$ becomes redundant, and we may delete it (reducing the rank ${k}$ by one), but note that in order for the remaining ${w_1,\ldots,w_{k-1}}$ to generate ${v_1,\ldots,v_n}$ we need to raise the height upper bound for the rationals involved from ${M}$ to some quantity ${M'}$ depending on ${M, F(M), k}$. We then replace ${M}$ by ${M'}$ and continue the process. We repeat this procedure; it can only run for at most ${n}$ steps and so terminates with ${w_1,\ldots,w_m}$ and ${M}$ obeying all of the desired properties. (Note that the bound on ${M}$ is quite poor, being essentially an ${n}$-fold iteration of ${F}$! Thus, for instance, if ${F}$ is exponential, then the bound on ${M}$ is tower-exponential in nature.) $\Box$

(A variant of this type of approximate basis lemma was used in my paper with Van Vu on the singularity probability of random Bernoulli matrices.)

Looking at the statements and proofs of these two theorems it is clear that the two results are in some sense the “same” result, except that the latter has been made sufficiently quantitative that it is meaningful in such finitary settings as ${{\mathbb Z}/p{\mathbb Z}}$. In this note I will show how this equivalence can be made formal using the language of non-standard analysis. This is not a particularly deep (or new) observation, but it is perhaps the simplest example I know of that illustrates how nonstandard analysis can be used to transfer a quantifier-heavy finitary statement, such as Theorem 2, into a quantifier-light infinitary statement, such as Theorem 1, thus lessening the need to perform “epsilon management” duties, such as keeping track of unspecified growth functions such as ${F}$. This type of transference is discussed at length in this previous blog post of mine.

In this particular case, the amount of effort needed to set up the nonstandard machinery in order to reduce Theorem 2 from Theorem 1 is too great for this transference to be particularly worthwhile, especially given that Theorem 2 has such a short proof. However, when performing a particularly intricate argument in additive combinatorics, in which one is performing a number of “rank reduction arguments”, “energy increment arguments”, “regularity lemmas”, “structure theorems”, and so forth, the purely finitary approach can become bogged down with all the epsilon management one needs to do to organise all the parameters that are flying around. The nonstandard approach can efficiently hide a large number of these parameters from view, and it can then become worthwhile to invest in the nonstandard framework in order to clean up the rest of a lengthy argument. Furthermore, an advantage of moving up to the infinitary setting is that one can then deploy all the firepower of an existing well-developed infinitary theory of mathematics (in this particular case, this would be the theory of linear algebra) out of the box, whereas in the finitary setting one would have to painstakingly finitise each aspect of such a theory that one wished to use (imagine for instance trying to finitise the rank-nullity theorem for rationals of bounded height).

The nonstandard approach is very closely related to use of compactness arguments, or of the technique of taking ultralimits and ultraproducts; indeed we will use an ultrafilter in order to create the nonstandard model in the first place.

I will also discuss a two variants of both Theorem 1 and Theorem 2 which have actually shown up in my research. The first is that of the regularity lemma for polynomials over finite fields, which came up when studying the equidistribution of such polynomials (in this paper with Ben Green). The second comes up when is dealing not with a single finite collection ${v_1,\ldots,v_n}$ of vectors, but rather with a family ${(v_{h,1},\ldots,v_{h,n})_{h \in H}}$ of such vectors, where ${H}$ ranges over a large set; this gives rise to what we call the sunflower lemma, and came up in this recent paper of myself, Ben Green, and Tamar Ziegler.

This post is mostly concerned with nonstandard translations of the “rank reduction argument”. Nonstandard translations of the “energy increment argument” and “density increment argument” were briefly discussed in this recent post; I may return to this topic in more detail in a future post.

In the course of the ongoing logic reading seminar at UCLA, I learned about the property of countable saturation. A model ${M}$ of a language ${L}$ is countably saturated if, every countable sequence ${P_1(x), P_2(x), \ldots}$ of formulae in ${L}$ (involving countably many constants in ${M}$) which is finitely satisfiable in ${M}$ (i.e. any finite collection ${P_1(x),\ldots,P_n(x)}$ in the sequence has a solution ${x}$ in ${M}$), is automatically satisfiable in ${M}$ (i.e. there is a solution ${x}$ to all ${P_n(x)}$ simultaneously). Equivalently, a model is countably saturated if the topology generated by the definable sets is countably compact.

Update, Nov 19: I have learned that the above terminology is not quite accurate; countable saturation allows for an uncountable sequence of formulae, as long as the constants used remain finite. So, the discussion here involves a weaker property than countable saturation, which I do not know the official term for. If one chooses a special type of ultrafilter, namely a “countably incomplete” ultrafilter, one can recover the full strength of countable saturation, though it is not needed for the remarks here. Most models are not countably saturated. Consider for instance the standard natural numbers ${{\Bbb N}}$ as a model for arithmetic. Then the sequence of formulae “${x > n}$” for ${n=1,2,3,\ldots}$ is finitely satisfiable in ${{\Bbb N}}$, but not satisfiable.

However, if one takes a model ${M}$ of ${L}$ and passes to an ultrapower ${*M}$, whose elements ${x}$ consist of sequences ${(x_n)_{n \in {\Bbb N}}}$ in ${M}$, modulo equivalence with respect to some fixed non-principal ultrafilter ${p}$, then it turns out that such models are automatically countably compact. Indeed, if ${P_1(x), P_2(x), \ldots}$ are finitely satisfiable in ${*M}$, then they are also finitely satisfiable in ${M}$ (either by inspection, or by appeal to Los’s theorem and/or the transfer principle in non-standard analysis), so for each ${n}$ there exists ${x_n \in M}$ which satisfies ${P_1,\ldots,P_n}$. Letting ${x = (x_n)_{n \in {\Bbb N}} \in *M}$ be the ultralimit of the ${x_n}$, we see that ${x}$ satisfies all of the ${P_n}$ at once.

In particular, non-standard models of mathematics, such as the non-standard model ${*{\Bbb N}}$ of the natural numbers, are automatically countably saturated.

This has some cute consequences. For instance, suppose one has a non-standard metric space ${*X}$ (an ultralimit of standard metric spaces), and suppose one has a standard sequence ${(x_n)_{n \in {\mathbb N}}}$ of elements of ${*X}$ which are standard-Cauchy, in the sense that for any standard ${\varepsilon > 0}$ one has ${d( x_n, x_m ) < \varepsilon}$ for all sufficiently large ${n,m}$. Then there exists a non-standard element ${x \in *X}$ such that ${x_n}$ standard-converges to ${x}$ in the sense that for every standard ${\varepsilon > 0}$ one has ${d(x_n, x) < \varepsilon}$ for all sufficiently large ${n}$. Indeed, from the standard-Cauchy hypothesis, one can find a standard ${\varepsilon(n) > 0}$ for each standard ${n}$ that goes to zero (in the standard sense), such that the formulae “${d(x_n,x) < \varepsilon(n)}$” are finitely satisfiable, and hence satisfiable by countable saturation. Thus we see that non-standard metric spaces are automatically “standardly complete” in some sense.

This leads to a non-standard structure theorem for Hilbert spaces, analogous to the orthogonal decomposition in Hilbert spaces:

Theorem 1 (Non-standard structure theorem for Hilbert spaces) Let ${*H}$ be a non-standard Hilbert space, let ${S}$ be a bounded (external) subset of ${*H}$, and let ${x \in H}$. Then there exists a decomposition ${x = x_S + x_{S^\perp}}$, where ${x_S \in *H}$ is “almost standard-generated by ${S}$” in the sense that for every standard ${\varepsilon > 0}$, there exists a standard finite linear combination of elements of ${S}$ which is within ${\varepsilon}$ of ${S}$, and ${x_{S^\perp} \in *H}$ is “standard-orthogonal to ${S}$” in the sense that ${\langle x_{S^\perp}, s\rangle = o(1)}$ for all ${s \in S}$.

Proof: Let ${d}$ be the infimum of all the (standard) distances from ${x}$ to a standard linear combination of elements of ${S}$, then for every standard ${n}$ one can find a standard linear combination ${x_n}$ of elements of ${S}$ which lie within ${d+1/n}$ of ${x}$. From the parallelogram law we see that ${x_n}$ is standard-Cauchy, and thus standard-converges to some limit ${x_S \in *H}$, which is then almost standard-generated by ${S}$ by construction. An application of Pythagoras then shows that ${x_{S^\perp} := x-x_S}$ is standard-orthogonal to every element of ${S}$. $\Box$

This is the non-standard analogue of a combinatorial structure theorem for Hilbert spaces (see e.g. Theorem 2.6 of my FOCS paper). There is an analogous non-standard structure theorem for ${\sigma}$-algebras (the counterpart of Theorem 3.6 from that paper) which I will not discuss here, but I will give just one sample corollary:

Theorem 2 (Non-standard arithmetic regularity lemma) Let ${*G}$ be a non-standardly finite abelian group, and let ${f: *G \rightarrow [0,1]}$ be a function. Then one can split ${f = f_{U^\perp} + f_U}$, where ${f_U: *G \rightarrow [-1,1]}$ is standard-uniform in the sense that all Fourier coefficients are (uniformly) ${o(1)}$, and ${f_{U^\perp}: *G \rightarrow [0,1]}$ is standard-almost periodic in the sense that for every standard ${\varepsilon > 0}$, one can approximate ${f_{U^\perp}}$ to error ${\varepsilon}$ in ${L^1(*G)}$ norm by a standard linear combination of characters (which is also bounded).

This can be used for instance to give a non-standard proof of Roth’s theorem (which is not much different from the “finitary ergodic” proof of Roth’s theorem, given for instance in Section 10.5 of my book with Van Vu). There is also a non-standard version of the Szemerédi regularity lemma which can be used, among other things, to prove the hypergraph removal lemma (the proof then becomes rather close to the infinitary proof of this lemma in this paper of mine). More generally, the above structure theorem can be used as a substitute for various “energy increment arguments” in the combinatorial literature, though it does not seem that there is a significant saving in complexity in doing so unless one is performing quite a large number of these arguments.

One can also cast density increment arguments in a nonstandard framework. Here is a typical example. Call a non-standard subset ${X}$ of a non-standard finite set ${Y}$ dense if one has ${|X| \geq \varepsilon |Y|}$ for some standard ${\varepsilon > 0}$.

Theorem 3 Suppose Szemerédi’s theorem (every set of integers of positive upper density contains an arithmetic progression of length ${k}$) fails for some ${k}$. Then there exists an unbounded non-standard integer ${N}$, a dense subset ${A}$ of ${[N] := \{1,\ldots,N\}}$ with no progressions of length ${k}$, and with the additional property that

$\displaystyle \frac{|A \cap P|}{|P|} \leq \frac{|A \cap [N]|}{N} + o(1)$

for any subprogression ${P}$ of ${[N]}$ of unbounded size (thus there is no sizeable density increment on any large progression).

Proof: Let ${B \subset {\Bbb N}}$ be a (standard) set of positive upper density which contains no progression of length ${k}$. Let ${\delta := \limsup_{|P| \rightarrow \infty} |B \cap P|/|P|}$ be the asymptotic maximal density of ${B}$ inside a long progression, thus ${\delta > 0}$. For any ${n > 0}$, one can then find a standard integer ${N_n \geq n}$ and a standard subset ${A_n}$ of ${[N_n]}$ of density at least ${\delta-1/n}$ such that ${A_n}$ can be embedded (after a linear transformation) inside ${B}$, so in particular ${A_n}$ has no progressions of length ${k}$. Applying the saturation property, one can then find an unbounded ${N}$ and a set ${A}$ of ${[N]}$ of density at least ${\delta-1/n}$ for every standard ${n}$ (i.e. of density at least ${\delta-o(1)}$) with no progressions of length ${k}$. By construction, we also see that for any subprogression ${P}$ of ${[N]}$ of unbounded size, ${A}$ hs density at most ${\delta+1/n}$ for any standard ${n}$, thus has density at most ${\delta+o(1)}$, and the claim follows. $\Box$

This can be used as the starting point for any density-increment based proof of Szemerédi’s theorem for a fixed ${k}$, e.g. Roth’s proof for ${k=3}$, Gowers’ proof for arbitrary ${k}$, or Szemerédi’s proof for arbitrary ${k}$. (It is likely that Szemerédi’s proof, in particular, simplifies a little bit when translated to the non-standard setting, though the savings are likely to be modest.)

I’m also hoping that the recent results of Hrushovski on the noncommutative Freiman problem require only countable saturation, as this makes it more likely that they can be translated to a non-standard setting and thence to a purely finitary framework.

Last year on this blog, I sketched out a non-rigorous probabilistic argument justifying the following well-known theorem:

Theorem 1. (Non-measurable sets exist) There exists a subset $E$ of the unit interval ${}[0,1]$ which is not  Lebesgue-measurable.

The idea was to let E be a “random” subset of ${}[0,1]$.  If one (non-rigorously) applies the law of large numbers, one expects E to have “density” 1/2 with respect to every subinterval of ${}[0,1]$, which would contradict the Lebesgue differentiation theorem.

I was recently asked whether I could in fact make the above argument rigorous. This turned out to be more difficult than I had anticipated, due to some technicalities in trying to make the concept of a random subset of ${}[0,1]$ (which requires an uncountable number of “coin flips” to generate) both rigorous and useful.  However, there is a simpler variant of the above argument which can be made rigorous.  Instead of letting E be a “random” subset of ${}[0,1]$, one takes E to be an “alternating” set that contains “every other” real number in ${}[0,1]$; this again should have density 1/2 in every subinterval and thus again contradict the Lebesgue differentiation theorem.

Of course, in the standard model of the real numbers, it makes no sense to talk about “every other” or “every second” real number, as the real numbers are not discrete.  If however one employs the language of non-standard analysis, then it is possible to make the above argument rigorous, and this is the purpose of my post today. I will assume some basic familiarity with non-standard analysis, for instance as discussed in this earlier post of mine.

This post is in some ways an antithesis of my previous postings on hard and soft analysis. In those posts, the emphasis was on taking a result in soft analysis and converting it into a hard analysis statement (making it more “quantitative” or “effective”); here we shall be focusing on the reverse procedure, in which one harnesses the power of infinitary mathematics – in particular, ultrafilters and nonstandard analysis – to facilitate the proof of finitary statements.

Arguments in hard analysis are notorious for their profusion of “epsilons and deltas”. In the more sophisticated arguments of this type, one can end up having an entire army of epsilons $\epsilon_1, \epsilon_2, \epsilon_3, \ldots$ that one needs to manage, in particular choosing each epsilon carefully to be sufficiently small compared to other parameters (including other epsilons), while of course avoiding an impossibly circular situation in which a parameter is ultimately required to be small with respect to itself, which is absurd. This art of epsilon management, once mastered, is not terribly difficult – it basically requires one to mentally keep track of which quantities are “small”, “very small”, “very very small”, and so forth – but when these arguments get particularly lengthy, then epsilon management can get rather tedious, and also has the effect of making these arguments unpleasant to read. In particular, any given assertion in hard analysis usually comes with a number of unsightly quantifiers (For every $\epsilon$ there exists an N…) which can require some thought for a reader to parse. This is in contrast with soft analysis, in which most of the quantifiers (and the epsilons) can be cleanly concealed via the deployment of some very useful terminology; consider for instance how many quantifiers and epsilons are hidden within, say, the Heine-Borel theorem: “a subset of a Euclidean space is compact if and only if it is closed and bounded”.

For those who practice hard analysis for a living (such as myself), it is natural to wonder if one can somehow “clean up” or “automate” all the epsilon management which one is required to do, and attain levels of elegance and conceptual clarity comparable to those in soft analysis, hopefully without sacrificing too much of the “elementary” or “finitary” nature of hard analysis in the process.