Szemerédi’s theorem asserts that any subset of the integers of positive upper density contains arbitrarily large arithmetic progressions. Here is an equivalent quantitative form of this theorem:
for some depending only on .
The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases as they are trivial and somewhat degenerate.
There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.
Here of course the averaging notation is interpreted internally.
Indeed, if Theorem 1 failed, one could create a sequence of functions of density at least for some fixed , and a fixed such that
taking ultralimits one can then soon obtain a counterexample to Theorem 2.
It remains to prove Theorem 2. Henceforth is a fixed nonstandard positive integer, and . By the Loeb measure construction (discussed in this previous blog post), one can give the structure of a probability space (the Loeb space of ), such that every internal subset of is (Loeb) measurable with
which implies that any bounded internal function has standard part which is (Loeb) measurable with
Conversely, a countable saturation argument shows that any function in is equal almost everywhere to the standard part of a bounded internal function.
From Hölder’s inequality we see that the -linear form
vanishes if one of the has standard part vanishing almost everywhere. As such, we can (by abuse of notation) extend this -linear form to functions that are elements of , rather than bounded internal functions. With this convention, we see that Theorem 2 is equivalent to the following assertion.
The next step is to introduce the Gowers-Host-Kra uniformity seminorms , defined for by the formula
where is any bounded internal function whose standard part equals almost everywhere. From Hölder’s inequality one can see that the exact choice of does not matter, so that this seminorm is well-defined. (It is indeed a seminorm, but we will not need this fact here.)
We have the following application of the van der Corput inequality:
Theorem 4 (Generalised von Neumann theorem) Let be standard. For any with for some , one has
This estimate is proven in numerous places in the literature (e.g. Lemma 11.4 of my book with Van Vu, or Exercise 23 of this blog post) and will not be repeated here. In particular, from multilinearity we see that
whenever with .
Dual to the Gowers norms are the uniformly almost periodic norms . Let us first define the internal version of these norms. We define to be the space of constant internal functions , with internal norm . Once is defined for some , we define to be the internal normed vector space of internal functions for which there exists a nonstandard real number , an internally finite non-empty set , an internal family of internal functions bounded in magnitude by one for each , and an internal family of internal functions in the unit ball of such that one had the representation
for all , where is the shift of by . The internal infimum of all such is then the norm of . This gives each of the the structure of an internal shift-invariant Banach algebra; see Section 5 of . The norms also controlled the supremum norm:
In particular, if we write for the space of standard parts of internal functions of bounded norm in , then is an (external) Banach algebra contained (as a real vector space) in . For , we can then define a factor of to be the probability space , where is the subalgebra of consisting of those sets such that lies in the closure of . This is easily seen to be a shift-invariant -algebra, and so is a factor.
We have the following key characteristic factor relationship:
Theorem 5 Let with . Then .
In fact the converse implication is true also (making the universal characteristic factor for the seminorm), but we will not need this direction of the implication.
Proof: Suppose for contradiction that ; we can normalise . Writing for some bounded internal , we then see that has a non-zero inner product with , where the dual function for is the bounded internal function
From the easily verified identity
and a routine induction, we see that lies in the unit ball of , and so is measurable with respect to . By hypothesis this implies that is orthogonal to , a contradiction.
We only apply this theorem in the case and , but for inductive purposes it is convenient to decouple the two parameters.
We prove Theorem 6 by induction on (allowing to be arbitrary). When , the claim is obvious for any because all functions in are essentially constant. Now suppose that and that the claim has already been proven for .
Let be a nonnegative function whose mean is positive; we may normalise to take values in . Let be standard, and let be a sufficiently small standard quantity depending on to be chosen later (one could for instance take , but we will not attempt to optimise in ). As is -measurable, one can find an internal function with and bounded norm such that . (Note though that while the norm of is bounded, this bound could be extremely large compared to , , .)
Set . We define the relative inner product for by the formula
and the relative norm
A crucial point is that the function is relatively almost periodic over the previous characteristic factor , in the following sense.
Proposition 7 (Relative almost periodicity) There exists a standard natural number and functions in the unit ball of with the following “relative total boundedness” property: for any , there exists a -measurable function such that almost everywhere (where is short-hand for ).
Proof: This will be a relative version of the standard analysis fact that integral operators on finite measure spaces with bounded kernel are in the Hilbert-Schmidt class, and thus compact.
By construction, there exists an internally finite non-empty set , an internal collection of internal functions that are uniformly bounded in , and an internal collection of internal functions that are uniformly bounded in , such that
for all . Note in particular that the all lie in a bounded subset of , and the all lie in a bounded subset of .
We give the -algebra generated from the standard parts of bounded internal functions such that the standard parts of all lie in a bounded subset of ; this gives a probability space that extends the product measure of and . We define an operator as follows. If , then is the standard part of some bounded internal function . We then define by the formula
This can easily be seen to not depend on the choice of , and defines a -linear operator (embedding into both and in the obvious fashion). Note that lies in the range of applied to a function in the unit ball of .
for all finite collections of functions that are relatively orthonormal over in the sense that
for all and . Indeed, the left-hand side of (4) may be expanded first as
for some sequence in with , and then as
where we use Loeb measure on and is the function , and are lifted up to in the obvious fashion. By Cauchy-Schwarz and the boundedness of , we can bound this by
But the are relatively orthonormal over (this reflects the relative orthogonality of and over ), so that
and the claim follows from the hypotheses on .
Using the relative spectral theorem for relative Hilbert-Schmidt operators (see Corollary 17 of this blog post), we may thus find relatively orthonormal systems in and respectively over and a non-increasing sequence of non-negative coefficients (the relative singular values) with almost everywhere, such that we have the spectral decomposition
wiht the sum converging in . (If were standard Borel spaces, one could deduce this theorem from the usual spectral theorem for Hilbert-Schmidt operators using disintegration. Loeb spaces are certainly not standard Borel, but as discussed in the linked blog post above, one can adapt the proof of the spectral theorem to the relative setting without using the device of disintegration.
Since and the are decreasing, one can find an such that almost everywhere for all . For in the unit ball of , this lets one approximate by the finite rank operator to within almost everywhere in norm. If one rounds to the nearest multiple of for each , and lets be the collection of linear combinations of the form with a multiple of , we obtain the claim.
We return to the proof of (2). Since and , we have
if is small enough. In particular there is a -measurable set of measure at least such that on . Since
on , where we write
almost everywhere on .
Let be a sufficiently large standard natural number (depending on and the quantity from Proposition 7), in fact it will essentially be a van der Waerden number of these inputs) to be chosen later. Applying the induction hypothesis, we have
In particular, there is a standard , such that for in a subset of of measure at least , we have
or equivalently that the set
has measure at least .
Let be as above, and let be the functions from Proposition 7. Then for , we can find a measurable function such that
almost everywhere on , hence by (5) we have
almost everywhere on . From this and the relative Hölder inequality, we see that
a.e. on whenever .
Now, for large enough, we see from van der Warden’s theorem that there exist measurable such that
almost everywhere in , and hence in (this can be seen by partitioning into finitely many pieces, with each of the constant on each of these pieces). For that choice of we have
almost everywhere on . But from (6) one has
a.e. on , so from Hölder’s inequality we have (for sufficiently small) that
From non-negativity of , this implies that
which on integrating in gives
Averaging in , we conclude that
Shifting by , we conclude that
Dilating by (and noting that the map is at most -to-one on ), we conclude that
and (2) follows.