Roth’s theorem on arithmetic progressions asserts that every subset of the integers of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:
Theorem 1 (Roth’s theorem) Let be a compact abelian group, with Haar probability measure , which is -divisible (i.e. the map is surjective) and let be a measurable subset of with for some . Then we have
where denotes the bound for some depending only on .
This theorem is usually formulated in the case that is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of -divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant on , but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the -divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the shift in that case.
We can deduce Theorem 1 from the following more general Khintchine-type statement. Let denote the Pontryagin dual of a compact abelian group , that is to say the set of all continuous homomorphisms from to the (additive) unit circle . Thus is a discrete abelian group, and functions have a Fourier transform defined by
If is -divisible, then is -torsion-free in the sense that the map is injective. For any finite set and any radius , define the Bohr set
where denotes the distance of to the nearest integer. We refer to the cardinality of as the rank of the Bohr set. We record a simple volume bound on Bohr sets:
Proof: We can cover the torus by translates of the cube . Then the sets form an cover of . But all of these sets lie in a translate of , and the claim then follows from the translation invariance of .
where is the constant such that
The function should be viewed as an -normalised “tent function” cutoff to . Note from Lemma 2 that
We then have the following sharper version of Theorem 1:
where denotes the convolution operation
A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with and equal to a small multiple of to conclude that there is a Bohr set with and such that
Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set to capture all the “large Fourier coefficients” of , but such that a certain “dilate” of does not capture much more Fourier energy of than itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the considered above to achieve a similar effect.
— 1. Proof of theorem —
Fix . Let be a large integer (depending only on ) to be chosen later. We will need a sequence
of small parameters, with each assumed to be sufficiently small depending on all previous and on . (One should think of the as shrinking to zero incredibly fast, e.g. and for .)
We then define an increasing sequence
of finite sets of frequencies, as follows.
- (i) Initialise .
- (ii) Once has been selected for some , introduce the function
Note that this is a square-integrable function, thanks to (2).
- (iii) Define to be the set
Note from Plancherel’s theorem that this is a finite set.
- (iv) If , increment to and return to step (ii).
An easy induction using Plancherel’s theorem and (2) repeatedly gives the bounds
for all .
We now decompose
One can think of as the “low frequency” (or “structured”, or “quasiperiodic”), “medium frequency” (or “small”), and “high frequency” (or “highly pseudorandom”) components of , where the notion of what it means for a frequency to be high or low is not determined by some preconceived notion of magnitude, but is instead adapted to the function itself. We will think of as the “main” term, and as “errors”.
Now we give further bounds on . We begin with :
Proof: Let . From the definition of and elementary Fourier analysis we have
If , then from (5) we have
and the claim follows (since has total mass one). Now suppose instead that . Then from (4) we have
for all in the support of , and thus
and the claim again follows.
Now we control :
for such , so this contribution to (9) is certainly acceptable thanks to (6). Now suppose that . The argument in the proof of Lemma 4 shows that and , and so this contribution is again acceptable thanks to (6).
Finally, we record the key properties of :
whenever and .
Proof: The first two properties are obvious, so we turn to the third. Since and is bounded, it suffices to show that
whenever and . But from (1) and the triangle inequality we have
for such , and the claim follows from (2).
for some .
Let us consider a term for which . We can bound this term by
which is thanks to Lemma 5. Similar arguments hold for terms with or , after shifting by or .
Now let us consider a term for which , that is to say
A short Fourier-analytic calculation reveals that this integral may be written as a sum
From (2) we have
and thus by Plancherel
Similarly, from (6) we have
and hence by Cauchy-Schwarz and the -torsion-free nature of , one has
Similar arguments dispose of the terms for which or .
The only remaining term is when , that is to say
From several applications of Lemma 6 we see that
for in the support of this integrand. Thus this term may be written as
which on evaluating the integral simplifies to
but by the non-negativity of and Hölder’s inequality, this is at least
and the claim follows.