Roth’s theorem on arithmetic progressions asserts that every subset of the integers of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:

Theorem 1 (Roth’s theorem)Let be a compact abelian group, with Haar probability measure , which is -divisible (i.e. the map is surjective) and let be a measurable subset of with for some . Then we havewhere denotes the bound for some depending only on .

This theorem is usually formulated in the case that is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of -divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant on , but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the -divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the shift in that case.

We can deduce Theorem 1 from the following more general Khintchine-type statement. Let denote the Pontryagin dual of a compact abelian group , that is to say the set of all continuous homomorphisms from to the (additive) unit circle . Thus is a discrete abelian group, and functions have a Fourier transform defined by

If is -divisible, then is -torsion-free in the sense that the map is injective. For any finite set and any radius , define the *Bohr set*

where denotes the distance of to the nearest integer. We refer to the cardinality of as the *rank* of the Bohr set. We record a simple volume bound on Bohr sets:

Lemma 2 (Volume packing bound)Let be a compact abelian group with Haar probability measure . For any Bohr set , we have

*Proof:* We can cover the torus by translates of the cube . Then the sets form an cover of . But all of these sets lie in a translate of , and the claim then follows from the translation invariance of .

Given any Bohr set , we define a normalised “Lipschitz” cutoff function by the formula

where is the constant such that

thus

The function should be viewed as an -normalised “tent function” cutoff to . Note from Lemma 2 that

We then have the following sharper version of Theorem 1:

Theorem 3 (Roth-Khintchine theorem)Let be a -divisible compact abelian group, with Haar probability measure , and let . Then for any measurable function , there exists a Bohr set with and such thatwhere denotes the convolution operation

A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with and equal to a small multiple of to conclude that there is a Bohr set with and such that

But from (2) we have the pointwise bound , and Theorem 1 follows.

Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set to capture all the “large Fourier coefficients” of , but such that a certain “dilate” of does not capture much more Fourier energy of than itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the considered above to achieve a similar effect.

** — 1. Proof of theorem — **

Fix . Let be a large integer (depending only on ) to be chosen later. We will need a sequence

of small parameters, with each assumed to be sufficiently small depending on all previous and on . (One should think of the as shrinking to zero incredibly fast, e.g. and for .)

We then define an increasing sequence

of finite sets of frequencies, as follows.

- (i) Initialise .
- (ii) Once has been selected for some , introduce the function
Note that this is a square-integrable function, thanks to (2).

- (iii) Define to be the set
Note from Plancherel’s theorem that this is a finite set.

- (iv) If , increment to and return to step (ii).

An easy induction using Plancherel’s theorem and (2) repeatedly gives the bounds

and

for all .

From Plancherel’s theorem we also have

As the are increasing, we thus see from the pigeonhole principle (if is sufficiently large depending on ) that there exists such that

We now decompose

where

One can think of as the “low frequency” (or “structured”, or “quasiperiodic”), “medium frequency” (or “small”), and “high frequency” (or “highly pseudorandom”) components of , where the notion of what it means for a frequency to be high or low is not determined by some preconceived notion of magnitude, but is instead adapted to the function itself. We will think of as the “main” term, and as “errors”.

Since is bounded in magnitude by , and are non-negative with total mass , we have the bounds

Now we give further bounds on . We begin with :

*Proof:* Let . From the definition of and elementary Fourier analysis we have

If , then from (5) we have

and the claim follows (since has total mass one). Now suppose instead that . Then from (4) we have

for all in the support of , and thus

and the claim again follows.

Now we control :

*Proof:* From the definition of and elementary Fourier analysis we have

We wish to bound this by . From (7) we see that the contribution of those in are acceptable. Now let us look at the contribution with . From (5) we have

for such , so this contribution to (9) is certainly acceptable thanks to (6). Now suppose that . The argument in the proof of Lemma 4 shows that and , and so this contribution is again acceptable thanks to (6).

Finally, we record the key properties of :

Lemma 6 ( continuous)is non-negative, has mean , and obeys the boundwhenever and .

*Proof:* The first two properties are obvious, so we turn to the third. Since and is bounded, it suffices to show that

whenever and . But from (1) and the triangle inequality we have

for such , and the claim follows from (2).

Now we can prove Theorem 3 with . Splitting as above, we can decompose the left-hand side of (3) into terms, each of the form

for some .

Let us consider a term for which . We can bound this term by

which is thanks to Lemma 5. Similar arguments hold for terms with or , after shifting by or .

Now let us consider a term for which , that is to say

A short Fourier-analytic calculation reveals that this integral may be written as a sum

From (2) we have

and thus by Plancherel

Similarly, from (6) we have

and

and hence by Cauchy-Schwarz and the -torsion-free nature of , one has

for all . Combining these estimates with Lemma 4, we conclude that this contribution to (3) is

Similar arguments dispose of the terms for which or .

The only remaining term is when , that is to say

From several applications of Lemma 6 we see that

for in the support of this integrand. Thus this term may be written as

which on evaluating the integral simplifies to

but by the non-negativity of and Hölder’s inequality, this is at least

and the claim follows.

## 7 comments

Comments feed for this article

28 April, 2014 at 12:45 am

mpetracheHello! I would like to ask a question: what fails to work for the estimates if one tries to get a result on lenght n>3 arithmetic progressions similar to this version of Roth-Kintchin theorem?

Also (probably this is a bit irrelevant) I guess in (5) there should be some “j” in the last term, right?

Thank you!

28 April, 2014 at 7:52 am

Terence TaoThanks for the correction!

With patterns of complexity greater than 1 (such as arithmetic progressions of length four and greater), what happens is that the Fourier pseudorandomness in Lemma 4 is no longer strong enough to control the contribution of (this is discussed extensively in Gowers’ paper on Szemeredi’s theorem for length four progressions). For k=4, it is possible to salvage this argument, but with all appearances of linear Fourier analysis arguments replaced by their quadratic Fourier analysis counterparts; this is done in my paper with Ben on the arithmetic regularity lemma that was linked to above. However, just as the k=3 argument relied on an analytic inequality, namely Holder’s inequality , the k=4 argument relies on an analytic inequality (roughly speaking, that when takes values in a compact abelian group with Haar probability measure) which does not have an analogue for k=5 and higher, and the Khintchine type theorem is in fact false in that case, as was essentially demonstrated by Ruzsa in an appendix to the paper of Bergelson, Host, and Kra linked to above. (Of course, Szemeredi’s theorem for k>=5 can be established by other means.)

28 April, 2014 at 6:31 pm

AnonymousI think the displayed equation (9) looks a little funny…

[Corrected, thanks - T.]29 April, 2014 at 6:16 am

AnonymousIs there a reason why $f_0,f_1,f_2$ are defined using double convolutions?

29 April, 2014 at 7:41 am

Terence TaoThe double comvolution obeys good bounds on the Fourier transform (i.e. Wiener algebra bounds), while only obeys good bounds on the Fourier transform. For the Fourier-analytic treatment of the terms, control of the Fourier transform becomes essential.

29 April, 2014 at 8:25 am

AnonymousI was curious about using the double convolution in the definition of the functions $f_0,f_1,f_2$.

It seems to me that $f_0$ is still sufficiently smooth with a single convolution, $f_1$ still has small $L^2$ norm, and $f_2$ still has small Fourier coefficients.

29 April, 2014 at 8:45 am

Terence TaoHmm, you’re right, the double convolution is not needed here, even though it is needed to constrain the shift. I’ve edited the text appropriately to reflect this simplification to the argument.