You are currently browsing the tag archive for the ‘Roth’s theorem’ tag.

Szemerédi’s theorem asserts that all subsets of the natural numbers of positive density contain arbitrarily long arithmetic progressions.  Roth’s theorem is the special case when one considers arithmetic progressions of length three.  Both theorems have many important proofs using tools from additive combinatorics, (higher order) Fourier analysis, (hyper) graph regularity theory, and ergodic theory.  However, the original proof by Endre Szemerédi, while extremely intricate, was purely combinatorial (and in particular “elementary”) and almost entirely self-contained, except for an invocation of the van der Waerden theorem.  It is also notable for introducing a prototype of what is now known as the Szemerédi regularity lemma.

Back in 2005, I rewrote Szemerédi’s original proof in order to understand it better, however my rewrite ended up being about the same length as the original argument and was probably only usable to myself.  In 2012, after Szemerédi was awarded the Abel prize, I revisited this argument with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing.  In that post, I suspected that the cleanest way to write up the argument would be through the language of nonstandard analysis (perhaps in an iterated hyperextension that could handle various hierarchies of infinitesimals), but was unable to actually achieve any substantial simplifications by passing to the nonstandard world.

A few weeks ago, I participated in a week-long workshop at the American Institute of Mathematics on “Nonstandard methods in combinatorial number theory”, and spent some time in a working group with Shabnam Akhtari, Irfam Alam, Renling Jin, Steven Leth, Karl Mahlburg, Paul Potgieter, and Henry Towsner to try to obtain a manageable nonstandard version of Szemerédi’s original proof.  We didn’t end up being able to do so – in fact there are now signs that perhaps nonstandard analysis is not the optimal framework in which to place this argument – but we did at least clarify the existing standard argument, to the point that I was able to go back to my original rewrite of the proof and present it in a more civilised form, which I am now uploading here as an unpublished preprint.   There are now a number of simplifications to the proof.  Firstly, one no longer needs the full strength of the regularity lemma; only the simpler “weak” regularity lemma of Frieze and Kannan is required.  Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving just (families of) one-dimensional arithmetic progressions rather than the complicated-looking multidimensional arithmetic progressions that occur so frequently in the original argument of Szemerédi.  Finally, the delicate manipulations of densities and epsilons via double counting arguments in Szemerédi’s original paper have been abstracted into a certain key property of families of arithmetic progressions that I call the “double counting property”.

The factoring mentioned above is particularly simple in the case of proving Roth’s theorem, which is now presented separately in the above writeup.  Roth’s theorem seeks to locate a length three progression {(P(1),P(2),P(3)) = (a, a+r, a+2r)} in which all three elements lie in a single set.  This will be deduced from an easier variant of the theorem in which one locates (a family of) length three progressions in which just the first two elements {P(1), P(2)} of the progression lie in a good set (and some other properties of the family are also required).  This is in turn derived from an even easier variant in which now just the first element of the progression is required to be in the good set.

More specifically, Roth’s theorem is now deduced from

Theorem 1.5.  Let {L} be a natural number, and let {S} be a set of integers of upper density at least {1-1/10L}.  Then, whenever {S} is partitioned into finitely many colour classes, there exists a colour class {A} and a family {(P_l(1),P_l(2),P_l(3))_{l=1}^L} of 3-term arithmetic progressions with the following properties:

  1. For each {l}, {P_l(1)} and {P_l(2)} lie in {A}.
  2. For each {l}, {P_l(3)} lie in {S}.
  3. The {P_l(3)} for {l=1,\dots,L} are in arithmetic progression.

The situation in this theorem is depicted by the following diagram, in which elements of A are in blue and elements of S are in grey:

Theorem 1.5 is deduced in turn from the following easier variant:

Theorem 1.6.  Let {L} be a natural number, and let {S} be a set of integers of upper density at least {1-1/10L}.  Then, whenever {S} is partitioned into finitely many colour classes, there exists a colour class {A} and a family {(P_l(1),P_l(2),P_l(3))_{l=1}^L} of 3-term arithmetic progressions with the following properties:

  1. For each {l}, {P_l(1)} lie in {A}.
  2. For each {l}, {P_l(2)} and {P_l(3)} lie in {S}.
  3. The {P_l(2)} for {l=1,\dots,L} are in arithmetic progression.

The situation here is described by the figure below.

Theorem 1.6 is easy to prove.  To derive Theorem 1.5 from Theorem 1.6, or to derive Roth’s theorem from Theorem 1.5, one uses double counting arguments, van der Waerden’s theorem, and the weak regularity lemma, largely as described in this previous blog post; see the writeup for the full details.  (I would be interested in seeing a shorter proof of Theorem 1.5 though that did not go through these arguments, and did not use the more powerful theorems of  Roth or Szemerédi.)


Roth’s theorem on arithmetic progressions asserts that every subset of the integers {{\bf Z}} of positive upper density contains infinitely many arithmetic progressions of length three. There are many versions and variants of this theorem. Here is one of them:

Theorem 1 (Roth’s theorem) Let {G = (G,+)} be a compact abelian group, with Haar probability measure {\mu}, which is {2}-divisible (i.e. the map {x \mapsto 2x} is surjective) and let {A} be a measurable subset of {G} with {\mu(A) \geq \alpha} for some {0 < \alpha < 1}. Then we have

\displaystyle  \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r)\ d\mu(x) d\mu(r) \gg_\alpha 1,

where {X \gg_\alpha Y} denotes the bound {X \geq c_\alpha Y} for some {c_\alpha > 0} depending only on {\alpha}.

This theorem is usually formulated in the case that {G} is a finite abelian group of odd order (in which case the result is essentially due to Meshulam) or more specifically a cyclic group {G = {\bf Z}/N{\bf Z}} of odd order (in which case it is essentially due to Varnavides), but is also valid for the more general setting of {2}-divisible compact abelian groups, as we shall shortly see. One can be more precise about the dependence of the implied constant {c_\alpha} on {\alpha}, but to keep the exposition simple we will work at the qualitative level here, without trying at all to get good quantitative bounds. The theorem is also true without the {2}-divisibility hypothesis, but the proof we will discuss runs into some technical issues due to the degeneracy of the {2r} shift in that case.

We can deduce Theorem 1 from the following more general Khintchine-type statement. Let {\hat G} denote the Pontryagin dual of a compact abelian group {G}, that is to say the set of all continuous homomorphisms {\xi: x \mapsto \xi \cdot x} from {G} to the (additive) unit circle {{\bf R}/{\bf Z}}. Thus {\hat G} is a discrete abelian group, and functions {f \in L^2(G)} have a Fourier transform {\hat f \in \ell^2(\hat G)} defined by

\displaystyle  \hat f(\xi) := \int_G f(x) e^{-2\pi i \xi \cdot x}\ d\mu(x).

If {G} is {2}-divisible, then {\hat G} is {2}-torsion-free in the sense that the map {\xi \mapsto 2 \xi} is injective. For any finite set {S \subset \hat G} and any radius {\rho>0}, define the Bohr set

\displaystyle  B(S,\rho) := \{ x \in G: \sup_{\xi \in S} \| \xi \cdot x \|_{{\bf R}/{\bf Z}} < \rho \}

where {\|\theta\|_{{\bf R}/{\bf Z}}} denotes the distance of {\theta} to the nearest integer. We refer to the cardinality {|S|} of {S} as the rank of the Bohr set. We record a simple volume bound on Bohr sets:

Lemma 2 (Volume packing bound) Let {G} be a compact abelian group with Haar probability measure {\mu}. For any Bohr set {B(S,\rho)}, we have

\displaystyle  \mu( B( S, \rho ) ) \gg_{|S|, \rho} 1.

Proof: We can cover the torus {({\bf R}/{\bf Z})^S} by {O_{|S|,\rho}(1)} translates {\theta+Q} of the cube {Q := \{ (\theta_\xi)_{\xi \in S} \in ({\bf R}/{\bf Z})^S: \sup_{\xi \in S} \|\theta_\xi\|_{{\bf R}/{\bf Z}} < \rho/2 \}}. Then the sets {\{ x \in G: (\xi \cdot x)_{\xi \in S} \in \theta + Q \}} form an cover of {G}. But all of these sets lie in a translate of {B(S,\rho)}, and the claim then follows from the translation invariance of {\mu}. \Box

Given any Bohr set {B(S,\rho)}, we define a normalised “Lipschitz” cutoff function {\nu_{B(S,\rho)}: G \rightarrow {\bf R}} by the formula

\displaystyle  \nu_{B(S,\rho)}(x) = c_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})_+ \ \ \ \ \ (1)

where {c_{B(S,\rho)}} is the constant such that

\displaystyle  \int_G \nu_{B(S,\rho)}\ d\mu = 1,


\displaystyle c_{B(S,\rho)} = \left( \int_{B(S,\rho)} (1 - \frac{1}{\rho} \sup_{\xi \in S} \|\xi \cdot x\|_{{\bf R}/{\bf Z}})\ d\mu(x) \right)^{-1}.

The function {\nu_{B(S,\rho)}} should be viewed as an {L^1}-normalised “tent function” cutoff to {B(S,\rho)}. Note from Lemma 2 that

\displaystyle  1 \ll_{|S|,\rho} c_{B(S,\rho)} \ll_{|S|,\rho} 1. \ \ \ \ \ (2)

We then have the following sharper version of Theorem 1:

Theorem 3 (Roth-Khintchine theorem) Let {G = (G,+)} be a {2}-divisible compact abelian group, with Haar probability measure {\mu}, and let {\epsilon>0}. Then for any measurable function {f: G \rightarrow [0,1]}, there exists a Bohr set {B(S,\rho)} with {|S| \ll_\epsilon 1} and {\rho \gg_\epsilon 1} such that

\displaystyle  \int_G \int_G f(x) f(x+r) f(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \ \ \ \ \ (3)

\displaystyle  \geq (\int_G f\ d\mu)^3 - O(\epsilon)

where {*} denotes the convolution operation

\displaystyle  f*g(x) := \int_G f(y) g(x-y)\ d\mu(y).

A variant of this result (expressed in the language of ergodic theory) appears in this paper of Bergelson, Host, and Kra; a combinatorial version of the Bergelson-Host-Kra result that is closer to Theorem 3 subsequently appeared in this paper of Ben Green and myself, but this theorem arguably appears implicitly in a much older paper of Bourgain. To see why Theorem 3 implies Theorem 1, we apply the theorem with {f := 1_A} and {\epsilon} equal to a small multiple of {\alpha^3} to conclude that there is a Bohr set {B(S,\rho)} with {|S| \ll_\alpha 1} and {\rho \gg_\alpha 1} such that

\displaystyle  \int_G \int_G 1_A(x) 1_A(x+r) 1_A(x+2r) \nu_{B(S,\rho)}*\nu_{B(S,\rho)}(r)\ d\mu(x) d\mu(r) \gg \alpha^3.

But from (2) we have the pointwise bound {\nu_{B(S,\rho)}*\nu_{B(S,\rho)} \ll_\alpha 1}, and Theorem 1 follows.

Below the fold, we give a short proof of Theorem 3, using an “energy pigeonholing” argument that essentially dates back to the 1986 paper of Bourgain mentioned previously (not to be confused with a later 1999 paper of Bourgain on Roth’s theorem that was highly influential, for instance in emphasising the importance of Bohr sets). The idea is to use the pigeonhole principle to choose the Bohr set {B(S,\rho)} to capture all the “large Fourier coefficients” of {f}, but such that a certain “dilate” of {B(S,\rho)} does not capture much more Fourier energy of {f} than {B(S,\rho)} itself. The bound (3) may then be obtained through elementary Fourier analysis, without much need to explicitly compute things like the Fourier transform of an indicator function of a Bohr set. (However, the bound obtained by this argument is going to be quite poor – of tower-exponential type.) To do this we perform a structural decomposition of {f} into “structured”, “small”, and “highly pseudorandom” components, as is common in the subject (e.g. in this previous blog post), but even though we crucially need to retain non-negativity of one of the components in this decomposition, we can avoid recourse to conditional expectation with respect to a partition (or “factor”) of the space, using instead convolution with one of the {\nu_{B(S,\rho)}} considered above to achieve a similar effect.

Read the rest of this entry »

We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:

Theorem 1 (Roth’s theorem) Let {A} be a subset of the integers {{\bf Z}} whose upper density

\displaystyle  \overline{\delta}(A) := \limsup_{N \rightarrow \infty} \frac{|A \cap [-N,N]|}{2N+1}

is positive. Then {A} contains infinitely many arithmetic progressions {a, a+r, a+2r} of length three, with {a \in {\bf Z}} and {r>0}.

This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length {k} for any {k}.

As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set {A} (or more precisely, the indicator function {1_A} of that set, or of pieces of that set) against linear characters {n \mapsto e(\alpha n)} for various frequencies {\alpha \in {\bf R}/{\bf Z}}. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when {A} is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form

\displaystyle  \{ n \in {\bf Z}: \| \alpha n - \theta \|_{{\bf R}/{\bf Z}} < \epsilon \}

or more generally of the form

\displaystyle  \{ n \in {\bf Z}: \alpha n \in U \}

for some multi-dimensional frequency {\alpha \in {\bf T}^d} and some open set {U}. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.

To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or {L^\infty} increment argument) and the energy increment argument (or {L^2} increment argument).

The idea behind the density increment argument is to introduce a dichotomy: either the object {A} being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.

The idea behind the energy increment argument is instead to decompose the original object {A} into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with {A}, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or {L^2} norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.

In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.

Read the rest of this entry »

In the previous lecture, we studied the recurrence properties of compact systems, which are systems in which all measurable functions exhibit almost periodicity – they almost return completely to themselves after repeated shifting. Now, we consider the opposite extreme of mixing systems – those in which all measurable functions (of mean zero) exhibit mixing – they become orthogonal to themselves after repeated shifting. (Actually, there are two different types of mixing, strong mixing and weak mixing, depending on whether the orthogonality occurs individually or on the average; it is the latter concept which is of more importance to the task of establishing the Furstenberg recurrence theorem.)

We shall see that for weakly mixing systems, averages such as \frac{1}{N} \sum_{n=0}^{N-1} T^n f \ldots T^{(k-1)n} f can be computed very explicitly (in fact, this average converges to the constant (\int_X f\ d\mu)^{k-1}). More generally, we shall see that weakly mixing components of a system tend to average themselves out and thus become irrelevant when studying many types of ergodic averages. Our main tool here will be the humble Cauchy-Schwarz inequality, and in particular a certain consequence of it, known as the van der Corput lemma.

As one application of this theory, we will be able to establish Roth’s theorem (the k=3 case of Szemerédi’s theorem).

Read the rest of this entry »

Earlier this month, in the previous incarnation of this page, I posed a question which I thought was unsolved, and obtained the answer (in fact, it was solved 25 years ago) within a week. Now that this new version of the page has better feedback capability, I am now tempted to try again, since I have a large number of such questions which I would like to publicise. (Actually, I even have a secret web page full of these somewhere near my home page, though it will take a non-trivial amount of effort to find it!)

Perhaps my favourite open question is the problem on the maximal size of a cap set – a subset of {\Bbb F}^n_3 ({\Bbb F}_3 being the finite field of three elements) which contains no lines, or equivalently no non-trivial arithmetic progressions of length three. As an upper bound, one can easily modify the proof of Roth’s theorem to show that cap sets must have size O(3^n/n) (see e.g. this paper of Meshulam). This of course is better than the trivial bound of 3^n once n is large. In the converse direction, the trivial example \{0,1\}^n shows that cap sets can be as large as 2^n; the current world record is (2.2174\ldots)^n, held by Edel. The gap between these two bounds is rather enormous; I would be very interested in either an improvement of the upper bound to o(3^n/n), or an improvement of the lower bound to (3-o(1))^n. (I believe both improvements are true, though a good friend of mine disagrees about the improvement to the lower bound.)

Read the rest of this entry »