We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:
Theorem 1 (Roth’s theorem) Let
be a subset of the integers
whose upper density
is positive. Then
contains infinitely many arithmetic progressions
of length three, with
and
.
This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length for any
.
As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set (or more precisely, the indicator function
of that set, or of pieces of that set) against linear characters
for various frequencies
. There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when
is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form
or more generally of the form
for some multi-dimensional frequency and some open set
. In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.
To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or increment argument) and the energy increment argument (or
increment argument).
The idea behind the density increment argument is to introduce a dichotomy: either the object being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.
The idea behind the energy increment argument is instead to decompose the original object into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with
, and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or
norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.
In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.
— 1. The density increment argument —
We begin with the density increment argument. We first rephrase Roth’s theorem in a finitary form:
Theorem 2 (Roth’s theorem, again) For every
, there exists an
, such that for every
, and every
with
,
contains an arithmetic progression of length three.
Exercise 3 Show that Theorem 1 and Theorem 2 are equivalent.
We prove Theorem 2 by a downward induction on the density parameter . Let
denote the proposition that Theorem 2 holds for that value of
(i.e. for sufficiently large
and all
with
,
contains an arithmetic progression of length three). Our objective is to show that
holds for all
.
Clearly, is (vacuously) true for
(and trivially true for
). It is also monotone in the sense that if
holds for some
, then
holds for all
. To downwardly induct on
, we will prove the following dichotomy:
Proposition 4 (Lack of progressions implies density increment) Let
, let
be sufficiently large depending on
, and let
be such that
. Then one of the following holds:
contains an arithmetic progression of length three; or
- there exists a subprogression
of
of length at least
such that
, where
goes to infinity as
, and
is bounded away from zero whenever
is bounded away from zero.
Let us see why Proposition 4 implies Theorem 2. It is slightly more convenient to use a “well-ordering principle” argument rather than an induction argument, though of course the two approaches are equivalent. Let be the infimum of all
for which
holds, thus
. If
then we are done, so suppose that
is non-zero. Then for any
,
is false, thus there exist arbitrarily large
and
with
with no progressions of length three. By Proposition 4, we can thus find a subprogression
of
of length at least
with
; if
is small enough, this implies that
. We then use an affine transformation to map
to
(noting crucially that the property of having no arithmetic progressions of a given length is preserved by affine transformations). As
can be arbitrarily large,
can be arbitrarily large also. Since
is true, we see that
contains an arithmetic progression of length three, hence
does also; which gives the desired contradiction.
It remains to prove Proposition 4. There are two main steps. The first relies heavily on the fact that the progressions only have length three, and is proven via Fourier analysis:
Proposition 5 (Lack of progressions implies correlation with a linear phase) Let
, let
be sufficiently large depending on
, let
be such that
for some
, with
containing no arithmetic progressions of length three. Then there exists
such that
.
Proof: In order to use Fourier analysis, it will be convenient to embed inside a cyclic group
, where
is equal to (say)
; the exact choice here is only of minor importance, though it will be convenient to take
to be odd. We introduce the trilinear form
for any functions ; we then observe that the quantity
(extending by zero outside of
) is equal to the number of arithmetic progressions
in
(counting the degenerate progressions in which
, and also allowing for
to be negative), divided by the normalising factor of
. On the other hand, by hypothesis,
contains no non-degenerate arithmetic progressions of length three, and clearly has
degenerate progressions; thus we have
On the other hand, from the Fourier inversion formula on the cyclic group we may write
for any function , where
are the Fourier coefficients
We may thus write as
Now observe that we have the identity
so the phase is trivial whenever
is of the form
, and so the expectation in (2) is equal to
. Conversely, if
is not of this form, then the phase is non-trivial, and from Fourier analysis we conclude that the expectation in (2) vanishes. We conclude that the left-hand side of (2) can be expressed as
Now using Plancherel’s theorem we have
(using normalised counting measure). Using this and Hölder’s inequality (and the fact that is odd), we obtain the bounds
and similarly for permutations of on the right-hand side.
We could apply this directly to , but this is not useful, since we seek a lower bound on this quantity rather than an upper bound. To get such a lower bound, we split
, where
is the mean zero portion of
, and use trilinearity to split
into a main term
, plus seven other error terms involving
and
, with each error term involving at least one copy of
. The main term can be computed explicitly as
Comparing this with (1), we conclude that one of the error terms must have magnitude also. For sake of concreteness, let us say that
the other cases are similar.
From the triangle inequality we see that have an
norm of
, and so from (3) one has
and so we conclude that
for some . (Similarly for other error terms, though sometimes onewill need a permutation of (3) instead of (3) itself.) The claim follows.
Remark 6 The above argument relied heavily on the fact that there was only a one-parameter family of linear relations between
. The same argument does not work directly for finding arithmetic progressions of length longer than three; we return to this point in later notes.
The second step converts correlation with a linear character into a density increment on a subprogression:
Proposition 7 (Fragmenting a linear character into progressions) Let
, let
, and let
be a linear phase. Then there exists
which goes to infinity as
for fixed
, and a partition
of
into arithmetic progressions
of length at least
, together with an error term
of cardinality at most
, such that
fluctuates by at most
on each progression
(i.e.
whenever
).
Proof: We may assume that is sufficiently large depending on
, as the claim is trivial otherwise (just set
).
Fix , and let
be a slowly growing function of
to be chosen later. By using recurrence for the linear phase
, we can find a shift
of size
such that
. We then partition
into
arithmetic progressions of spacing
, and then partition each of those progressions in turn into subprogressions
of spacing
and length
, plus an error of cardinality at most
, leading to an error set
of cardinality at most
. On each of the
,
fluctuates by at most
. The claim then follows by choosing
to be a sufficiently slowly growing function of
.
Now we can prove Proposition 4 (and thus Roth’s theorem). Let be as in Proposition 4. By Proposition 5 (if
is large enough), we can find
for which
We now let be a small quantity depending on
to be chosen later (actually it turns out that we can take
to be a small multiple of
) and apply Proposition 7 to decompose
into progressions
and an error term
with the stated properties. Then we have
Since fluctuates by at most
on
, we can apply the triangle inequality and conclude that
If is sufficiently small depending on
, we conclude that
On the other hand, as is the mean of
on
, we have
and thus
Adding this to (4) and noting that for real
, we conclude (for
small enough) that
and hence by the pigeonhole principle we can find such that
or in other words
for some absolute constant , and Proposition 4 follows.
It is possible to rewrite the above argument in the ultralimit setting, though it only makes the argument slightly shorter as a consequence. We sketch this alternate formulation below.
Exercise 8 Let
be as above.
- Show that if
is an unbounded limit natural number, and
is a limit subset whose density
is strictly greater than
, then
contains a (limit) arithmetic progression
of length three (with
).
- Show that there exists an unbounded limit natural number
and a limit subset
of density
, which does not contain any arithmetic progressions of length three.
Exercise 9 Show that if
is an unbounded limit natural number, and
is a limit subset of positive density
with no arithmetic progressions of length three, then there exists a limit real
such that
.
Exercise 10 If
is an unbounded limit natural number, and
is a limit real, show that one can partition
, where
is a limit natural number, the
are limit arithmetic subprogressions of
of unbounded length (with the map
being a limit function), such that
fluctuates by
on each
(uniformly in
), and
.
Exercise 11 Use the previous three exercises to reprove Roth’s theorem.
Exercise 12 (Roth’s theorem in bounded characteristic) Let
be a finite field, let
, and let
be a finite vector space. Show that if the dimension of
is sufficiently large depending on
, and if
is such that
, then there exists
with
such that
. (Hint: Mimic the above arguments (either finitarily, or with ultralimits), using hyperplanes as a substitute for subprogressions.)
Exercise 13 (Roth’s theorem in finite abelian groups) Let
be a finite abelian group, and let
. Show that if
is sufficiently large depending on
, and
is such that
, then there exists
with
such that
. (Hint: if there is an element of
of large order, one can use Theorem 2 and the pigeonhole principle. If all elements have bounded order, one can instead use Exercise 12.) This result (as well as the special case in the preceding exercise) was first established by Meshulam.
— 2. The energy increment argument —
Now we turn to the energy increment approach. This approach requires a bit more machinery to set up, but ends up being quite flexible and powerful (for instance, it is the starting point for my theorem with Ben Green establishing arbitrarily long progressions in the primes, which we do not know how to establish via density increment arguments).
Instead of passing from to a subprogression, we now instead coarsen
to some partition (or factor) of
, as follows. Define a factor of
to be a
-algebra of subsets
of
, or equivalently a partition of
into disjoint atoms or cells (with the elements of
then being the arbitary unions of atoms). Given a function
and a factor
, we define the conditional expectation
to be the function whose value at a given point
is given by the formula
where is the unique atom of
that contains
. One can view the map
as the orthogonal projection from
to
, where
is the space of functions
with the inner product
and is the subspace of functions in
which are measurable with respect to
, or equivalently are constant on each atom of
.
We say that one factor refines another
if
, or equivalently if every atom of
is a union of atoms of
, or if every atom of
is contained in an atom of
, or equivalently again if
. Given two factors
,
, one can define their join
to be their least common refinement, thus the atoms in
are the non-empty intersections of atoms in
with atoms in
.
The idea is to split a given function in
(and specifically, an indicator function
) into a projection
onto a “structured factor”
to obtain a “structured component”
, together with a “pseudorandom component”
that is essentially orthogonal to all structured functions. This decomposition is related to the classical decomposition of a vector in a Hilbert space into its orthogonal projection onto a closed subspace
, plus the complementary projection to the orthogonal complement
; we will see the relationship between the two decompositions later when we pass to the ultralimit.
We need to make the notion of “structured” more precise. We begin with some definitions. We say that a function has Fourier complexity at most
if it can be expressed as
for some and some complex numbers
of magnitude at most
, and some real numbers
. Note that from the Fourier inversion formula that every function will have some finite Fourier complexity, but typically one expects the complexity to grow with
; only a few special functions will have complexity bounded uniformly in
. Also note that if
have Fourier complexity
then
, or
all have Fourier complexity at most
; informally, the space of bounded complexity functions forms an algebra. (We will be able to formalise this statement after we take ultralimits.)
Ideally, we would like to take “functions of bounded Fourier complexity” as our class of structured functions. For technical reasons (related to our desire to use indicator functions as structured functions), we need to take an closure and work with the wider class of Fourier measurable functions as our structured class.
Definition 14 (Measurability) Let
be a function. We say that a function
is Fourier measurable with growth function
if, for every
, one can find a function
of Fourier complexity at most
such that
.
A subsetof
is Fourier measurable with growth function
if
is Fourier measurable with this growth function.
Exercise 15 Show that every interval
in
is Fourier measurable with some growth function
independent of
. (Hint: apply Fejér summation to
.)
Exercise 16 Let
be a Fourier-measurable function with some growth function
, which is bounded in magnitude by
. Show that for every
, one can find a function
which also is bounded in magnitude by
, and of Fourier complexity
, such that
. (Hint: start with the approximating function
from Definition 14, which is already bounded in magnitude by
, and then set
where
is a polynomial bounded in magnitude by
on the ball of radius
which is close to the identity function on the ball of radius
(such a function can be constructed via the Stone-Weierstrass theorem).)
Exercise 17 Show that if
are bounded in magnitude by
, and are Fourier measurable with growth functions
, then
,
, and
are Fourier measurable with some growth function
depending only on
and
.
Conclude that ifare Fourier-measurable with growth function
, then
,
, and
are Fourier-measurable with some growth function
depending only on
.
We thus see that Fourier-measurable sets morally form a Boolean algebra. (Again, we can formalise this assertion once we pass to the ultralimit.)
Now we make a key observation (cf. this paper by Reingold, Trevisan, Tulsiani, and Vadhan).
Lemma 18 (Correlation with a Fourier character implies correlation with a Fourier-measurable set) Let
be bounded in magnitude by
, and suppose that
for some
. Then there exists a Fourier-measurable set
with some growth function
depending on
, such that
.
Proof: By splitting into real and imaginary parts, we may assume without loss of generality that
is real. Rotating
, we may find a real number
such that
We then express
where
By Minkowski’s inequality, we thus have either
or
In the former case we are done (setting ), so suppose that the latter holds. If all the
were uniformly Fourier-measurable, we would now be done in this case also by the pigeonhole principle. This is not quite true; however, it turns out that most
are uniformly measurable, and this will be enough. More precisely, let
be a small parameter to be chosen later, and say that
is good if one has
for all . Let
be the set of all bad
. Observe that for each bad
, we have
, where
is the probability measure
and is the Hardy-Littlewood maximal function
Applying the Hardy-Littlewood maximal inequality, we conclude that . In particular, if
is small enough compared with
, we have
and so by the pigeonhole principle, there exists a good such that
It remains to verify that is good. For any
, we have (as
is good) that
Applying Urysohn’s lemma, we can thus find a smooth function with
for
and
for
such that
Using the Weierstrass approximation theorem, one can then approximate uniformly by
on
by a polynomial of degree
and coefficients
. This allows one to approximate
in
norm to an accuracy of
by a function of Fourier complexity
, and the claim follows.
Corollary 19 (Correlation implies energy increment) Let
, and let
be a factor generated by at most
atoms, each of which is Fourier-measurable with growth function
. Suppose that we have the correlation
for some
and
. Then there exists a refinement
generated by at most
atoms, each of which is Fourier-measurable with a growth function
depending only on
, such that
Proof: By Lemma 18, we can find a Fourier-measurable set with some growth function
depending on
, such that
We let be the factor generated by
and
. As
is measurable with respect to
, we may project onto
and conclude that
By Cauchy-Schwarz, we thus have
Squaring and using Pythagoras’ theorem, we obtain (5). The remaining claims in the corollary follow from Exercise 17.
We can then iterate this corollary via an energy increment argument to obtain
Proposition 20 (Weak arithmetic regularity lemma) Let
, and let
be a factor generated by at most
atoms, each of which is Fourier-measurable with growth function
. Let
. Then there exists an extension
of
generated by
atoms, each of which is Fourier-measurable with growth function
depending on
, such that
for all
.
Proof: We initially set equal to
. If (6) already holds, then we are done; otherwise, we invoke Corollary 19 to increase the “energy”
by
, at the cost of possibly doubling the number of atoms in
, and also altering the growth function somewhat. We iterate this procedure; as the energy
is bounded between zero and one, and increases by
at each step, the procedure must terminate in
steps, at which point the claim follows.
It turns out that the power of this lemma is amplified if we iterate one more time, to obtain
Theorem 21 (Strong arithmetic regularity lemma) Let
, let
, and let
be an arbitrary function. Then we can decompose
and find
such that
- (Nonnegativity)
take values in
, and
have mean zero;
- (Structure)
is Fourier-measurable with a growth function
that depends only on
;
- (Smallness)
has an
norm of at most
; and
- (Pseudorandomness) One has
for all
.
Proof: We recursively define a sequence by setting
and
(say). Applying Proposition 20 (starting with the trivial factor
), one can then find a nested sequence of refinements
, such that
for all and
, and such that each
consists of
atoms that are Fourier-measurable with some growth function depending on
(note that this quantity dominates
and
by construction). By Pythagoras’ theorem, the energies
are monotone increasing between
and
, so by the pigeonhole principle there exists
such that
and hence by Pythagoras
Setting ,
,
, we obtain the claim.
Remark 22 This result is essentially due to Green (though not quite in this language). Earlier related decompositions are due to Bourgain and to Green-Konyagin. The Szemerédi regularity lemma in graph theory can be viewed as the graph-theoretic analogue of this Fourier-analytic result; see this paper of mine (or my FOCS paper) for further discussion. The double iteration required to prove Theorem 21 means that the bounds here are quite poor (of tower-exponential type, in fact, when
is exponential, which is typical in applications), much as in the graph theory case; thus the use of this lemma, while technically quantitative in nature, gives bounds that are usually quite inferior to what is known or suspected to be true.
As with the Ratner-type theorems from the previous notes, it is crucial that the uniformityfor the pseudorandom component
is of an arbitrarily higher quality than the measurability of the structured component
.
Much as the Ratner-type theorems from the previous notes could be used to prove multiple recurrence theorems, the arithmetic regularity lemma can be used (among other things) to give a proof of Roth’s theorem. We do so as follows. Let be a large integer, and let
be a subset of
with
for some
. We consider the expression
, where
is the trilinear form
which implies that the number of all three-term arithmetic progressions in (including the degenerate ones with
) is
. For
sufficiently large depending on
, this number is larger than the number
of degenerate progressions, giving the theorem.
It remains to establish (7). We apply Theorem 21 with parameters ,
to be chosen later (they will depend on
) to obtain a quantity
and a decomposition
with the stated properties. This splits the left-hand side of (7) into terms. But we can eliminate several of these terms:
Exercise 23 Show that all of the terms in (7) which involve at least one copy of
are of size
. (Hint: Modify the proof of Proposition 5.)
From this exercise we see that
Now we need to deal with . A key point is the almost periodicity of
:
Lemma 24 (Almost periodicity) For
values of
, one has
(where we extend
by zero outside of
).
Proof: As is Fourier-measurable, we can approximate it to an error of
in
norm by a function
of Fourier complexity . From the smallness of
, we then have
(where we extend using (9) rather than by zero, with the error being
when
). We can use (9) and the triangle inequality to bound
Using multiple recurrence, we can find values of
such that
for all
. The claim follows.
Now we can finish the proof of Roth’s theorem. As has the same mean as
, we have
and hence by Hölder’s inequality (and the non-negativity of )
Now if is one of the periods in the above lemma, we have
and thus by shifting
and so by the triangle inequality
Putting all this together using the triangle and Hölder inequalities, we obtain
Thus, if is sufficiently small depending on
, we have
for values of
, and thus
if we then set to be a sufficiently rapidly growing function (depending on
), we obtain the claim from (8). This concludes the proof of Roth’s theorem.
Exercise 25 Use the energy increment method to establish a different proof of Exercise 13. (Hint: For the multiple recurrence step, use a pigeonhole principle argument rather than an appeal to equidistribution theory.)
We now briefly indicate how to translate the above arguments into the ultralimit setting. We first need to construct an important measure on limit sets, namely Loeb measure.
Exercise 26 (Construction of Loeb measure) Let
be an unbounded natural number. Define the Loeb measure
of a limit subset
of
to be the quantity
, thus for instance a set of cardinality
will have Loeb measure zero.
- Show that if a limit subset
of
is partitioned into countably many disjoint limit subsets
, that all but finitely many of the
are empty, and so
.
- Define the outer measure
of a subset
of
(not necessarily a limit subset) to be the infimum of
, where
is a countable family of limit subsets of
that cover
, and call a subset of
null if it has zero outer measure. Call a subset Loeb measurable if it differs from a limit set by a null set. Show that there is a unique extension of Loeb measure
from limit sets to Loeb measurable sets that is a countably additive probability measure on
. (Hint: use the Carathéodory extension theorem, see e.g. my 254B notes 0 or notes 0a.)
- If
is a limit function bounded in magnitude by some standard real
, show that
is a Loeb measurable function in
, with norm at most
.
- Show that there exists a unique trilinear form
, jointly continuous in the
topology for all three inputs, such that
for all bounded limit functions
.
- Show that Roth’s theorem is equivalent to the assertion that
whenever
is a bounded non-negative function with
.
Next, we develop the ultralimit analogue of Fourier measurability, which we will rename Kronecker measurability due to the close analogy with the Kronecker factor in ergodic theory.
Exercise 27 (Construction of the Kronecker factor) Let
be an unbounded natural number. We define a Fourier character to be a function in
of the form
for some limit real number
. We define a trigonometric polynomial to be any finite linear combination (over the standard complex numbers) of Fourier characters. Let
be the
-algebra of Loeb measurable sets generated by the Fourier characters; we refer to
as the Kronecker factor, and functions or sets measurable in this factor as Kronecker measurable functions and sets. Thus for instance all trigonometric polynomials are Kronecker measurable. We let
denote the orthogonal projection from
to
, i.e. the conditional expectation to the Kronecker factor.
- Show that if
is bounded in magnitude by
and
is a standard real, then there exists a trigonometric polynomial
which is also bounded in magnitude by
and is within
of
in
norm.
- Show that if
and
, then there exists a limit subset
of
of cardinality
such that
for all
(extending
by zero).
- Show that if
is non-negative with
, then
.
- Show that if
and
for at least one
, then
.
- Conclude the proof of Roth’s theorem using ultralimits.
Remark 28 Note how the (finitary) arithmetic regularity lemma has been replaced by the more familiar (infinitary) theory of conditional expectation to a factor, and the finitary notion of measurability has been replaced by a notion from the traditional (countably additive) infinitary theory of measurability. This is one of the key advantages of the ultralimit approach, namely that it allows one to exploit already established theories of infinitary mathematics (e.g. measure theory, ergodic theory, Hilbert space geometry, etc.) to prove a finitary result.
Exercise 29 Use the ultralimit energy increment method to establish yet another proof of Exercise 13.
— 3. More quantitative bounds (optional) —
The above proofs of Roth’s theorem (as formulated in, say, Theorem 2) were qualitative in the sense that they did not explicitly give a bound for in terms of
. Nevertheless, by analysing the finitary arguments more carefully, a bound can be extracted:
Exercise 30 Show that in Proposition 7, one can take
. Using this and the density increment argument, show that one can take
in Theorem 2. (To put it another way, subsets of
of density much larger than
will contain progressions of length three.)
Exercise 31 Show that in the energy increment proof of Roth’s theorem, one can take the growth functions
involved to be polynomial in
(but with the exponent growing exponentially with each refinement of the factor), and
can be taken to be an iterated exponential; thus ultimately allows one to take
to be a tower exponential of height
. (To put it another way, subsets of
of density much larger than
for some
will contain progressions of length three.) Thus we see that the energy increment argument, in the form presented here, provides much worse bounds than the density increment argument; but see below.
For the ultralimit arguments, it is significantly harder to extract a quantitative bound from the argument (basically one has to painstakingly “finitise” the argument first, essentially reaching the finitary counterparts of these arguments presented above). Thus we see that there is a tradeoff when using ultralimit analysis; the arguments become slightly cleaner (and one can deploy infinitary methods), but one tends to lose sight of what quantitative bounds the method establishes. (This is particularly the case if one begins to rely heavily on the axiom of choice (or on large cardinal axioms) once one takes ultralimits, although these axioms are not used in the examples above.)
It is possible to run the density increment argument more efficiently by combining it with some aspects of the energy increment argument. As described above, the density increment argument proceeds by locating a single large Fourier coefficient of
, and uses this to obtain a density increment on a relatively short subprogression of
(of length comparable to
, ignoring factors of
). One then has to iterate this about
times before one obtains a truly significant density increment (e.g. from
to
). It is this repeated passage from
to
which is ultimately responsible for the double exponential bound for
at the end of the day.
In an unpublished work, Endre Szemerédi observed that one can run this argument more efficiently by collecting several large Fourier coefficients of simultaneously (somewhat in the spirit of the energy increment argument), and only then passing to a subprogression on which all of the relevant Fourier characters are simultaneously close to constant. The subprogression obtained is smaller as a consequence, but the density increment is much more substantial. Using this strategy, Endre was able to improve the original Roth bound of
to the somewhat better
(or equivalently, he was able to establish length three progressions in any subset of
of density much larger than
for some
). By carefully optimising the choice of threshold for selecting the “large Fourier coefficients”, Szemerédi (unpublished) and Heath-Brown independently improved this method further to obtain
, or equivalently obtaining length three progressions in sets of density much larger than
. (This result was extended to arbitrary finite abelian groups by Meshulam.)
The next advance was by Bourgain, who realised that rather than pass to short subprogressions, it was more efficient to work on the significantly larger (but messier) Bohr sets , after ensuring that such Bohr sets were regular (this condition is closely related to the Fourier measurability condition used in the energy increment argument). With this modification to the original Roth argument, the bound was lowered to
, or equivalently obtaining length three progressions in sets of density much larger than
. Even more recently, this argument was combined with the Szemerédi-Heath-Brown argument by Bourgain (not yet published) to obtain the further improvement of
, or equivalently obtaining length three progressions in sets of density much larger than
. This is getting close to the
case of an old conjecture of Erdös that asserts that any subset of the natural numbers whose sums of reciprocals diverge should have infinitely many arithmetic progressions of length
for any
. To establish the
case from quantitative versions of Roth’s theorem, one would basically need a bound of the form
for some
(or the ability to obtain progressions in sets of density
).
On the other hand, there is an old counterexample of Behrend (based ultimately on the observation that a sphere in a high-dimensional lattice does not contain any arithmetic progressions of length three) which shows that
must be at least
(in particular, it must be super-polynomial in
); equivalently, it is known that there are subsets of
of density about
with no arithmetic progressions of length three. For the sharpest results in this direction, see this paper of Elkin (and a later exposition of Green and Wolf).
The question of refining the bounds is an important one, as it tends to improve the technological understanding of current methods, as well as shed light on their relative strengths and weaknesses. However, this comes at the cost of making the arguments somewhat more technical, and so we shall not focus on the sharpest quantitative results in these notes.
19 comments
Comments feed for this article
14 April, 2010 at 5:35 am
Joel Moreira
Dear Prof. Tao
In theorem 1, shouldn’t be limsup?
Also, after (1) (in the proof of proposition 4) I think it’s the cyclic group Z/N’Z.
And in the second equation in the proof of lemma 7, I believe that there is a sign missing (not that it matters).
[Corrected, thanks – T.]
Great Notes!
14 April, 2010 at 5:10 pm
Anonymous
The full entry shows up on the main page for your blog. Did you mean to have it all show, rather than just the beginning? Maybe it’s something weird on my side.
[Corrected, thanks – T.]
28 April, 2010 at 2:37 pm
254B, Notes 3: Linear patterns « What’s new
[…] von Neumann theorem, Gowers uniformity norms, linear patterns | by Terence Tao In the previous lecture notes, we used (linear) Fourier analysis to control the number of three-term arithmetic progressions in […]
20 May, 2010 at 9:45 pm
254B, Notes 5: The inverse conjecture for the Gowers norm I. The finite field case « What’s new
[…] proceed we need the following analogue of Proposition 5 of Notes 2: Exercise 6 (Fragmenting a polynomial into subspaces) Let be a classical polynomial of degree . […]
29 May, 2010 at 2:15 pm
254B, Notes 6: The inverse conjecture for the Gowers norm II. The integer case « What’s new
[…] with growth function depending only on , where Fourier measurability was defined in Notes 2. Exercise 18 Show that the class of nilsequences of degree does not change if we drop the […]
5 June, 2010 at 12:57 pm
254B, Lecture Notes 7: The transference principle, and linear equations in primes « What’s new
[…] should be compared with Roth’s theorem in the integers (Notes 2), which is the same statement but with the primes replaced by the integers (or natural numbers ). […]
5 September, 2011 at 1:17 pm
ylin2
Dear Professor Tao: In exercise 8, you said to consider a Fejer summation method. But Fejer kernel seems to be defined on the continuous torus, so how to adapt that method to the discrete setting as in this problem?
[Corrected, thanks – T.]
5 September, 2011 at 6:43 pm
Terence Tao
A function on the integers can be viewed as the restriction of a function on the reals.
7 January, 2012 at 9:35 am
Applications of Szemerédi’s regularity lemma: triangle removal lemma, Roth’s theorem, corner’s theorem and graph removal lemma « Disquisitiones Mathematicae
[…] regularity lemma, and now we give a few of its various applications: the triangle removal lemma, Roth’s theorem on the existence of arithmetic progressions of length in subsets of the integers with positive […]
14 May, 2018 at 12:51 pm
Finitary versions of theorems – Write down what you’ve done
[…] of Roth’s theorem on arithmetic progressions which I wanted to understand. I found it in Terence Tao’s lecture notes on higher order Fourier analysis. While working my way through this proof I tried to solve the exercises. However, already the first […]
20 January, 2020 at 3:38 pm
hc
I have a confusion: what does arithmetic regularity lemma buy us in the energy increment proof?
It seems like after Proposition 9, we already found the right sigma algebra
of Fourier measurable sets with complexity
(say we start with
being the trivial
-algebra). We can then write
Almost periodicity should hold for
as in your proof. By proposition 9, the residual
would have small
-norm, say
. It seems like one can already conclude Roth’s theorem here.
Thank you!
20 January, 2020 at 7:44 pm
Terence Tao
Almost periodicity will give a good contribution from
, but only on a relatively small number of shifts
(a lower bound of
. The
control of
is not strong enough to control its contribution on such a sparse set of shifts, so one cannot conclude any nontrivial lower bound on the number of 3-term APs this way.
22 January, 2020 at 8:12 am
hc
Thanks for your quick reply, this is very helpful!
I missed that
has another dependency on
(coming from
having
atoms), hence I thought I can still have enough shifts
independent of
:(
15 May, 2020 at 3:04 pm
S M
Dear Professor Tao, if you do not terribly mind a couple of stupid questions, could you please comment on how to actually prove that
is Fourier-measurable? I do not quite understand what Fejer summation means but do you mean that one should define take our
to be
where
denotes the Fejer kernel?
Secondly, I simply cannot wrap my head around what the definition of
is. Usually, if I had to calculate such an integral I would try to approximate such a function by simple functions and then using the monotone convergence theorem or something similar. However, I do not quite know how to work with this integral of
. Thank you for patiently reading my questions.
15 May, 2020 at 3:07 pm
S M
Sorry, I meant
and
.
17 May, 2020 at 10:07 am
S M
I think I (with some help from my teacher!) just figured out what the definition of
should naturally be
where we define
.
27 May, 2020 at 9:45 am
Abhishek Khetan
Dear Professor Tao,
In the section on the density increment argument, when you wrote “…conclude that the left-hand side of (1) can be expressed as…” I think you meant to refer to Equation (2) rather than Equation (1).
Thank you.
[Corrected, thanks – T.]
9 November, 2021 at 10:58 pm
Anonymous
Dear Prof. Tao,
In the corollary 19(here) or ( or cor. 1.2.9 from textbook)
What is meant by the term “generated”. You have used this term several times and I am unable to infer its meaning. The argument says:
1. there exists a refinement generated by
generated by at most
atoms.
2.We let
be the factor generated by
and
.
3. How does it follow from
and the fact that
is measurable with respect to to $\mathcal{B’}$ that

[See https://en.wikipedia.org/wiki/%CE%A3-algebra#%CF%83-algebra_generated_by_an_arbitrary_family -T]
9 November, 2021 at 11:00 pm
Anonymous
Can you please elaborate or give details on how to proceed with question 3? I am unable to follow directly. Thank you!