We now give a basic application of Fourier analysis to the problem of counting additive patterns in sets, namely the following famous theorem of Roth:
is positive. Then contains infinitely many arithmetic progressions of length three, with and .
This is the first non-trivial case of Szemerédi’s theorem, which is the same assertion but with length three arithmetic progressions replaced by progressions of length for any .
As it turns out, one can prove Roth’s theorem by an application of linear Fourier analysis – by comparing the set (or more precisely, the indicator function of that set, or of pieces of that set) against linear characters for various frequencies . There are two extreme cases to consider (which are model examples of a more general dichotomy between structure and randomness). One is when is aligned up almost completely with one of these linear characters, for instance by being a Bohr set of the form
or more generally of the form
for some multi-dimensional frequency and some open set . In this case, arithmetic progressions can be located using the equidistribution theory of the previous set of notes. At the other extreme, one has Fourier-uniform or Fourier-pseudorandom sets, whose correlation with any linear character is negligible. In this case, arithmetic progressions can be produced in abundance via a Fourier-analytic calculation.
To handle the general case, one must somehow synthesise together the argument that deals with the structured case with the argument that deals with the random case. There are several known ways to do this, but they can be basically classified into two general methods, namely the density increment argument (or increment argument) and the energy increment argument (or increment argument).
The idea behind the density increment argument is to introduce a dichotomy: either the object being studied is pseudorandom (in which case one is done), or else one can use the theory of the structured objects to locate a sub-object of significantly higher “density” than the original object. As the density cannot exceed one, one should thus be done after a finite number of iterations of this dichotomy. This argument was introduced by Roth in his original proof of the above theorem.
The idea behind the energy increment argument is instead to decompose the original object into two pieces (and, sometimes, a small additional error term): a structured component that captures all the structured objects that have significant correlation with , and a pseudorandom component which has no significant correlation with any structured object. This decomposition usually proceeds by trying to maximise the “energy” (or norm) of the structured component, or dually by trying to minimise the energy of the residual between the original object and the structured object. This argument appears for instance in the proof of the Szemerédi regularity lemma (which, not coincidentally, can also be used to prove Roth’s theorem), and is also implicit in the ergodic theory approach to such problems (through the machinery of conditional expectation relative to a factor, which is a type of orthogonal projection, the existence of which is usually established via an energy increment argument). However, one can also deploy the energy increment argument in the Fourier analytic setting, to give an alternate Fourier-analytic proof of Roth’s theorem that differs in some ways from the density increment proof.
In these notes we give both two Fourier-analytic proofs of Roth’s theorem, one proceeding via the density increment argument, and the other by the energy increment argument. As it turns out, both of these arguments extend to establish Szemerédi’s theorem, and more generally in counting other types of patterns, but this is non-trivial (requiring some sort of inverse conjecture for the Gowers uniformity norms in both cases); we will discuss this further in later notes.
— 1. The density increment argument —
We begin with the density increment argument. We first rephrase Roth’s theorem in a finitary form:
We prove Theorem 2 by a downward induction on the density parameter . Let denote the proposition that Theorem 2 holds for that value of (i.e. for sufficiently large and all with , contains an arithmetic progression of length three). Our objective is to show that holds for all .
Clearly, is (vacuously) true for (and trivially true for ). It is also monotone in the sense that if holds for some , then holds for all . To downwardly induct on , we will prove the following dichotomy:
- contains an arithmetic progression of length three; or
- there exists a subprogression of of length at least such that , where goes to infinity as , and is bounded away from zero whenever is bounded away from zero.
Let us see why Proposition 3 implies Theorem 2. It is slightly more convenient to use a “well-ordering principle” argument rather than an induction argument, though of course the two approaches are equivalent. Let be the infimum of all for which holds, thus . If then we are done, so suppose that is non-zero. Then for any , is false, thus there exist arbitrarily large and with with no progressions of length three. By Proposition 3, we can thus find a subprogression of of length at least with ; if is small enough, this implies that . We then use an affine transformation to map to (noting crucially that the property of having no arithmetic progressions of a given length is preserved by affine transformations). As can be arbitrarily large, can be arbitrarily large also. Since is true, we see that contains an arithmetic progression of length three, hence does also; which gives the desired contradiction.
It remains to prove Proposition 3. There are two main steps. The first relies heavily on the fact that the progressions only have length three, and is proven via Fourier analysis:
Proposition 4 (Lack of progressions implies correlation with a linear phase) Let , let be sufficiently large depending on , let be such that for some , with containing no arithmetic progressions of length three. Then there exists such that .
Proof: In order to use Fourier analysis, it will be convenient to embed inside a cyclic group , where is equal to (say) ; the exact choice here is only of minor importance, though it will be convenient to take to be odd. We introduce the trilinear form
for any functions ; we then observe that the quantity
(extending by zero outside of ) is equal to the number of arithmetic progressions in (counting the degenerate progressions in which , and also allowing for to be negative), divided by the normalising factor of . On the other hand, by hypothesis, contains no non-degenerate arithmetic progressions of length three, and clearly has degenerate progressions; thus we have
On the other hand, from the Fourier inversion formula on the cyclic group we may write
for any function , where are the Fourier coefficients
We may thus write as
Now observe that we have the identity
so the phase is trivial whenever is of the form , and so the expectation in (2) is equal to . Conversely, if is not of this form, then the phase is non-trivial, and from Fourier analysis we conclude that the expectation in (2) vanishes. We conclude that the left-hand side of (1) can be expressed as
Now using Plancherel’s theorem we have
and similarly for permutations of on the right-hand side.
We could apply this directly to , but this is not useful, since we seek a lower bound on this quantity rather than an upper bound. To get such a lower bound, we split , where is the mean zero portion of , and use trilinearity to split into a main term , plus seven other error terms involving and , with each error term involving at least one copy of . The main term can be computed explicitly as
Comparing this with (1), we conclude that one of the error terms must have magnitude also. For sake of concreteness, let us say that
the other cases are similar.
From the triangle inequality we see that have an norm of , and so from (3) one has
and so we conclude that
Remark 1 The above argument relied heavily on the fact that there was only a one-parameter family of linear relations between . The same argument does not work directly for finding arithmetic progressions of length longer than three; we return to this point in later notes.
The second step converts correlation with a linear character into a density increment on a subprogression:
of into arithmetic progressions of length at least , together with an error term of cardinality at most , such that fluctuates by at most on each progression (i.e. whenever ).
Proof: We may assume that is sufficiently large depending on , as the claim is trivial otherwise (just set ).
Fix , and let be a slowly growing function of to be chosen later. By using recurrence for the linear phase , we can find a shift of size such that . We then partition into arithmetic progressions of spacing , and then partition each of those progressions in turn into subprogressions of spacing and length , plus an error of cardinality at most , leading to an error set of cardinality at most . On each of the , fluctuates by at most . The claim then follows by choosing to be a sufficiently slowly growing function of .
We now let be a small quantity depending on to be chosen later (actually it turns out that we can take to be a small multiple of ) and apply Proposition 5 to decompose into progressions and an error term with the stated properties. Then we have
Since fluctuates by at most on , we can apply the triangle inequality and conclude that
On the other hand, as is the mean of on , we have
Adding this to (4) and noting that for real , we conclude (for small enough) that
and hence by the pigeonhole principle we can find such that
or in other words
for some absolute constant , and Proposition 3 follows.
It is possible to rewrite the above argument in the ultralimit setting, though it only makes the argument slightly shorter as a consequence. We sketch this alternate formulation below.
Exercise 2 Let be as above.
- Show that if is an unbounded limit natural number, and is a limit subset whose density is strictly greater than , then contains a (limit) arithmetic progression of length three (with ).
- Show that there exists an unbounded limit natural number and a limit subset of density , which does not contain any arithmetic progressions of length three.
Exercise 3 Show that if is an unbounded limit natural number, and is a limit subset of positive density with no arithmetic progressions of length three, then there exists a limit real such that .
Exercise 4 If is an unbounded limit natural number, and is a limit real, show that one can partition , where is a limit natural number, the are limit arithmetic subprogressions of of unbounded length (with the map being a limit function), such that fluctuates by on each (uniformly in ), and .
Exercise 5 Use the previous three exercises to reprove Roth’s theorem.
Exercise 6 (Roth’s theorem in bounded characteristic) Let be a finite field, let , and let be a finite vector space. Show that if the dimension of is sufficiently large depending on , and if is such that , then there exists with such that . (Hint: Mimic the above arguments (either finitarily, or with ultralimits), using hyperplanes as a substitute for subprogressions.)
Exercise 7 (Roth’s theorem in finite abelian groups) Let be a finite abelian group, and let . Show that if is sufficiently large depending on , and is such that , then there exists with such that . (Hint: if there is an element of of large order, one can use Theorem 2 and the pigeonhole principle. If all elements have bounded order, one can instead use Exercise 6.) This result (as well as the special case in the preceding exercise) was first established by Meshulam.
— 2. The energy increment argument —
Now we turn to the energy increment approach. This approach requires a bit more machinery to set up, but ends up being quite flexible and powerful (for instance, it is the starting point for my theorem with Ben Green establishing arbitrarily long progressions in the primes, which we do not know how to establish via density increment arguments).
Instead of passing from to a subprogression, we now instead coarsen to some partition (or factor) of , as follows. Define a factor of to be a -algebra of subsets of , or equivalently a partition of into disjoint atoms or cells (with the elements of then being the arbitary unions of atoms). Given a function and a factor , we define the conditional expectation to be the function whose value at a given point is given by the formula
where is the unique atom of that contains . One can view the map as the orthogonal projection from to , where is the space of functions with the inner product
and is the subspace of functions in which are measurable with respect to , or equivalently are constant on each atom of .
We say that one factor refines another if , or equivalently if every atom of is a union of atoms of , or if every atom of is contained in an atom of , or equivalently again if . Given two factors , , one can define their join to be their least common refinement, thus the atoms in are the non-empty intersections of atoms in with atoms in .
The idea is to split a given function in (and specifically, an indicator function ) into a projection onto a “structured factor” to obtain a “structured component” , together with a “pseudorandom component” that is essentially orthogonal to all structured functions. This decomposition is related to the classical decomposition of a vector in a Hilbert space into its orthogonal projection onto a closed subspace , plus the complementary projection to the orthogonal complement ; we will see the relationship between the two decompositions later when we pass to the ultralimit.
We need to make the notion of “structured” more precise. We begin with some definitions. We say that a function has Fourier complexity at most if it can be expressed as
for some and some complex numbers of magnitude at most , and some real numbers . Note that from the Fourier inversion formula that every function will have some finite Fourier complexity, but typically one expects the complexity to grow with ; only a few special functions will have complexity bounded uniformly in . Also note that if have Fourier complexity then , or all have Fourier complexity at most ; informally, the space of bounded complexity functions forms an algebra. (We will be able to formalise this statement after we take ultralimits.)
Ideally, we would like to take “functions of bounded Fourier complexity” as our class of structured functions. For technical reasons (related to our desire to use indicator functions as structured functions), we need to take an closure and work with the wider class of Fourier measurable functions as our structured class.
A subset of is Fourier measurable with growth function if is Fourier measurable with this growth function.
Exercise 8 Show that every interval in is Fourier measurable with some growth function independent of . (Hint: apply Fejér summation to .)
Exercise 9 Let be a Fourier-measurable function with some growth function , which is bounded in magnitude by . Show that for every , one can find a function which also is bounded in magnitude by , and of Fourier complexity , such that . (Hint: start with the approximating function from Definition 6, which is already bounded in magnitude by , and then set where is a polynomial bounded in magnitude by on the ball of radius which is close to the identity function on the ball of radius (such a function can be constructed via the Stone-Weierstrass theorem).)
Conclude that if are Fourier-measurable with growth function , then , , and are Fourier-measurable with some growth function depending only on .
We thus see that Fourier-measurable sets morally form a Boolean algebra. (Again, we can formalise this assertion once we pass to the ultralimit.)
Now we make a key observation (cf. this paper by Reingold, Trevisan, Tulsiani, and Vadhan).
Lemma 7 (Correlation with a Fourier character implies correlation with a Fourier-measurable set) Let be bounded in magnitude by , and suppose that for some . Then there exists a Fourier-measurable set with some growth function depending on , such that .
Proof: Rotating , we may find a real number such that
We then express
By Minkowski’s inequality, we thus have either
In the former case we are done (setting ), so suppose that the latter holds. If all the were uniformly Fourier-measurable, we would now be done in this case also by the pigeonhole principle. This is not quite true; however, it turns out that most are uniformly measurable, and this will be enough. More precisely, let be a small parameter to be chosen later, and say that is good if one has
for all . Let be the set of all bad . Observe that for each bad , we have , where is the probability measure
and is the Hardy-Littlewood maximal function
Applying the Hardy-Littlewood maximal inequality, we conclude that . In particular, if is small enough compared with , we have
and so by the pigeonhole principle, there exists a good such that
It remains to verify that is good. For any , we have (as is good) that
Applying Urysohn’s lemma, we can thus find a smooth function with for and for such that
Using the Weierstrass approximation theorem, one can then approximate uniformly by on by a polynomial of degree and coefficients . This allows one to approximate in norm to an accuracy of by a function of Fourier complexity , and the claim follows.
Corollary 8 (Correlation implies energy increment) Let , and let be a factor generated by at most atoms, each of which is Fourier-measurable with growth function . Suppose that we have the correlation
Proof: By Lemma 7, we can find a Fourier-measurable set with some growth function depending on , such that
We let be the factor generated by and . As is measurable with respect to , we may project onto and conclude that
By Cauchy-Schwarz, we thus have
We can then iterate this corollary via an energy increment argument to obtain
Proposition 9 (Weak arithmetic regularity lemma) Let , and let be a factor generated by at most atoms, each of which is Fourier-measurable with growth function . Let . Then there exists an extension of generated by atoms, each of which is Fourier-measurable with growth function depending on , such that
for all .
Proof: We initially set equal to . If (6) already holds, then we are done; otherwise, we invoke Corollary 8 to increase the “energy” by , at the cost of possibly doubling the number of atoms in , and also altering the growth function somewhat. We iterate this procedure; as the energy is bounded between zero and one, and increases by at each step, the procedure must terminate in steps, at which point the claim follows.
It turns out that the power of this lemma is amplified if we iterate one more time, to obtain
- (Nonnegativity) take values in , and have mean zero;
- (Structure) is Fourier-measurable with a growth function that depends only on ;
- (Smallness) has an norm of at most ; and
- (Pseudorandomness) One has for all .
Proof: We recursively define a sequence by setting and (say). Applying Proposition 9 (starting with the trivial factor ), one can then find a nested sequence of refinements , such that
for all and , and such that each consists of atoms that are Fourier-measurable with some growth function depending on (note that this quantity dominates and by construction). By Pythagoras’ theorem, the energies are monotone increasing between and , so by the pigeonhole principle there exists such that
and hence by Pythagoras
Setting , , , we obtain the claim.
Remark 2 This result is essentially due to Green (though not quite in this language). Earlier related decompositions are due to Bourgain and to Green-Konyagin. The Szemerédi regularity lemma in graph theory can be viewed as the graph-theoretic analogue of this Fourier-analytic result; see this paper of mine (or my FOCS paper) for further discussion. The double iteration required to prove Theorem 10 means that the bounds here are quite poor (of tower-exponential type, in fact, when is exponential, which is typical in applications), much as in the graph theory case; thus the use of this lemma, while technically quantitative in nature, gives bounds that are usually quite inferior to what is known or suspected to be true.
As with the Ratner-type theorems from the previous notes, it is crucial that the uniformity for the pseudorandom component is of an arbitrarily higher quality than the measurability of the structured component .
Much as the Ratner-type theorems from the previous notes could be used to prove multiple recurrence theorems, the arithmetic regularity lemma can be used (among other things) to give a proof of Roth’s theorem. We do so as follows. Let be a large integer, and let be a subset of with for some . We consider the expression , where is the trilinear form
which implies that the number of all three-term arithmetic progressions in (including the degenerate ones with ) is . For sufficiently large depending on , this number is larger than the number of degenerate progressions, giving the theorem.
with the stated properties. This splits the left-hand side of (7) into terms. But we can eliminate several of these terms:
Now we need to deal with . A key point is the almost periodicity of :
Lemma 11 (Almost periodicity) For values of , one has
(where we extend by zero outside of ).
of Fourier complexity . From the smallness of , we then have
Using multiple recurrence, we can find values of such that for all . The claim follows.
Now we can finish the proof of Roth’s theorem. As has the same mean as , we have
and hence by Hölder’s inequality (and the non-negativity of )
Now if is one of the periods in the above lemma, we have
and thus by shifting
and so by the triangle inequality
Putting all this together using the triangle and Hölder inequalities, we obtain
Thus, if is sufficiently small depending on , we have
for values of , and thus
if we then set to be a sufficiently rapidly growing function (depending on ), we obtain the claim from (8). This concludes the proof of Roth’s theorem.
Exercise 12 Use the energy increment method to establish a different proof of Exercise 7. (Hint: For the multiple recurrence step, use a pigeonhole principle argument rather than an appeal to equidistribution theory.)
We now briefly indicate how to translate the above arguments into the ultralimit setting. We first need to construct an important measure on limit sets, namely Loeb measure.
Exercise 13 (Construction of Loeb measure) Let be an unbounded natural number. Define the Loeb measure of a limit subset of to be the quantity , thus for instance a set of cardinality will have Loeb measure zero.
- Show that if a limit subset of is partitioned into countably many disjoint limit subsets , that all but finitely many of the are empty, and so .
- Define the outer measure of a subset of (not necessarily a limit subset) to be the infimum of , where is a countable family of limit subsets of that cover , and call a subset of null if it has zero outer measure. Call a subset Loeb measurable if it differs from a limit set by a null set. Show that there is a unique extension of Loeb measure from limit sets to Loeb measurable sets that is a countably additive probability measure on . (Hint: use the Carathéodory extension theorem, see e.g. my 254B notes 0 or notes 0a.)
- If is a limit function bounded in magnitude by some standard real , show that is a Loeb measurable function in , with norm at most .
- Show that there exists a unique trilinear form , jointly continuous in the topology for all three inputs, such that
for all bounded limit functions .
- Show that Roth’s theorem is equivalent to the assertion that whenever is a bounded non-negative function with .
Next, we develop the ultralimit analogue of Fourier measurability, which we will rename Kronecker measurability due to the close analogy with the Kronecker factor in ergodic theory.
Exercise 14 (Construction of the Kronecker factor) Let be an unbounded natural number. We define a Fourier character to be a function in of the form for some limit real number . We define a trigonometric polynomial to be any finite linear combination (over the standard complex numbers) of Fourier characters. Let be the -algebra of Loeb measurable sets generated by the Fourier characters; we refer to as the Kronecker factor, and functions or sets measurable in this factor as Kronecker measurable functions and sets. Thus for instance all trigonometric polynomials are Kronecker measurable. We let denote the orthogonal projection from to , i.e. the conditional expectation to the Kronecker factor.
- Show that if is bounded in magnitude by and is a standard real, then there exists a trigonometric polynomial which is also bounded in magnitude by and is within of in norm.
- Show that if and , then there exists a limit subset of of cardinality such that for all (extending by zero).
- Show that if is non-negative with , then .
- Show that if and for at least one , then .
- Conclude the proof of Roth’s theorem using ultralimits.
Remark 3 Note how the (finitary) arithmetic regularity lemma has been replaced by the more familiar (infinitary) theory of conditional expectation to a factor, and the finitary notion of measurability has been replaced by a notion from the traditional (countably additive) infinitary theory of measurability. This is one of the key advantages of the ultralimit approach, namely that it allows one to exploit already established theories of infinitary mathematics (e.g. measure theory, ergodic theory, Hilbert space geometry, etc.) to prove a finitary result.
Exercise 15 Use the ultralimit energy increment method to establish yet another proof of Exercise 7.
— 3. More quantitative bounds (optional) —
The above proofs of Roth’s theorem (as formulated in, say, Theorem 2) were qualitative in the sense that they did not explicitly give a bound for in terms of . Nevertheless, by analysing the finitary arguments more carefully, a bound can be extracted:
Exercise 16 Show that in Proposition 5, one can take . Using this and the density increment argument, show that one can take in Theorem 2. (To put it another way, subsets of of density much larger than will contain progressions of length three.)
Exercise 17 Show that in the energy increment proof of Roth’s theorem, one can take the growth functions involved to be polynomial in (but with the exponent growing exponentially with each refinement of the factor), and can be taken to be an iterated exponential; thus ultimately allows one to take to be a tower exponential of height . (To put it another way, subsets of of density much larger than for some will contain progressions of length three.) Thus we see that the energy increment argument, in the form presented here, provides much worse bounds than the density increment argument; but see below.
For the ultralimit arguments, it is significantly harder to extract a quantitative bound from the argument (basically one has to painstakingly “finitise” the argument first, essentially reaching the finitary counterparts of these arguments presented above). Thus we see that there is a tradeoff when using ultralimit analysis; the arguments become slightly cleaner (and one can deploy infinitary methods), but one tends to lose sight of what quantitative bounds the method establishes. (This is particularly the case if one begins to rely heavily on the axiom of choice (or on large cardinal axioms) once one takes ultralimits, although these axioms are not used in the examples above.)
It is possible to run the density increment argument more efficiently by combining it with some aspects of the energy increment argument. As described above, the density increment argument proceeds by locating a single large Fourier coefficient of , and uses this to obtain a density increment on a relatively short subprogression of (of length comparable to , ignoring factors of ). One then has to iterate this about times before one obtains a truly significant density increment (e.g. from to ). It is this repeated passage from to which is ultimately responsible for the double exponential bound for at the end of the day.
In an unpublished work, Endre Szemerédi observed that one can run this argument more efficiently by collecting several large Fourier coefficients of simultaneously (somewhat in the spirit of the energy increment argument), and only then passing to a subprogression on which all of the relevant Fourier characters are simultaneously close to constant. The subprogression obtained is smaller as a consequence, but the density increment is much more substantial. Using this strategy, Endre was able to improve the original Roth bound of to the somewhat better (or equivalently, he was able to establish length three progressions in any subset of of density much larger than for some ). By carefully optimising the choice of threshold for selecting the “large Fourier coefficients”, Szemerédi (unpublished) and Heath-Brown independently improved this method further to obtain , or equivalently obtaining length three progressions in sets of density much larger than . (This result was extended to arbitrary finite abelian groups by Meshulam.)
The next advance was by Bourgain, who realised that rather than pass to short subprogressions, it was more efficient to work on the significantly larger (but messier) Bohr sets , after ensuring that such Bohr sets were regular (this condition is closely related to the Fourier measurability condition used in the energy increment argument). With this modification to the original Roth argument, the bound was lowered to , or equivalently obtaining length three progressions in sets of density much larger than . Even more recently, this argument was combined with the Szemerédi-Heath-Brown argument by Bourgain (not yet published) to obtain the further improvement of , or equivalently obtaining length three progressions in sets of density much larger than . This is getting close to the case of an old conjecture of Erdös that asserts that any subset of the natural numbers whose sums of reciprocals diverge should have infinitely many arithmetic progressions of length for any . To establish the case from quantitative versions of Roth’s theorem, one would basically need a bound of the form for some (or the ability to obtain progressions in sets of density ).
On the other hand, there is an old counterexample of Behrend (based ultimately on the observation that a sphere in a high-dimensional lattice does not contain any arithmetic progressions of length three) which shows that must be at least (in particular, it must be super-polynomial in ); equivalently, it is known that there are subsets of of density about with no arithmetic progressions of length three. For the sharpest results in this direction, see this paper of Elkin (and a later exposition of Green and Wolf).
The question of refining the bounds is an important one, as it tends to improve the technological understanding of current methods, as well as shed light on their relative strengths and weaknesses. However, this comes at the cost of making the arguments somewhat more technical, and so we shall not focus on the sharpest quantitative results in these notes.