Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for {r_4(N)}“, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).

For any natural number {N}, define {r_4(N)} to be the largest cardinality of a subset {A} of {[N] = \{1,\dots,N\}} which does not contain any non-trivial arithmetic progressions {a, a+r, a+2r, a+3r} of length four (where “non-trivial” means that {r} is non-zero). Trivially we have {r_4(N) \leq N}. In 1969, Szemerédi showed that {r_4(N) = o(N)}. However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that {r_4(N) \ll N (\log \log N)^{-c}} for some absolute constant {c>0}. In the second paper in the above-mentioned series, we managed to improve this bound to {r_4(N) \ll N \exp( - c \sqrt{\log \log N})}. In this paper, we improve the bound further to {r_4(N) \ll N (\log N)^{-c}}, which seems to be the limit of the methods. (We remark that if we could take {c} to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on {r_3(N)} one can take any {c} less than one.)

Most of the previous work on bounding {r_4(N)} relied in some form or another on the density increment argument introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset {A} of {[N]} fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of {[N]} in which {A} has increased density. This was the basic method for instance underlying our previous bound {r_4(N) \ll N \exp( - c \sqrt{\log \log N})}, as well as a finite field analogue of the bound {r_4(N) \ll N (\log N)^{-c}}; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.

One way to phrase the latter recurrence theorem is as follows. Suppose that {A \subset [N]} has density {\delta}. Then one would expect a “randomly” selected arithmetic progression {{\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r}} in {[N]} (using the convention that random variables will be in boldface) to be contained in {A} with probability about {\delta^4}. This is not true in general, however it was shown by Ben and myself that for any {\eta>0}, there was a set of shifts {r \in [-N,N]} of cardinality {\gg_{\delta,\eta} N}, such that for any such {r} one had

\displaystyle {\bf P}( {\bf a}, {\bf a}+r, {\bf a}+2r, {\bf a}+3r \in A ) \geq \delta^4 - \eta

if {{\bf a}} was chosen uniformly at random from {[N]}. This easily implies that {r_4(N) = o(N)}, but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound {\gg_{\delta,\eta} N} is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take {N} to be extremely large compared to {\delta,\eta} to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift {r=0}.

We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to couple together the two parameters {{\bf a}} and {{\bf r}} of the length four progression. Namely, with {A}, {\delta}, {\eta} as above, we are able to show that there exist random variables {{\bf a}, {\bf r}}, not necessarily independent, such that

\displaystyle {\bf P}( {\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r} \in A ) \geq \delta^4 - \eta \ \ \ \ \ (1)

and such that we have the non-degeneracy bound

\displaystyle {\bf P}( {\bf r} = 0 ) \ll \exp( \eta^{-O(1)} ) / N.

This then easily implies the main theorem.

The energy increment method is then deployed to locate a good pair {({\bf a}, {\bf r})} of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function {1_A} “behaves like” a globally quadratic function such as {F( \alpha n^2 )}, for some irrational {\alpha} and some smooth periodic function {F: {\bf R}/{\bf Z} \rightarrow {\bf R}} of mean {\delta}. If one then takes {{\bf a}, {\bf r}} to be uniformly distributed in {[N]} and {[-\varepsilon N, \varepsilon N]} respectively for some small {\varepsilon>0}, with no coupling between the two variables, then the left-hand side of (1) is approximately of the form

\displaystyle \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F(x) F(y) F(z) F(w) \ \ \ \ \ (2)

where the integral is with respect to the probability Haar measure, and the constraint {x-3y+3z-w=0} ultimately arises from the algebraic constraint

\displaystyle \alpha {\bf a}^2 - 3 \alpha ({\bf a}+{\bf r})^2 + 3 \alpha ({\bf a}+2{\bf r})^2 - \alpha ({\bf a}+3{\bf r})^2 = 0.

However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least {(\int_{{\bf R}/{\bf Z}} F)^4}, which (morally at least) gives (1) in this case.

Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which {[N]} is partitioned into some number of structured pieces {B_c} (think of these as arithmetic progressions, or as “Bohr sets), and on each piece {B_c}, {1_A} behaves like a locally quadratic function such as {F_c( \alpha_c n^2 )}, where {\alpha_c} now varies with {c}, and the mean of {F_c} will be approximately {\delta} on the average after averaging in {c} (weighted by the size of the pieces {B_c}). Now one should select {{\bf a}} and {{\bf r}} in the following coupled manner: first one chooses {{\bf a}} uniformly from {[N]}, then one defines {{\bf c}} to be the label {c} such that {{\bf a} \in B_c}, and then selects {{\bf r}} uniformly from a set {B_{c,\varepsilon}} which is related to {B_c} in much the same way that {[-\varepsilon N, \varepsilon N]} is related to {[N]}. If one does this correctly, the analogue of (2) becomes

\displaystyle {\bf E} \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F_{\mathbf c}(x) F_{\mathbf c}(y) F_{\mathbf c}(z) F_{\mathbf c}(w),

and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.

The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of {1_A} which involves a decomposition of {[N]} into structured pieces {B_c}, and a quadratic approximation to {1_A} on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm {U^3} of the error is small enough) to model the count in (1) (for random variables {{\bf a}, {\bf r}} determined by the above partition of {[N]} into pieces {B_c}), and if the frequencies (such as {\alpha_c}) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm {U^3}. A significant fraction of the paper is then devoted to establishing a quantitative inverse theorem for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to {1_A} in a manner that significantly increases its “energy” (basically an {L^2} norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.

There are existing inverse theorems for {U^3} type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “{1\%}-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “{99\%}-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse pseudorandom subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this {99\%}-structured homomorphism on pseudorandom subsets of Bohr sets to a {100\%}-structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a {1}-form on {{\bf R}^n} that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the {1}-form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.