You are currently browsing the tag archive for the ‘Szemeredi’s theorem’ tag.

Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for {r_4(N)}“, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).

For any natural number {N}, define {r_4(N)} to be the largest cardinality of a subset {A} of {[N] = \{1,\dots,N\}} which does not contain any non-trivial arithmetic progressions {a, a+r, a+2r, a+3r} of length four (where “non-trivial” means that {r} is non-zero). Trivially we have {r_4(N) \leq N}. In 1969, Szemerédi showed that {r_4(N) = o(N)}. However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that {r_4(N) \ll N (\log \log N)^{-c}} for some absolute constant {c>0}. In the second paper in the above-mentioned series, we managed to improve this bound to {r_4(N) \ll N \exp( - c \sqrt{\log \log N})}. In this paper, we improve the bound further to {r_4(N) \ll N (\log N)^{-c}}, which seems to be the limit of the methods. (We remark that if we could take {c} to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on {r_3(N)} one can take any {c} less than one.)

Most of the previous work on bounding {r_4(N)} relied in some form or another on the density increment argument introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset {A} of {[N]} fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of {[N]} in which {A} has increased density. This was the basic method for instance underlying our previous bound {r_4(N) \ll N \exp( - c \sqrt{\log \log N})}, as well as a finite field analogue of the bound {r_4(N) \ll N (\log N)^{-c}}; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.

One way to phrase the latter recurrence theorem is as follows. Suppose that {A \subset [N]} has density {\delta}. Then one would expect a “randomly” selected arithmetic progression {{\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r}} in {[N]} (using the convention that random variables will be in boldface) to be contained in {A} with probability about {\delta^4}. This is not true in general, however it was shown by Ben and myself that for any {\eta>0}, there was a set of shifts {r \in [-N,N]} of cardinality {\gg_{\delta,\eta} N}, such that for any such {r} one had

\displaystyle {\bf P}( {\bf a}, {\bf a}+r, {\bf a}+2r, {\bf a}+3r \in A ) \geq \delta^4 - \eta

if {{\bf a}} was chosen uniformly at random from {[N]}. This easily implies that {r_4(N) = o(N)}, but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound {\gg_{\delta,\eta} N} is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take {N} to be extremely large compared to {\delta,\eta} to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift {r=0}.

We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to couple together the two parameters {{\bf a}} and {{\bf r}} of the length four progression. Namely, with {A}, {\delta}, {\eta} as above, we are able to show that there exist random variables {{\bf a}, {\bf r}}, not necessarily independent, such that

\displaystyle {\bf P}( {\bf a}, {\bf a}+{\bf r}, {\bf a}+2{\bf r}, {\bf a}+3{\bf r} \in A ) \geq \delta^4 - \eta \ \ \ \ \ (1)

and such that we have the non-degeneracy bound

\displaystyle {\bf P}( {\bf r} = 0 ) \ll \exp( - \eta^{-O(1)} ) / N.

This then easily implies the main theorem.

The energy increment method is then deployed to locate a good pair {({\bf a}, {\bf r})} of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function {1_A} “behaves like” a globally quadratic function such as {F( \alpha n^2 )}, for some irrational {\alpha} and some smooth periodic function {F: {\bf R}/{\bf Z} \rightarrow {\bf R}} of mean {\delta}. If one then takes {{\bf a}, {\bf r}} to be uniformly distributed in {[N]} and {[-\varepsilon N, \varepsilon N]} respectively for some small {\varepsilon>0}, with no coupling between the two variables, then the left-hand side of (1) is approximately of the form

\displaystyle \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F(x) F(y) F(z) F(w) \ \ \ \ \ (2)

where the integral is with respect to the probability Haar measure, and the constraint {x-3y+3z-w=0} ultimately arises from the algebraic constraint

\displaystyle \alpha {\bf a}^2 - 3 \alpha ({\bf a}+{\bf r})^2 + 3 \alpha ({\bf a}+2{\bf r})^2 - \alpha ({\bf a}+3{\bf r})^2 = 0.

However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least {(\int_{{\bf R}/{\bf Z}} F)^4}, which (morally at least) gives (1) in this case.

Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which {[N]} is partitioned into some number of structured pieces {B_c} (think of these as arithmetic progressions, or as “Bohr sets), and on each piece {B_c}, {1_A} behaves like a locally quadratic function such as {F_c( \alpha_c n^2 )}, where {\alpha_c} now varies with {c}, and the mean of {F_c} will be approximately {\delta} on the average after averaging in {c} (weighted by the size of the pieces {B_c}). Now one should select {{\bf a}} and {{\bf r}} in the following coupled manner: first one chooses {{\bf a}} uniformly from {[N]}, then one defines {{\bf c}} to be the label {c} such that {{\bf a} \in B_c}, and then selects {{\bf r}} uniformly from a set {B_{c,\varepsilon}} which is related to {B_c} in much the same way that {[-\varepsilon N, \varepsilon N]} is related to {[N]}. If one does this correctly, the analogue of (2) becomes

\displaystyle {\bf E} \int_{(x,y,z,w) \in ({\bf R}/{\bf Z})^4: x-3y+3z-w = 0} F_{\mathbf c}(x) F_{\mathbf c}(y) F_{\mathbf c}(z) F_{\mathbf c}(w),

and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.

The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of {1_A} which involves a decomposition of {[N]} into structured pieces {B_c}, and a quadratic approximation to {1_A} on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm {U^3} of the error is small enough) to model the count in (1) (for random variables {{\bf a}, {\bf r}} determined by the above partition of {[N]} into pieces {B_c}), and if the frequencies (such as {\alpha_c}) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm {U^3}. A significant fraction of the paper is then devoted to establishing a quantitative inverse theorem for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to {1_A} in a manner that significantly increases its “energy” (basically an {L^2} norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.

There are existing inverse theorems for {U^3} type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “{1\%}-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “{99\%}-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse pseudorandom subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this {99\%}-structured homomorphism on pseudorandom subsets of Bohr sets to a {100\%}-structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a {1}-form on {{\bf R}^n} that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the {1}-form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.

Szemerédi’s theorem asserts that any subset of the integers of positive upper density contains arbitrarily large arithmetic progressions. Here is an equivalent quantitative form of this theorem:

Theorem 1 (Szemerédi’s theorem) Let {N} be a positive integer, and let {f: {\bf Z}/N{\bf Z} \rightarrow [0,1]} be a function with {{\bf E}_{x \in {\bf Z}/N{\bf Z}} f(x) \geq \delta} for some {\delta>0}, where we use the averaging notation {{\bf E}_{x \in A} f(x) := \frac{1}{|A|} \sum_{x \in A} f(x)}, {{\bf E}_{x,r \in A} f(x) := \frac{1}{|A|^2} \sum_{x, r \in A} f(x)}, etc.. Then for {k \geq 3} we have

\displaystyle  {\bf E}_{x,r \in {\bf Z}/N{\bf Z}} f(x) f(x+r) \dots f(x+(k-1)r) \geq c(k,\delta)

for some {c(k,\delta)>0} depending only on {k,\delta}.

The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases {k=1,2} as they are trivial and somewhat degenerate.

There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.

Read the rest of this entry »

Tamar Ziegler and I have just uploaded to the arXiv our joint paper “A multi-dimensional Szemerédi theorem for the primes via a correspondence principle“. This paper is related to an earlier result of Ben Green and mine in which we established that the primes contain arbitrarily long arithmetic progressions. Actually, in that paper we proved a more general result:

Theorem 1 (Szemerédi’s theorem in the primes) Let {A} be a subset of the primes {{\mathcal P}} of positive relative density, thus {\limsup_{N \rightarrow \infty} \frac{|A \cap [N]|}{|{\mathcal P} \cap [N]|} > 0}. Then {A} contains arbitrarily long arithmetic progressions.

This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.

Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of {{\bf Z}^d} necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:

Theorem 2 (Multidimensional Szemerédi theorem in the primes) Let {d \geq 1}, and let {A} be a subset of the {d^{th}} Cartesian power {{\mathcal P}^d} of the primes of positive relative density, thus

\displaystyle  \limsup_{N \rightarrow \infty} \frac{|A \cap [N]^d|}{|{\mathcal P}^d \cap [N]^d|} > 0.

Then for any {v_1,\ldots,v_k \in {\bf Z}^d}, {A} contains infinitely many “constellations” of the form {a+r v_1, \ldots, a + rv_k} with {a \in {\bf Z}^k} and {r} a positive integer.

In the case when {A} is itself a Cartesian product of one-dimensional sets (in particular, if {A} is all of {{\mathcal P}^d}), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.

The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function {\nu}) of almost primes or almost Gaussian primes, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the linear forms condition and the correlation condition. Very roughly speaking, these conditions assert statements of the following form: if {n} is a randomly selected integer, then the events of {n+h_1,\ldots,n+h_k} simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of {h_1,\ldots,h_k}. Once these conditions are satisfied, one can then run a transference argument (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.

However, when one tries to adapt these arguments to sets such as {{\mathcal P}^2}, a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square {{\mathcal A}^2} of the almost primes – pairs {(n,m)} whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as {\nu(n) \nu(m)} that is concentrated on a set such as {{\mathcal A}^2}, but let me ignore this distinction for now.) However, this set {{\mathcal A}^2} does not enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed {h, k}, and random {(n,m)}, the four events

\displaystyle  (n,m) \in {\mathcal A}^2

\displaystyle  (n+h,m) \in {\mathcal A}^2

\displaystyle  (n,m+k) \in {\mathcal A}^2

\displaystyle  (n+h,m+k) \in {\mathcal A}^2

do not behave independently (as they would if {{\mathcal A}^2} were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form {(n,m), (n+r,m), (n,m+r)}) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for {{\mathcal A}^2} (or for weights concentrated on {{\mathcal A}^2}) when applied to rectangular patterns.

There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the result of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.

In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an infinite number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire {\sigma}-algebra; for this, it is not enough to specify the measure {\mu(A)} of a single set such as {A}, but one also has to specify the measure {\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)} of “cylinder sets” such as {T^{n_1} A \cap \ldots \cap T^{n_m} A} where {m} could be arbitrarily large. The larger {m} gets, the more linear forms conditions one needs to keep the correspondence under control.

With the sieve weights {\nu} we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function {\Lambda}) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure {\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)} of cylinder sets, with each {m} requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity {r_4(F^n)} for a vector space {F^n} over a finite field {F} of characteristic greater than {4}, where {r_4(F^n)} is defined as the cardinality of the largest subset of {F^n} that does not contain an arithmetic progression of length {4}. In our earlier paper, we gave two arguments that bounded {r_4(F^n)} in the regime when the field {F} was fixed and {n} was large. The first “cheap” argument gave the bound

\displaystyle  r_4(F^n) \ll |F|^n \exp( - c \sqrt{\log n} )

and the more complicated “expensive” argument gave the improvement

\displaystyle  r_4(F^n) \ll |F|^n n^{-c} \ \ \ \ \ (1)

for some constant {c>0} depending only on {F}.

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset {A} of {F^n} that has no arithmetic progressions of length {4}, and seeks to locate a subspace on which {A} has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse {U^3} theorem of Ben and myself (and also independently by Samorodnitsky), one approximates {A} by a “quadratically structured” function {f}, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because {A} has no progressions of length {4}, the count of progressions of length {4} weighted by {f} will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which {f} has increased density.

The error in the paper was to conclude from this that the original function {1_A} also had increased density on the same subspace; it turns out that the manner in which {f} approximates {1_A} is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on {r_4(F^n)} one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of {c = 2^{-22}} for the exponent {c} in (1). In fact, it gives the following stronger result:

Theorem 1 Let {A} be a subset of {F^n} of density at least {\alpha}, and let {\epsilon>0}. Then there is a subspace {W} of {F^n} of codimension {O( \epsilon^{-2^{20}})} such that the number of (possibly degenerate) progressions {a, a+r, a+2r, a+3r} in {A \cap W} is at least {(\alpha^4-\epsilon)|W|^2}.

The bound (1) is an easy consequence of this theorem after choosing {\epsilon := \alpha^4/2} and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to {1_A} with a significantly stronger local approximation to {1_A} on a subspace {W}. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse {U^3} theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace {W} on which {A} is quite dense (of density {\alpha-O(\epsilon)}) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem) Let {A} be a set of integers of positive upper density, thus {\lim \sup_{N \rightarrow\infty} \frac{|A \cap [-N,N]|}{|[-N,N]|} > 0}, where {[-N,N] := \{-N, -N+1,\ldots,N\}}. Then {A} contains an arithmetic progression of length {k} for any {k>1}.

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions {P_1 = \{ a, a+r_1, a+2r_1, \ldots, a+(k_1-1)r_1\}}, but also progressions of progressions {P_2 := \{ P_1, P_1 + r_2, P_1+2r_2, \ldots, P_1+(k_2-1)r_2\}}, progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set {A} (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length {k} that consists entirely of elements of {A}.

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression {P, P + r, \ldots, P+(k-1)r} of progressions of length {k}, with each progression {P+ir}, {i=0,\ldots,k-1} being quite long, and containing a near-maximal amount of elements of {A}, thus

\displaystyle  |A \cap (P+ir)| \approx \delta |P|

where {\delta := \lim \sup_{|P| \rightarrow \infty} \frac{|A \cap P|}{|P|}} is the “maximal density” of {A} along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as {\approx} instead.) By hypothesis, {\delta} is positive. The objective is then to locate a progression {a, a+r', \ldots,a+(k-1)r'} in {A}, with each {a+ir} in {P+ir} for {i=0,\ldots,k-1}. It may help to view the progression of progressions {P, P + r, \ldots, P+(k-1)r} as a tall thin rectangle {P \times \{0,\ldots,k-1\}}.

If we write {A_i := \{ a \in P: a+ir \in A \}} for {i=0,\ldots,k-1}, then the problem is equivalent to finding a (possibly degenerate) arithmetic progression {a_0,a_1,\ldots,a_{k-1}}, with each {a_i} in {A_i}.

By hypothesis, we know already that each set {A_i} has density about {\delta} in {P}:

\displaystyle  |A_i \cap P| \approx \delta |P|. \ \ \ \ \ (1)

Let us now make a “weakly mixing” assumption on the {A_i}, which roughly speaking asserts that

\displaystyle  |A_i \cap E| \approx \delta \sigma |P| \ \ \ \ \ (2)

for “most” subsets {E} of {P} of density {\approx \sigma} of a certain form to be specified shortly. This is a plausible type of assumption if one believes {A_i} to behave like a random set, and if the sets {E} are constructed “independently” of the {A_i} in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions {a_0,\ldots,a_{k-1}} of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims {C(i,j)} for each {0 \leq i \leq j < k}:

  • {C(i,j)}: For most choices of {a_j \in P}, there are {\sim \delta^i |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} with the specified choice of {a_j}, such that {a_l \in A_l} for all {l=0,\ldots,i-1}.

(Actually, to avoid boundary issues one should restrict {a_j} to lie in the middle third of {P}, rather than near the edges, but let us ignore this minor technical detail.) The quantity {\delta^i |P|} is natural here, given that there are {\sim |P|} arithmetic progressions {a_0,\ldots,a_{k-1}} in {P} that pass through {a_i} in the {i^{th}} position, and that each one ought to have a probability of {\delta^i} or so that the events {a_0 \in A_0, \ldots, a_{i-1} \in A_{i-1}} simultaneously hold.) If one has the claim {C(k-1,k-1)}, then by selecting a typical {a_{k-1}} in {A_{k-1}}, we obtain a progression {a_0,\ldots,a_{k-1}} with {a_i \in A_i} for all {i=0,\ldots,k-1}, as required. (In fact, we obtain about {\delta^k |P|^2} such progressions by this method.)

We can heuristically justify the claims {C(i,j)} by induction on {i}. For {i=0}, the claims {C(0,j)} are clear just from direct counting of progressions (as long as we keep {a_j} away from the edges of {P}). Now suppose that {i>0}, and the claims {C(i-1,j)} have already been proven. For any {i \leq j < k} and for most {a_j \in P}, we have from hypothesis that there are {\sim \delta^{i-1} |P|} progressions {a_0,\ldots,a_{k-1}} in {P} through {a_j} with {a_0 \in A_0,\ldots,a_{i-2}\in A_{i-2}}. Let {E = E(a_j)} be the set of all the values of {a_{i-1}} attained by these progressions, then {|E| \sim \delta^{i-1} |P|}. Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of {a_j}, we have

\displaystyle  |A_{i-1} \cap E| \sim \delta^i |P|

which then gives the desired claim {C(i,j)}.

The observant reader will note that we only needed the claim {C(i,j)} in the case {j=k-1} for the above argument, but for technical reasons, the full proof requires one to work with more general values of {j} (also the claim {C(i,j)} needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block {A_i} of {A}, one can easily concoct a scenario in which this hypothesis fails, by choosing {E} to overlap with {A_i} too strongly, or to be too disjoint from {A_i}. However, one can do better if one can select {A_i} from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). Let {E} be a subset of {P} of density {\approx \sigma}. Then (if {M} is large enough) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E| \lessapprox \delta \sigma |P|.

Proof: The key is the double counting identity

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| = \sum_{a \in E} |A \cap \{ a, a+r, \ldots, a+(M-1) r\}|.

Because {A} has maximal density {\delta} and {M} is large, we have

\displaystyle  |A \cap \{ a, a+r, \ldots, a+(M-1) r\}| \lessapprox \delta M

for each {a}, and thus

\displaystyle  \sum_{i=0}^{M-1} |A_i \cap E| \lessapprox \delta M |E|.

The claim then follows from the pigeonhole principle. \Box

Now suppose we want to obtain weak mixing not just for a single set {E}, but for a small number {E_1,\ldots,E_m} of such sets, i.e. we wish to find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|. \ \ \ \ \ (3)

for all {j=1,\ldots,m}, where {\sigma_j} is the density of {E_j} in {P}. The above proposition gives, for each {j}, a choice of {i} for which (3) holds, but it could be a different {i} for each {j}, and so it is not immediately obvious how to use Proposition 2 to find an {i} for which (3) holds simultaneously for all {j}. However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_j} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {j}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: Suppose that the claim failed (for some suitably large {M}). Then, for each {i = 0,\ldots,M-1}, there exists {j \in \{1,\ldots,m\}} such that

\displaystyle  |A_i \cap E_j| \gg \delta \sigma_j |P|.

This can be viewed as a colouring of the interval {\{1,\ldots,M\}} by {m} colours. If we take {M} large compared to {m}, van der Waerden’s theorem allows us to then find a long subprogression of {\{1,\ldots,M\}} which is monochromatic, so that {j} is constant on this progression. But then this will furnish a counterexample to Proposition 2. \Box

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {1 \leq j \leq m}, let {E_j} be a subset of {P} of density {\approx \sigma_j}. Then (if {M} is large enough depending on {m}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_j| \approx \delta \sigma_j |P|

simultaneously for all {1 \leq j \leq m}.

Proof: By applying the previous proposition to the collection of sets {E_1,\ldots,E_m} and their complements {P\backslash E_1,\ldots,P \backslash E_m} (thus replacing {m} with {2m}, one can find an {i} for which

\displaystyle  |A_i \cap E_j| \lessapprox \delta \sigma_j |P|


\displaystyle  |A_i \cap (P \backslash E_j)| \lessapprox \delta (1-\sigma_j) |P|

which gives the claim. \Box

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number {m} of sets {E_1,\ldots,E_m} for which mixing is established is too small compared with the length {M} of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of {E_i} to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing) Let {P, P+r, \ldots, P+(M-1)r} be a progression of progressions {P+ir} for some large {M}. Suppose that for each {i=0,\ldots,M-1}, the set {A_i := \{ a \in P: a+ir \in A \}} has density {\approx \delta} in {P} (i.e. (1) holds). For each {v} in some (large) finite set {V}, let {E_v} be a subset of {P} of density {\approx \sigma_v}. Then (if {M} is large enough, but not dependent on the size of {V}) one can find an {i = 0,\ldots,M-1} such that

\displaystyle  |A_i \cap E_v| \approx \delta \sigma_v |P|

simultaneously for almost all {v \in V}.

Proof: We build a bipartite graph {G = (P, V, E)} connecting the progression {P} to the finite set {V} by placing an edge {(a,v)} between an element {a \in P} and an element {v \in V} whenever {a \in E_v}. The number {|E_v| \approx \sigma_v |P|} can then be interpreted as the degree of {v} in this graph, while the number {|A_i \cap E_v|} is the number of neighbours of {v} that land in {A_i}.

We now apply the regularity lemma to this graph {G}. Roughly speaking, what this lemma does is to partition {P} and {V} into almost equally sized cells {P = P_1 \cup \ldots P_m} and {V = V_1 \cup \ldots V_m} such that for most pairs {P_j, V_k} of cells, the graph {G} resembles a random bipartite graph of some density {d_{jk}} between these two cells. The key point is that the number {m} of cells here is bounded uniformly in the size of {P} and {V}. As a consequence of this lemma, one can show that for most vertices {v} in a typical cell {V_k}, the number {|E_v|} is approximately equal to

\displaystyle  |E_v| \approx \sum_{j=1}^m d_{ij} |P_j|

and the number {|A_i \cap E_v|} is approximately equal to

\displaystyle  |A_i \cap E_v| \approx \sum_{j=1}^m d_{ij} |A_i \cap P_j|.

The point here is that the {|V|} different statistics {|A_i \cap E_v|} are now controlled by a mere {m} statistics {|A_i \cap P_j|} (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an {i} for which

\displaystyle  |A_i \cap P_j| \approx \delta |P_j|

simultaneously for all {j=1,\ldots,m}, and the claim follows. \Box

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions {P, P+r, \ldots, P+(k-1)r} of progressions containing a near-maximal density of elements of {A}, we will now have to work with a family {(P_\lambda, P_\lambda+r_\lambda,\ldots,P_\lambda+(k-1)r_\lambda)_{\lambda \in \Lambda}} of such progression of progressions, where {\Lambda} ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given {i}, it should be possible to find many reasonably large subfamilies of this family for which the {i^{th}} terms {P_\lambda + i r_\lambda} of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets {E_v} in Proposition 5 are not allowed to depend on {i}, one also needs the progressions {P_\lambda + i' r_\lambda} for any given {0 \leq i' < i} to be “similar” in the sense that they intersect {A} in the same fashion (thus the sets {A \cap (P_\lambda + i' r_\lambda)} as {\lambda} varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set {\Lambda} in order to gain good mixing properties for at least one of the sets {P_\lambda +i r_\lambda} in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank {d} than the rank {2} progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length {k} progressions, one has to consider generalised progressions of rank as high as {2^k+1}. It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by {A} in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank {d}, a family of large generalised progressions of rank {d-1} which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of {i}. (But getting this sort of property for all values of {i} simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index {i}, one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank {2^k+1} that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

In 1977, Furstenberg established his multiple recurrence theorem:

Theorem 1 (Furstenberg multiple recurrence) Let {(X, {\mathcal B}, \mu, T)} be a measure-preserving system, thus {(X,{\mathcal B},\mu)} is a probability space and {T: X \rightarrow X} is a measure-preserving bijection such that {T} and {T^{-1}} are both measurable. Let {E} be a measurable subset of {X} of positive measure {\mu(E) > 0}. Then for any {k \geq 1}, there exists {n > 0} such that

\displaystyle  E \cap T^{-n} E \cap \ldots \cap T^{-(k-1)n} E \neq \emptyset.

Equivalently, there exists {n > 0} and {x \in X} such that

\displaystyle  x, T^n x, \ldots, T^{(k-1)n} x \in E.

As is well known, the Furstenberg multiple recurrence theorem is equivalent to Szemerédi’s theorem, thanks to the Furstenberg correspondence principle; see for instance these lecture notes of mine.

The multiple recurrence theorem is proven, roughly speaking, by an induction on the “complexity” of the system {(X,{\mathcal X},\mu,T)}. Indeed, for very simple systems, such as periodic systems (in which {T^n} is the identity for some {n>0}, which is for instance the case for the circle shift {X = {\bf R}/{\bf Z}}, {Tx := x+\alpha} with a rational shift {\alpha}), the theorem is trivial; at a slightly more advanced level, almost periodic (or compact) systems (in which {\{ T^n f: n \in {\bf Z} \}} is a precompact subset of {L^2(X)} for every {f \in L^2(X)}, which is for instance the case for irrational circle shifts), is also quite easy. One then shows that the multiple recurrence property is preserved under various extension operations (specifically, compact extensions, weakly mixing extensions, and limits of chains of extensions), which then gives the multiple recurrence theorem as a consequence of the Furstenberg-Zimmer structure theorem for measure-preserving systems. See these lecture notes for further discussion.

From a high-level perspective, this is still one of the most conceptual proofs known of Szemerédi’s theorem. However, the individual components of the proof are still somewhat intricate. Perhaps the most difficult step is the demonstration that the multiple recurrence property is preserved under compact extensions; see for instance these lecture notes, which is devoted entirely to this step. This step requires quite a bit of measure-theoretic and/or functional analytic machinery, such as the theory of disintegrations, relatively almost periodic functions, or Hilbert modules.

However, I recently realised that there is a special case of the compact extension step – namely that of finite extensions – which avoids almost all of these technical issues while still capturing the essence of the argument (and in particular, the key idea of using van der Waerden’s theorem). As such, this may serve as a pedagogical device for motivating this step of the proof of the multiple recurrence theorem.

Let us first explain what a finite extension is. Given a measure-preserving system {X = (X,{\mathcal X},\mu,T)}, a finite set {Y}, and a measurable map {\rho: X \rightarrow \hbox{Sym}(Y)} from {X} to the permutation group of {Y}, one can form the finite extension

\displaystyle X \ltimes_\rho Y = (X \times Y, {\mathcal X} \times {\mathcal Y}, \mu \times \nu, S),

which as a probability space is the product of {(X,{\mathcal X},\mu)} with the finite probability space {Y = (Y, {\mathcal Y},\nu)} (with the discrete {\sigma}-algebra and uniform probability measure), and with shift map

\displaystyle  S(x, y) := (Tx, \rho(x) y). \ \ \ \ \ (1)

One easily verifies that this is indeed a measure-preserving system. We refer to {\rho} as the cocycle of the system.

An example of finite extensions comes from group theory. Suppose we have a short exact sequence

\displaystyle  0 \rightarrow K \rightarrow G \rightarrow H \rightarrow 0

of finite groups. Let {g} be a group element of {G}, and let {h} be its projection in {H}. Then the shift map {x \mapsto gx} on {G} (with the discrete {\sigma}-algebra and uniform probability measure) can be viewed as a finite extension of the shift map {y \mapsto hy} on {H} (again with the discrete {\sigma}-algebra and uniform probability measure), by arbitrarily selecting a section {\phi: H \rightarrow G} that inverts the projection map, identifying {G} with {H \times K} by identifying {k \phi(y)} with {(y,k)} for {y \in H, k \in K}, and using the cocycle

\displaystyle  \rho(y) := \phi(hy)^{-1} g \phi(y).

Thus, for instance, the unit shift {x \mapsto x+1} on {{\bf Z}/N{\bf Z}} can be thought of as a finite extension of the unit shift {x \mapsto x+1} on {{\bf Z}/M{\bf Z}} whenever {N} is a multiple of {M}.

Another example comes from Riemannian geometry. If {M} is a Riemannian manifold that is a finite cover of another Riemannian manifold {N} (with the metric on {M} being the pullback of that on {N}), then (unit time) geodesic flow on the cosphere bundle of {M} is a finite extension of the corresponding flow on {N}.

Here, then, is the finite extension special case of the compact extension step in the proof of the multiple recurrence theorem:

Proposition 2 (Finite extensions) Let {X \rtimes_\rho Y} be a finite extension of a measure-preserving system {X}. If {X} obeys the conclusion of the Furstenberg multiple recurrence theorem, then so does {X \rtimes_\rho Y}.

Before we prove this proposition, let us first give the combinatorial analogue.

Lemma 3 Let {A} be a subset of the integers that contains arbitrarily long arithmetic progressions, and let {A = A_1 \cup \ldots \cup A_M} be a colouring of {A} by {M} colours (or equivalently, a partition of {A} into {M} colour classes {A_i}). Then at least one of the {A_i} contains arbitrarily long arithmetic progressions.

Proof: By the infinite pigeonhole principle, it suffices to show that for each {k \geq 1}, one of the colour classes {A_i} contains an arithmetic progression of length {k}.

Let {N} be a large integer (depending on {k} and {M}) to be chosen later. Then {A} contains an arithmetic progression of length {N}, which may be identified with {\{0,\ldots,N-1\}}. The colouring of {A} then induces a colouring on {\{0,\ldots,N-1\}} into {M} colour classes. Applying (the finitary form of) van der Waerden’s theorem, we conclude that if {N} is sufficiently large depending on {M} and {k}, then one of these colouring classes contains an arithmetic progression of length {k}; undoing the identification, we conclude that one of the {A_i} contains an arithmetic progression of length {k}, as desired. \Box

Of course, by specialising to the case {A={\bf Z}}, we see that the above Lemma is in fact equivalent to van der Waerden’s theorem.

Now we prove Proposition 2.

Proof: Fix {k}. Let {E} be a positive measure subset of {X \rtimes_\rho Y = (X \times Y, {\mathcal X} \times {\mathcal Y}, \mu \times \nu, S)}. By Fubini’s theorem, we have

\displaystyle  \mu \times \nu(E) = \int_X f(x)\ d\mu(x)

where {f(x) := \nu(E_x)} and {E_x := \{ y \in Y: (x,y) \in E \}} is the fibre of {E} at {x}. Since {\mu \times \nu(E)} is positive, we conclude that the set

\displaystyle F := \{ x \in X: f(x) > 0 \} = \{ x \in X: E_x \neq \emptyset \}

is a positive measure subset of {X}. Note for each {x \in F}, we can find an element {g(x) \in Y} such that {(x,g(x)) \in E}. While not strictly necessary for this argument, one can ensure if one wishes that the function {g} is measurable by totally ordering {Y}, and then letting {g(x)} the minimal element of {Y} for which {(x,g(x)) \in E}.

Let {N} be a large integer (which will depend on {k} and the cardinality {M} of {Y}) to be chosen later. Because {X} obeys the multiple recurrence theorem, we can find a positive integer {n} and {x \in X} such that

\displaystyle  x, T^n x, T^{2n} x, \ldots, T^{(N-1) n} x \in F.

Now consider the sequence of {N} points

\displaystyle  S^{-mn}( T^{mn} x, g(T^{mn} x) )

for {m = 0,\ldots,N-1}. From (1), we see that

\displaystyle  S^{-mn}( T^{mn} x, g(T^{mn} x) ) = (x, c(m)) \ \ \ \ \ (2)

for some sequence {c(0),\ldots,c(N-1) \in Y}. This can be viewed as a colouring of {\{0,\ldots,N-1\}} by {M} colours, where {M} is the cardinality of {Y}. Applying van der Waerden’s theorem, we conclude (if {N} is sufficiently large depending on {k} and {|Y|}) that there is an arithmetic progression {a, a+r,\ldots,a+(k-1)r} in {\{0,\ldots,N-1\}} with {r>0} such that

\displaystyle  c(a) = c(a+r) = \ldots = c(a+(k-1)r) = c

for some {c \in Y}. If we then let {y = (x,c)}, we see from (2) that

\displaystyle  S^{an + irn} y = ( T^{(a+ir)n} x, g(T^{(a+ir)n} x) ) \in E

for all {i=0,\ldots,k-1}, and the claim follows. \Box

Remark 1 The precise connection between Lemma 3 and Proposition 2 arises from the following observation: with {E, F, g} as in the proof of Proposition 2, and {x \in X}, the set

\displaystyle  A := \{ n \in {\bf Z}: T^n x \in F \}

can be partitioned into the classes

\displaystyle  A_i := \{ n \in {\bf Z}: S^n (x,i) \in E' \}

where {E' := \{ (x,g(x)): x \in F \} \subset E} is the graph of {g}. The multiple recurrence property for {X} ensures that {A} contains arbitrarily long arithmetic progressions, and so therefore one of the {A_i} must also, which gives the multiple recurrence property for {Y}.

Remark 2 Compact extensions can be viewed as a generalisation of finite extensions, in which the fibres are no longer finite sets, but are themselves measure spaces obeying an additional property, which roughly speaking asserts that for many functions {f} on the extension, the shifts {T^n f} of {f} behave in an almost periodic fashion on most fibres, so that the orbits {T^n f} become totally bounded on each fibre. This total boundedness allows one to obtain an analogue of the above colouring map {c()} to which van der Waerden’s theorem can be applied.

In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces {V} over a finite field {{\bf F}}.

In this case, the Gowers norms {U^{d+1}(V)} are closely tied to the space {\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} of polynomials of degree at most {d}. Indeed, as noted in Exercise 20 of Notes 4, a function {f: V \rightarrow {\bf C}} of {L^\infty(V)} norm {1} has {U^{d+1}(V)} norm equal to {1} if and only if {f = e(\phi)} for some {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}; thus polynomials solve the “{100\%} inverse problem” for the trivial inequality {\|f\|_{U^{d+1}(V)} \leq \|f\|_{L^\infty(V)}}. They are also a crucial component of the solution to the “{99\%} inverse problem” and “{1\%} inverse problem”. For the former, we will soon show:

Proposition 1 ({99\%} inverse theorem for {U^{d+1}(V)}) Let {f: V \rightarrow {\bf C}} be such that {\|f\|_{L^\infty(V)}} and {\|f\|_{U^{d+1}(V)} \geq 1-\epsilon} for some {\epsilon > 0}. Then there exists {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} such that {\| f - e(\phi)\|_{L^1(V)} = O_{d, {\bf F}}( \epsilon^c )}, where {c = c_d > 0} is a constant depending only on {d}.

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to {99\%} inverse theorem for {U^{d+1}(V)}) If {\|f\|_{L^\infty(V)} \leq 1} and {\|f-e(\phi)\|_{L^1(V)} \leq \epsilon} for some {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})}, then {\|F\|_{U^{d+1}(V)} \geq 1 - O_{d,{\bf F}}( \epsilon^c )}, where {c = c_d > 0} is a constant depending only on {d}.

In the {1\%} world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has

Lemma 2 (Converse to the {1\%} inverse theorem for {U^{d+1}(V)}) If {f: V \rightarrow {\bf C}} and {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} are such that {|\langle f, e(\phi) \rangle_{L^2(V)}| \geq \epsilon}, where {\langle f, g \rangle_{L^2(V)} := {\bf E}_{x \in G} f(x) \overline{g(x)}}, then {\|f\|_{U^{d+1}(V)} \geq \epsilon}.

Proof: From the definition of the {U^1} norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

\displaystyle  |\langle f, e(\phi) \rangle| = \| f e(-\phi) \|_{U^1(V)}

\displaystyle  \leq \|f e(-\phi) \|_{U^{d+1}(V)}

\displaystyle  = \|f\|_{U^{d+1}(V)}

and the claim follows. \Box

In the high characteristic case {\hbox{char}({\bf F}) > d} at least, this can be reversed:

Theorem 3 ({1\%} inverse theorem for {U^{d+1}(V)}) Suppose that {\hbox{char}({\bf F}) > d \geq 0}. If {f: V \rightarrow {\bf C}} is such that {\|f\|_{L^\infty(V)} \leq 1} and {\|f\|_{U^{d+1}(V)} \geq \epsilon}, then there exists {\phi \in \hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} such that {|\langle f, e(\phi) \rangle_{L^2(V)}| \gg_{\epsilon,d,{\bf F}} 1}.

This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small {d}, the claim is easy:

Exercise 2 Verify the cases {d=0,1} of this theorem. (Hint: to verify the {d=1} case, use the Fourier-analytic identities {\|f\|_{U^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^4)^{1/4}} and {\|f\|_{L^2(V)} = (\sum_{\xi \in \hat V} |\hat f(\xi)|^2)^{1/2}}, where {\hat V} is the space of all homomorphisms {\xi: x \mapsto \xi \cdot x} from {V} to {{\bf R}/{\bf Z}}, and {\hat f(\xi) := \mathop{\bf E}_{x \in V} f(x) e(-\xi \cdot x)} are the Fourier coefficients of {f}.)

This conjecture for larger values of {d} are more difficult to establish. The {d=2} case of the theorem was established by Ben Green and myself in the high characteristic case {\hbox{char}({\bf F}) > 2}; the low characteristic case {\hbox{char}({\bf F}) = d = 2} was independently and simultaneously established by Samorodnitsky. The cases {d>2} in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials {\hbox{Poly}_{\leq d}(V \rightarrow {\bf R}/{\bf Z})} in the above conjecture with the essentially equivalent space of classical polynomials {\hbox{Poly}_{\leq d}(V \rightarrow {\bf F})}. However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if {\hbox{char}({\bf F}) = 2} and {d \geq 3}; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields) Let {{\bf F} = {\bf F}_p} be a finite field, let {\delta > 0}, and let {A \subset {\bf F}^n} be such that {|A| \geq \delta |{\bf F}^n|}. If {n} is sufficiently large depending on {p,\delta}, then {A} contains an (affine) line {\{ x, x+r, \ldots, x+(p-1)r\}} for some {x,r \in {\bf F}^n} with { r\neq 0}.

Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if {k \geq 1} and {n} is sufficiently large depending on {p,\delta}, then {A} contains an affine {k}-dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

Read the rest of this entry »

Ben Green, and I have just uploaded to the arXiv a short (six-page) paper “Yet another proof of Szemeredi’s theorem“, submitted to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we put in print a folklore observation, namely that the inverse conjecture for the Gowers norm, together with the density increment argument, easily implies Szemerédi’s famous theorem on arithmetic progressions. This is unsurprising, given that Gowers’ proof of Szemerédi’s theorem proceeds through a weaker version of the inverse conjecture and a density increment argument, and also given that it is possible to derive Szemerédi’s theorem from knowledge of the characteristic factor for multiple recurrence (the ergodic theory analogue of the inverse conjecture, first established by Host and Kra), as was done by Bergelson, Leibman, and Lesigne (and also implicitly in the earlier paper of Bergelson, Host, and Kra); but to our knowledge the exact derivation of Szemerédi’s theorem from the inverse conjecture was not in the literature. Ordinarily this type of folklore might be considered too trifling (and too well known among experts in the field) to publish; but we felt that the venue of the Szemerédi birthday conference provided a natural venue for this particular observation.

The key point is that one can show (by an elementary argument relying primarily an induction on dimension argument and the Weyl recurrence theorem, i.e. that given any real {\alpha} and any integer {s \geq 1}, that the expression {\alpha n^s} gets arbitrarily close to an integer) that given a (polynomial) nilsequence {n \mapsto F(g(n)\Gamma)}, one can subdivide any long arithmetic progression (such as {[N] = \{1,\ldots,N\}}) into a number of medium-sized progressions, where the nilsequence is nearly constant on each progression. As a consequence of this and the inverse conjecture for the Gowers norm, if a set has no arithmetic progressions, then it must have an elevated density on a subprogression; iterating this observation as per the usual density-increment argument as introduced long ago by Roth, one obtains the claim. (This is very close to the scheme of Gowers’ proof.)

Technically, one might call this the shortest proof of Szemerédi’s theorem in the literature (and would be something like the sixteenth such genuinely distinct proof, by our count), but that would be cheating quite a bit, primarily due to the fact that it assumes the inverse conjecture for the Gowers norm, our current proof of which is checking in at about 100 pages…

Ben Green, and I have just uploaded to the arXiv our paper “An arithmetic regularity lemma, an associated counting lemma, and applications“, submitted (a little behind schedule) to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we describe the general-degree version of the arithmetic regularity lemma, which can be viewed as the counterpart of the Szemerédi regularity lemma, in which the object being regularised is a function {f: [N] \rightarrow [0,1]} on a discrete interval {[N] = \{1,\ldots,N\}} rather than a graph, and the type of patterns one wishes to count are additive patterns (such as arithmetic progressions {n,n+d,\ldots,n+(k-1)d}) rather than subgraphs. Very roughly speaking, this regularity lemma asserts that all such functions can be decomposed as a degree {\leq s} nilsequence (or more precisely, a variant of a nilsequence that we call an virtual irrational nilsequence), plus a small error, plus a third error which is extremely tiny in the Gowers uniformity norm {U^{s+1}[N]}. In principle, at least, the latter two errors can be readily discarded in applications, so that the regularity lemma reduces many questions in additive combinatorics to questions concerning (virtual irrational) nilsequences. To work with these nilsequences, we also establish a arithmetic counting lemma that gives an integral formula for counting additive patterns weighted by such nilsequences.

The regularity lemma is a manifestation of the “dichotomy between structure and randomness”, as discussed for instance in my ICM article or FOCS article. In the degree {1} case {s=1}, this result is essentially due to Green. It is powered by the inverse conjecture for the Gowers norms, which we and Tamar Ziegler have recently established (paper to be forthcoming shortly; the {k=4} case of our argument is discussed here). The counting lemma is established through the quantitative equidistribution theory of nilmanifolds, which Ben and I set out in this paper.

The regularity and counting lemmas are designed to be used together, and in the paper we give three applications of this combination. Firstly, we give a new proof of Szemerédi’s theorem, which proceeds via an energy increment argument rather than a density increment one. Secondly, we establish a conjecture of Bergelson, Host, and Kra, namely that if {A \subset [N]} has density {\alpha}, and {\epsilon > 0}, then there exist {\gg_{\alpha,\epsilon} N} shifts {h} for which {A} contains at least {(\alpha^4 - \epsilon)N} arithmetic progressions of length {k=4} of spacing {h}. (The {k=3} case of this conjecture was established earlier by Green; the {k=5} case is false, as was shown by Ruzsa in an appendix to the Bergelson-Host-Kra paper.) Thirdly, we establish a variant of a recent result of Gowers-Wolf, showing that the true complexity of a system of linear forms over {[N]} indeed matches the conjectured value predicted in their first paper.

In all three applications, the scheme of proof can be described as follows:

  • Apply the arithmetic regularity lemma, and decompose a relevant function {f} into three pieces, {f_{nil}, f_{sml}, f_{unf}}.
  • The uniform part {f_{unf}} is so tiny in the Gowers uniformity norm that its contribution can be easily dealt with by an appropriate “generalised von Neumann theorem”.
  • The contribution of the (virtual, irrational) nilsequence {f_{nil}} can be controlled using the arithmetic counting lemma.
  • Finally, one needs to check that the contribution of the small error {f_{sml}} does not overwhelm the main term {f_{nil}}. This is the trickiest bit; one often needs to use the counting lemma again to show that one can find a set of arithmetic patterns for {f_{nil}} that is so sufficiently “equidistributed” that it is not impacted by the small error.

To illustrate the last point, let us give the following example. Suppose we have a set {A \subset [N]} of some positive density (say {|A| = 10^{-1} N}) and we have managed to prove that {A} contains a reasonable number of arithmetic progressions of length {5} (say), e.g. it contains at least {10^{-10} N^2} such progressions. Now we perturb {A} by deleting a small number, say {10^{-2} N}, elements from {A} to create a new set {A'}. Can we still conclude that the new set {A'} contains any arithmetic progressions of length {5}?

Unfortunately, the answer could be no; conceivably, all of the {10^{-10} N^2} arithmetic progressions in {A} could be wiped out by the {10^{-2} N} elements removed from {A}, since each such element of {A} could be associated with up to {|A|} (or even {5|A|}) arithmetic progressions in {A}.

But suppose we knew that the {10^{-10} N^2} arithmetic progressions in {A} were equidistributed, in the sense that each element in {A} belonged to the same number of such arithmetic progressions, namely {5 \times 10^{-9} N}. Then each element deleted from {A} only removes at most {5 \times 10^{-9} N} progressions, and so one can safely remove {10^{-2} N} elements from {A} and still retain some arithmetic progressions. The same argument works if the arithmetic progressions are only approximately equidistributed, in the sense that the number of progressions that a given element {a \in A} belongs to concentrates sharply around its mean (for instance, by having a small variance), provided that the equidistribution is sufficiently strong. Fortunately, the arithmetic regularity and counting lemmas are designed to give precisely such a strong equidistribution result.

A succinct (but slightly inaccurate) summation of the regularity+counting lemma strategy would be that in order to solve a problem in additive combinatorics, it “suffices to check it for nilsequences”. But this should come with a caveat, due to the issue of the small error above; in addition to checking it for nilsequences, the answer in the nilsequence case must be sufficiently “dispersed” in a suitable sense, so that it can survive the addition of a small (but not completely negligible) perturbation.

One last “production note”. Like our previous paper with Emmanuel Breuillard, we used Subversion to write this paper, which turned out to be a significant efficiency boost as we could work on different parts of the paper simultaneously (this was particularly important this time round as the paper was somewhat lengthy and complicated, and there was a submission deadline). When doing so, we found it convenient to split the paper into a dozen or so pieces (one for each section of the paper, basically) in order to avoid conflicts, and to help coordinate the writing process. I’m also looking into git (a more advanced version control system), and am planning to use it for another of my joint projects; I hope to be able to comment on the relative strengths of these systems (and with plain old email) in the future.

In this lecture, we describe the simple but fundamental Furstenberg correspondence principle which connects the “soft analysis” subject of ergodic theory (in particular, recurrence theorems) with the “hard analysis” subject of combinatorial number theory (or more generally with results of “density Ramsey theory” type). Rather than try to set up the most general and abstract version of this principle, we shall instead study the canonical example of this principle in action, namely the equating of the Furstenberg multiple recurrence theorem with Szemerédi’s theorem on arithmetic progressions.
Read the rest of this entry »