You are currently browsing the tag archive for the ‘Szemeredi’s theorem’ tag.

Ben Green and I have (finally!) uploaded to the arXiv our paper “New bounds for Szemerédi’s theorem, III: A polylogarithmic bound for “, submitted to Mathematika. This is the sequel to two previous papers (and an erratum to the former paper), concerning quantitative versions of Szemerédi’s theorem in the case of length four progressions. This sequel has been delayed for over a decade for a number of reasons, but we have finally managed to write the arguments up to our satisfaction and submit it (to a special issue of Mathematika honouring the work of Klaus Roth).

For any natural number , define to be the largest cardinality of a subset of which does not contain any non-trivial arithmetic progressions of length four (where “non-trivial” means that is non-zero). Trivially we have . In 1969, Szemerédi showed that . However, the decay rate that could be theoretically extracted from this argument (and from several subsequent proofs of this bound, including one by Roth) were quite poor. The first significant quantitative bound on this quantity was by Gowers, who showed that for some absolute constant . In the second paper in the above-mentioned series, we managed to improve this bound to . In this paper, we improve the bound further to , which seems to be the limit of the methods. (We remark that if we could take to be larger than one, this would imply the length four case of a well known conjecture of Erdös that any set of natural numbers whose sum of reciprocals diverges would contain arbitrarily long arithmetic progressions. Thanks to the work of Sanders and of Bloom, the corresponding case of the conjecture for length three conjectures is nearly settled, as it is known that for the analogous bound on one can take any less than one.)

Most of the previous work on bounding relied in some form or another on the *density increment argument* introduced by Roth back in 1953; roughly speaking, the idea is to show that if a dense subset of fails to contain arithmetic progressions of length four, one seeks to then locate a long subprogression of in which has increased density. This was the basic method for instance underlying our previous bound , as well as a finite field analogue of the bound ; however we encountered significant technical difficulties for several years in extending this argument to obtain the result of the current paper. Our method is instead based on “energy increment arguments”, and more specifically on establishing quantitative version of a Khintchine-type recurrence theorem, similar to the qualitative recurrence theorems established (in the ergodic theory context) by Bergelson-Host-Kra, and (in the current combinatorial context) by Ben Green and myself.

One way to phrase the latter recurrence theorem is as follows. Suppose that has density . Then one would expect a “randomly” selected arithmetic progression in (using the convention that random variables will be in boldface) to be contained in with probability about . This is not true in general, however it was shown by Ben and myself that for any , there was a set of shifts of cardinality , such that for any such one had

if was chosen uniformly at random from . This easily implies that , but does not give a particularly good bound on the decay rate, because the implied constant in the cardinality lower bound is quite poor (in fact of tower-exponential type, due to the use of regularity lemmas!), and so one has to take to be extremely large compared to to avoid the possibility that the set of shifts in the above theorem consists only of the trivial shift .

We do not know how to improve the lower bound on the set of shifts to the point where it can give bounds that are competitive with those in this paper. However, we can obtain better quantitative results if we permit ourselves to *couple* together the two parameters and of the length four progression. Namely, with , , as above, we are able to show that there exist random variables , not necessarily independent, such that

and such that we have the non-degeneracy bound

This then easily implies the main theorem.

The energy increment method is then deployed to locate a good pair of random variables that will obey the above bounds. One can get some intuition on how to proceed here by considering some model cases. Firstly one can consider a “globally quadratically structured” case in which the indicator function “behaves like” a globally quadratic function such as , for some irrational and some smooth periodic function of mean . If one then takes to be uniformly distributed in and respectively for some small , with no coupling between the two variables, then the left-hand side of (1) is approximately of the form

where the integral is with respect to the probability Haar measure, and the constraint ultimately arises from the algebraic constraint

However, an application of the Cauchy-Schwarz inequality and Fubini’s theorem shows that the integral in (2) is at least , which (morally at least) gives (1) in this case.

Due to the nature of the energy increment argument, it also becomes necessary to consider “locally quadratically structured” cases, in which is partitioned into some number of structured pieces (think of these as arithmetic progressions, or as “Bohr sets), and on each piece , behaves like a locally quadratic function such as , where now varies with , and the mean of will be approximately on the average after averaging in (weighted by the size of the pieces ). Now one should select and in the following coupled manner: first one chooses uniformly from , then one defines to be the label such that , and then selects uniformly from a set which is related to in much the same way that is related to . If one does this correctly, the analogue of (2) becomes

and one can again use Cauchy-Schwarz and Fubini’s theorem to conclude.

The general case proceeds, very roughly, by an iterative argument. At each stage of the iteration, one has some sort of quadratic model of which involves a decomposition of into structured pieces , and a quadratic approximation to on each piece. If this approximation is accurate enough (or more precisely, if a certain (averaged) local Gowers uniformity norm of the error is small enough) to model the count in (1) (for random variables determined by the above partition of into pieces ), and if the frequencies (such as ) involved in the quadratic approximation are “high rank” or “linearly independent over the rationals” in a suitably quantitative sense, then some version of the above arguments can be made to work. If there are some unwanted linear dependencies in the frequencies, we can do some linear algebra to eliminate one of the frequencies (using some geometry of numbers to keep the quantitative bounds under control) and continue the iteration. If instead the approximation is too inaccurate, then the error will be large in a certain averaged local Gowers uniformity norm . A significant fraction of the paper is then devoted to establishing a quantitative *inverse theorem* for that norm that concludes (with good bounds) that the error must then locally correlate with locally quadratic phases, which can be used to refine the quadratic approximation to in a manner that significantly increases its “energy” (basically an norm). Such energy increments cannot continue indefinitely, and when they terminate we obtain the desired claim.

There are existing inverse theorems for type norms in the literature, going back to the pioneering work of Gowers mentioned previously, and relying on arithmetic combinatorics tools such as Freiman’s theorem and the Balog-Szemerédi-Gowers lemma, which are good for analysing the “-structured homomorphisms” that arise in Gowers’ argument. However, when we applied these methods to the local Gowers norms we obtained inferior quantitative results that were not strong enough for our application. Instead, we use arguments from a different paper of Gowers in which he tackled Szemerédi’s theorem for arbitrary length progressions. This method produces “-structured homomorphisms” associated to any function with large Gowers uniformity norm; however the catch is that such homomorphisms are initially supported only on a sparse unstructured set, rather than a structured set such as a Bohr set. To proceed further, one first has to locate inside the sparse unstructured set a sparse *pseudorandom* subset of a Bohr set, and then use “error-correction” type methods (such as “majority-vote” based algorithms) to locally upgrade this -structured homomorphism on pseudorandom subsets of Bohr sets to a -structured homomorphism on the entirety of a Bohr set. It is then possible to use some “approximate cohomology” tools to “integrate” these homomorphisms (and discern a key “local symmetry” property of these homomorphisms) to locate the desired local quadratic structure (in much the same fashion that a -form on that varies linearly with the coordinates can be integrated to be the derivative of a quadratic function if we know that the -form is closed). These portions of the paper are unfortunately rather technical, but broadly follow the methods already used in previous literature.

Szemerédi’s theorem asserts that any subset of the integers of positive upper density contains arbitrarily large arithmetic progressions. Here is an equivalent quantitative form of this theorem:

Theorem 1 (Szemerédi’s theorem)Let be a positive integer, and let be a function with for some , where we use the averaging notation , , etc.. Then for we havefor some depending only on .

The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases as they are trivial and somewhat degenerate.

There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.

Tamar Ziegler and I have just uploaded to the arXiv our joint paper “A multi-dimensional Szemerédi theorem for the primes via a correspondence principle“. This paper is related to an earlier result of Ben Green and mine in which we established that the primes contain arbitrarily long arithmetic progressions. Actually, in that paper we proved a more general result:

Theorem 1 (Szemerédi’s theorem in the primes)Let be a subset of the primes of positive relative density, thus . Then contains arbitrarily long arithmetic progressions.

This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.

Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:

Theorem 2 (Multidimensional Szemerédi theorem in the primes)Let , and let be a subset of the Cartesian power of the primes of positive relative density, thusThen for any , contains infinitely many “constellations” of the form with and a positive integer.

In the case when is itself a Cartesian product of one-dimensional sets (in particular, if is all of ), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.

The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function ) of *almost primes* or *almost Gaussian primes*, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the *linear forms condition* and the *correlation condition*. Very roughly speaking, these conditions assert statements of the following form: if is a randomly selected integer, then the events of simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of . Once these conditions are satisfied, one can then run a *transference argument* (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.

However, when one tries to adapt these arguments to sets such as , a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square of the almost primes – pairs whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as that is concentrated on a set such as , but let me ignore this distinction for now.) However, this set does *not* enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed , and random , the four events

do *not* behave independently (as they would if were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form ) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for (or for weights concentrated on ) when applied to rectangular patterns.

There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the *result* of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.

In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an *infinite* number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire -algebra; for this, it is not enough to specify the measure of a single set such as , but one also has to specify the measure of “cylinder sets” such as where could be arbitrarily large. The larger gets, the more linear forms conditions one needs to keep the correspondence under control.

With the sieve weights we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function ) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure of cylinder sets, with each requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)

Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity for a vector space over a finite field of characteristic greater than , where is defined as the cardinality of the largest subset of that does not contain an arithmetic progression of length . In our earlier paper, we gave two arguments that bounded in the regime when the field was fixed and was large. The first “cheap” argument gave the bound

and the more complicated “expensive” argument gave the improvement

for some constant depending only on .

Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the *density increment method*: one begins with a large subset of that has no arithmetic progressions of length , and seeks to locate a subspace on which has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse theorem of Ben and myself (and also independently by Samorodnitsky), one approximates by a “quadratically structured” function , which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because has no progressions of length , the count of progressions of length weighted by will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which has increased density.

The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which approximates is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on one gets at the end of the day to be worse than the cheap argument.)

After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent in (1). In fact, it gives the following stronger result:

Theorem 1Let be a subset of of density at least , and let . Then there is a subspace of of codimension such that the number of (possibly degenerate) progressions in is at least .

The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.

The main new idea is to work with a *local* Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to on a subspace . This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace on which is quite dense (of density ) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.

A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).

As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:

Theorem 1 (Szemerédi’s theorem)Let be a set of integers of positive upper density, thus , where . Then contains an arithmetic progression of length for any .

Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.

In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)

Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions , but also progressions of progressions , progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as *generalised arithmetic progressions* of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set (arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length that consists entirely of elements of .

To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression of progressions of length , with each progression , being quite long, and containing a near-maximal amount of elements of , thus

where is the “maximal density” of along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as instead.) By hypothesis, is positive. The objective is then to locate a progression in , with each in for . It may help to view the progression of progressions as a tall thin rectangle .

If we write for , then the problem is equivalent to finding a (possibly degenerate) arithmetic progression , with each in .

By hypothesis, we know already that each set has density about in :

Let us now make a “weakly mixing” assumption on the , which roughly speaking asserts that

for “most” subsets of of density of a certain form to be specified shortly. This is a plausible type of assumption if one believes to behave like a random set, and if the sets are constructed “independently” of the in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions of the desired form.

We will inductively consider the following (nonrigorously defined) sequence of claims for each :

- : For most choices of , there are arithmetic progressions in with the specified choice of , such that for all .

(Actually, to avoid boundary issues one should restrict to lie in the middle third of , rather than near the edges, but let us ignore this minor technical detail.) The quantity is natural here, given that there are arithmetic progressions in that pass through in the position, and that each one ought to have a probability of or so that the events simultaneously hold.) If one has the claim , then by selecting a typical in , we obtain a progression with for all , as required. (In fact, we obtain about such progressions by this method.)

We can heuristically justify the claims by induction on . For , the claims are clear just from direct counting of progressions (as long as we keep away from the edges of ). Now suppose that , and the claims have already been proven. For any and for most , we have from hypothesis that there are progressions in through with . Let be the set of all the values of attained by these progressions, then . Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of , we have

which then gives the desired claim .

The observant reader will note that we only needed the claim in the case for the above argument, but for technical reasons, the full proof requires one to work with more general values of (also the claim needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).

We now return to the question of how to justify the weak mixing hypothesis (2). For a single block of , one can easily concoct a scenario in which this hypothesis fails, by choosing to overlap with too strongly, or to be too disjoint from . However, one can do better if one can select from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:

Proposition 2 (Single upper bound)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). Let be a subset of of density . Then (if is large enough) one can find an such that

*Proof:* The key is the double counting identity

Because has maximal density and is large, we have

for each , and thus

The claim then follows from the pigeonhole principle.

Now suppose we want to obtain weak mixing not just for a single set , but for a small number of such sets, i.e. we wish to find an for which

for all , where is the density of in . The above proposition gives, for each , a choice of for which (3) holds, but it could be a different for each , and so it is not immediately obvious how to use Proposition 2 to find an for which (3) holds *simultaneously* for all . However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:

Proposition 3 (Multiple upper bound)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such thatsimultaneously for all .

*Proof:* Suppose that the claim failed (for some suitably large ). Then, for each , there exists such that

This can be viewed as a colouring of the interval by colours. If we take large compared to , van der Waerden’s theorem allows us to then find a long subprogression of which is monochromatic, so that is constant on this progression. But then this will furnish a counterexample to Proposition 2.

One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:

Proposition 4 (Multiple mixing)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each , let be a subset of of density . Then (if is large enough depending on ) one can find an such thatsimultaneously for all .

*Proof:* By applying the previous proposition to the collection of sets and their complements (thus replacing with , one can find an for which

and

which gives the claim.

However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number of sets for which mixing is established is too small compared with the length of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of to be mixed (at the cost of excluding a small fraction of exceptions):

Proposition 5 (Really multiple mixing)Let be a progression of progressions for some large . Suppose that for each , the set has density in (i.e. (1) holds). For each in some (large) finite set , let be a subset of of density . Then (if is large enough, butnotdependent on the size of ) one can find an such thatsimultaneously for almost all .

*Proof:* We build a bipartite graph connecting the progression to the finite set by placing an edge between an element and an element whenever . The number can then be interpreted as the degree of in this graph, while the number is the number of neighbours of that land in .

We now apply the regularity lemma to this graph . Roughly speaking, what this lemma does is to partition and into almost equally sized cells and such that for most pairs of cells, the graph resembles a random bipartite graph of some density between these two cells. The key point is that the number of cells here is bounded uniformly in the size of and . As a consequence of this lemma, one can show that for most vertices in a typical cell , the number is approximately equal to

and the number is approximately equal to

The point here is that the different statistics are now controlled by a mere statistics (this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an for which

simultaneously for all , and the claim follows.

This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions of progressions containing a near-maximal density of elements of , we will now have to work with a *family* of such progression of progressions, where ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given , it should be possible to find many reasonably large subfamilies of this family for which the terms of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets in Proposition 5 are not allowed to depend on , one also needs the progressions for any given to be “similar” in the sense that they intersect in the same fashion (thus the sets as varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set in order to gain good mixing properties for at least one of the sets in the progression of progressions.

Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank than the rank progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length progressions, one has to consider generalised progressions of rank as high as . It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank , a family of large generalised progressions of rank which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of . (But getting this sort of property for *all* values of simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index , one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)

In 1977, Furstenberg established his multiple recurrence theorem:

Theorem 1 (Furstenberg multiple recurrence)Let be a measure-preserving system, thus is a probability space and is a measure-preserving bijection such that and are both measurable. Let be a measurable subset of of positive measure . Then for any , there exists such thatEquivalently, there exists and such that

As is well known, the Furstenberg multiple recurrence theorem is equivalent to Szemerédi’s theorem, thanks to the Furstenberg correspondence principle; see for instance these lecture notes of mine.

The multiple recurrence theorem is proven, roughly speaking, by an induction on the “complexity” of the system . Indeed, for very simple systems, such as periodic systems (in which is the identity for some , which is for instance the case for the circle shift , with a rational shift ), the theorem is trivial; at a slightly more advanced level, *almost periodic* (or *compact*) systems (in which is a precompact subset of for every , which is for instance the case for irrational circle shifts), is also quite easy. One then shows that the multiple recurrence property is preserved under various *extension* operations (specifically, compact extensions, weakly mixing extensions, and limits of chains of extensions), which then gives the multiple recurrence theorem as a consequence of the *Furstenberg-Zimmer structure theorem* for measure-preserving systems. See these lecture notes for further discussion.

From a high-level perspective, this is still one of the most conceptual proofs known of Szemerédi’s theorem. However, the individual components of the proof are still somewhat intricate. Perhaps the most difficult step is the demonstration that the multiple recurrence property is preserved under *compact extensions*; see for instance these lecture notes, which is devoted entirely to this step. This step requires quite a bit of measure-theoretic and/or functional analytic machinery, such as the theory of disintegrations, relatively almost periodic functions, or Hilbert modules.

However, I recently realised that there is a special case of the compact extension step – namely that of *finite* extensions – which avoids almost all of these technical issues while still capturing the essence of the argument (and in particular, the key idea of using van der Waerden’s theorem). As such, this may serve as a pedagogical device for motivating this step of the proof of the multiple recurrence theorem.

Let us first explain what a finite extension is. Given a measure-preserving system , a finite set , and a measurable map from to the permutation group of , one can form the *finite extension*

which as a probability space is the product of with the finite probability space (with the discrete -algebra and uniform probability measure), and with shift map

One easily verifies that this is indeed a measure-preserving system. We refer to as the *cocycle* of the system.

An example of finite extensions comes from group theory. Suppose we have a short exact sequence

of finite groups. Let be a group element of , and let be its projection in . Then the shift map on (with the discrete -algebra and uniform probability measure) can be viewed as a finite extension of the shift map on (again with the discrete -algebra and uniform probability measure), by arbitrarily selecting a section that inverts the projection map, identifying with by identifying with for , and using the cocycle

Thus, for instance, the unit shift on can be thought of as a finite extension of the unit shift on whenever is a multiple of .

Another example comes from Riemannian geometry. If is a Riemannian manifold that is a finite cover of another Riemannian manifold (with the metric on being the pullback of that on ), then (unit time) geodesic flow on the cosphere bundle of is a finite extension of the corresponding flow on .

Here, then, is the finite extension special case of the compact extension step in the proof of the multiple recurrence theorem:

Proposition 2 (Finite extensions)Let be a finite extension of a measure-preserving system . If obeys the conclusion of the Furstenberg multiple recurrence theorem, then so does .

Before we prove this proposition, let us first give the combinatorial analogue.

Lemma 3Let be a subset of the integers that contains arbitrarily long arithmetic progressions, and let be a colouring of by colours (or equivalently, a partition of into colour classes ). Then at least one of the contains arbitrarily long arithmetic progressions.

*Proof:* By the infinite pigeonhole principle, it suffices to show that for each , one of the colour classes contains an arithmetic progression of length .

Let be a large integer (depending on and ) to be chosen later. Then contains an arithmetic progression of length , which may be identified with . The colouring of then induces a colouring on into colour classes. Applying (the finitary form of) van der Waerden’s theorem, we conclude that if is sufficiently large depending on and , then one of these colouring classes contains an arithmetic progression of length ; undoing the identification, we conclude that one of the contains an arithmetic progression of length , as desired.

Of course, by specialising to the case , we see that the above Lemma is in fact equivalent to van der Waerden’s theorem.

Now we prove Proposition 2.

*Proof:* Fix . Let be a positive measure subset of . By Fubini’s theorem, we have

where and is the fibre of at . Since is positive, we conclude that the set

is a positive measure subset of . Note for each , we can find an element such that . While not strictly necessary for this argument, one can ensure if one wishes that the function is measurable by totally ordering , and then letting the minimal element of for which .

Let be a large integer (which will depend on and the cardinality of ) to be chosen later. Because obeys the multiple recurrence theorem, we can find a positive integer and such that

Now consider the sequence of points

for . From (1), we see that

for some sequence . This can be viewed as a colouring of by colours, where is the cardinality of . Applying van der Waerden’s theorem, we conclude (if is sufficiently large depending on and ) that there is an arithmetic progression in with such that

for some . If we then let , we see from (2) that

for all , and the claim follows.

Remark 1The precise connection between Lemma 3 and Proposition 2 arises from the following observation: with as in the proof of Proposition 2, and , the setcan be partitioned into the classes

where is the graph of . The multiple recurrence property for ensures that contains arbitrarily long arithmetic progressions, and so therefore one of the must also, which gives the multiple recurrence property for .

Remark 2Compact extensions can be viewed as a generalisation of finite extensions, in which the fibres are no longer finite sets, but are themselves measure spaces obeying an additional property, which roughly speaking asserts that for many functions on the extension, the shifts of behave in an almost periodic fashion on most fibres, so that the orbits become totally bounded on each fibre. This total boundedness allows one to obtain an analogue of the above colouring map to which van der Waerden’s theorem can be applied.

In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by *the Gowers uniformity norms* of functions associated to that set.

Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces over a finite field .

In this case, the Gowers norms are closely tied to the space of polynomials of degree at most . Indeed, as noted in Exercise 20 of Notes 4, a function of norm has norm equal to if and only if for some ; thus polynomials solve the “ inverse problem” for the trivial inequality . They are also a crucial component of the solution to the “ inverse problem” and “ inverse problem”. For the former, we will soon show:

Proposition 1 ( inverse theorem for )Let be such that and for some . Then there exists such that , where is a constant depending only on .

Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:

Exercise 1 (Converse to inverse theorem for )If and for some , then , where is a constant depending only on .

In the world, one no longer expects to be close to a polynomial. Instead, one expects to *correlate* with a polynomial. Indeed, one has

Lemma 2 (Converse to the inverse theorem for )If and are such that , where , then .

*Proof:* From the definition of the norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has

and the claim follows.

In the high characteristic case at least, this can be reversed:

Theorem 3 ( inverse theorem for )Suppose that . If is such that and , then there exists such that .

This result is sometimes referred to as the *inverse conjecture for the Gowers norm* (in high, but bounded, characteristic). For small , the claim is easy:

Exercise 2Verify the cases of this theorem. (Hint:to verify the case, use the Fourier-analytic identities and , where is the space of all homomorphisms from to , and are the Fourier coefficients of .)

This conjecture for larger values of are more difficult to establish. The case of the theorem was established by Ben Green and myself in the high characteristic case ; the low characteristic case was independently and simultaneously established by Samorodnitsky. The cases in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.

The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials in the above conjecture with the essentially equivalent space of classical polynomials . However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if and ; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.

The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:

Theorem 4 (Szemerédi’s theorem for finite fields)Let be a finite field, let , and let be such that . If is sufficiently large depending on , then contains an (affine) line for some with .

Exercise 3Use Theorem 4 to establish the following generalisation: with the notation as above, if and is sufficiently large depending on , then contains an affine -dimensional subspace.

We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.

Ben Green, and I have just uploaded to the arXiv a short (six-page) paper “Yet another proof of Szemeredi’s theorem“, submitted to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we put in print a folklore observation, namely that the inverse conjecture for the Gowers norm, together with the density increment argument, easily implies Szemerédi’s famous theorem on arithmetic progressions. This is unsurprising, given that Gowers’ proof of Szemerédi’s theorem proceeds through a weaker version of the inverse conjecture and a density increment argument, and also given that it is possible to derive Szemerédi’s theorem from knowledge of the characteristic factor for multiple recurrence (the ergodic theory analogue of the inverse conjecture, first established by Host and Kra), as was done by Bergelson, Leibman, and Lesigne (and also implicitly in the earlier paper of Bergelson, Host, and Kra); but to our knowledge the exact derivation of Szemerédi’s theorem from the inverse conjecture was not in the literature. Ordinarily this type of folklore might be considered too trifling (and too well known among experts in the field) to publish; but we felt that the venue of the Szemerédi birthday conference provided a natural venue for this particular observation.

The key point is that one can show (by an elementary argument relying primarily an induction on dimension argument and the Weyl recurrence theorem, i.e. that given any real and any integer , that the expression gets arbitrarily close to an integer) that given a (polynomial) nilsequence , one can subdivide any long arithmetic progression (such as ) into a number of medium-sized progressions, where the nilsequence is nearly constant on each progression. As a consequence of this and the inverse conjecture for the Gowers norm, if a set has no arithmetic progressions, then it must have an elevated density on a subprogression; iterating this observation as per the usual density-increment argument as introduced long ago by Roth, one obtains the claim. (This is very close to the scheme of Gowers’ proof.)

Technically, one might call this the shortest proof of Szemerédi’s theorem in the literature (and would be something like the sixteenth such genuinely distinct proof, by our count), but that would be cheating quite a bit, primarily due to the fact that it assumes the inverse conjecture for the Gowers norm, our current proof of which is checking in at about 100 pages…

Ben Green, and I have just uploaded to the arXiv our paper “An arithmetic regularity lemma, an associated counting lemma, and applications“, submitted (a little behind schedule) to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we describe the general-degree version of the *arithmetic regularity lemma*, which can be viewed as the counterpart of the Szemerédi regularity lemma, in which the object being regularised is a function on a discrete interval rather than a graph, and the type of patterns one wishes to count are additive patterns (such as arithmetic progressions ) rather than subgraphs. Very roughly speaking, this regularity lemma asserts that all such functions can be decomposed as a degree nilsequence (or more precisely, a variant of a nilsequence that we call an *virtual irrational nilsequence*), plus a small error, plus a third error which is extremely tiny in the Gowers uniformity norm . In principle, at least, the latter two errors can be readily discarded in applications, so that the regularity lemma reduces many questions in additive combinatorics to questions concerning (virtual irrational) nilsequences. To work with these nilsequences, we also establish a *arithmetic counting lemma* that gives an integral formula for counting additive patterns weighted by such nilsequences.

The regularity lemma is a manifestation of the “dichotomy between structure and randomness”, as discussed for instance in my ICM article or FOCS article. In the degree case , this result is essentially due to Green. It is powered by the *inverse conjecture for the Gowers norms*, which we and Tamar Ziegler have recently established (paper to be forthcoming shortly; the case of our argument is discussed here). The counting lemma is established through the quantitative equidistribution theory of nilmanifolds, which Ben and I set out in this paper.

The regularity and counting lemmas are designed to be used together, and in the paper we give three applications of this combination. Firstly, we give a new proof of Szemerédi’s theorem, which proceeds via an energy increment argument rather than a density increment one. Secondly, we establish a conjecture of Bergelson, Host, and Kra, namely that if has density , and , then there exist shifts for which contains at least arithmetic progressions of length of spacing . (The case of this conjecture was established earlier by Green; the case is false, as was shown by Ruzsa in an appendix to the Bergelson-Host-Kra paper.) Thirdly, we establish a variant of a recent result of Gowers-Wolf, showing that the true complexity of a system of linear forms over indeed matches the conjectured value predicted in their first paper.

In all three applications, the scheme of proof can be described as follows:

- Apply the arithmetic regularity lemma, and decompose a relevant function into three pieces, .
- The uniform part is so tiny in the Gowers uniformity norm that its contribution can be easily dealt with by an appropriate “generalised von Neumann theorem”.
- The contribution of the (virtual, irrational) nilsequence can be controlled using the arithmetic counting lemma.
- Finally, one needs to check that the contribution of the small error does not overwhelm the main term . This is the trickiest bit; one often needs to use the counting lemma again to show that one can find a set of arithmetic patterns for that is so sufficiently “equidistributed” that it is not impacted by the small error.

To illustrate the last point, let us give the following example. Suppose we have a set of some positive density (say ) and we have managed to prove that contains a reasonable number of arithmetic progressions of length (say), e.g. it contains at least such progressions. Now we perturb by deleting a small number, say , elements from to create a new set . Can we still conclude that the new set contains any arithmetic progressions of length ?

Unfortunately, the answer could be no; conceivably, all of the arithmetic progressions in could be wiped out by the elements removed from , since each such element of could be associated with up to (or even ) arithmetic progressions in .

But suppose we knew that the arithmetic progressions in were *equidistributed*, in the sense that each element in belonged to the same number of such arithmetic progressions, namely . Then each element deleted from only removes at most progressions, and so one can safely remove elements from and still retain some arithmetic progressions. The same argument works if the arithmetic progressions are only *approximately* equidistributed, in the sense that the number of progressions that a given element belongs to concentrates sharply around its mean (for instance, by having a small variance), provided that the equidistribution is sufficiently strong. Fortunately, the arithmetic regularity and counting lemmas are designed to give precisely such a strong equidistribution result.

A succinct (but slightly inaccurate) summation of the regularity+counting lemma strategy would be that in order to solve a problem in additive combinatorics, it “suffices to check it for nilsequences”. But this should come with a caveat, due to the issue of the small error above; in addition to checking it for nilsequences, the answer in the nilsequence case must be sufficiently “dispersed” in a suitable sense, so that it can survive the addition of a small (but not completely negligible) perturbation.

One last “production note”. Like our previous paper with Emmanuel Breuillard, we used Subversion to write this paper, which turned out to be a significant efficiency boost as we could work on different parts of the paper simultaneously (this was particularly important this time round as the paper was somewhat lengthy and complicated, and there was a submission deadline). When doing so, we found it convenient to split the paper into a dozen or so pieces (one for each section of the paper, basically) in order to avoid conflicts, and to help coordinate the writing process. I’m also looking into git (a more advanced version control system), and am planning to use it for another of my joint projects; I hope to be able to comment on the relative strengths of these systems (and with plain old email) in the future.

In this lecture, we describe the simple but fundamental *Furstenberg correspondence principle* which connects the “soft analysis” subject of ergodic theory (in particular, recurrence theorems) with the “hard analysis” subject of combinatorial number theory (or more generally with results of “density Ramsey theory” type). Rather than try to set up the most general and abstract version of this principle, we shall instead study the canonical example of this principle in action, namely the equating of the *Furstenberg multiple recurrence theorem* with Szemerédi’s theorem on arithmetic progressions.

Read the rest of this entry »

## Recent Comments