You are currently browsing the tag archive for the ‘Szemeredi’s theorem’ tag.
Ben Green and I have just uploaded to the arXiv our paper “New bounds for Szemeredi’s theorem, Ia: Progressions of length 4 in finite field geometries revisited“, submitted to Proc. Lond. Math. Soc.. This is both an erratum to, and a replacement for, our previous paper “New bounds for Szemeredi’s theorem. I. Progressions of length 4 in finite field geometries“. The main objective in both papers is to bound the quantity for a vector space
over a finite field
of characteristic greater than
, where
is defined as the cardinality of the largest subset of
that does not contain an arithmetic progression of length
. In our earlier paper, we gave two arguments that bounded
in the regime when the field
was fixed and
was large. The first “cheap” argument gave the bound
and the more complicated “expensive” argument gave the improvement
depending only on
.
Unfortunately, while the cheap argument is correct, we discovered a subtle but serious gap in our expensive argument in the original paper. Roughly speaking, the strategy in that argument is to employ the density increment method: one begins with a large subset of
that has no arithmetic progressions of length
, and seeks to locate a subspace on which
has a significantly increased density. Then, by using a “Koopman-von Neumann theorem”, ultimately based on an iteration of the inverse
theorem of Ben and myself (and also independently by Samorodnitsky), one approximates
by a “quadratically structured” function
, which is (locally) a combination of a bounded number of quadratic phase functions, which one can prepare to be in a certain “locally equidistributed” or “locally high rank” form. (It is this reduction to the high rank case that distinguishes the “expensive” argument from the “cheap” one.) Because
has no progressions of length
, the count of progressions of length
weighted by
will also be small; by combining this with the theory of equidistribution of quadratic phase functions, one can then conclude that there will be a subspace on which
has increased density.
The error in the paper was to conclude from this that the original function also had increased density on the same subspace; it turns out that the manner in which
approximates
is not strong enough to deduce this latter conclusion from the former. (One can strengthen the nature of approximation until one restores such a conclusion, but only at the price of deteriorating the quantitative bounds on
one gets at the end of the day to be worse than the cheap argument.)
After trying unsuccessfully to repair this error, we eventually found an alternate argument, based on earlier papers of ourselves and of Bergelson-Host-Kra, that avoided the density increment method entirely and ended up giving a simpler proof of a stronger result than (1), and also gives the explicit value of for the exponent
in (1). In fact, it gives the following stronger result:
Theorem 1 Let
be a subset of
of density at least
, and let
. Then there is a subspace
of
of codimension
such that the number of (possibly degenerate) progressions
in
is at least
.
The bound (1) is an easy consequence of this theorem after choosing and removing the degenerate progressions from the conclusion of the theorem.
The main new idea is to work with a local Koopman-von Neumann theorem rather than a global one, trading a relatively weak global approximation to with a significantly stronger local approximation to
on a subspace
. This is somewhat analogous to how sometimes in graph theory it is more efficient (from the point of view of quantative estimates) to work with a local version of the Szemerédi regularity lemma which gives just a single regular pair of cells, rather than attempting to regularise almost all of the cells. This local approach is well adapted to the inverse
theorem we use (which also has this local aspect), and also makes the reduction to the high rank case much cleaner. At the end of the day, one ends up with a fairly large subspace
on which
is quite dense (of density
) and which can be well approximated by a “pure quadratic” object, namely a function of a small number of quadratic phases obeying a high rank condition. One can then exploit a special positivity property of the count of length four progressions weighted by pure quadratic objects, essentially due to Bergelson-Host-Kra, which then gives the required lower bound.
A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).
As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:
Theorem 1 (Szemerédi’s theorem) Let
be a set of integers of positive upper density, thus
, where
. Then
contains an arithmetic progression of length
for any
.
Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.
In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)
Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions , but also progressions of progressions
, progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set
(arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length
that consists entirely of elements of
.
To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression of progressions of length
, with each progression
,
being quite long, and containing a near-maximal amount of elements of
, thus
where is the “maximal density” of
along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as
instead.) By hypothesis,
is positive. The objective is then to locate a progression
in
, with each
in
for
. It may help to view the progression of progressions
as a tall thin rectangle
.
If we write for
, then the problem is equivalent to finding a (possibly degenerate) arithmetic progression
, with each
in
.
By hypothesis, we know already that each set has density about
in
:
, which roughly speaking asserts that
of
of density
of a certain form to be specified shortly. This is a plausible type of assumption if one believes
to behave like a random set, and if the sets
are constructed “independently” of the
in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions
of the desired form.
We will inductively consider the following (nonrigorously defined) sequence of claims for each
:
-
: For most choices of
, there are
arithmetic progressions
in
with the specified choice of
, such that
for all
.
(Actually, to avoid boundary issues one should restrict to lie in the middle third of
, rather than near the edges, but let us ignore this minor technical detail.) The quantity
is natural here, given that there are
arithmetic progressions
in
that pass through
in the
position, and that each one ought to have a probability of
or so that the events
simultaneously hold.) If one has the claim
, then by selecting a typical
in
, we obtain a progression
with
for all
, as required. (In fact, we obtain about
such progressions by this method.)
We can heuristically justify the claims by induction on
. For
, the claims
are clear just from direct counting of progressions (as long as we keep
away from the edges of
). Now suppose that
, and the claims
have already been proven. For any
and for most
, we have from hypothesis that there are
progressions
in
through
with
. Let
be the set of all the values of
attained by these progressions, then
. Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of
, we have
which then gives the desired claim .
The observant reader will note that we only needed the claim in the case
for the above argument, but for technical reasons, the full proof requires one to work with more general values of
(also the claim
needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).
We now return to the question of how to justify the weak mixing hypothesis (2). For a single block of
, one can easily concoct a scenario in which this hypothesis fails, by choosing
to overlap with
too strongly, or to be too disjoint from
. However, one can do better if one can select
from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:
Proposition 2 (Single upper bound) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). Let
be a subset of
of density
. Then (if
is large enough) one can find an
such that
Proof: The key is the double counting identity
Because has maximal density
and
is large, we have
for each , and thus
The claim then follows from the pigeonhole principle.
Now suppose we want to obtain weak mixing not just for a single set , but for a small number
of such sets, i.e. we wish to find an
for which
, where
is the density of
in
. The above proposition gives, for each
, a choice of
for which (3) holds, but it could be a different
for each
, and so it is not immediately obvious how to use Proposition 2 to find an
for which (3) holds simultaneously for all
. However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:
Proposition 3 (Multiple upper bound) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). For each
, let
be a subset of
of density
. Then (if
is large enough depending on
) one can find an
such that
simultaneously for all
.
Proof: Suppose that the claim failed (for some suitably large ). Then, for each
, there exists
such that
This can be viewed as a colouring of the interval by
colours. If we take
large compared to
, van der Waerden’s theorem allows us to then find a long subprogression of
which is monochromatic, so that
is constant on this progression. But then this will furnish a counterexample to Proposition 2.
One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:
Proposition 4 (Multiple mixing) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). For each
, let
be a subset of
of density
. Then (if
is large enough depending on
) one can find an
such that
simultaneously for all
.
Proof: By applying the previous proposition to the collection of sets and their complements
(thus replacing
with
, one can find an
for which
and
which gives the claim.
However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number of sets
for which mixing is established is too small compared with the length
of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of
to be mixed (at the cost of excluding a small fraction of exceptions):
Proposition 5 (Really multiple mixing) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). For each
in some (large) finite set
, let
be a subset of
of density
. Then (if
is large enough, but not dependent on the size of
) one can find an
such that
simultaneously for almost all
.
Proof: We build a bipartite graph connecting the progression
to the finite set
by placing an edge
between an element
and an element
whenever
. The number
can then be interpreted as the degree of
in this graph, while the number
is the number of neighbours of
that land in
.
We now apply the regularity lemma to this graph . Roughly speaking, what this lemma does is to partition
and
into almost equally sized cells
and
such that for most pairs
of cells, the graph
resembles a random bipartite graph of some density
between these two cells. The key point is that the number
of cells here is bounded uniformly in the size of
and
. As a consequence of this lemma, one can show that for most vertices
in a typical cell
, the number
is approximately equal to
and the number is approximately equal to
The point here is that the different statistics
are now controlled by a mere
statistics
(this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an
for which
simultaneously for all , and the claim follows.
This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions of progressions containing a near-maximal density of elements of
, we will now have to work with a family
of such progression of progressions, where
ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given
, it should be possible to find many reasonably large subfamilies of this family for which the
terms
of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets
in Proposition 5 are not allowed to depend on
, one also needs the progressions
for any given
to be “similar” in the sense that they intersect
in the same fashion (thus the sets
as
varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set
in order to gain good mixing properties for at least one of the sets
in the progression of progressions.
Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank than the rank
progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length
progressions, one has to consider generalised progressions of rank as high as
. It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by
in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank
, a family of large generalised progressions of rank
which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of
. (But getting this sort of property for all values of
simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index
, one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank
that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)
In 1977, Furstenberg established his multiple recurrence theorem:
Theorem 1 (Furstenberg multiple recurrence) Let
be a measure-preserving system, thus
is a probability space and
is a measure-preserving bijection such that
and
are both measurable. Let
be a measurable subset of
of positive measure
. Then for any
, there exists
such that
Equivalently, there exists
and
such that
As is well known, the Furstenberg multiple recurrence theorem is equivalent to Szemerédi’s theorem, thanks to the Furstenberg correspondence principle; see for instance these lecture notes of mine.
The multiple recurrence theorem is proven, roughly speaking, by an induction on the “complexity” of the system . Indeed, for very simple systems, such as periodic systems (in which
is the identity for some
, which is for instance the case for the circle shift
,
with a rational shift
), the theorem is trivial; at a slightly more advanced level, almost periodic (or compact) systems (in which
is a precompact subset of
for every
, which is for instance the case for irrational circle shifts), is also quite easy. One then shows that the multiple recurrence property is preserved under various extension operations (specifically, compact extensions, weakly mixing extensions, and limits of chains of extensions), which then gives the multiple recurrence theorem as a consequence of the Furstenberg-Zimmer structure theorem for measure-preserving systems. See these lecture notes for further discussion.
From a high-level perspective, this is still one of the most conceptual proofs known of Szemerédi’s theorem. However, the individual components of the proof are still somewhat intricate. Perhaps the most difficult step is the demonstration that the multiple recurrence property is preserved under compact extensions; see for instance these lecture notes, which is devoted entirely to this step. This step requires quite a bit of measure-theoretic and/or functional analytic machinery, such as the theory of disintegrations, relatively almost periodic functions, or Hilbert modules.
However, I recently realised that there is a special case of the compact extension step – namely that of finite extensions – which avoids almost all of these technical issues while still capturing the essence of the argument (and in particular, the key idea of using van der Waerden’s theorem). As such, this may serve as a pedagogical device for motivating this step of the proof of the multiple recurrence theorem.
Let us first explain what a finite extension is. Given a measure-preserving system , a finite set
, and a measurable map
from
to the permutation group of
, one can form the finite extension
which as a probability space is the product of with the finite probability space
(with the discrete
-algebra and uniform probability measure), and with shift map
as the cocycle of the system.
An example of finite extensions comes from group theory. Suppose we have a short exact sequence
of finite groups. Let be a group element of
, and let
be its projection in
. Then the shift map
on
(with the discrete
-algebra and uniform probability measure) can be viewed as a finite extension of the shift map
on
(again with the discrete
-algebra and uniform probability measure), by arbitrarily selecting a section
that inverts the projection map, identifying
with
by identifying
with
for
, and using the cocycle
Thus, for instance, the unit shift on
can be thought of as a finite extension of the unit shift
on
whenever
is a multiple of
.
Another example comes from Riemannian geometry. If is a Riemannian manifold that is a finite cover of another Riemannian manifold
(with the metric on
being the pullback of that on
), then (unit time) geodesic flow on the cosphere bundle of
is a finite extension of the corresponding flow on
.
Here, then, is the finite extension special case of the compact extension step in the proof of the multiple recurrence theorem:
Proposition 2 (Finite extensions) Let
be a finite extension of a measure-preserving system
. If
obeys the conclusion of the Furstenberg multiple recurrence theorem, then so does
.
Before we prove this proposition, let us first give the combinatorial analogue.
Lemma 3 Let
be a subset of the integers that contains arbitrarily long arithmetic progressions, and let
be a colouring of
by
colours (or equivalently, a partition of
into
colour classes
). Then at least one of the
contains arbitrarily long arithmetic progressions.
Proof: By the infinite pigeonhole principle, it suffices to show that for each , one of the colour classes
contains an arithmetic progression of length
.
Let be a large integer (depending on
and
) to be chosen later. Then
contains an arithmetic progression of length
, which may be identified with
. The colouring of
then induces a colouring on
into
colour classes. Applying (the finitary form of) van der Waerden’s theorem, we conclude that if
is sufficiently large depending on
and
, then one of these colouring classes contains an arithmetic progression of length
; undoing the identification, we conclude that one of the
contains an arithmetic progression of length
, as desired.
Of course, by specialising to the case , we see that the above Lemma is in fact equivalent to van der Waerden’s theorem.
Now we prove Proposition 2.
Proof: Fix . Let
be a positive measure subset of
. By Fubini’s theorem, we have
where and
is the fibre of
at
. Since
is positive, we conclude that the set
is a positive measure subset of . Note for each
, we can find an element
such that
. While not strictly necessary for this argument, one can ensure if one wishes that the function
is measurable by totally ordering
, and then letting
the minimal element of
for which
.
Let be a large integer (which will depend on
and the cardinality
of
) to be chosen later. Because
obeys the multiple recurrence theorem, we can find a positive integer
and
such that
Now consider the sequence of points
for . From (1), we see that
. This can be viewed as a colouring of
by
colours, where
is the cardinality of
. Applying van der Waerden’s theorem, we conclude (if
is sufficiently large depending on
and
) that there is an arithmetic progression
in
with
such that
for some . If we then let
, we see from (2) that
for all , and the claim follows.
Remark 1 The precise connection between Lemma 3 and Proposition 2 arises from the following observation: with
as in the proof of Proposition 2, and
, the set
can be partitioned into the classes
where
is the graph of
. The multiple recurrence property for
ensures that
contains arbitrarily long arithmetic progressions, and so therefore one of the
must also, which gives the multiple recurrence property for
.
Remark 2 Compact extensions can be viewed as a generalisation of finite extensions, in which the fibres are no longer finite sets, but are themselves measure spaces obeying an additional property, which roughly speaking asserts that for many functions
on the extension, the shifts
of
behave in an almost periodic fashion on most fibres, so that the orbits
become totally bounded on each fibre. This total boundedness allows one to obtain an analogue of the above colouring map
to which van der Waerden’s theorem can be applied.
In Notes 3, we saw that the number of additive patterns in a given set was (in principle, at least) controlled by the Gowers uniformity norms of functions associated to that set.
Such norms can be defined on any finite additive group (and also on some other types of domains, though we will not discuss this point here). In particular, they can be defined on the finite-dimensional vector spaces over a finite field
.
In this case, the Gowers norms are closely tied to the space
of polynomials of degree at most
. Indeed, as noted in Exercise 20 of Notes 4, a function
of
norm
has
norm equal to
if and only if
for some
; thus polynomials solve the “
inverse problem” for the trivial inequality
. They are also a crucial component of the solution to the “
inverse problem” and “
inverse problem”. For the former, we will soon show:
Proposition 1 (
inverse theorem for
) Let
be such that
and
for some
. Then there exists
such that
, where
is a constant depending only on
.
Thus, for the Gowers norm to be almost completely saturated, one must be very close to a polynomial. The converse assertion is easily established:
Exercise 1 (Converse to
inverse theorem for
) If
and
for some
, then
, where
is a constant depending only on
.
In the world, one no longer expects to be close to a polynomial. Instead, one expects to correlate with a polynomial. Indeed, one has
Lemma 2 (Converse to the
inverse theorem for
) If
and
are such that
, where
, then
.
Proof: From the definition of the norm (equation (18) from Notes 3), the monotonicity of the Gowers norms (Exercise 19 of Notes 3), and the polynomial phase modulation invariance of the Gowers norms (Exercise 21 of Notes 3), one has
and the claim follows.
In the high characteristic case at least, this can be reversed:
Theorem 3 (
inverse theorem for
) Suppose that
. If
is such that
and
, then there exists
such that
.
This result is sometimes referred to as the inverse conjecture for the Gowers norm (in high, but bounded, characteristic). For small , the claim is easy:
Exercise 2 Verify the cases
of this theorem. (Hint: to verify the
case, use the Fourier-analytic identities
and
, where
is the space of all homomorphisms
from
to
, and
are the Fourier coefficients of
.)
This conjecture for larger values of are more difficult to establish. The
case of the theorem was established by Ben Green and myself in the high characteristic case
; the low characteristic case
was independently and simultaneously established by Samorodnitsky. The cases
in the high characteristic case was established in two stages, firstly using a modification of the Furstenberg correspondence principle, due to Ziegler and myself. to convert the problem to an ergodic theory counterpart, and then using a modification of the methods of Host-Kra and Ziegler to solve that counterpart, as done in this paper of Bergelson, Ziegler, and myself.
The situation with the low characteristic case in general is still unclear. In the high characteristic case, we saw from Notes 4 that one could replace the space of non-classical polynomials in the above conjecture with the essentially equivalent space of classical polynomials
. However, as we shall see below, this turns out not to be the case in certain low characteristic cases (a fact first observed by Lovett, Meshulam, and Samorodnitsky, and independently by Ben Green and myself), for instance if
and
; this is ultimately due to the existence in those cases of non-classical polynomials which exhibit no significant correlation with classical polynomials of equal or lesser degree. This distinction between classical and non-classical polynomials appears to be a rather non-trivial obstruction to understanding the low characteristic setting; it may be necessary to obtain a more complete theory of non-classical polynomials in order to fully settle this issue.
The inverse conjecture has a number of consequences. For instance, it can be used to establish the analogue of Szemerédi’s theorem in this setting:
Theorem 4 (Szemerédi’s theorem for finite fields) Let
be a finite field, let
, and let
be such that
. If
is sufficiently large depending on
, then
contains an (affine) line
for some
with
.
Exercise 3 Use Theorem 4 to establish the following generalisation: with the notation as above, if
and
is sufficiently large depending on
, then
contains an affine
-dimensional subspace.
We will prove this theorem in two different ways, one using a density increment method, and the other using an energy increment method. We discuss some other applications below the fold.
Ben Green, and I have just uploaded to the arXiv a short (six-page) paper “Yet another proof of Szemeredi’s theorem“, submitted to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we put in print a folklore observation, namely that the inverse conjecture for the Gowers norm, together with the density increment argument, easily implies Szemerédi’s famous theorem on arithmetic progressions. This is unsurprising, given that Gowers’ proof of Szemerédi’s theorem proceeds through a weaker version of the inverse conjecture and a density increment argument, and also given that it is possible to derive Szemerédi’s theorem from knowledge of the characteristic factor for multiple recurrence (the ergodic theory analogue of the inverse conjecture, first established by Host and Kra), as was done by Bergelson, Leibman, and Lesigne (and also implicitly in the earlier paper of Bergelson, Host, and Kra); but to our knowledge the exact derivation of Szemerédi’s theorem from the inverse conjecture was not in the literature. Ordinarily this type of folklore might be considered too trifling (and too well known among experts in the field) to publish; but we felt that the venue of the Szemerédi birthday conference provided a natural venue for this particular observation.
The key point is that one can show (by an elementary argument relying primarily an induction on dimension argument and the Weyl recurrence theorem, i.e. that given any real and any integer
, that the expression
gets arbitrarily close to an integer) that given a (polynomial) nilsequence
, one can subdivide any long arithmetic progression (such as
) into a number of medium-sized progressions, where the nilsequence is nearly constant on each progression. As a consequence of this and the inverse conjecture for the Gowers norm, if a set has no arithmetic progressions, then it must have an elevated density on a subprogression; iterating this observation as per the usual density-increment argument as introduced long ago by Roth, one obtains the claim. (This is very close to the scheme of Gowers’ proof.)
Technically, one might call this the shortest proof of Szemerédi’s theorem in the literature (and would be something like the sixteenth such genuinely distinct proof, by our count), but that would be cheating quite a bit, primarily due to the fact that it assumes the inverse conjecture for the Gowers norm, our current proof of which is checking in at about 100 pages…
Ben Green, and I have just uploaded to the arXiv our paper “An arithmetic regularity lemma, an associated counting lemma, and applications“, submitted (a little behind schedule) to the 70th birthday conference proceedings for Endre Szemerédi. In this paper we describe the general-degree version of the arithmetic regularity lemma, which can be viewed as the counterpart of the Szemerédi regularity lemma, in which the object being regularised is a function on a discrete interval
rather than a graph, and the type of patterns one wishes to count are additive patterns (such as arithmetic progressions
) rather than subgraphs. Very roughly speaking, this regularity lemma asserts that all such functions can be decomposed as a degree
nilsequence (or more precisely, a variant of a nilsequence that we call an virtual irrational nilsequence), plus a small error, plus a third error which is extremely tiny in the Gowers uniformity norm
. In principle, at least, the latter two errors can be readily discarded in applications, so that the regularity lemma reduces many questions in additive combinatorics to questions concerning (virtual irrational) nilsequences. To work with these nilsequences, we also establish a arithmetic counting lemma that gives an integral formula for counting additive patterns weighted by such nilsequences.
The regularity lemma is a manifestation of the “dichotomy between structure and randomness”, as discussed for instance in my ICM article or FOCS article. In the degree case
, this result is essentially due to Green. It is powered by the inverse conjecture for the Gowers norms, which we and Tamar Ziegler have recently established (paper to be forthcoming shortly; the
case of our argument is discussed here). The counting lemma is established through the quantitative equidistribution theory of nilmanifolds, which Ben and I set out in this paper.
The regularity and counting lemmas are designed to be used together, and in the paper we give three applications of this combination. Firstly, we give a new proof of Szemerédi’s theorem, which proceeds via an energy increment argument rather than a density increment one. Secondly, we establish a conjecture of Bergelson, Host, and Kra, namely that if has density
, and
, then there exist
shifts
for which
contains at least
arithmetic progressions of length
of spacing
. (The
case of this conjecture was established earlier by Green; the
case is false, as was shown by Ruzsa in an appendix to the Bergelson-Host-Kra paper.) Thirdly, we establish a variant of a recent result of Gowers-Wolf, showing that the true complexity of a system of linear forms over
indeed matches the conjectured value predicted in their first paper.
In all three applications, the scheme of proof can be described as follows:
- Apply the arithmetic regularity lemma, and decompose a relevant function
into three pieces,
.
- The uniform part
is so tiny in the Gowers uniformity norm that its contribution can be easily dealt with by an appropriate “generalised von Neumann theorem”.
- The contribution of the (virtual, irrational) nilsequence
can be controlled using the arithmetic counting lemma.
- Finally, one needs to check that the contribution of the small error
does not overwhelm the main term
. This is the trickiest bit; one often needs to use the counting lemma again to show that one can find a set of arithmetic patterns for
that is so sufficiently “equidistributed” that it is not impacted by the small error.
To illustrate the last point, let us give the following example. Suppose we have a set of some positive density (say
) and we have managed to prove that
contains a reasonable number of arithmetic progressions of length
(say), e.g. it contains at least
such progressions. Now we perturb
by deleting a small number, say
, elements from
to create a new set
. Can we still conclude that the new set
contains any arithmetic progressions of length
?
Unfortunately, the answer could be no; conceivably, all of the arithmetic progressions in
could be wiped out by the
elements removed from
, since each such element of
could be associated with up to
(or even
) arithmetic progressions in
.
But suppose we knew that the arithmetic progressions in
were equidistributed, in the sense that each element in
belonged to the same number of such arithmetic progressions, namely
. Then each element deleted from
only removes at most
progressions, and so one can safely remove
elements from
and still retain some arithmetic progressions. The same argument works if the arithmetic progressions are only approximately equidistributed, in the sense that the number of progressions that a given element
belongs to concentrates sharply around its mean (for instance, by having a small variance), provided that the equidistribution is sufficiently strong. Fortunately, the arithmetic regularity and counting lemmas are designed to give precisely such a strong equidistribution result.
A succinct (but slightly inaccurate) summation of the regularity+counting lemma strategy would be that in order to solve a problem in additive combinatorics, it “suffices to check it for nilsequences”. But this should come with a caveat, due to the issue of the small error above; in addition to checking it for nilsequences, the answer in the nilsequence case must be sufficiently “dispersed” in a suitable sense, so that it can survive the addition of a small (but not completely negligible) perturbation.
One last “production note”. Like our previous paper with Emmanuel Breuillard, we used Subversion to write this paper, which turned out to be a significant efficiency boost as we could work on different parts of the paper simultaneously (this was particularly important this time round as the paper was somewhat lengthy and complicated, and there was a submission deadline). When doing so, we found it convenient to split the paper into a dozen or so pieces (one for each section of the paper, basically) in order to avoid conflicts, and to help coordinate the writing process. I’m also looking into git (a more advanced version control system), and am planning to use it for another of my joint projects; I hope to be able to comment on the relative strengths of these systems (and with plain old email) in the future.
In this lecture, we describe the simple but fundamental Furstenberg correspondence principle which connects the “soft analysis” subject of ergodic theory (in particular, recurrence theorems) with the “hard analysis” subject of combinatorial number theory (or more generally with results of “density Ramsey theory” type). Rather than try to set up the most general and abstract version of this principle, we shall instead study the canonical example of this principle in action, namely the equating of the Furstenberg multiple recurrence theorem with Szemerédi’s theorem on arithmetic progressions.
Read the rest of this entry »
A few months ago, I was invited to contribute an article to Scholarpedia – a Wikipedia-like experiment (using essentially the same software, in fact) in which the articles are far fewer in number, but have specialists as the primary authors (and curators) and are peer-reviewed in a manner similar to submissions to a research journal. Specifically, I was invited (with Ben Green) to author the article on Szemerédi’s theorem. The article is now submitted (awaiting review) reviewed and accepted, and can be viewed on the Scholarpedia page for that theorem. Like Wikipedia, the page is open to edits or any other comments by any user (once they register an account and login); but the edits are moderated by the curators and primary authors, who thus remain responsible for the content.
Scholarpedia seems to be an interesting experiment, trying to blend the collaborative and dynamic strengths of the wiki system with the traditional and static strengths of the peer-review system. At any rate, any feedback on my article with Ben, either at the Scholarpedia page or here, would be welcome.
[Update, July 9: the article has been reviewed, modified, and accepted in just three days - a blindingly fast speed as far as peer review goes!]
In this second lecture, I wish to talk about the dichotomy between structure and randomness as it manifests itself in four closely related areas of mathematics:
- Combinatorial number theory, which seeks to find patterns in unstructured dense sets (or colourings) of integers;
- Ergodic theory (or more specifically, multiple recurrence theory), which seeks to find patterns in positive-measure sets under the action of a discrete dynamical system on probability spaces (or more specifically, measure-preserving actions of the integers
);
- Graph theory, or more specifically the portion of this theory concerned with finding patterns in large unstructured dense graphs; and
- Ergodic graph theory, which is a very new and undeveloped subject, which roughly speaking seems to be concerned with the patterns within a measure-preserving action of the infinite permutation group
, which is one of several models we have available to study infinite “limits” of graphs.
The two “discrete” (or “finitary”, or “quantitative”) fields of combinatorial number theory and graph theory happen to be related to each other, basically by using the Cayley graph construction; I will give an example of this shortly. The two “continuous” (or “infinitary”, or “qualitative”) fields of ergodic theory and ergodic graph theory are at present only related on the level of analogy and informal intuition, but hopefully some more systematic connections between them will appear soon.
On the other hand, we have some very rigorous connections between combinatorial number theory and ergodic theory, and also (more recently) between graph theory and ergodic graph theory, basically by the procedure of viewing the infinitary continuous setting as a limit of the finitary discrete setting. These two connections go by the names of the Furstenberg correspondence principle and the graph correspondence principle respectively. These principles allow one to tap the power of the infinitary world (for instance, the ability to take limits and perform completions or closures of objects) in order to establish results in the finitary world, or at least to take the intuition gained in the infinitary world and transfer it to a finitary setting. Conversely, the finitary world provides an excellent model setting to refine one’s understanding of infinitary objects, for instance by establishing quantitative analogues of “soft” results obtained in an infinitary manner. I will remark here that this best-of-both-worlds approach, borrowing from both the finitary and infinitary traditions of mathematics, was absolutely necessary for Ben Green and I in order to establish our result on long arithmetic progressions in the primes. In particular, the infinitary setting is excellent for being able to rigorously define and study concepts (such as structure or randomness) which are much “fuzzier” and harder to pin down exactly in the finitary world.
Earlier this month, in the previous incarnation of this page, I posed a question which I thought was unsolved, and obtained the answer (in fact, it was solved 25 years ago) within a week. Now that this new version of the page has better feedback capability, I am now tempted to try again, since I have a large number of such questions which I would like to publicise. (Actually, I even have a secret web page full of these somewhere near my home page, though it will take a non-trivial amount of effort to find it!)
Perhaps my favourite open question is the problem on the maximal size of a cap set – a subset of (
being the finite field of three elements) which contains no lines, or equivalently no non-trivial arithmetic progressions of length three. As an upper bound, one can easily modify the proof of Roth’s theorem to show that cap sets must have size
(see e.g. this paper of Meshulam). This of course is better than the trivial bound of
once n is large. In the converse direction, the trivial example
shows that cap sets can be as large as
; the current world record is
, held by Edel. The gap between these two bounds is rather enormous; I would be very interested in either an improvement of the upper bound to
, or an improvement of the lower bound to
. (I believe both improvements are true, though a good friend of mine disagrees about the improvement to the lower bound.)

Recent Comments