A few days ago, Endre Szemerédi was awarded the 2012 Abel prize “for his fundamental contributions to discrete mathematics and theoretical computer science, and in recognition of the profound and lasting impact of these contributions on additive number theory and ergodic theory.” The full citation for the prize may be found here, and the written notes for a talk given by Tim Gowers on Endre’s work at the announcement may be found here (and video of the talk can be found here).
As I was on the Abel prize committee this year, I won’t comment further on the prize, but will instead focus on what is arguably Endre’s most well known result, namely Szemerédi’s theorem on arithmetic progressions:
Theorem 1 (Szemerédi’s theorem) Let
be a set of integers of positive upper density, thus
, where
. Then
contains an arithmetic progression of length
for any
.
Szemerédi’s original proof of this theorem is a remarkably intricate piece of combinatorial reasoning. Most proofs of theorems in mathematics – even long and difficult ones – generally come with a reasonably compact “high-level” overview, in which the proof is (conceptually, at least) broken down into simpler pieces. There may well be technical difficulties in formulating and then proving each of the component pieces, and then in fitting the pieces together, but usually the “big picture” is reasonably clear. To give just one example, the overall strategy of Perelman’s proof of the Poincaré conjecture can be briefly summarised as follows: to show that a simply connected three-dimensional manifold is homeomorphic to a sphere, place a Riemannian metric on it and perform Ricci flow, excising any singularities that arise by surgery, until the entire manifold becomes extinct. By reversing the flow and analysing the surgeries performed, obtain enough control on the topology of the original manifold to establish that it is a topological sphere.
In contrast, the pieces of Szemerédi’s proof are highly interlocking, particularly with regard to all the epsilon-type parameters involved; it takes quite a bit of notational setup and foundational lemmas before the key steps of the proof can even be stated, let alone proved. Szemerédi’s original paper contains a logical diagram of the proof (reproduced in Gowers’ recent talk) which already gives a fair indication of this interlocking structure. (Many years ago I tried to present the proof, but I was unable to find much of a simplification, and my exposition is probably not that much clearer than the original text.) Even the use of nonstandard analysis, which is often helpful in cleaning up armies of epsilons, turns out to be a bit tricky to apply here. (In typical applications of nonstandard analysis, one can get by with a single nonstandard universe, constructed as an ultrapower of the standard universe; but to correctly model all the epsilons occuring in Szemerédi’s argument, one needs to repeatedly perform the ultrapower construction to obtain a (finite) sequence of increasingly nonstandard (and increasingly saturated) universes, each one containing unbounded quantities that are far larger than any quantity that appears in the preceding universe, as discussed at the end of this previous blog post. This sequence of universes does end up concealing all the epsilons, but it is not so clear that this is a net gain in clarity for the proof; I may return to the nonstandard presentation of Szemeredi’s argument at some future juncture.)
Instead of trying to describe the entire argument here, I thought I would instead show some key components of it, with only the slightest hint as to how to assemble the components together to form the whole proof. In particular, I would like to show how two particular ingredients in the proof – namely van der Waerden’s theorem and the Szemerédi regularity lemma – become useful. For reasons that will hopefully become clearer later, it is convenient not only to work with ordinary progressions , but also progressions of progressions
, progressions of progressions of progressions, and so forth. (In additive combinatorics, these objects are known as generalised arithmetic progressions of rank one, two, three, etc., and play a central role in the subject, although the way they are used in Szemerédi’s proof is somewhat different from the way that they are normally used in additive combinatorics.) Very roughly speaking, Szemerédi’s proof begins by building an enormous generalised arithmetic progression of high rank containing many elements of the set
(arranged in a “near-maximal-density” configuration), and then steadily prunes this progression to improve the combinatorial properties of the configuration, until one ends up with a single rank one progression of length
that consists entirely of elements of
.
To illustrate some of the basic ideas, let us first consider a situation in which we have located a progression of progressions of length
, with each progression
,
being quite long, and containing a near-maximal amount of elements of
, thus
where is the “maximal density” of
along arithmetic progressions. (There are a lot of subtleties in the argument about exactly how good the error terms are in various approximations, but we will ignore these issues for the sake of this discussion and just use the imprecise symbols such as
instead.) By hypothesis,
is positive. The objective is then to locate a progression
in
, with each
in
for
. It may help to view the progression of progressions
as a tall thin rectangle
.
If we write for
, then the problem is equivalent to finding a (possibly degenerate) arithmetic progression
, with each
in
.
By hypothesis, we know already that each set has density about
in
:
Let us now make a “weakly mixing” assumption on the , which roughly speaking asserts that
for “most” subsets of
of density
of a certain form to be specified shortly. This is a plausible type of assumption if one believes
to behave like a random set, and if the sets
are constructed “independently” of the
in some sense. Of course, we do not expect such an assumption to be valid all of the time, but we will postpone consideration of this point until later. Let us now see how this sort of weakly mixing hypothesis could help one count progressions
of the desired form.
We will inductively consider the following (nonrigorously defined) sequence of claims for each
:
-
: For most choices of
, there are
arithmetic progressions
in
with the specified choice of
, such that
for all
.
(Actually, to avoid boundary issues one should restrict to lie in the middle third of
, rather than near the edges, but let us ignore this minor technical detail.) The quantity
is natural here, given that there are
arithmetic progressions
in
that pass through
in the
position, and that each one ought to have a probability of
or so that the events
simultaneously hold.) If one has the claim
, then by selecting a typical
in
, we obtain a progression
with
for all
, as required. (In fact, we obtain about
such progressions by this method.)
We can heuristically justify the claims by induction on
. For
, the claims
are clear just from direct counting of progressions (as long as we keep
away from the edges of
). Now suppose that
, and the claims
have already been proven. For any
and for most
, we have from hypothesis that there are
progressions
in
through
with
. Let
be the set of all the values of
attained by these progressions, then
. Invoking the weak mixing hypothesis, we (heuristically, at least) conclude that for most choices of
, we have
which then gives the desired claim .
The observant reader will note that we only needed the claim in the case
for the above argument, but for technical reasons, the full proof requires one to work with more general values of
(also the claim
needs to be replaced by a more complicated version of itself, but let’s ignore this for sake of discussion).
We now return to the question of how to justify the weak mixing hypothesis (2). For a single block of
, one can easily concoct a scenario in which this hypothesis fails, by choosing
to overlap with
too strongly, or to be too disjoint from
. However, one can do better if one can select
from a long progression of blocks. The starting point is the following simple double counting observation that gives the right upper bound:
Proposition 2 (Single upper bound) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). Let
be a subset of
of density
. Then (if
is large enough) one can find an
such that
Proof: The key is the double counting identity
Because has maximal density
and
is large, we have
for each , and thus
The claim then follows from the pigeonhole principle.
Now suppose we want to obtain weak mixing not just for a single set , but for a small number
of such sets, i.e. we wish to find an
for which
for all , where
is the density of
in
. The above proposition gives, for each
, a choice of
for which (3) holds, but it could be a different
for each
, and so it is not immediately obvious how to use Proposition 2 to find an
for which (3) holds simultaneously for all
. However, it turns out that the van der Waerden theorem is the perfect tool for this amplification:
Proposition 3 (Multiple upper bound) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). For each
, let
be a subset of
of density
. Then (if
is large enough depending on
) one can find an
such that
simultaneously for all
.
Proof: Suppose that the claim failed (for some suitably large ). Then, for each
, there exists
such that
This can be viewed as a colouring of the interval by
colours. If we take
large compared to
, van der Waerden’s theorem allows us to then find a long subprogression of
which is monochromatic, so that
is constant on this progression. But then this will furnish a counterexample to Proposition 2.
One nice thing about this proposition is that the upper bounds can be automatically upgraded to an asymptotic:
Proposition 4 (Multiple mixing) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). For each
, let
be a subset of
of density
. Then (if
is large enough depending on
) one can find an
such that
simultaneously for all
.
Proof: By applying the previous proposition to the collection of sets and their complements
(thus replacing
with
, one can find an
for which
and
which gives the claim.
However, this improvement of Proposition 2 turns out to not be strong enough for applications. The reason is that the number of sets
for which mixing is established is too small compared with the length
of the progression one has to use in order to obtain that mixing. However, thanks to the magic of the Szemerédi regularity lemma, one can amplify the above proposition even further, to allow for a huge number of
to be mixed (at the cost of excluding a small fraction of exceptions):
Proposition 5 (Really multiple mixing) Let
be a progression of progressions
for some large
. Suppose that for each
, the set
has density
in
(i.e. (1) holds). For each
in some (large) finite set
, let
be a subset of
of density
. Then (if
is large enough, but not dependent on the size of
) one can find an
such that
simultaneously for almost all
.
Proof: We build a bipartite graph connecting the progression
to the finite set
by placing an edge
between an element
and an element
whenever
. The number
can then be interpreted as the degree of
in this graph, while the number
is the number of neighbours of
that land in
.
We now apply the regularity lemma to this graph . Roughly speaking, what this lemma does is to partition
and
into almost equally sized cells
and
such that for most pairs
of cells, the graph
resembles a random bipartite graph of some density
between these two cells. The key point is that the number
of cells here is bounded uniformly in the size of
and
. As a consequence of this lemma, one can show that for most vertices
in a typical cell
, the number
is approximately equal to
and the number is approximately equal to
The point here is that the different statistics
are now controlled by a mere
statistics
(this is not unlike the use of principal component analysis in statistics, incidentally, but that is another story). Now, we invoke Proposition 4 to find an
for which
simultaneously for all , and the claim follows.
This proposition now suggests a way forward to establish the type of mixing properties (2) needed for the preceding attempt at proving Szemerédi’s theorem to actually work. Whereas in that attempt, we were working with a single progression of progressions of progressions containing a near-maximal density of elements of
, we will now have to work with a family
of such progression of progressions, where
ranges over some suitably large parameter set. Furthermore, in order to invoke Proposition 5, this family must be “well-arranged” in some arithmetic sense; in particular, for a given
, it should be possible to find many reasonably large subfamilies of this family for which the
terms
of the progression of progressions in this subfamily are themselves in arithmetic progression. (Also, for technical reasons having to do with the fact that the sets
in Proposition 5 are not allowed to depend on
, one also needs the progressions
for any given
to be “similar” in the sense that they intersect
in the same fashion (thus the sets
as
varies need to be translates of each other).) If one has this sort of family, then Proposition 5 allows us to “spend” some of the degrees of freedom of the parameter set
in order to gain good mixing properties for at least one of the sets
in the progression of progressions.
Of course, we still have to figure out how to get such large families of well-arranged progressions of progressions. Szemerédi’s solution was to begin by working with generalised progressions of a much larger rank than the rank
progressions considered here; roughly speaking, to prove Szemerédi’s theorem for length
progressions, one has to consider generalised progressions of rank as high as
. It is possible by a reasonably straightforward (though somewhat delicate) “density increment argument” to locate a huge generalised progression of this rank which is “saturated” by
in a certain rather technical sense (related to the concept of “near maximal density” used previously). Then, by another reasonably elementary argument, it is possible to locate inside a suitable large generalised progression of some rank
, a family of large generalised progressions of rank
which inherit many of the good properties of the original generalised progression, and which have the arithmetic structure needed for Proposition 5 to be applicable, at least for one value of
. (But getting this sort of property for all values of
simultaneously is tricky, and requires many careful iterations of the above scheme; there is also the problem that by obtaining good behaviour for one index
, one may lose good behaviour at previous indices, leading to a sort of “Tower of Hanoi” situation which may help explain the exponential factor in the rank
that is ultimately needed. It is an extremely delicate argument; all the parameters and definitions have to be set very precisely in order for the argument to work at all, and it is really quite remarkable that Endre was able to see it through to the end.)
14 comments
Comments feed for this article
23 March, 2012 at 9:25 am
Kevin O'Bryant
I’m always amused that the original proof is so involved that Szemeredi himself didn’t want to write it up. What ended up in print was (heroically) written by Ron Graham. Maybe one of the principals will post here the true story of how that came to happen.
26 March, 2012 at 2:50 am
Peter Komjath
The heroic writing up was done by Andras Hajnal. See here the detailed story by Szemeredi himself: http://www.matud.iif.hu/08jun/12.html (in Hungarian).
29 March, 2012 at 1:00 pm
Endre Szemeredi
“The true story”
The exact acknowledgment is the following:
” My indebtedness to my friends R.L.Graham and A.Hajnal is extremely great.
In Fact, they wrote the whole paper after listening to my rough oral exposition.”
In the volume, An Irregular Mind Andras Hajnal wrote a short essay, My Early Encounters with Szemeredi (page 757-758 he writes the following.
“In 1973 Endre announced that he generalized his old theorem for arbitrary arithmetic progressions. Paul as very excited, first because this was a problem he and Turan had raised about thirty years ago, and also because he promised one thousand dollars for it.
He was in the US an could not come to Budapest. He quarelled with “joe” because he did not let some Israeli mathematicians attend the conference held in Hungary for his sixtieth birthday. Endre was telling me the proof and I was writing it down as we went along. It took unusually long because we checked every detail. We were about halfway when Paul called from the US asking if there was a proof. I told him that I don’t know, since I did not see the and of it, but would buy it for $500. This convinced him that the matter is serious. Endre submitted the manuscript and it appeared quite fast in the Acta Arithmetica in 1975. It helped that Ron Graham kindly read it, but it appeared basically as it was submitted.”
Actually the Mathematical Institute of the Hungarian Academy of Sciences produced a preprint what I gave to my friend Ron.
A few words about r4 (n). The r4(n) paper was written down by Peter Elliott and Eduard Wirsing while I was giving the lecture. Because of my practically non existing English they had to suffer a lot. I am very greatful to them also.
Endre Szemeredi
30 March, 2012 at 7:22 pm
Anonymous
Wait, who is B. L. Graham? Is that someone different from Ron Graham?
[Corrected – T.]
And congratulations of course ;-).
23 March, 2012 at 11:33 am
Endre Szemerédi laureado com Prémio Abel 2012 « problemas | teoremas
[…] Post de Terence Tao acerca da demonstração do Teorema de Szemerédi. Partilhar isto:TwitterGostar […]
23 March, 2012 at 12:31 pm
csgergo
Dear Prof Tao,
Aren’t absolute signs in theorem 1 missing?
[Corrected, thanks – T.]
25 March, 2012 at 2:00 am
Teorema de la semana: el de Szemerédi | Series divergentes
[…] razonamiento lógico de los enunciados está demasiado enredado. Terry Tao publicó ayer en su blog los “ingredientes” de la demostración, y hace notar que el mismo Szemerédi incluyó un diagrama con las implicaciones lógicas de las […]
25 March, 2012 at 6:24 am
gowers
Kevin, I feel I should add that, at least according to the acknowledgement in the paper, Ron Graham wrote it with András Hajnal. But I agree that it would be good to hear the full story.
25 March, 2012 at 9:41 am
none
Aha, the part about Ron Graham explains something I found amusing: that Tim’s write-up had a snippet of the original paper, and the snippet referred to the theorem as “Szemerédi’s theorem”. Usually when a paper presents a new result, the result is called something like “Theorem 3” and it only becomes known as “So-and-so’s theorem” after other papers have been citing it for a while ;-).
3 April, 2012 at 4:21 am
arif
ah… that’s was insane
29 April, 2012 at 4:09 pm
Abel Ödülü 2012 « mathematist
[…] İlgilenenler Terence Tao’nun Szemerédi Teoremi’nin ispatıyla alakalı yazısını okuyabilir. Ayrıca Szemerédi’nin çakışma geometrisi (incidence geometry) ve çizge […]
5 July, 2012 at 7:20 pm
Fifteenth Linkfest
[…] Some ingredients in Szemerédi’s proof of Szemerédi’s theorem (includes an interesting comparison with Perelmen’s proof of the Poincaré conjecture) […]
12 September, 2017 at 9:43 am
Szemeredi’s proof of Szemeredi’s theorem | What's new
[…] with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing. In that post, I suspected that the cleanest way to […]
20 March, 2022 at 11:37 am
domotorp
What happened to the math formulas in this old post?