Szemerédi’s theorem asserts that all subsets of the natural numbers of positive density contain arbitrarily long arithmetic progressions. *Roth’s theorem* is the special case when one considers arithmetic progressions of length three. Both theorems have many important proofs using tools from additive combinatorics, (higher order) Fourier analysis, (hyper) graph regularity theory, and ergodic theory. However, the original proof by Endre Szemerédi, while extremely intricate, was purely combinatorial (and in particular “elementary”) and almost entirely self-contained, except for an invocation of the van der Waerden theorem. It is also notable for introducing a prototype of what is now known as the Szemerédi regularity lemma.

Back in 2005, I rewrote Szemerédi’s original proof in order to understand it better, however my rewrite ended up being about the same length as the original argument and was probably only usable to myself. In 2012, after Szemerédi was awarded the Abel prize, I revisited this argument with the intention to try to write up a more readable version of the proof, but ended up just presenting some ingredients of the argument in a blog post, rather than try to rewrite the whole thing. In that post, I suspected that the cleanest way to write up the argument would be through the language of nonstandard analysis (perhaps in an iterated hyperextension that could handle various hierarchies of infinitesimals), but was unable to actually achieve any substantial simplifications by passing to the nonstandard world.

A few weeks ago, I participated in a week-long workshop at the American Institute of Mathematics on “Nonstandard methods in combinatorial number theory”, and spent some time in a working group with Shabnam Akhtari, Irfam Alam, Renling Jin, Steven Leth, Karl Mahlburg, Paul Potgieter, and Henry Towsner to try to obtain a manageable nonstandard version of Szemerédi’s original proof. We didn’t end up being able to do so – in fact there are now signs that perhaps nonstandard analysis is not the optimal framework in which to place this argument – but we did at least clarify the existing standard argument, to the point that I was able to go back to my original rewrite of the proof and present it in a more civilised form, which I am now uploading here as an unpublished preprint. There are now a number of simplifications to the proof. Firstly, one no longer needs the full strength of the regularity lemma; only the simpler “weak” regularity lemma of Frieze and Kannan is required. Secondly, the proof has been “factored” into a number of stand-alone propositions of independent interest, in particular involving just (families of) one-dimensional arithmetic progressions rather than the complicated-looking multidimensional arithmetic progressions that occur so frequently in the original argument of Szemerédi. Finally, the delicate manipulations of densities and epsilons via double counting arguments in Szemerédi’s original paper have been abstracted into a certain key property of families of arithmetic progressions that I call the “double counting property”.

The factoring mentioned above is particularly simple in the case of proving Roth’s theorem, which is now presented separately in the above writeup. Roth’s theorem seeks to locate a length three progression in which all three elements lie in a single set. This will be deduced from an easier variant of the theorem in which one locates (a family of) length three progressions in which just the first two elements of the progression lie in a good set (and some other properties of the family are also required). This is in turn derived from an even easier variant in which now just the first element of the progression is required to be in the good set.

More specifically, Roth’s theorem is now deduced from

Theorem 1.5. Let be a natural number, and let be a set of integers of upper density at least . Then, whenever is partitioned into finitely many colour classes, there exists a colour class and a family of 3-term arithmetic progressions with the following properties:

- For each , and lie in .
- For each , lie in .
- The for are in arithmetic progression.

The situation in this theorem is depicted by the following diagram, in which elements of are in blue and elements of are in grey:

Theorem 1.5 is deduced in turn from the following easier variant:

Theorem 1.6. Let be a natural number, and let be a set of integers of upper density at least . Then, whenever is partitioned into finitely many colour classes, there exists a colour class and a family of 3-term arithmetic progressions with the following properties:

- For each , lie in .
- For each , and lie in .
- The for are in arithmetic progression.

The situation here is described by the figure below.

Theorem 1.6 is easy to prove. To derive Theorem 1.5 from Theorem 1.6, or to derive Roth’s theorem from Theorem 1.5, one uses double counting arguments, van der Waerden’s theorem, and the weak regularity lemma, largely as described in this previous blog post; see the writeup for the full details. (I would be interested in seeing a shorter proof of Theorem 1.5 though that did not go through these arguments, and did not use the more powerful theorems of Roth or Szemerédi.)

## 17 comments

Comments feed for this article

12 September, 2017 at 1:22 pm

AnonymousConsidering Erdos’ problem on the characterization of all subsets of natural numbers containing arbitrarily long arithmetic progressions (conjectured to be those sets with divergent sum of the reciprocals of their elements), is there any probabilistic argument for this conjecture?

12 September, 2017 at 2:31 pm

Stefan Kohl@Anonymous: But this works at most one way. — It is easy to find sets of natural numbers which contain arbitrarily long arithmetic progressions such that the sum of reciprocals of the elements does not diverge. — Take e.g. .

12 September, 2017 at 3:29 pm

AnonymousYes! (the above Erdos conjecture for arithmetic progressions, is only a conjectured sufficient condition for such subset to contain arbitrarily long arithmetic progressions) – thanks for the correction!

17 September, 2017 at 6:35 am

Paul EpsteinThanks a lot for this. Is the purpose of this account to give a proof of Szemeredi’s theorem that is elementary, self-contained, and easy to follow?

Elementary proofs of van der Waerden are readily googlable, and your omission of this part of the proof is understandable.

However, what is less clear is whether omitting the proof of your version of Frieze-Kannan regularity can be justified pedagogically. I agree that an experienced mathematician can simply consult the Frieze-Kannan paper and follow their proof, but a person with that degree of mathematical maturity wouldn’t need an elementary proof in the first place.

Personally speaking, I did consult the Frieze-Kannan paper, and I couldn’t find anything that matched your version. In particular, the discussions always focused on a single partition, rather than two partitions, as in your presentation.

I would have appreciated your account more if you had either given a proof of your version of Frieze-Kannan regularity, or been more precise and explicit about how exactly you’re using the original Frieze-Kannan paper, instead of a mere citation.

Thank you very much for your blog which is teaching me a lot, even though I’m not a research-level mathematician.

Paul Epstein

17 September, 2017 at 9:11 am

Terence TaoFair enough; I’ve added a proof of the weak regularity lemma (using spectral methods) as an appendix.

19 September, 2017 at 10:59 pm

AnonymousCould surreal numbers (instead of nonstandard reals) be of any use in handling the epsilons? Thanks.

5 October, 2017 at 1:07 am

How to write a 21st century paper – Write down what you’ve done[…] but still lend themselves to a depiction by other directed acyclic graphs (See for example this rewrite of Szemeredis theorem by Terence Tao, page 8). But the general idea of dividing a proof into smaller (how small) steps and specifying […]

2 December, 2017 at 6:41 am

pauldepsteinThank you very much for adding a proof of the weak regularity lemma. I learned the SVD theorem as a response to this, so your blog is certainly educating me mathematically. I’m sorry to say that I’m a bit stuck on this proof of the weak regularity lemma.

Specifically, I’m stuck on page 36 on the equation that occurs after the phrase “whenever i >= 16/ (epsilon ^ 2).

I understand equation A 1. which involves a sum of products

of terms f_i(v) * g_i(w).

However, when you multiply one row the of adjacency matrix by g_i(w), and

take the sum, I’m very puzzled that so many of the f_i(v) terms seem to melt away.

Thanks for your help.

Paul Epstein

2 December, 2017 at 6:53 am

pauldepsteinSorry, I get it now. Just not very used to these orthonormal computations in my head. I understand now, so no need to respond.

3 December, 2017 at 9:06 am

pauldepsteinStill reading the proof of the weak regularity lemma in the appendix. I’m only stuck on one issue. Consider the i for which lambda_i > epsilon/4. I don’t understand why the sum of these lambda_i is < 4/epsilon. This claim is at the very end of the appendix. I'd be grateful for any help. Thank you.

6 December, 2017 at 12:25 pm

Terence TaoIf then . Since , the claim follows.

9 December, 2017 at 3:54 am

pauldepsteinI’ve learned so much from this paper, and am checking all the claims in full detail. I’ve now finished reading the proof of the regularity lemma in the appendix, however I’m not sure how to formalise the concept “if c is sufficiently slowly decaying” as on page 17. I’m fine up to there, though. Many thanks for any help you can offer, as to how concepts such as “sufficiently slowly” can be formalised along similar lines to epsilon-delta formulations.

9 December, 2017 at 4:17 pm

pauldepsteinJust before 3.8 (“and hence by Markov’s inequality”) the >= sign is the wrong way round. I’m afraid I’m uncomfortable with the argument around 3.7 and 3.8 even after making this correction. The epsilon in the denominator in 3.8

makes it unclear what type of c function is intended.

14 December, 2017 at 3:13 pm

Terence TaoThe inequality seems to be in the right direction to me (note the presence of “but” immediately after (3.8)).

Recall from (3.5) that one is assuming to be much larger than any given function of , or equivalently that is larger than any given function of that goes to zero. In particular, one can use (3.5) to make less than, say, , and then by choosing appropriately one can obtain the display after (3.8).

16 December, 2017 at 11:55 am

pauldepsteinThanks, Terry. 3.8 is indeed correct but the inequality before 3.8 is the wrong way round. You’re arguing that a positive quantity is small so we need positive quantity <= … The inequality before 3.8 is the wrong way round. I now can follow everything so far and have read the first 21 pages plus the appendix. There are occasional errors (like the inequality) and I'd be happy to provide an errata list if it would be of sufficient interest.

16 December, 2017 at 6:37 pm

Terence TaoAh, I see now, thanks. This typo will be corrected in the next revision of the ms. I’m also happy to take an errata list if you have one.

19 December, 2017 at 12:50 pm

pauldepsteinThere are few significant errors that I’ve uncovered so far. I’d like to provide a few gap-fillers so that remarks that puzzled me are proved in more detail. However, this depends on your intended audience. If the intended audience is maths graduates, then perhaps no more details are needed. However, Khinchin’s Three Pearls of Number Theory is readable by complete beginners — a lot would need to be proved in more detail for it to be accessible by the “Three Pearls” audience. If you tell me who the intended audience is, I could maybe make more meaningful suggestions. I think a readable account of the elementary proof of Szemeredi’s theorem is a great concept — no one understands the original and I’ve been trying to understand the original for years.