In this lecture, we describe the simple but fundamental Furstenberg correspondence principle which connects the “soft analysis” subject of ergodic theory (in particular, recurrence theorems) with the “hard analysis” subject of combinatorial number theory (or more generally with results of “density Ramsey theory” type). Rather than try to set up the most general and abstract version of this principle, we shall instead study the canonical example of this principle in action, namely the equating of the Furstenberg multiple recurrence theorem with Szemerédi’s theorem on arithmetic progressions.

In 1975, Szemerédi established the following theorem, which had been conjectured in 1936 by Erdős and Turán:

Theorem 1. (Szemerédi’s theorem) Let k \geq 1 be an integer, and let A be a set of integers of positive upper density, thus \limsup_{N \to \infty} \frac{1}{2N+1} |A \cap \{-N,\ldots,N\}| > 0. Then A contains a non-trivial arithmetic progression n, n+r, \ldots, n+(k-1) r of length k. (By “non-trivial” we mean that r \neq 0.) [More succinctly: every set of integers of positive upper density contains arbitrarily long arithmetic progressions.]

This theorem is trivial for k=1 and k=2. The first non-trivial case is k=3, which was proven by Roth in 1953 and will be discussed in a later lecture. The k=4 case was also established by Szemerédi in 1969.

In 1977, Furstenberg gave another proof of Szemerédi’s theorem, by establishing the following equivalent statement:

Theorem 2. (Furstenberg multiple recurrence theorem) Let k \geq 1 be an integer, let (X, {\mathcal X}, \mu, T) be a measure-preserving system, and let E be a set of positive measure. Then there exists r > 0 such that E \cap T^{-r} E \cap \ldots \cap T^{-(k-1) r} E is non-empty.

Remark 1. The negative signs here can be easily removed because T is invertible, but I have placed them here for consistency with some later results involving non-invertible transformations, in which the negative sign becomes important. \diamond

Exercise 1. Prove that Theorem 2 is equivalent to the apparently stronger theorem in which “is non-empty” is replaced by “has positive measure”, and “there exists r > 0” is replaced by “there exist infinitely many r > 0“. \diamond

Note that the k=1 case of Theorem 2 is trivial, while the k=2 case follows from the Poincaré recurrence theorem (Theorem 1 from Lecture 8). We will prove the higher k cases of this theorem in later lectures. In this one, we will explain why, for any fixed k, Theorem 1 and Theorem 2 are equivalent.

Let us first give the easy implication that Theorem 1 implies Theorem 2. This follows immediately from

Lemma 1. Let (X, {\mathcal X}, \mu, T) be a measure-preserving system, and let E be a set of positive measure. Then there exists a point x in X such that the recurrence set \{ n \in {\Bbb Z}: T^n x \in E \} has positive upper density.

Indeed, from Lemma 1 and Theorem 1, we obtain a point x for which the set \{ n \in {\Bbb Z}: T^n x \in E \} contains an arithmetic progression of length k and some step r, which implies that E \cap T^r E \cap \ldots \cap T^{(k-1) r} E is non-empty.

Proof of Lemma 1. Observe (from the shift-invariance of \mu) that

\displaystyle \int_X \frac{1}{2N+1} \sum_{n=-N}^N 1_{T^n E}\ d\mu = \mu(E). (1)

On the other hand, the integrand is at most 1. We conclude that for each N, the set A_N := \{ x: \frac{1}{2N+1} \sum_{n=-N}^N 1_{T^n E}(x) \geq \mu(E)/2 \} must have measure at least \mu(E)/2. This implies that the function \sum_N 1_{A_N} is not absolutely integrable even after excluding an arbitrary set of measure up to \mu(E)/4, which implies that \sum_N 1_{A_N} is not finite a.e., and the claim follows (cf. the proof of the Borel-Cantelli lemma). \diamond

Now we show how Theorem 2 implies Theorem 1. If we could pretend that “upper density” was a probability measure on the integers, then this implication would be immediate by applying Theorem 2 to the dynamical system ({\Bbb Z}, n \mapsto n+1). Of course, we know that the integers do not admit a shift-invariant probability measure (and upper density is not even additive, let alone a probability measure). So this does not work directly. Instead, we need to first lift from the integers to a more abstract universal space and use a standard “compactness and contradiction” argument in order to be able to build the desired probability measure properly.

More precisely, let A be as in Theorem 1. Consider the topological boolean Bernoulli dynamical system 2^{\Bbb Z} with the product topology and the shift T: B \mapsto B+1. The set A can be viewed as a point in this system, and the orbit closure X := \overline{\{ A+n: n \in {\Bbb Z} \}} of that point becomes a subsystem of that Bernoulli system, with the relative topology.

Suppose for contradiction that A contains no non-trivial progressions of length k, thus A \cap A+r \cap \ldots \cap A+(k-1)r = \emptyset for all r > 0. Then, if we define the cylinder set E := \{ B \in X: 0 \in B \} to be the collection of all points in X which (viewed as sets of integers) contain 0, we see (after unpacking all the definitions) that E \cap T^r E \cap \ldots T^{(k-1)r} E = \emptyset for all r > 0.

In order to apply Theorem 2 and obtain the desired contradiction, we need to find a shift-invariant Borel probability measure \mu on X which assigns a positive measure to E.

For each integer N, consider the measure \mu_N which assigns a mass of \frac{1}{2N+1} to the points T^{-n} A in X for -N \leq n \leq N, and no mass to the rest of X. Then we see that \mu_N(E) = \frac{1}{2N+1} |A \cap \{-N,\ldots,N\}|. Thus, since A has positive upper density, there exists some sequence N_j going to infinity such that \liminf_{j \to \infty} \mu_{N_j}(E) > 0. On the other hand, by vague sequential compactness (Lemma 1 of Lecture 7) we know that some subsequence of \mu_{N_j} converges in the vague topology to a probability measure \mu, which then assigns a positive measure to the (clopen) set E. As the \mu_{N_j} are asymptotically shift invariant, we see that \mu is invariant also (as in the proof of Corollary 1 of Lecture 7). As \mu now has all the required properties, we have completed the deduction of Theorem 1 from Theorem 2.

Exercise 2. Show that Theorem 2 in fact implies a seemingly stronger version of Theorem 1, in which the conclusion becomes the assertion that the set \{ n: n, n+r, \ldots, n+(k-1) r \in A \} has positive upper density for infinitely many r. \diamond

Exercise 3. Show that Theorem 1 in fact implies a seemingly stronger version of Theorem 2: If E_1, E_2, E_3, \ldots are sets in a probability space with uniformly positive measure (i.e. \inf_n \mu(E_n) > 0), then for any k there exists positive integers n, r such that \mu( E_n \cap E_{n+r} \cap \ldots \cap E_{n+(k-1)r} ) > 0. \diamond

— Varnavides type theorems —

A similar “compactness and contradiction” argument (combined with a preliminary averaging-over-dilations trick of Varnavides) allows us to use Theorem 2 to imply the following apparently stronger statement (observed by Bergelson, Host, McCutcheon, and Parreau):

Theorem 3. (Uniform Furstenberg multiple recurrence theorem) Let k \geq 1 be an integer and \delta > 0. Then for any measure-preserving system (X, {\mathcal X}, \mu, T) and any measurable set E with \mu(E) \geq \delta we have

\displaystyle \frac{1}{N} \sum_{r=0}^{N-1} \mu( E \cap T^r E \cap \ldots \cap T^{(k-1)r} E) \geq c(k,\delta) (2)

for all N \geq 1, where c(k,\delta) > 0 is a positive quantity which depends only on k and \delta (i.e. it is uniform over all choices of system and of the set E with measure at least \delta).

Exercise 4. Assuming Theorem 3, show that if N is sufficiently large depending on k and \delta, then any subset of \{1,\ldots,N\} with cardinality at least \delta N will contain at least c'(k,\delta) N^2 non-trivial arithmetic progressions of length k, for some c'(k,\delta) > 0. (This result for k=3 was first established by Varnavides via an averaging argument from Roth’s theorem.) Conclude in particular that Theorem 3 implies Theorem 1. \diamond

It is clear that Theorem 3 implies Theorem 2; let us now establish the converse. We first use an averaging argument of Varnavides to reduce Theorem 3 to a weaker statement, in which the conclusion (2) is not asserted to hold for all N, but instead one asserts that

\displaystyle \frac{1}{N_0} \sum_{r=1}^{N_0-1} \mu( E \cap T^r E \cap \ldots \cap T^{(k-1)r} E) \geq c(k,\delta) (2′)

is true for some N_0 = N_0(k,\delta) > 0 depending only on k and \delta (note that the r=0 term in (2′) has been dropped, otherwise the claim is trivial). To see why one can recover (2) from (2′), observe by replacing the shift T with a power T^a that we can amplify (2′) to

\displaystyle \frac{1}{N_0} \sum_{r=1}^{N_0-1} \mu( E \cap T^{ar} E \cap \ldots \cap T^{(k-1)ar} E) \geq c(k,\delta) (2”)

for all a. Averaging (2”) over 1 \leq a \leq N/N_0 we easily conclude (2).

It remains to prove that (2”) holds under the hypotheses of Theorem 3. Our next reduction is to observe that for it suffices to perform this task for the boolean Bernoulli system X_0 := 2^{\Bbb Z} with the cylinder set E_0 := \{ B \in X_0: 0 \in B\} as before. To see this, recall from Example 5 of Lecture 2 that there is a morphism \phi: X \to X_0 from any measure-preserving system (X, {\mathcal X}, \mu, T) with a distinguished set E to the system X_0 with the product \sigma-algebra {\mathcal X}_0, the usual shift T_0, and the set E_0, and with the push-forward measure \mu_0 := \phi_\# \mu. Specifically, \phi sends any point x in X to its recurrence set \phi(x) := \{ n \in {\Bbb Z}: T^n x \in E \}. Using this morphism it is not difficult to show that the claim (2) for (X, {\mathcal X},\mu,T) and E would follow from the same claim for (X_0, {\mathcal X}_0,\mu_0,T_0) and E_0.

We still need to prove (2”) for the boolean system. The point is that by lifting to this universal setting, the dynamical system (X, {\mathcal X}, T) and the set E have been canonically fixed; the only remaining parameter is the probability measure \mu. But now we can exploit vague sequential compactness again as follows.

Suppose for contradiction that Theorem 3 failed for the boolean system. Then by carefully negating all the quantifiers, we can find \delta > 0 such that for any N_0 there is a sequence of shift-invariant probability measures \mu_j on X with \mu_j(E) \geq \delta,

\displaystyle \frac{1}{N_0} \sum_{r=1}^{N_0-1} \mu_j( E \cap T^r E \cap \ldots \cap T^{(k-1)r} E) \to 0 (3)

as j \to \infty. Note that if (3) holds for one value of N_0, then it also holds for all smaller values of N_0. A standard diagonalisation argument then allows us to build a sequence \mu_j as above, but which obeys (3) for all N_0 \geq 1.

Now we are finally in a good position to apply vague sequential compactness. By passing to a subsequence if necessary, we may assume that \mu_j converges vaguely to a limit \mu, which is a shift-invariant probability measure. In particular we have \mu(E) \geq \delta > 0, while from (3) we see that

\displaystyle \frac{1}{N_0} \sum_{r=1}^{N_0-1} \mu( E \cap T^r E \cap \ldots \cap T^{(k-1)r} E) = 0 (4)

for all N_0 \geq 1; thus the sets E \cap T^r E \cap \ldots \cap T^{(k-1)r} E all have zero measure for r > 0. But this contradicts Theorem 2 (and Exercise 1). This completes the deduction of Theorem 3 from Theorem 2.

— Other recurrence theorems and their combinatorial counterparts —

The Furstenberg correspondence principle can be extended to relate several other recurrence theorems to their combinatorial analogues. We give some representative examples here (without proofs). Firstly, there is a multidimensional version of Szemerédi’s theorem (compare with Exercise 7 from Lecture 4):

Theorem 4 (Multidimensional Szemerédi theorem) Let d \geq 1, let v_1,\ldots,v_k \in {\Bbb Z}^d, and let A \subset {\Bbb Z}^d be a set of positive upper Banach density (which means that \limsup_{N \to \infty} |A \cap B_N|/|B_N| > 0, where B_N := \{-N,\ldots,N\}^d). Then A contains a pattern of the form n+rv_1,\ldots,n+rv_k for some n \in {\Bbb Z}^d and r > 0.

Note that Theorem 1 corresponds to the special case when d=1 and v_i = i-1.

This theorem was first proven by Furstenberg and Katznelson, who deduced it via the correspondence principle from the following generalisation of Theorem 2:

Theorem 5 (Recurrence for multiple commuting shifts) Let k \geq 1 be an integer, let (X, {\mathcal X}, \mu) be a probability space, let T_1,\ldots,T_k: X \to X be measure-preserving bimeasurable maps which commute with each other, and let E be a set of positive measure. Then there exists r > 0 such that T_1^r E \cap T_2^r E \cap \ldots \cap T_k^{ r} E is non-empty.

Exercise 5. Show that Theorem 4 and Theorem 5 are equivalent. \diamond

Exercise 6. State an analogue of Theorem 3 for multiple commuting shifts, and prove that it is equivalent to Theorem 5. \diamond

There is also a polynomial version of these theorems (cf. Theorem 1 from Lecture 5), which we will also state in general dimension:

Theorem 6 (Multidimensional polynomial Szemerédi theorem) Let d \geq 1, let P_1, \ldots, P_k: {\Bbb Z} \to {\Bbb Z}^d be polynomials with P_1(0)=\ldots=P_k(0)=0, and let A \subset {\Bbb Z}^d be a set of positive upper Banach density. Then A contains a pattern of the form n+P_1(r),\ldots,n+P_k(r) for some n \in {\Bbb Z}^d and r > 0.

This theorem was established by Bergelson and Leibman, who deduced it from

Theorem 7 (Polynomial recurrence for multiple commuting shifts) Let k, (X, {\mathcal X}, \mu), T_1,\ldots,T_k: X \to X, E be as in Theorem 5, and let P_1,\ldots,P_k be as in Theorem 6. Then there exists r > 0 such that T^{-P_1(r)} E \cap T^{-P_2(r)} E \cap \ldots \cap T^{-P_k(r)} E is non-empty, where we adopt the convention T^{(a_1,\ldots,a_k)} := T_1^{a_1} \ldots T_k^{a_k} (thus we are making the action of {\Bbb Z}^d on X explicit).

Exercise 7. Show that Theorem 6 and Theorem 7 are equivalent. \diamond

Exercise 8. State an analogue of Theorem 3 for polynomial recurrence for multiple commuting shifts, and prove that it is equivalent to Theorem 7. (Hint: first establish this in the case that each of the P_j are monomials, in which case there is enough dilation symmetry to use the Varnavides averaging trick. Interestingly, if one only restricts attention to one-dimensional systems k=1, it does not seem possible to deduce the uniform polynomial recurrence theorem from the non-uniform polynomial recurrence theorem, thus indicating that the averaging trick is less universal in its applicability than the correspondence principle.) \diamond

In the above theorems, the underlying action was given by either the integer group {\Bbb Z} or the lattice group {\Bbb Z}^d. It is not too difficult to generalise these results to the semigroups {\Bbb N} and {\Bbb N}^d (thus dropping the assumption that the shift maps are invertible), by using a trick similar to that used in Exercise 9 of Lecture 4, or by using the correspondence principle back and forth a few times. A bit more surprisingly, it is possible to extend these results to even weaker objects than semigroups. To describe this we need some more notation.

Define a partial semigroup (G, \cdot) to be a set G together with a partially defined multiplication operation \cdot: \Omega \to G for some subset \Omega \subset G \times G, which is associative in the sense that whenever (a \cdot b) \cdot c is defined, then a \cdot (b \cdot c) is defined and equal to (a \cdot b) \cdot c, and vice versa. A good example of a partial semigroup is the finite subsets \binom{S}{<\omega} := \{ A \subset S: |A| < \infty \} of a fixed set S, where the multiplication operation A \cdot B is disjoint union, or more precisely A \cdot B := A \cup B when A and B are disjoint, and A \cdot B is undefined otherwise.

Remark 2. One can extend a partial semigroup to be a genuine semigroup by adjoining a new element \hbox{err} to G, and redefining multiplication a \cdot b to equal \hbox{err} if it was previously undefined (or if one of a or b was already equal to \hbox{err}). However, we will avoid using this trick here, as it tends to complicate the notation a little. \diamond

One can take Cartesian products of partial semigroups in the obvious manner to obtain more partial semigroups. In particular, we have the partial semigroup \binom{{\Bbb N}}{<\omega}^d for any d \geq 1, defined as the collection of d-tuples (A_1,\ldots,A_d) of finite sets of natural numbers (not necessarily disjoint), with the partial semigroup law (A_1,\ldots,A_d) \cdot (B_1,\ldots,B_d) := (A_1 \cup B_1, \ldots,A_d \cup B_d) whenever A_i and B_i are disjoint for each 1 \leq i \leq d.

If (X, {\mathcal X},\mu) is a probability space and (G,\cdot) is a partial semigroup, we define a measure-preserving action of G on X to be an assignment of a measure-preserving transformation T^g: X \to X (not necessarily invertible) to each g \in G, such that T^{g \cdot h} = T^g T^h whenever g \cdot h is defined.

An action T of \binom{\Bbb N}{<\omega} on X is known as an IP system on X; it is generated by a countable number T_1, T_2, \ldots of commuting measure-preserving transformations, with T^A := \prod_{i \in A} T^i. (Admittedly, it is possible that the action of the empty set is not necessarily the identity, but this turns out to have a negligible impact on matters.) An action T of \binom{\Bbb N}{<\omega}^d is then a collection of d simultaneously commuting IP systems.

Furstenberg and Katznelson showed the following generalisation of Theorem 5:

Theorem 8 (IP multiple recurrence theorem) Let T be an action of \binom{\Bbb N}{<\omega}^d on a probability space (X, {\mathcal X},\mu). Then there exists a non-empty set A \in \binom{\Bbb N}{<\omega} such that E \cap (T^{A_1})^{-1}(E) \cap \ldots \cap (T^{A_d})^{-1}(E) is non-empty, where A_i := (\emptyset, \ldots, \emptyset, A, \emptyset, \ldots, \emptyset) is the group element which equals A in the i^{th} position and is the empty set otherwise.

It has a number of combinatorial consequences, such as the following strengthening of Szemerédi’s theorem:

Theorem 9. (IP Szemerédi theorem) Let A be a set of integers of positive upper density, let k \geq 1, and let B \subset {\Bbb N} be infinite. Then A contains an arithmetic progression n, n+r, \ldots, n+(k-1)r of length k in which r lies in FS(B), the set of finite sums of B (cf. Hindman’s theorem from Lecture 5).

(There is also a multidimensional version of this theorem, but it requires a fair amount of notation to state properly.)

Exercise 9. Deduce Theorem 9 from Theorem 8. \diamond

Exercise 10. Using Theorem 9, show that for any k, and any set of integers A of positive upper density, the set of steps r which occur in the arithmetic progressions in A of length k is syndetic. \diamond

Exercise 11. Using Theorem 8, show that if {\Bbb F} is a finite field, and {\Bbb F}^{<\omega} := \bigcup_{n=0}^\infty {\Bbb F}^n is the canonical vector space over {\Bbb F} spanned (in the algebraic sense) by a countably infinite number of basis vectors, show that any subset A of {\Bbb F}^{<\omega} of positive upper Banach density (which means that \limsup_{n \to \infty} |A \cap {\Bbb F}^n|/|{\Bbb F}^n| > 0 contains affine subspaces of arbitrarily high dimension. \diamond

The IP recurrence theorem is already very powerful, but even stronger theorems are known. For instance, Furstenberg and Katznelson established the following deep strengthening of the Hales-Jewett theorem (Theorem 8 from Lecture 5), as well as of Exercise 11 above:

Theorem 10 (Density Hales-Jewett theorem) Let A be a finite alphabet. If E is a subset of A^{<\omega} of positive upper Banach density, then E contains a combinatorial line.

This theorem was deduced (via an advanced form of the correspondence principle) by a somewhat complicated recurrence theorem which we will not state here; rather than the action of a group, semigroup, or partial semigroup, one instead works with an ensemble of sets (as in Exercise 3), and furthermore one regularises the system of the probability space and set ensemble (which can collectively be viewed as a random process) to be what Furstenberg and Katznelson call a strongly stationary process, which (very) roughly means that the statistics of this process look “the same” when restricted to any combinatorial subspace of a fixed dimension.

Remark 3. Similar correspondence principles can be established connecting property testing results for graphs and hypergraphs to the measure theory of exchangeable measures: see this paper of myself, and of myself and Austin, for details. There is also a correspondence principle connecting ergodic convergence theorems with a (rather complicated looking) finitary analogue; see the papers of Avigad-Gerhardy-Towsner and of myself on this subject. Finally, we have implicitly been using a similar correspondence principle between topological dynamics and colouring Ramsey theorems in our previous lectures (in particular Lecture 3, Lecture 4, and Lecture 5). \diamond

Remark 4. The Furstenberg correspondence principle also comes tantalisingly close to deducing my theorem with Ben that the primes contain arbitrarily long arithmetic progressions from Szemerédi’s theorem. More precisely, they show that any subset A of a genuinely random set of integers with logarithmic-type density B, with A having positive relative upper density with respect to B, contains arbitrarily long arithmetic progressions; see this unpublished note of myself. Unfortunately, the almost primes are not known to quite obey enough “correlation conditions” to behave sufficiently pseudorandomly that these arguments apply to the primes, though perhaps there is still a “softer” way to prove our theorem than the way we did it (there is for instance some recent work by Trevisan, Tulsiani, and Vadhan in this direction). \diamond