This month I am at MSRI, for the programs of Ergodic Theory and Additive Combinatorics, and Analysis on Singular Spaces, that are currently ongoing here.  This week I am giving three lectures on the correspondence principle, and on finitary versions of ergodic theory, for the introductory workshop in the former program.  The article here is broadly describing the content of these talks (which are slightly different in theme from that announced in the abstract, due to some recent developments).  [These lectures were also recorded on video and should be available from the MSRI web site within a few months.]

My lectures are devoted to the correspondence principle between finite dynamical systems and infinite dynamical systems, that allows one to convert certain statements about the former to logically equivalent statements about the latter.  (I will be vague here about what “dynamical system” means; very broadly, just about anything with a group action could qualify here.)  A little more specifically, the correspondence principle equates four types of results, which we informally describe as follows:

1. Local quantitative results for concrete finite systems.
2. Local qualitative results for concrete infinite systems.
3. Continuous quantitative results for abstract finite systems.
4. Continuous qualitative results for abstract infinite systems.

(The meaning of these terms should become clearer once we give some specific examples.)

There are many contexts in which this principle shows up (e.g. in Ramsey theory, recurrence theory, graph theory, group theory, etc.) but the basic ingredients are always the same.  Namely, the correspondence between Type 1 and Type 2 (or Type 3 and Type 4) arises from a weak sequential compactness property, which, roughly speaking asserts that given any sequence of (increasingly large) finite systems, there exists a subsequence of such systems which converges (in a suitably “weak” sense) to an infinite system.  [We will define these terms more precisely in concrete situations later.]  More informally, any “sufficiently large” finite system can be “approximated” in some weak sense by an infinite system.   (One can make this informal statement more rigorous using nonstandard analysis and/or ultrafilters, but we will not take such an approach here.)  Because of this, we obtain a correspondence principle: any qualitative statement about infinite systems (e.g. that a certain quantity is always strictly positive) is equivalent to a quantitative statement about sufficiently large finite systems (e.g. a certain quantity is always uniformly bounded from below).  This principle forms a crucial bridge between finitary (or quantitative) mathematics and infinitary (or qualitative) mathematics; in particular, by taking advantage of this principle, tools from one type of mathematics can be used to prove results in the other.  (See my previous post on soft analysis and hard analysis for further discussion.)

In addition to the use of compactness, the other key pillar of the correspondence principle is that of abstraction – the ability to generalise from a concrete system (on a very explicit space, e.g. the infinite cube $\{0,1\}^{\Bbb Z}$) to an more general abstract setting (e.g. an abstract dynamical system, measure-preserving system, group, etc.)  One of the reasons for doing this is that there are various maneuvres one can do in the abstract setting (e.g. passing from a system to a subsystem, a factor, or an extension, or by reasoning by analogy from other special systems that are different from the original concrete system) which can be quite difficult to execute or motivate if one stays within the confines of a single concrete setting.

We now turn to several specific examples of this principle in various contexts.  We begin with the more “combinatorial” or “non-ergodic theoretical” instances of this principle, in which there is no underlying probability measure involved; these situations are simpler than the ergodic-theoretic ones, but already illustrate many of the key features of this principle in action.

1.  The correspondence principle in Ramsey theory

We begin with the classical correspondence principle that connects Ramsey results about finite colourings, to Ramsey results about infinite colourings, or (equivalently) about the topological dynamics of covers of open sets.  We illustrate this by demonstrating the equivalence of three statements. The first two are as follows:

Theorem 1. (van der Waerden theorem, Type 2 formulation)  Suppose the integers are coloured by finitely many colours.  Then there exist arbitrarily long monochromatic arithmetic progressions.

Theorem 2. (van der Waerden theorem, Type 1 formulation) For every c and k there exists N such that whenever $\{1,\ldots,N\}$ is coloured by c colours, there exists a monochromatic arithmetic progression of length k.

It is easy to see that Theorem 2 implies Theorem 1.  Conversely, to deduce Theorem 2 from Theorem 1 we use the correspondence principle as follows.  Assume for contradiction that Theorem 1 was true, but Theorem 2 was false. Untangling the quantifiers, this means that there exist positive integers k, c, a sequence $N_n$ going to infinity, and colourings $\{1,\ldots,N_n\} = A_{n,1} \cup \ldots \cup A_{n,c}$ of $\{1,\ldots,N_n\}$ into c colours, none of which contain any monochromatic arithmetic progressions of length k.

By shifting the sets $\{1,\ldots,N_n\}$ and redefining $N_n$ a little, we can replace $\{1,\ldots,N_n\}$ by $\{-N_n,\ldots,N_n\}$.  This sequence of colourings on the increasingly large sets $\{-N_n,\ldots,N_n\}$.  One can now extract a subsequence of such colourings on finite sets of integers that converge pointwise or weakly to a colouring on the whole set of integers by the usual “Arzelà-Ascoli diagonalisation trick”.   Indeed, by passing to an initial subsequence (and using the infinite pigeonhole principle), one can ensure that all of these colourings eventually become a constant colour at 0; refining to another subsequence, we can ensure it is a constant colour at 1; then at -1, 2, -2, and so forth.  Taking a diagonal subsequence of these sequences, we obtain a final subsequence of finite colourings that converges pointwise to an infinite limit colouring.  By Theorem 1, this limit colouring contains an monochromatic arithmetic progression of length k.  Now note that the property of being monochromatic at this progression is a local one: one only needs to inspect the colour of finitely many of the integers in order to verify this property.  Because of this, this property of the infinite limit colouring will also be shared by the finite colourings that are sufficiently far along the converging sequence.  But we assumed at the very beginning that none of these finite colourings have a monochromatic arithmetic progression of length k, a contradiction, and the claim follows.

The above argument has all the basic ingredients of the correspondence principle in action: a proof by contradiction, use of weak compactness to extract an infinite limiting object, application of the infinitary result to that object, and checking that the conclusion of that result is sufficiently “finitary”, “local”, or “continuous” that it extends back to some of the finitary sequence, leading to the desired contradiction.  [It is essential that one manages to reduce to purely local properties before passing from the converging sequence to the limit, or vice versa, since non-local properties are usually not preserved by the limit. For instance, consider the colouring of $\{-N,\ldots,N\}$ which colours every integer between -N/2 and N/2 blue, and all the rest red.  Then this converges weakly to the all-blue colouring, and clearly the (non-local) property of containing at least one red element is not preserved by the limit.]

A key disadvantage of the use of the correspondence principle, though, is that it is quite difficult to extract specific quantitative bounds from any argument that uses this principle; for instance, while one can eventually “proof mine” the above argument (combined with some standard proof of Theorem 1) to eventually get a bound on N in terms of k and d, such a bound is extremely poor (of Ackermann function type).

Theorem 1 can be reformulated in a more abstract form:

Theorem 3. (van der Waerden theorem, Type 4 version)  Let X be a compact space, let $T: X \to X$ be a homeomorphism, and let $(V_\alpha)_{\alpha \in A}$ be an open cover of X.  Then for any k there exists a positive integer n and an open set $V_\alpha$ in the cover such that $V_\alpha \cap T^{-n} V_\alpha \cap \ldots T^{-(k-1)n} V_\alpha$ is non-empty.

The deduction of Theorem 3 from Theorem 1 is easy, after using the compactness to refine the open cover to a finite subcover, picking a point $x_0$ in X, and then colouring each integer n by the index $\alpha$ of the first open set $V_\alpha$ that contains $T^n x_0$. The converse deduction of Theorem 1 from Theorem 3 is the one which shows the “dynamical” aspect of this theorem: we can encode a colouring ${\Bbb Z} = A_1 \cup \ldots \cup A_c$ of the integers as a point $x_0 := (c_n)_{n \in {\Bbb Z}}$ in the infinite product space $\{1,\ldots,c\}^{\Bbb Z}$, where $c_n$ is the unique class such that $n \in A_{c_n}$ (indeed, one can think of this product space as the space of all c-colourings of the integers).  The infinite product space is compact with the product (or weak) topology used earlier, thus a sequence of colourings converge to a limit iff they converge locally (or pointwise).  This space also comes with the standard shift $T: (x_n)_{n \in {\Bbb Z}} \to (x_{n-1})_{n \in {\Bbb Z}}$ (corresponding to right shift on the space of colourings).  If we let X be the closure of the orbit $\{ T^n x_0: n \in {\Bbb Z} \}$, and let $V_1,\ldots,V_c$ be the open cover $V_i := \{ (x_n)_{n \in {\Bbb Z}}: x_0 = i \}$, it is straightforward to show that Theorem 3 implies Theorem 1.

2.  The correspondence principle for finitely generated groups

The combinatorial correspondence used above for colourings can also be applied to other situations, such as that of finitely generated groups.  Recall that if G is a group generated by a finite set S, we say that G has polynomial growth if there exists constants K, d such that the ball $B_r$ of radius r (i.e. the set of words in S of length at most r) has cardinality at most $Kr^d$.  Such groups were classified by a well-known theorem of Gromov:

Theorem 4. (Gromov’s theorem on polynomial growth, Type 4 version) Let G be a finitely generated group of polynomial growth.  Then G is virtually nilpotent (i.e. it has a finite index subgroup that is nilpotent).

As observed in Gromov’s original paper, this result is equivalent to a finitary version:

Theorem 5. (Gromov’s theorem on polynomial growth, Type 3)  For every integers s, K, d, there exists R such that any finitely generated group with s generators, such that $B_r$ has cardinality at most $Kr^d$ for all $1 \leq r \leq R$, is virtually nilpotent.

It is clear that Theorem 5 implies Theorem 4; for the converse implication, we use the correspondence principle.  We sketch the details as follows.  First, we make things more concrete (i.e. move from Type 3 and Type 4 to Type 1 and Type 2 respectively) by observing that every group G on s generators can be viewed as a quotient ${\Bbb F}_s/\Gamma$ of the (nonabelian) free group on s generators by some normal subgroup $\Gamma$.

Suppose for contradiction that Theorem 5 failed in this concrete setting; then there exists s, K, d, a sequence $R_n$ going to infinity, and a sequence $G_n = {\Bbb F}_s/\Gamma_n$ of groups such that each $G_n$ obeys the volume condition $|B_r| \leq K r^d$ for all $1 \leq r \leq R_n$.

The next step, as before, is to exploit weak sequential compactness and extract a subsequence of groups $G_n = {\Bbb F}_s/\Gamma_n$ that “converge” to some limit $G = {\Bbb F}_s/\Gamma$, in the “weak” or “pointwise” sense that $\Gamma_n$ converges pointwise (or locally) to $\Gamma$ (much as with the convergence of colourings in the previous setting).  The Arzelà-Ascoli argument as before shows that we can find a subsequence of $G_n = {\Bbb F}_s/\Gamma_n$ which do converge pointwise to some limit object $G = {\Bbb F}_s/\Gamma$; one can check that the property of being a normal subgroup is sufficiently “local” that it is preserved under limits, thus $\Gamma$ is a normal subgroup of ${\Bbb F}_s$ and so G is well-defined as a group.  (One way to view this convergence is that algebraic identity obeyed by the generators of G, is eventually obeyed by the groups sufficiently far along the convergent subsequence, and conversely.)

As volume growth is a local condition (involving only words of bounded length for any fixed r), we then easily conclude that G is of polynomial growth, and thus by Theorem 4 is virtually nilpotent.  Some nilpotent algebra reveals that every virtually nilpotent group is finitely presented, so in particular there are a finite list of relations among the generators which guarantee this virtual nilpotency property.  Such properties are local enough that they must then persist to groups $G_n$ sufficiently far along the subsequence, contradicting Theorem 5.

A slight modification of the above argument also reveals that the step and index of the nilpotent subgroup of G can be bounded by some constant depending only on K, d, s; this gives Theorem 5 meaningful content even when G is finite (in contrast to Theorem 4, which is trivial for finite groups).  On the other hand, no explicit bound for this constant (or for R) in terms of s, K, d is currently known, though presumably some such bound can eventually be extracted from the above argument and the existing proofs of Gromov’s theorem by proof mining techniques.

3.  The correspondence principle for dense sets of integers

Now we turn to the more “ergodic” variants of the correspondence principle, starting with the fundamental Furstenberg correspondence principle connecting combinatorial number theory with ergodic theory.  We will illustrate this with the classic example of Szemerédi’s theorem.

There are many finitary versions of Szemerédi’s theorem.  Here is one:

Theorem 6. (Szemerédi’s theorem, Type 1 version) Let $k \geq 2$ and $0 < \delta \leq 1$.  Then there exists a positive integer $N = N(\delta,k)$ such that every subset A of $\{1,\ldots,N\}$ with $|A| \geq \delta N$ contains at least one k-term arithmetic progression.

The standard “Type 2” formulation of this theorem is the assertion that any subset of the integers of positive upper density has arbitrarily long arithmetic progressions.  While this statement is indeed easily shown to be equivalent to Theorem 6, the Furstenberg correspondence principle instead connects this formulation to a rather different one, in which the deterministic infinite set is replaced by a random one. Recall that a random subset of integers ${\Bbb Z}$ is a random variable A taking values in the power set $2^{\Bbb Z}$ of the integers (or more formally, with a distribution that is a Borel probability measure on $2^{\Bbb Z}$ with the product topology), and so in particular the probabilities of any cylinder events such as

${\Bbb P}( 3, 5 \in A; 7, 11 \not \in A )$

that involve only finitely many of the elements of A, are well-defined as numbers between 0 and 1.  The Carathéodory extension theorem (combined with some topological properties of $2^{\Bbb Z}$) shows, conversely, that any assignment of numbers between 0 and 1 to each cylinder set, which obeys various compatibility conditions such as

${\Bbb P}( 3 \in A ) = {\Bbb P}( 3, 5 \in A ) + {\Bbb P}( 3 \in A; 5 \not \in A )$

can be shown to give rise to a well-defined random set A.

We say that a random set A of integers is stationary if for every integer h, the shifted set A+h has the same probability distribution as A.  In terms of cylinder events, this is equivalent to a collection of assertions such as

${\Bbb P}( 3, 5 \in A; 7, 11 \not \in A ) = {\Bbb P}( 3-h, 5-h \in A; 7-h, 11-h \not \in A )$

and so forth.  One can then equate Theorem 6 with

Theorem 7. (Szemerédi’s theorem, Type 2 version) Let A be a stationary random infinite set of integers such that ${\Bbb P}(0 \in A) > 0$ (which, by stationarity, implies that ${\Bbb P}(n \in A) > 0$ for all n), and let $k \geq 2$.  Then, with positive probability, A contains a k-term arithmetic progression for each k.

It is not difficult to show that Theorem 6 implies Theorem 7.  We briefly sketch the converse implication, which goes through the usual compactness-and-contradiction method.  Suppose for contradiction that Theorem 7 is true, but Theorem 6 fails.  Then we can find k and $\delta$, a sequence of $N_n$ going to infinity, and sets $A_n \subset \{1,\ldots,N_n\}$ with $|A_n| \geq \delta N_n$ with no k-term arithmetic progressions.

We now need to extract a stationary random infinite set A of integers as a limit of the deterministic finite sets $A_n$.  The way one does that is by randomly translating each of the $A_n$.  More precisely, let $B_n$ denote the random finite set $A_n - h$, where $h$ is chosen from $\{1,\ldots,N_n\}$ uniformly at random.  The probability distribution $\mu_n$ of $B_n$ is a discrete probability measure on $2^{\Bbb Z}$ which is “almost stationary” in the sense that $B_n+1$ (say) has a distribution very close to $B_n$; for instance probabilities such as ${\Bbb P}( 3, 5 \in B_n )$ and ${\Bbb P}(3,5 \in B_n+1)$ can easily be seen to differ only by $O(1/N_n)$.  Also, the fact that $|A_n| \geq \delta N_n$ equates to the assertion that ${\Bbb P}(0 \in B_n) \geq \delta$.

By using the same type of Arzelà-Ascoli argument as before, we can show that some subsequence of the random variables $B_n$ converge weakly to a limit $B$ in the sense that the cylinder probabilities of $B_n$ converge to those of B along this subsequence; thus for instance

$\lim_{n \to \infty} {\Bbb P}(3, 5 \in B_n; 17 \not \in B_n) = {\Bbb P}(3, 5 \in B; 17 \not \in B)$.

(To get from the cylinder probabilities back to a random variable, one uses the Carathéodory extension theorem.  Weak convergence (or more precisely, weak-* convergence) of measures is also known as vague convergence, or convergence in distribution.)

Since the $B_n$ are approximately stationary, one can show that $B$ is exactly stationary.  Since ${\Bbb P}( 0 \in B_n)$ is bounded uniformly away from zero, one can show that ${\Bbb P}(0 \in B)$ is positive.  Thus, we can apply Theorem 7 to show that B contains a k-term arithmetic progression with positive probability.  Since there are only countably many k-term arithmetic progressions, the countable pigeonhole principle then tells us that there exists some k-term arithmetic progression $a, n+r, \ldots, a+(k-1) r$ which lies in B with positive probability.  This is a “local” property on B.  By weak convergence, this means that this same is true for $B_n$ for n sufficiently far along this subsequence; in particular, the corresponding deterministic sets $A_n$ contain k-term arithmetic progressions, a contradiction.  Thus Theorem 7 does imply Theorem 6.

Much as Theorem 1 is equivalent to Theorem 4, Theorem 7 can be reformulated in a more abstract manner, known as the Furstenberg recurrence theorem:

Theorem 8. (Szemerédi’s theorem, Type 4 version) Let $(X, \mu, T)$ be a probability space with a measure-preserving bimeasurable map $T: X \to X$ (thus T is invertible, and $T, T^{-1}$ are measurable and measure-preserving), and let $A \subset X$ have positive measure $\mu(A) > 0$.  Then there exists $r > 0$ such that $\mu( A \cap T^{-r} A \cap \ldots \cap T^{(k-1)r} A) > 0$.

We leave the equivalence of Theorem 8 with Theorems 6,7 as an exercise.

4.  The correspondence principle for dense sets of integers II.

The deduction of Theorem 6 from Theorem 7, gives that the set A appearing in Theorem 6 has at least one k-term arithmetic progression, but if one inspects the argument more carefully, one sees that in fact one has a stronger statement that if N is large enough, there exists some $1 \leq r \leq C( k, \delta )$ such that A contains at least $c(k,\delta) N$ k-term arithmetic progressions $a, a+r, \ldots, a+(k-1)r$ of step r.  We leave this derivation as an exercise.

It is possible however to find even more progressions in the set A:

Theorem 8′. (Szemerédi’s theorem, Varnavides-type version) Let $k \geq 2$ and $0 < \delta \leq 1$.  Then there exists a positive integer $N_0 = N_0(\delta,k)$ and $c = c(k,\delta) > 0$ such that for every $N \geq N_0$ and every subset A of $\{1,\ldots,N\}$ with $|A| \geq \delta N$ contains at least $c N^2$  k-term arithmetic progressions.

This can be obtained from Theorem 7 by repeating the derivation of Theorem 6 with two additional twists.  Firstly, it is not difficult to modify the $N_n$ appearing in this derivation to be prime (for instance, by appealing to Bertrand’s postulate).  This allows us to identify $\{1,\ldots,N_n\}$ with the finite field ${\Bbb Z}/N_n {\Bbb Z}$, giving us the ability to dilate within this set as well as translate.  For technical reasons it is also convenient to restrict $A_n$ to lie in the bottom half $\{1,\ldots,\lfloor N_n/2\rfloor\}$ of this set, which is also easy to arrange.  We then argue as before, but with the randomly translated set $B_n := A_n - h$ replaced by the randomly translated and dilated set $B_n := (A_n-h) \cdot r$, where h and r are independently chosen at random from this finite field.  If one does this, one finds that probabilities such as ${\Bbb P}( 0, 1, \ldots, k-1 \in B_n )$ are essentially equal to the number of k-term arithmetic progressions in $A_n$, divided by $N_n^2$ (the restriction of $A_n$ to the bottom half of $\{1,\ldots,N_n\}$ is necessary to avoid certain “wraparound” issues).  If one then repeats the previous arguments one can establish Theorem 8′ from Theorem 7.

This type of argument was implicitly first introduced by Varnavides.  Basically, this argument exploits the affine invariance (i.e. $\hbox{Aff}({\Bbb Z})$ invariance) of the space of arithmetic progressions, as opposed to mere translation invariance (i.e. ${\Bbb Z}$ invariance).

One can rephrase Theorem 8 in a quantitative ergodic formulation, essentially due to Bergelson, Host, McCutcheon, and Parreau:

Theorem 9. (Szemerédi’s theorem, Type 3 version) Let $k \geq 2$ and $0 < \delta \leq 1$.  Then there exists $c(k,\delta) > 0$ such that for every measure-preserving system $(X,\mu,T)$ and any measurable set A with $\mu(A) > \delta$, we have $\liminf_{N \to \infty} {\Bbb E}_{n \in \{1,\ldots,N\}} \mu( A \cap T^{-n} A \cap \ldots \cap T^{-(k-1)n} A) \geq c(k,\delta)$.

5.  The correspondence principle for sparse sets of integers

It is possible to squeeze even more finitary results out of Theorem 7 than was done in the previous two sections.  In particular, one has the following relative version of Szemerédi’s theorem:

Theorem 10. (Relative Szemerédi theorem) Let $k \geq 2$ and $0 < \delta \leq 1$.  Let N be a sufficiently large integer, and let R be a “sufficiently pseudorandom” subset of $\{1,\ldots,N\}$.  Then every subset A of R with $|A| \geq \delta |R|$ contains one k-term arithmetic progression.

I did not define above what “sufficiently pseudorandom” meant as it is somewhat technical, but very roughly speaking it is a package of approximate independence conditions which include things like

${\Bbb E}_{n,h,k \in [N]} \nu(n) \nu(n+h) \nu(n+k) \nu(n+h+k)=1+o(1)$, (1)

where $\nu(n) := \frac{N}{|R|} 1_R(n)$ is the normalised indicator function of R, and all arithmetic operations are taking place in the cyclic group ${\Bbb Z}/N{\Bbb Z}$.

The point of Theorem 10 is that it allows one to detect arithmetic progressions inside quite sparse sets of integers (typically, R will have size about $N/\log N$, so A and R would be logarithmically sparse).  In particular, a relative Szemerédi theorem similar to this one (but somewhat stronger) was a key ingredient in my result with Ben Green that the primes contain arbitrarily long arithmetic progressions.  (Theorem 10 is unfortunately insufficient for this task, because the amount of pseudorandomness needed here depends on $\delta$; the relative Szemerédi’s theorem developed in my paper with Ben only needs a number of pseudorandomness conditions that depend on k but – crucially – not on $\delta$.)

The derivation of Theorem 10 from Theorem 7 is discussed in this unpublished note of mine.  We sketch the main ideas here.  Once again, we argue by contradiction.  If Theorem 10 failed, then one can find k and $\delta$, a sequence $N_n$ going to infinity, a sequence $R_n$ of “increasingly pseudorandom” subsets of $\{1,\ldots,N_n\}$, and sets $A_n \subset R_n$ with $|A_n| \geq \delta |R_n|$ such that none of the $A_n$ contain k-term arithmetic progressions.

As before, it is not difficult to ensure $N_n$ to be prime, and that $A_n$ lives in the bottom half $R_n \cap \{1,\ldots,\lfloor N_n/2\lfloor\}$ of $R_n$.  We then create the random translated and dilated set $B_n := (A_n - h) \cdot r$ as before; note that $B_n$ still has no k-term arithmetic progressions (except in the degenerate case r=0, but this case is extremely rare and becomes irrelevant in the limit).  We can also randomly translated and dilate $R_n$ by the same parameters to obtain a random set $S_n := (R_n - r) \cdot r$; thus $B_n$ is a relatively dense subset of the random (and potentially quite sparse) set $S_n$.

In the previous two sections, we looked at the (Borel) probability measure $\mu_n$ on the power set $2^{\Bbb Z}$ formed by the distribution of $B_n$, which can be viewed as a collection of cylinder statistics such as

$\mu_n( C(3, 5, \overline{17}) ) = {\Bbb P}( 3, 5 \in B_n; 17 \not \in B_n )$

where $C(3,5,\overline{17})$ is the cylinder set $\{ A \subset {\Bbb Z}: 3, 5 \in A; 17 \not \in A \}$.  In order for these statistics to actually arise from a Borel probability measure, these statistics have to be numbers lying between 0 and 1, and they have to obey compatibility conditions such as

$\mu_n( C(3, \overline{17}) ) = \mu_n( C(3, 5, \overline{17}) ) + \mu_n( C(3, \overline{5}, \overline{17}) ),$

and also

$\mu_n( 2^{\Bbb Z}) = \mu_n(C()) = 1$.

Conversely, any set of non-negative numbers obeying these properties will give a Borel probability measure, thanks to the Carathéodory extension theorem.

In the sparse case, this approach does not work, because $\mu_n$ degenerates in the limit if $R_n$ is sparse.  For instance, because $B_n$ is so sparse, the probability ${\Bbb P}( 3, 5 \in B_n; 17 \not \in B_n )$ can go to zero; indeed, we expect this quantity to look like $O( |R_n|/ N )^2$.  To fix this we do two things.  Firstly, we replace the absolute complement $\overline{B_n} = \{1,\ldots,N_n\} \backslash B_n$ that implicitly appears above by the relative complement $S_n$.  Secondly, we introduce the normalising factor $N_n/|R_n|$, so for instance the cylinder set $C( 3, 5, \overline{17} )$ will now be assigned the normalised weight

$\displaystyle \mu_n( C(3, 5, \overline{17}) ) = (\frac{N_n}{|R_n|})^3 {\Bbb P}( 3, 5 \in B_n; 17 \in S_n \backslash B_n )$

and similarly for other cylinder sets.  Perhaps more suggestively, we have

$\mu_n( C(3, 5, \overline{17}) ) ={\Bbb E}_{a, r \in {\Bbb Z}/N_n {\Bbb Z}} \Lambda_n(a+3r) \Lambda_n(a+5r) (\nu_n-\Lambda_n)(a+17r)$

where $\Lambda_n := \frac{N_n}{|R_n|} 1_{A_n}$ and $\nu_n := \frac{N_n}{|R_n|} 1_{R_n}$.

This gives us a non-negative number assigned to every cylinder set, but unfortunately these numbers do not obey the compatibility conditions required to make these numbers arise from a probability measure.  However, if one assumes enough pseudorandomness conditions such as (1), one can show that the compatibility conditions are satisfied approximately, thus for instance

$\mu_n( C(3, \overline{17}) ) = \mu_n( C(3, 5, \overline{17}) ) + \mu_n( C(3, \overline{5}, \overline{17}) ) + o(1)$

or equivalently

${\Bbb E}_{a,r \in {\Bbb Z}/N_n {\Bbb Z}} \Lambda(a+3r) (\nu-1)(a+5r) (\nu-\Lambda)(a+17r) = o(1).$

These conditions can be checked using a large number of applications of the Cauchy-Schwarz inequality, which we omit here.  Thus, $\mu_n$ is not a true probability measure, but is some sort of approximate “virtual probability measure”.  It turns out that these virtual probability measures enjoy the same crucial weak compactness property as actual probability measures, and one can repeat all the previous arguments to deduce Theorem 10 from Theorem 7.

6.  The correspondence principle for graphs

In the previous three sections we considered the correspondence principle in the integers ${\Bbb Z}$.  It is not difficult to replace the integers with other amenable groups, such as ${\Bbb Z}^d$ for some fixed d.  Now we discuss a somewhat different-looking instance of the correspondence principle, coming from graph theory, in which the underlying group is now the (small) permutation group $\hbox{Perm}_0({\Bbb Z})$ consisting of those permutations $\sigma: {\Bbb Z} \to {\Bbb Z}$ which permute only finitely many elements.  (It is convenient to work with this group, rather than the entire group of permutations, as it is countable.)  We illustrate the graph correspondence principle with the following old result of Ruzsa and Szemerédi:

Theorem 11. (Triangle removal lemma, Type 3 version) For every $\epsilon > 0$ there exists $\delta > 0$ and $N_0 > 0$ such that for any $G = (V,E)$ be a graph on N vertices for some $N \geq N_0$ which has at most $\delta N^3$ triangles, one can remove at most $\epsilon N^2$ edges from the graph to make the graph triangle free.

We will not discuss why this theorem is of interest, other than to mention in passing that it actually implies the k=3 case of Szemerédi’s theorem.  (See also my previous blog posts on related topics.)   It turns out that this result can be deduced from some infinitary versions of this lemma.  Here is one instance:

Theorem 12. (Triangle removal lemma, Type 2 version) Let G be a random graph on the integers ${\Bbb Z}$ which is exchangeable in the sense that any permutation of G has the same distribution as G.  (This is the analogue of stationarity in this setting.)  Suppose that G is almost surely triangle free, and let $\varepsilon > 0$.  Then there exists a continuous function F from the space $2^{\binom{\Bbb Z}{2}}$ of graphs on ${\Bbb Z}$ (with the product topology) to the space $2^{\binom{\Bbb N}{2}}$ of graphs on ${\Bbb N}$, which is equivariant with respect to permutations of ${\Bbb N}$, such that $G' := F(G)$ is surely (not just almost surely) a subgraph of G which is triangle free, and such that ${\Bbb P}( (1,2) \in G \backslash G' ) \leq \varepsilon$.

Theorem 13. (Triangle removal lemma, Type 3 version) Let $(X, \mu)$ be a probability space with a measure-preserving action of $\hbox{Perm}_0({\Bbb Z})$, and let $E_{12}, E_{23}, E_{31}$ be three measurable sets which are invariant under the stabiliser of $\{1,2\}$, $\{2,3\}$, and $\{3,1\}$ respectively.  Suppose that $E_{12} \cap E_{23} \cap E_{31}$ has measure zero.  Then one can find subsets $E'_{12}, E'_{23}, E'_{31}$ respectively of $E_{12}, E_{23}, E_{31}$, which remain invariant under the stabilisers of $\{1,2\}$, $\{2,3\}$, and $\{3,1\}$ respectively, such that $E'_{12} \cap E'_{23} \cap E'_{31}$ is empty (and not just measure zero).

The proofs that Theorem 12 and Theorem 13 imply Theorem 11 are somewhat technical, but in the same spirit as the previous applications of the correspondence principle; see this paper (or these blog posts) and this paper respectively for details.  In fact they prove slightly stronger statements than Theorem 11, in that they give a bit more information as to how the triangle-free graph G’ is obtained from the nearly triangle-free graph G.  The same methods also apply to hypergraphs without much further difficulty, as is done in the above papers, but we will not discuss the details here.

7.  The correspondence principle over finite fields

The correspondence principle can also be applied quite effectively to the finite field setting, which is a dyadic model for the integer setting.  In particular, Tamar Ziegler and I have very recently shown the equivalence of the following two results:

Theorem 14. (Inverse Gowers conjecture for finite fields, Type 1 version)  Let F be a finite field, let $k \geq 2$, let $\delta > 0$, and let $f: F^n \to {\Bbb C}$ be a function on some vector space $F^n$ bounded in magnitude by 1, and such that the Gowers norm

$\|f\|_{U^k(F^n)} := ({\Bbb E}_{h_1,\ldots,h_k,x \in F^n} \Delta_{h_1} \ldots \Delta_{h_k} f(x))^{1/2^k}$

is larger than $\delta$, where $\Delta_h f(x) := f(x+h) \overline{f(x)}$. Then there exists a function $\phi: F^n \to S^1$ which is a phase polynomial of degree at most k in the sense that $\Delta_{h_1} \ldots \Delta_{h_k} \phi(x) = 1$ for all $h_1,\ldots,h_k,x \in F^n$ (or equivalently, that $\|\phi\|_{U^k(F^n)} = 1$), such that we have the correlation $|{\Bbb E}_{x \in F^n} f(x) \overline{\phi(x)}| \geq c(F,k,\delta)$ for some $c(F,k,\delta) > 0$ independent of n.

Theorem 15. (Inverse Gowers conjecture for finite fields, Type 4 version)  Let $(X,\mu)$ be a probability space, let F be a finite field, and let $g \mapsto T_g$ be a measure-preserving action of the infinite group $F^\omega := \lim_{\leftarrow} F^n$.  Let $f \in L^\infty(X)$ be such that the Gowers-Host-Kra seminorm

$\|f\|_{U^k(X)} := \lim_{n \to \infty} ({\Bbb E}_{h_1,\ldots,h_k \in F^n} \int_X \Delta_{h_1} \ldots \Delta_{h_k} f \mu)^{1/2^k}$

is positive, where $\Delta_h f(x) := f( T^h x ) \overline{f(x)}$.  Then there exists $\phi: X \to S^1$ which is a phase polynomial in the sense that $\Delta_{h_1} \ldots \Delta_{h_k} \phi = 1$ a.e., and which correlates with f in the sense that $\int_X f \overline{\phi}\ d\mu \neq 0$.

In a forthcoming paper of Bergelson, Ziegler, and myself (which I hope to discuss here on this blog in the near future), Theorem 15 was established, which by the correspondence principle alluded to above, implies Theorem 14.  (See my previous blog post for some discussion of Theorem 14, which had been conjectured for a few years.  Interestingly, this result also holds in the low characteristic case, for which (as discussed in that post) the conjecture was believed to fail.  The issue here is a little subtle: counterexamples are known if the phase polynomial $\phi$ is constrained to be an $|F|^{th}$ root of unity, but with the above results we now know that these counterexamples are no longer present when the phase polynomial is allowed to range freely in $S^1$.)

I will try to discuss these things in more detail in a later post, but very roughly speaking the reason why the correspondence principle is more effective here than on the integers is because the vector space $F^n$ enjoys a massively transitive action of the general linear group $GL(F^n)$ that mixes things around in a manner much stronger than even the affine action $\hbox{Aff}({\Bbb Z})$ mentioned earlier (which is basically 2-transitive but not k-transitive for any higher k).

8.  The correspondence principle for convergence of ergodic averages

My final instance of the correspondence principle goes in the opposite direction from previous instances.  In the preceding seven instances, the interesting aspect of the principle was that one could use a qualitative result about infinite systems to deduce a quantitative result about finite systems.  Here, we will do the reverse: we show how a result about infinite systems can be deduced from one from a finite system.  We will illustrate this with a very simple result from ergodic theory, the mean ergodic theorem:

Theorem 16 (Mean ergodic theorem, Type 4 version) Let $(X,\mu,T)$ be a measure-preserving system, and let $f \in L^2(X)$.  Let $S_N$ be the averaging operators $S_N f := {\Bbb E}_{1 \leq n \leq N} T^n f$.  Then $S_N f$ is a convergent sequence in $L^2(X)$ as $N \to \infty$.

This is of course a well-known result with many short and elegant proofs; the proof method that we sketch here (essentially due to Avigad, Gerhardy, and Towsner) is lengthier and messier than the purely infinitary proofs, but can be extended to some situations in which it had been difficult to proceed in an infinitary manner (in particular on my result on convergence for multiple commuting transformations, as discussed in this post of mine).

The basic problem with finitising this theorem is that there is no uniform rate of convergence in the mean ergodic theorem: given any $\epsilon > 0$, we know that the averages $S_N f$ eventually lie within $\varepsilon$ of their limit for N large enough, but it is known that the N we need for this is not uniform in the choice of the system $(X,\mu,T)$ or the function f, and can indeed by arbitrarily large for given $\varepsilon$ even after fixing the size of f.  So a “naive” finitisation does not work, much as a naive finitisation of the infinite convergence principle (every bounded monotone sequence converges), as discussed in this post.

The resolution is in fact very similar to that discussed in the above-mentioned post.  Observe that if $x_1, x_2, \ldots$ is any sequence in a complete metric space (e.g. the real line, or $L^2(X)$), the statement that “$x_n$” converges”, or equivalently that “$x_n$ is Cauchy“, is equivalent to

For every $\varepsilon > 0$, there exists n such that for all sufficiently large m, $d(x_n,x_m) \leq \varepsilon$,

which is in turn equivalent to

For every $\varepsilon > 0$ there exists $F_0: {\Bbb N} \to {\Bbb N}$ such that for every $F: {\Bbb N} \to {\Bbb N}$ that grows faster than $F_0$ in the sense that $F(n) > F_0(n)$ for all n, one has $d(x_n,x_{F(n)}) \leq \varepsilon$ for some n.

The point here is that once the function F is selected, one only has to verify the closeness of a single pair of elements in the sequence, rather than infinitely many.  This makes it easier to finitise the convergence statement effectively.  Indeed, Theorem 16 is easily seen (by another compactness and contradiction argument) to be equivalent to

Theorem 17 (Mean ergodic theorem, Type 3 version) For every $\varepsilon > 0$ and every sufficiently rapid F (i.e. F grows faster than some $F_0$ depending on $\varepsilon$) there exists N such that for every measure-preserving system $(X,\mu,T)$ and every $f \in L^2(X)$ with $\|f\|_{L^2(X)} \leq 1$, we have $\|S_n f - S_{F(n)} f \|_{L^2(X)} \leq \varepsilon$ for some $1 \leq n \leq N$.

Note that this theorem is quantitative in the sense that N depends only on $\varepsilon$ and F, and not on the underlying system; indeed one can give an explicit value for N, arising from iterating F about $1/\varepsilon^2$ times.  This result is essentially proven as Corollary 2 of my 254A lecture notes on this topic.