Having studied compact extensions in the previous lecture, we now consider the opposite type of extension, namely that of a weakly mixing extension. Just as compact extensions are “relative” versions of compact systems, weakly mixing extensions are “relative” versions of weakly mixing systems, in which the underlying algebra of scalars {\Bbb C} is replaced by L^\infty(Y). As in the case of unconditionally weakly mixing systems, we will be able to use the van der Corput lemma to neglect “conditionally weakly mixing” functions, thus allowing us to lift the uniform multiple recurrence property (UMR) from a system to any weakly mixing extension of that system.

To finish the proof of the Furstenberg recurrence theorem requires two more steps. One is a relative version of the dichotomy between mixing and compactness: if a system is not weakly mixing relative to some factor, then that factor has a non-trivial compact extension. This will be accomplished using the theory of conditional Hilbert-Schmidt operators in this lecture. Finally, we need the (easy) result that the UMR property is preserved under limits of chains; this will be accomplished in the next lecture.

— Conditionally weakly mixing functions —

Recall that in a measure-preserving system X = (X, {\mathcal X}, \mu, T), a function f \in L^2(X) = L^2(X,{\mathcal X},\mu) is said to be weakly mixing if the squared inner products |\langle T^n f, f \rangle_X|^2 := (\int_X T^n f \overline{f}\ d\mu)^2 converge in the Cesàro sense, thus

\lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} |\int_X T^n f \overline{f}\ d\mu|^2 = 0. (1)

Now let Y = (Y, {\mathcal Y}, \nu, S) be a factor of X, so that L^\infty(Y) can be viewed as a subspace of L^\infty(X). Recall that we have the conditional inner product \langle f, g \rangle_{X|Y} := {\Bbb E}( f \overline{g}|Y) and the Hilbert module L^2(X|Y) of functions f for which \langle f, f \rangle_{X|Y} lies in L^\infty(Y). We shall say that a function f \in L^2(X|Y) is conditionally weakly mixing relative to Y if the L^2 norms \|\langle T^n f, f \rangle_{X|Y}\|_{L^2(Y)}^2 converge to zero in the Cesàro sense, thus

\lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} \int_X |{\Bbb E}(T^n f \overline{f}|Y)|^2\ d\mu = 0. (2)

Example 1. If X = Y \times Z is a product system of the factor space Y = (Y, {\mathcal Y}, \nu, S) and another system Z = (Z, {\mathcal Z}, \rho, R), then a function f(y,z) = f(z) of the vertical variable z \in Z is weakly mixing relative to Y if and only if f(z) is weakly mixing in Z. \diamond

Much of the theory of weakly mixing systems extends easily to the conditionally weakly mixing case. For instance:

Exercise 1. By adapting the proof of Corollary 2 from Lecture 12, show that if f \in L^2(X|Y) is conditionally weakly mixing and g \in L^2(X|Y), then \|\langle T^n f, g \rangle_{X|Y}\|_{L^2(Y)}^2 and \|\langle f, T^n g \rangle_{X|Y}\|_{L^2(Y)}^2 converge to zero in the Cesàro sense. (Hint: you will need to show that expressions such as \langle g, T^n f \rangle_{X|Y} T^n f converge in L^2(X) in the Cesàro sense. Apply the van der Corput lemma and use the fact that \langle g, T^n f \rangle_{X|Y} are uniformly bounded in L^\infty(Y) by conditional Cauchy-Schwarz.) \diamond

Exercise 2. Show that the space of conditionally weakly mixing functions in L^2(X|Y) is a module over L^\infty(Y) (i.e. it is closed under addition and multiplication by the “scalars” L^\infty(Y)), which is also shift-invariant and topologically closed in the topology of L^2(X|Y) (see Exercise 2 from Lecture 13). \diamond

Let us now see the first link between conditional weak mixing and conditional almost periodicity (cf. Exercise 18 from Lecture 12):

Lemma 1. If f \in L^2(X|Y) is conditionally weakly mixing and g \in L^2(X|Y) is conditionally almost periodic, then \langle f, g \rangle_{X|Y} = 0 a.e.

Proof. Since \langle f,g \rangle_{X|Y} = T^{-n} \langle T^n f, T^n g \rangle_{X|Y}, it will suffice to show that

C\!-\!\sup_{n \to \infty} \|\langle T^n f, T^n g \rangle_{X|Y}\|_{L^2(Y)} = 0. (3)

Let \varepsilon > 0 be arbitrary. As g is conditionally almost periodic, one can find a finitely generated module zonotope \{ c_1 f_1 + \ldots + c_d f_d: \| c_1\|_{L^\infty(Y)},\ldots,\|c_d\|_{L^\infty(Y)} \leq 1 \} with f_1,\ldots,f_d \in L^2(X|Y) such that all the shifts T^n g lie within \varepsilon (in L^2(X|Y)) of this zonotope. Thus (by conditional Cauchy-Schwarz) we have

\| \langle T^n f, T^n g \rangle_{X|Y} \|_{L^2(Y)} = \| \langle T^n f, c_{1,n} f_1 + \ldots + c_{d,n} f_d \rangle_{X|Y} \|_{L^2(Y)} + O(\varepsilon) (4)

for all n and some c_{1,n},\ldots,c_{d,n} \in L^\infty(Y) with norm at most 1. We can pull these constants out of the conditional inner product and bound the left-hand side of (4) by

\|\langle T^n f, f_1 \rangle_{X|Y}\|_{L^2(Y)} + \ldots + \| \langle T^n f, f_1 \rangle_{X|Y}\|_{L^2(Y)} + O(\varepsilon). (5)

By Exercise 1, the Cesàro supremum of (5) is at most O(\varepsilon). Since \varepsilon is arbitrary, the claim (3) follows. \Box

Since all functions in L^\infty(Y) are conditionally almost periodic, we conclude that every conditionally weakly mixing function f is orthogonal to L^\infty(Y), or equivalently that {\Bbb E}(f|Y) = 0 a.e. Let us say that f has relative mean zero if the latter holds.

Definition 1. A system X is a weakly mixing extension of a factor Y if every f \in L^2(X|Y) with relative mean zero is relatively weakly mixing.

Exercise 3. Show that a product X=Y \times Z of a system Y with a weakly mixing system Z is always a weakly mixing extension of Y. \diamond

Remark 1. If X is regular, then we can disintegrate the measure \mu as an average \mu = \int_Y \mu_y d\nu(y), see Theorem 4 from Lecture 9. It is then possible to construct a relative product system X \times_Y X, which is the product system X \times X but with the measure \mu \times_\nu \mu := \int_Y \mu_y \times \mu_y d\nu(y) instead of \mu \times \mu. It can then be shown (cf. Exercise 9 from Lecture 12) that X is a weakly mixing extension of Y if and only if X \times_Y X is ergodic; see for instance Furstenberg’s book for details. However, in these notes we shall focus instead on the more abstract operator-algebraic approach which avoids the use of disintegrations. \diamond

Now we show that the uniform multiple recurrence property (UMR) from Lecture 13 is preserved under weakly mixing extensions (cf. Theorem 1 from Lecture 13).

Theorem 1. Suppose that X = (X, {\mathcal X}, \mu, T) is a weakly mixing extension of Y = (Y, {\mathcal Y}, \nu, S). If Y obeys UMR, then so does X.

The proof of this theorem rests on the following analogue of Proposition 1 from Lecture 12:

Proposition 1. Let a_1,\ldots,a_k \in \Bbb Z be distinct integers for some k \geq 1. Let X = (X, {\mathcal X}, \mu, T) is a weakly mixing extension of Y = (Y, {\mathcal Y}, \nu, S), and let f_1,\ldots,f_k \in L^\infty(X) be such that at least one of f_1,\ldots,f_k has relative mean zero. Then

C\!-\!\lim_{n \to \infty} T^{a_1 n} f_1 \ldots T^{a_k n} f_k = 0 (6)

in L^2(X, {\mathcal X}, \mu).

Exercise 4. Prove Proposition 1. (Hint: modify (or “relativise”) the proof of Proposition 1 from Lecture 12.) \diamond

Corollary 1. Let a_1,\ldots,a_k \in {\Bbb Z} be distinct integers for some k \geq 1. Let X = (X, {\mathcal X}, \mu, T) is a weakly mixing extension of Y = (Y, {\mathcal Y}, \nu, S), and let f_1,\ldots,f_k \in L^\infty(X). Then

C\!-\!\lim_{n \to \infty} \int_X T^{a_1 n} f_1 \ldots T^{a_k n} f_k\ d\mu

- \int_X T^{a_1 n} {\Bbb E}(f_1|Y) \ldots T^{a_k n} {\Bbb E}(f_k|Y)\ d\mu = 0. (7)

Exercise 5. Prove Corollary 1. (Hint: adapt the proof of Corollary 2 from Lecture 12.) \diamond

Proof of Theorem 1. Let f \in L^\infty(X) be non-negative with positive mean. Then {\Bbb E}(f|Y) \in L^\infty(Y) is also non-negative with positive mean. Since Y obeys UMR, we have

\liminf_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} {\Bbb E}(f|Y) T^n {\Bbb E}(f|Y) \ldots T^{(k-1) n} {\Bbb E}(f|Y) > 0. (8)

Applying Corollary 1 we see that the same statement holds with {\Bbb E}(f|Y) replaced by f, and the claim follows. \Box

Remark 2. As the above proof shows, Corollary 1 lets us replace functions in the weakly mixing extension X by their expectations in Y for the purposes of computing k-fold averages. In the notation of Furstenberg and Weiss, Corollary 1 asserts that Y is a characteristic factor of X for the average (7). The deeper structural theory of such characteristic factors (and in particular, on the minimal characteristic factor for any given average) is an active and difficult area of research, with surprising connections with Lie group actions (and in particular with flows on nilmanifolds), as well as the theory of inverse problems in additive combinatorics (and in particular to inverse theorems for the Gowers norms); see for instance the ICM paper of Kra for a survey of recent developments. The concept of a characteristic factor (or more precisely, finitary analogues of this concept) also is fundamental in my work with Ben Green on primes in arithmetic progression. \diamond

— The dichotomy between structure and randomness —

The remainder of this lecture is devoted to proving the following “relative” generalisation of Theorem 1 from Lecture 12, and which is a fundamental ingredient in the proof of the Furstenberg recurrence theorem:

Theorem 2. Suppose that X = (X, {\mathcal X}, \mu, T) is an extension of a system Y = (Y, {\mathcal Y}, \nu, S). Then exactly one of the following statements is true:

  1. (Structure) X has a factor Z which is a non-trivial compact extension of Y.
  2. (Randomness) X is a weakly mixing extension of Y.

As in Lecture 12, the key to proving this theorem is to show

Proposition 2. Suppose that X = (X, {\mathcal X}, \mu, T) is an extension of a system Y = (Y, {\mathcal Y}, \nu, S). Then a function f \in L^2(X|Y) is relatively weakly mixing if and only if \langle f, g \rangle_{X|Y} = 0 a.e. for all relatively almost periodic g.

The “only if” part of this proposition is Lemma 1; the harder part is the “if” part, which we will prove shortly. But for now, let us see why Proposition 2 implies Theorem 2.

From Lemma 1, we already know that no non-trivial function can be simultaneously conditionally weakly mixing and conditionally almost periodic, which shows that cases 1 and 2 of Theorem 2 cannot simultaneously hold. To finish the proof of Theorem 2, suppose that X is not a weakly mixing extension of Y, thus there exists a function f \in L^2(X|Y) of relative mean zero which is not weakly mixing. By Proposition 2, there must exist a relatively almost periodic g \in L^2(X|Y) such that \langle f, g\rangle_{X|Y} does not vanish a.e.. Since f is orthogonal to all functions in L^\infty(Y), we conclude that g is not in L^\infty(Y), thus we have a single relatively almost periodic function. From Exercise 6 of Lecture 13, this shows that the maximal compact extension of Y is non-trivial, and the claim follows.

It thus suffices to prove the “if” part of Proposition 2; thus we need to show that every non-conditionally-weakly-mixing function correlates with some conditionally almost periodic function. But observe that if f \in L^2(X|Y) is not conditionally weakly mixing, then by definition we have

\limsup_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} | {\Bbb E}( T^n f \overline{f} | Y ) |_{L^2(Y)}^2 > 0. (9)

We can rearrange this as

\limsup_{N \to \infty} \langle S_{f,N} f, f \rangle_X > 0. (10)

where S_{f,N}: L^2(X|Y) \to L^2(X|Y) is the operator

S_{f,N} g := \frac{1}{N} \sum_{n=0}^{N-1} {\Bbb E}( g \overline{T^n f}|Y) T^n f. (11)

To prove Proposition 2, it thus suffices (by weak compactness) to show that

Proposition 3. (Dual functions are almost periodic) Suppose that X = (X, {\mathcal X}, \mu, T) is an extension of a system Y = (Y, {\mathcal Y}, \nu, S), and let f \in L^2(X|Y). Let S_f be any limit point of S_{f,N} in the weak operator topology. Then S_f f is relatively almost periodic.

Remark 3. By applying the mean ergodic theorem to the dynamical system X \times_Y X, one can show that the sequence D_N is in fact convergent in the weak or strong operator topologies (at least when X is regular). But to avoid some technicalities we shall present an argument that does not rely on existence of a strong limit. \diamond

As one might expect from the experience with unconditional weak mixing, the proof of Proposition 3 relies on the theory of conditionally Hilbert-Schmidt operators on L^2(X|Y). We give here a definition of such operators which is suited for our needs.

Definition 2. Let X, Y be as above. A sub-orthonormal set in L^2(X|Y) is any at most countable sequence e_\alpha \in L^2(X|Y) such that \langle e_\alpha, e_\beta \rangle_{X|Y} = 0 a.e. for all \alpha \neq \beta and \langle e_\alpha, e_\alpha \rangle_{X|Y} \leq 1 a.e. for all \alpha. A linear operator A: L^2(X|Y) \to L^2(X|Y) is said to be a conditionally Hilbert-Schmidt operator if we have the module property

A(cf) = cAf for all c \in L^\infty(Y) (12)

and the bound

\sum_\alpha \sum_\beta |\langle A e_\alpha, f_\beta \rangle_{X,Y}|^2 \leq C^2 a.e. (13)

for all sub-orthonormal sets \{ e_\alpha\}, \{f_\beta\} and some constant C>0; the best such C is called the (uniform) conditional Hilbert-Schmidt norm \|\|A\|_{HS(X|Y)}\|_{L^\infty(Y)} of A.

Remark 4. As in Lecture 12, one can also set up the concept of a tensor product of two Hilbert modules, and use that to define conditionally Hilbert-Schmidt operators in a way which does not require sub-orthonormal sets. But we will not need to do so here. One can also define a pointwise conditional Hilbert-Schmidt norm \|A\|_{HS(X|Y)}(y) for each y \in Y, but we will not need this concept. \diamond

Example 2. Suppose Y is just a finite set (with the discrete \sigma-algebra), then X splits into finitely many fibres \pi^{-1}(\{y\}) with the conditional measures \mu_y, and L^2(X|Y) can be direct sum (with the l^\infty norm) of the Hilbert spaces L^2(\mu_y). A conditional Hilbert-Schmidt operator is then equivalent to a family of Hilbert-Schmidt operators A_y: L^2(\mu_y) \to L^2(\mu_y) for each y, with the A_y uniformly bounded in Hilbert-Schmidt norm. \diamond

Example 3. In the skew shift example X = ({\Bbb R}/{\Bbb Z})^2 = \{ (y,z): y, z \in {\Bbb R}/{\Bbb Z} \}, Y = ({\Bbb R}/{\Bbb Z}), one can show that an operator A is conditionally Hilbert-Schmidt if and only if it takes the form A f(y,z) = \int_{{\Bbb R}/{\Bbb Z}} K_y(z,z') f(y,z')\ dz' a.e. for all f \in L^2(X|Y), with \|\|A\|_{HS(X|Y)}\|_{L^\infty(Y)} = \sup_y (\int_{{\Bbb R}/{\Bbb Z}} \int_{{\Bbb R}/{\Bbb Z}} |K_y(z,z')|^2 dz dz')^{1/2} finite. \diamond

Exercise 6. Let f_1, f_2 \in L^2(X|Y) with \|f_1\|_{L^2(X|Y)}, \|f_2\|_{L^2(X|Y)} \leq 1 a.e.. Show that the rank one operator g \mapsto \langle g, f_1 \rangle_{X|Y} f_2 is conditionally Hilbert-Schmidt with norm at most 1. \diamond

Observe from (11) that the S_{f,N} are averages of rank one operators arising from the functions T^n f, and so by Exercise 6 and the triangle inequality we see that the S_{f,N} are uniformly conditionally Hilbert-Schmidt. Taking weak limits using (13) (and Fatou’s lemma) we conclude that S_f is also conditionally Hilbert-Schmidt.

Next, we observe from the telescoping identity that for every h, T^h S_{f,N} - S_{f,N} T^h converges to zero in the weak operator topology (and even in the operator norm topology) as N \to \infty; taking limits, we see that S_f commutes with T. To show that S_f f is conditionally almost periodic, it thus suffices to show the following analogue of Lemma 2 from Lecture 12:

Lemma 2. Let A: L^2(X|Y) \to L^2(X|Y) be a conditionally Hilbert-Schmidt operator. Then the image of the unit ball of L^2(X|Y) under A is conditionally precompact.

Proof. We shall prove this lemma by establishing a sort of conditional singular value decomposition for A. We can normalise A to have uniform conditional Hilbert-Schmidt norm 1. We fix \varepsilon > 0, and we will also need an integer k and a small quantity \delta > 0 depending on \varepsilon to be chosen later.

We first consider the quantities |\langle Ae_1, f_1 \rangle_{X|Y}|^2 where e_1, f_1 ranges over all sub-orthonormal sets of cardinality 1. On the one hand, these quantities are bounded pointwise by 1, thanks to (13). On the other hand, observe that if |\langle Ae_1, f_1 \rangle_{X|Y}|^2 and |\langle Ae'_1, f'_1 \rangle_{X|Y}|^2 are of the above form, then so is the join \max( |\langle Ae_1, f_1 \rangle_{X|Y}|^2, |\langle Ae'_1, f'_1 \rangle_{X|Y}|^2 ), as can be seen by taking \tilde e_1 := e_1 1_E + e'_1 1_{E^c} and \tilde f_1 := f_1 1_E + f'_1 1_{E^c}, where E is the set where |\langle Ae_1,f_1\rangle_{X|Y}|^2 exceeds |\langle Ae'_1,f'_1\rangle_{X|Y}|^2. By using a maximising sequence for the quantity \int_Y |\langle Ae, f \rangle_{X|Y}|^2\ d\nu and applying joins repeatedly, we can thus (on taking limits) find a pair e_1, f_1 which is near-optimal in the sense that |\langle Ae_1, f_1 \rangle_{X|Y}|^2 \geq (1-\delta) |\langle Ae'_1, f'_1\rangle_{X|Y}|^2 a.e. for all competitors e'_1, f'_1.

Now fix e_1, f_1, and consider the quantity |\langle Ae_2, f_2 \rangle_{X|Y}|^2, where \{e_1,e_2\} and \{f_1,f_2\} are sub-orthonormal sets. By arguing as before we can find an e_2, f_2 which is near optimal in the sense that |\langle Ae_2, f_2 \rangle_{X|Y}|^2 \geq (1-\delta) |\langle Ae'_2, f'_2 \rangle_{X|Y}|^2 a.e. for all competitors e'_2, f'_2.

We continue in this fashion k times to obtain sub-orthonormal sets \{e_1,\ldots,e_k\} and \{f_1,\ldots,f_k\} with the property that |\langle Ae_i, f_i \rangle_{X|Y}|^2 \geq (1-\delta) |\langle Ae'_i, f'_i \rangle_{X|Y}|^2 whenever \{e_1,\ldots,e_{i-1},e'_i\}, \{f_1,\ldots,f_{i-1},f'_i\} are sub-orthonormal sets. On the other hand, from (13) we know that \sum_i |\langle Ae_i, f_i \rangle_{X|Y}|^2 \leq 1. From these two facts we soon conclude that |\langle Ae, f\rangle_{X|Y}|^2 \leq 1/k+O_k(\delta) a.e. whenever \{e_1,\ldots,e_k,e\} and \{f_1,\ldots,f_k,f\} are sub-orthonormal. If k, \delta are chosen appropriately we obtain |\langle Ae, f\rangle_{X|Y}| \leq \varepsilon a.e. Thus (by duality) A maps the unit ball of the orthogonal complement of the span of \{e_1,\ldots,e_k\} to the \varepsilon-neighbourhood of the span of \{f_1,\ldots,f_k\} (with notions such as orthogonality, span, and neighbourhood being defined conditionally of course, using the L^\infty(Y)-Hilbert module structure of L^2(X|Y)). From this it is not hard to establish the desired precompactness. \Box

[Update, Mar 1: Typo corrected.]

[Update, June 2 2009: Some minor changes in the proof of Lemma 1.  Thanks to Jeremy Avigad for corrections.]