Having studied compact extensions in the previous lecture, we now consider the opposite type of extension, namely that of a weakly mixing extension. Just as compact extensions are “relative” versions of compact systems, weakly mixing extensions are “relative” versions of weakly mixing systems, in which the underlying algebra of scalars ${\Bbb C}$ is replaced by $L^\infty(Y)$. As in the case of unconditionally weakly mixing systems, we will be able to use the van der Corput lemma to neglect “conditionally weakly mixing” functions, thus allowing us to lift the uniform multiple recurrence property (UMR) from a system to any weakly mixing extension of that system.

To finish the proof of the Furstenberg recurrence theorem requires two more steps. One is a relative version of the dichotomy between mixing and compactness: if a system is not weakly mixing relative to some factor, then that factor has a non-trivial compact extension. This will be accomplished using the theory of conditional Hilbert-Schmidt operators in this lecture. Finally, we need the (easy) result that the UMR property is preserved under limits of chains; this will be accomplished in the next lecture.

— Conditionally weakly mixing functions —

Recall that in a measure-preserving system $X = (X, {\mathcal X}, \mu, T)$, a function $f \in L^2(X) = L^2(X,{\mathcal X},\mu)$ is said to be weakly mixing if the squared inner products $|\langle T^n f, f \rangle_X|^2 := (\int_X T^n f \overline{f}\ d\mu)^2$ converge in the Cesàro sense, thus

$\lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} |\int_X T^n f \overline{f}\ d\mu|^2 = 0$. (1)

Now let $Y = (Y, {\mathcal Y}, \nu, S)$ be a factor of X, so that $L^\infty(Y)$ can be viewed as a subspace of $L^\infty(X)$. Recall that we have the conditional inner product $\langle f, g \rangle_{X|Y} := {\Bbb E}( f \overline{g}|Y)$ and the Hilbert module $L^2(X|Y)$ of functions f for which $\langle f, f \rangle_{X|Y}$ lies in $L^\infty(Y)$. We shall say that a function $f \in L^2(X|Y)$ is conditionally weakly mixing relative to Y if the $L^2$ norms $\|\langle T^n f, f \rangle_{X|Y}\|_{L^2(Y)}^2$ converge to zero in the Cesàro sense, thus

$\lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} \int_X |{\Bbb E}(T^n f \overline{f}|Y)|^2\ d\mu = 0$. (2)

Example 1. If $X = Y \times Z$ is a product system of the factor space $Y = (Y, {\mathcal Y}, \nu, S)$ and another system $Z = (Z, {\mathcal Z}, \rho, R)$, then a function $f(y,z) = f(z)$ of the vertical variable $z \in Z$ is weakly mixing relative to Y if and only if f(z) is weakly mixing in Z. $\diamond$

Much of the theory of weakly mixing systems extends easily to the conditionally weakly mixing case. For instance:

Exercise 1. By adapting the proof of Corollary 2 from Lecture 12, show that if $f \in L^2(X|Y)$ is conditionally weakly mixing and $g \in L^2(X|Y)$, then $\|\langle T^n f, g \rangle_{X|Y}\|_{L^2(Y)}^2$ and $\|\langle f, T^n g \rangle_{X|Y}\|_{L^2(Y)}^2$ converge to zero in the Cesàro sense. (Hint: you will need to show that expressions such as $\langle g, T^n f \rangle_{X|Y} T^n f$ converge in $L^2(X)$ in the Cesàro sense. Apply the van der Corput lemma and use the fact that $\langle g, T^n f \rangle_{X|Y}$ are uniformly bounded in $L^\infty(Y)$ by conditional Cauchy-Schwarz.) $\diamond$

Exercise 2. Show that the space of conditionally weakly mixing functions in $L^2(X|Y)$ is a module over $L^\infty(Y)$ (i.e. it is closed under addition and multiplication by the “scalars” $L^\infty(Y)$), which is also shift-invariant and topologically closed in the topology of $L^2(X|Y)$ (see Exercise 2 from Lecture 13). $\diamond$

Let us now see the first link between conditional weak mixing and conditional almost periodicity (cf. Exercise 18 from Lecture 12):

Lemma 1. If $f \in L^2(X|Y)$ is conditionally weakly mixing and $g \in L^2(X|Y)$ is conditionally almost periodic, then $\langle f, g \rangle_{X|Y} = 0$ a.e.

Proof. Since $\langle f,g \rangle_{X|Y} = T^{-n} \langle T^n f, T^n g \rangle_{X|Y}$, it will suffice to show that

$C\!-\!\sup_{n \to \infty} \|\langle T^n f, T^n g \rangle_{X|Y}\|_{L^2(Y)} = 0$. (3)

Let $\varepsilon > 0$ be arbitrary. As g is conditionally almost periodic, one can find a finitely generated module zonotope $\{ c_1 f_1 + \ldots + c_d f_d: \| c_1\|_{L^\infty(Y)},\ldots,\|c_d\|_{L^\infty(Y)} \leq 1 \}$ with $f_1,\ldots,f_d \in L^2(X|Y)$ such that all the shifts $T^n g$ lie within $\varepsilon$ (in $L^2(X|Y)$) of this zonotope. Thus (by conditional Cauchy-Schwarz) we have

$\| \langle T^n f, T^n g \rangle_{X|Y} \|_{L^2(Y)} = \| \langle T^n f, c_{1,n} f_1 + \ldots + c_{d,n} f_d \rangle_{X|Y} \|_{L^2(Y)} + O(\varepsilon)$ (4)

for all n and some $c_{1,n},\ldots,c_{d,n} \in L^\infty(Y)$ with norm at most 1. We can pull these constants out of the conditional inner product and bound the left-hand side of (4) by

$\|\langle T^n f, f_1 \rangle_{X|Y}\|_{L^2(Y)} + \ldots + \| \langle T^n f, f_1 \rangle_{X|Y}\|_{L^2(Y)} + O(\varepsilon)$. (5)

By Exercise 1, the Cesàro supremum of (5) is at most $O(\varepsilon)$. Since $\varepsilon$ is arbitrary, the claim (3) follows. $\Box$

Since all functions in $L^\infty(Y)$ are conditionally almost periodic, we conclude that every conditionally weakly mixing function f is orthogonal to $L^\infty(Y)$, or equivalently that ${\Bbb E}(f|Y) = 0$ a.e. Let us say that f has relative mean zero if the latter holds.

Definition 1. A system X is a weakly mixing extension of a factor Y if every $f \in L^2(X|Y)$ with relative mean zero is relatively weakly mixing.

Exercise 3. Show that a product $X=Y \times Z$ of a system Y with a weakly mixing system Z is always a weakly mixing extension of Y. $\diamond$

Remark 1. If X is regular, then we can disintegrate the measure $\mu$ as an average $\mu = \int_Y \mu_y d\nu(y)$, see Theorem 4 from Lecture 9. It is then possible to construct a relative product system $X \times_Y X$, which is the product system $X \times X$ but with the measure $\mu \times_\nu \mu := \int_Y \mu_y \times \mu_y d\nu(y)$ instead of $\mu \times \mu$. It can then be shown (cf. Exercise 9 from Lecture 12) that X is a weakly mixing extension of Y if and only if $X \times_Y X$ is ergodic; see for instance Furstenberg’s book for details. However, in these notes we shall focus instead on the more abstract operator-algebraic approach which avoids the use of disintegrations. $\diamond$

Now we show that the uniform multiple recurrence property (UMR) from Lecture 13 is preserved under weakly mixing extensions (cf. Theorem 1 from Lecture 13).

Theorem 1. Suppose that $X = (X, {\mathcal X}, \mu, T)$ is a weakly mixing extension of $Y = (Y, {\mathcal Y}, \nu, S)$. If Y obeys UMR, then so does X.

The proof of this theorem rests on the following analogue of Proposition 1 from Lecture 12:

Proposition 1. Let $a_1,\ldots,a_k \in \Bbb Z$ be distinct integers for some $k \geq 1$. Let $X = (X, {\mathcal X}, \mu, T)$ is a weakly mixing extension of $Y = (Y, {\mathcal Y}, \nu, S)$, and let $f_1,\ldots,f_k \in L^\infty(X)$ be such that at least one of $f_1,\ldots,f_k$ has relative mean zero. Then

$C\!-\!\lim_{n \to \infty} T^{a_1 n} f_1 \ldots T^{a_k n} f_k = 0$ (6)

in $L^2(X, {\mathcal X}, \mu)$.

Exercise 4. Prove Proposition 1. (Hint: modify (or “relativise”) the proof of Proposition 1 from Lecture 12.) $\diamond$

Corollary 1. Let $a_1,\ldots,a_k \in {\Bbb Z}$ be distinct integers for some $k \geq 1$. Let $X = (X, {\mathcal X}, \mu, T)$ is a weakly mixing extension of $Y = (Y, {\mathcal Y}, \nu, S)$, and let $f_1,\ldots,f_k \in L^\infty(X)$. Then

$C\!-\!\lim_{n \to \infty} \int_X T^{a_1 n} f_1 \ldots T^{a_k n} f_k\ d\mu$

$- \int_X T^{a_1 n} {\Bbb E}(f_1|Y) \ldots T^{a_k n} {\Bbb E}(f_k|Y)\ d\mu = 0$. (7)

Exercise 5. Prove Corollary 1. (Hint: adapt the proof of Corollary 2 from Lecture 12.) $\diamond$

Proof of Theorem 1. Let $f \in L^\infty(X)$ be non-negative with positive mean. Then ${\Bbb E}(f|Y) \in L^\infty(Y)$ is also non-negative with positive mean. Since Y obeys UMR, we have

$\liminf_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} {\Bbb E}(f|Y) T^n {\Bbb E}(f|Y) \ldots T^{(k-1) n} {\Bbb E}(f|Y) > 0$. (8)

Applying Corollary 1 we see that the same statement holds with ${\Bbb E}(f|Y)$ replaced by f, and the claim follows. $\Box$

Remark 2. As the above proof shows, Corollary 1 lets us replace functions in the weakly mixing extension X by their expectations in Y for the purposes of computing k-fold averages. In the notation of Furstenberg and Weiss, Corollary 1 asserts that Y is a characteristic factor of X for the average (7). The deeper structural theory of such characteristic factors (and in particular, on the minimal characteristic factor for any given average) is an active and difficult area of research, with surprising connections with Lie group actions (and in particular with flows on nilmanifolds), as well as the theory of inverse problems in additive combinatorics (and in particular to inverse theorems for the Gowers norms); see for instance the ICM paper of Kra for a survey of recent developments. The concept of a characteristic factor (or more precisely, finitary analogues of this concept) also is fundamental in my work with Ben Green on primes in arithmetic progression. $\diamond$

— The dichotomy between structure and randomness —

The remainder of this lecture is devoted to proving the following “relative” generalisation of Theorem 1 from Lecture 12, and which is a fundamental ingredient in the proof of the Furstenberg recurrence theorem:

Theorem 2. Suppose that $X = (X, {\mathcal X}, \mu, T)$ is an extension of a system $Y = (Y, {\mathcal Y}, \nu, S)$. Then exactly one of the following statements is true:

1. (Structure) X has a factor Z which is a non-trivial compact extension of Y.
2. (Randomness) X is a weakly mixing extension of Y.

As in Lecture 12, the key to proving this theorem is to show

Proposition 2. Suppose that $X = (X, {\mathcal X}, \mu, T)$ is an extension of a system $Y = (Y, {\mathcal Y}, \nu, S)$. Then a function $f \in L^2(X|Y)$ is relatively weakly mixing if and only if $\langle f, g \rangle_{X|Y} = 0$ a.e. for all relatively almost periodic g.

The “only if” part of this proposition is Lemma 1; the harder part is the “if” part, which we will prove shortly. But for now, let us see why Proposition 2 implies Theorem 2.

From Lemma 1, we already know that no non-trivial function can be simultaneously conditionally weakly mixing and conditionally almost periodic, which shows that cases 1 and 2 of Theorem 2 cannot simultaneously hold. To finish the proof of Theorem 2, suppose that X is not a weakly mixing extension of Y, thus there exists a function $f \in L^2(X|Y)$ of relative mean zero which is not weakly mixing. By Proposition 2, there must exist a relatively almost periodic $g \in L^2(X|Y)$ such that $\langle f, g\rangle_{X|Y}$ does not vanish a.e.. Since f is orthogonal to all functions in $L^\infty(Y)$, we conclude that g is not in $L^\infty(Y)$, thus we have a single relatively almost periodic function. From Exercise 6 of Lecture 13, this shows that the maximal compact extension of Y is non-trivial, and the claim follows.

It thus suffices to prove the “if” part of Proposition 2; thus we need to show that every non-conditionally-weakly-mixing function correlates with some conditionally almost periodic function. But observe that if $f \in L^2(X|Y)$ is not conditionally weakly mixing, then by definition we have

$\limsup_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} | {\Bbb E}( T^n f \overline{f} | Y ) |_{L^2(Y)}^2 > 0.$ (9)

We can rearrange this as

$\limsup_{N \to \infty} \langle S_{f,N} f, f \rangle_X > 0$. (10)

where $S_{f,N}: L^2(X|Y) \to L^2(X|Y)$ is the operator

$S_{f,N} g := \frac{1}{N} \sum_{n=0}^{N-1} {\Bbb E}( g \overline{T^n f}|Y) T^n f$. (11)

To prove Proposition 2, it thus suffices (by weak compactness) to show that

Proposition 3. (Dual functions are almost periodic) Suppose that $X = (X, {\mathcal X}, \mu, T)$ is an extension of a system $Y = (Y, {\mathcal Y}, \nu, S)$, and let $f \in L^2(X|Y)$. Let $S_f$ be any limit point of $S_{f,N}$ in the weak operator topology. Then $S_f f$ is relatively almost periodic.

Remark 3. By applying the mean ergodic theorem to the dynamical system $X \times_Y X$, one can show that the sequence $D_N$ is in fact convergent in the weak or strong operator topologies (at least when X is regular). But to avoid some technicalities we shall present an argument that does not rely on existence of a strong limit. $\diamond$

As one might expect from the experience with unconditional weak mixing, the proof of Proposition 3 relies on the theory of conditionally Hilbert-Schmidt operators on $L^2(X|Y)$. We give here a definition of such operators which is suited for our needs.

Definition 2. Let X, Y be as above. A sub-orthonormal set in $L^2(X|Y)$ is any at most countable sequence $e_\alpha \in L^2(X|Y)$ such that $\langle e_\alpha, e_\beta \rangle_{X|Y} = 0$ a.e. for all $\alpha \neq \beta$ and $\langle e_\alpha, e_\alpha \rangle_{X|Y} \leq 1$ a.e. for all $\alpha$. A linear operator $A: L^2(X|Y) \to L^2(X|Y)$ is said to be a conditionally Hilbert-Schmidt operator if we have the module property

$A(cf) = cAf$ for all $c \in L^\infty(Y)$ (12)

and the bound

$\sum_\alpha \sum_\beta |\langle A e_\alpha, f_\beta \rangle_{X,Y}|^2 \leq C^2$ a.e. (13)

for all sub-orthonormal sets $\{ e_\alpha\}$, $\{f_\beta\}$ and some constant $C>0$; the best such C is called the (uniform) conditional Hilbert-Schmidt norm $\|\|A\|_{HS(X|Y)}\|_{L^\infty(Y)}$ of A.

Remark 4. As in Lecture 12, one can also set up the concept of a tensor product of two Hilbert modules, and use that to define conditionally Hilbert-Schmidt operators in a way which does not require sub-orthonormal sets. But we will not need to do so here. One can also define a pointwise conditional Hilbert-Schmidt norm $\|A\|_{HS(X|Y)}(y)$ for each $y \in Y$, but we will not need this concept. $\diamond$

Example 2. Suppose Y is just a finite set (with the discrete $\sigma$-algebra), then X splits into finitely many fibres $\pi^{-1}(\{y\})$ with the conditional measures $\mu_y$, and $L^2(X|Y)$ can be direct sum (with the $l^\infty$ norm) of the Hilbert spaces $L^2(\mu_y)$. A conditional Hilbert-Schmidt operator is then equivalent to a family of Hilbert-Schmidt operators $A_y: L^2(\mu_y) \to L^2(\mu_y)$ for each y, with the $A_y$ uniformly bounded in Hilbert-Schmidt norm. $\diamond$

Example 3. In the skew shift example $X = ({\Bbb R}/{\Bbb Z})^2 = \{ (y,z): y, z \in {\Bbb R}/{\Bbb Z} \}$, $Y = ({\Bbb R}/{\Bbb Z})$, one can show that an operator A is conditionally Hilbert-Schmidt if and only if it takes the form $A f(y,z) = \int_{{\Bbb R}/{\Bbb Z}} K_y(z,z') f(y,z')\ dz'$ a.e. for all $f \in L^2(X|Y)$, with $\|\|A\|_{HS(X|Y)}\|_{L^\infty(Y)} = \sup_y (\int_{{\Bbb R}/{\Bbb Z}} \int_{{\Bbb R}/{\Bbb Z}} |K_y(z,z')|^2 dz dz')^{1/2}$ finite. $\diamond$

Exercise 6. Let $f_1, f_2 \in L^2(X|Y)$ with $\|f_1\|_{L^2(X|Y)}, \|f_2\|_{L^2(X|Y)} \leq 1$ a.e.. Show that the rank one operator $g \mapsto \langle g, f_1 \rangle_{X|Y} f_2$ is conditionally Hilbert-Schmidt with norm at most 1. $\diamond$

Observe from (11) that the $S_{f,N}$ are averages of rank one operators arising from the functions $T^n f$, and so by Exercise 6 and the triangle inequality we see that the $S_{f,N}$ are uniformly conditionally Hilbert-Schmidt. Taking weak limits using (13) (and Fatou’s lemma) we conclude that $S_f$ is also conditionally Hilbert-Schmidt.

Next, we observe from the telescoping identity that for every h, $T^h S_{f,N} - S_{f,N} T^h$ converges to zero in the weak operator topology (and even in the operator norm topology) as $N \to \infty$; taking limits, we see that $S_f$ commutes with T. To show that $S_f f$ is conditionally almost periodic, it thus suffices to show the following analogue of Lemma 2 from Lecture 12:

Lemma 2. Let $A: L^2(X|Y) \to L^2(X|Y)$ be a conditionally Hilbert-Schmidt operator. Then the image of the unit ball of $L^2(X|Y)$ under $A$ is conditionally precompact.

Proof. We shall prove this lemma by establishing a sort of conditional singular value decomposition for A. We can normalise A to have uniform conditional Hilbert-Schmidt norm 1. We fix $\varepsilon > 0$, and we will also need an integer k and a small quantity $\delta > 0$ depending on $\varepsilon$ to be chosen later.

We first consider the quantities $|\langle Ae_1, f_1 \rangle_{X|Y}|^2$ where $e_1, f_1$ ranges over all sub-orthonormal sets of cardinality 1. On the one hand, these quantities are bounded pointwise by 1, thanks to (13). On the other hand, observe that if $|\langle Ae_1, f_1 \rangle_{X|Y}|^2$ and $|\langle Ae'_1, f'_1 \rangle_{X|Y}|^2$ are of the above form, then so is the join $\max( |\langle Ae_1, f_1 \rangle_{X|Y}|^2, |\langle Ae'_1, f'_1 \rangle_{X|Y}|^2 )$, as can be seen by taking $\tilde e_1 := e_1 1_E + e'_1 1_{E^c}$ and $\tilde f_1 := f_1 1_E + f'_1 1_{E^c}$, where E is the set where $|\langle Ae_1,f_1\rangle_{X|Y}|^2$ exceeds $|\langle Ae'_1,f'_1\rangle_{X|Y}|^2$. By using a maximising sequence for the quantity $\int_Y |\langle Ae, f \rangle_{X|Y}|^2\ d\nu$ and applying joins repeatedly, we can thus (on taking limits) find a pair $e_1, f_1$ which is near-optimal in the sense that $|\langle Ae_1, f_1 \rangle_{X|Y}|^2 \geq (1-\delta) |\langle Ae'_1, f'_1\rangle_{X|Y}|^2$ a.e. for all competitors $e'_1, f'_1$.

Now fix $e_1, f_1$, and consider the quantity $|\langle Ae_2, f_2 \rangle_{X|Y}|^2$, where $\{e_1,e_2\}$ and $\{f_1,f_2\}$ are sub-orthonormal sets. By arguing as before we can find an $e_2, f_2$ which is near optimal in the sense that $|\langle Ae_2, f_2 \rangle_{X|Y}|^2 \geq (1-\delta) |\langle Ae'_2, f'_2 \rangle_{X|Y}|^2$ a.e. for all competitors $e'_2, f'_2$.

We continue in this fashion k times to obtain sub-orthonormal sets $\{e_1,\ldots,e_k\}$ and $\{f_1,\ldots,f_k\}$ with the property that $|\langle Ae_i, f_i \rangle_{X|Y}|^2 \geq (1-\delta) |\langle Ae'_i, f'_i \rangle_{X|Y}|^2$ whenever $\{e_1,\ldots,e_{i-1},e'_i\}, \{f_1,\ldots,f_{i-1},f'_i\}$ are sub-orthonormal sets. On the other hand, from (13) we know that $\sum_i |\langle Ae_i, f_i \rangle_{X|Y}|^2 \leq 1$. From these two facts we soon conclude that $|\langle Ae, f\rangle_{X|Y}|^2 \leq 1/k+O_k(\delta)$ a.e. whenever $\{e_1,\ldots,e_k,e\}$ and $\{f_1,\ldots,f_k,f\}$ are sub-orthonormal. If $k, \delta$ are chosen appropriately we obtain $|\langle Ae, f\rangle_{X|Y}| \leq \varepsilon$ a.e. Thus (by duality) A maps the unit ball of the orthogonal complement of the span of $\{e_1,\ldots,e_k\}$ to the $\varepsilon$-neighbourhood of the span of $\{f_1,\ldots,f_k\}$ (with notions such as orthogonality, span, and neighbourhood being defined conditionally of course, using the $L^\infty(Y)$-Hilbert module structure of $L^2(X|Y)$). From this it is not hard to establish the desired precompactness. $\Box$

[Update, Mar 1: Typo corrected.]

[Update, June 2 2009: Some minor changes in the proof of Lemma 1.  Thanks to Jeremy Avigad for corrections.]