Having studied compact extensions in the previous lecture, we now consider the opposite type of extension, namely that of a *weakly mixing extension*. Just as compact extensions are “relative” versions of compact systems, weakly mixing extensions are “relative” versions of weakly mixing systems, in which the underlying algebra of scalars is replaced by . As in the case of unconditionally weakly mixing systems, we will be able to use the van der Corput lemma to neglect “conditionally weakly mixing” functions, thus allowing us to lift the uniform multiple recurrence property (UMR) from a system to any weakly mixing extension of that system.

To finish the proof of the Furstenberg recurrence theorem requires two more steps. One is a relative version of the dichotomy between mixing and compactness: if a system is not weakly mixing relative to some factor, then that factor has a non-trivial compact extension. This will be accomplished using the theory of conditional Hilbert-Schmidt operators in this lecture. Finally, we need the (easy) result that the UMR property is preserved under limits of chains; this will be accomplished in the next lecture.

— Conditionally weakly mixing functions —

Recall that in a measure-preserving system , a function is said to be *weakly mixing* if the squared inner products converge in the Cesàro sense, thus

. (1)

Now let be a factor of X, so that can be viewed as a subspace of . Recall that we have the conditional inner product and the Hilbert module of functions f for which lies in . We shall say that a function is *conditionally weakly mixing* relative to Y if the norms converge to zero in the Cesàro sense, thus

. (2)

**Example 1. **If is a product system of the factor space and another system , then a function of the vertical variable is weakly mixing relative to Y if and only if f(z) is weakly mixing in Z.

Much of the theory of weakly mixing systems extends easily to the conditionally weakly mixing case. For instance:

**Exercise 1.** By adapting the proof of Corollary 2 from Lecture 12, show that if is conditionally weakly mixing and , then and converge to zero in the Cesàro sense. (*Hint*: you will need to show that expressions such as converge in in the Cesàro sense. Apply the van der Corput lemma and use the fact that are uniformly bounded in by conditional Cauchy-Schwarz.)

**Exercise 2.** Show that the space of conditionally weakly mixing functions in is a module over (i.e. it is closed under addition and multiplication by the “scalars” ), which is also shift-invariant and topologically closed in the topology of (see Exercise 2 from Lecture 13).

Let us now see the first link between conditional weak mixing and conditional almost periodicity (cf. Exercise 18 from Lecture 12):

Lemma 1.If is conditionally weakly mixing and is conditionally almost periodic, then a.e.

**Proof.** Since , it will suffice to show that

. (3)

Let be arbitrary. As g is conditionally almost periodic, one can find a finitely generated module zonotope with such that all the shifts lie within (in ) of this zonotope. Thus (by conditional Cauchy-Schwarz) we have

(4)

for all n and some with norm at most 1. We can pull these constants out of the conditional inner product and bound the left-hand side of (4) by

. (5)

By Exercise 1, the Cesàro supremum of (5) is at most . Since is arbitrary, the claim (3) follows.

Since all functions in are conditionally almost periodic, we conclude that every conditionally weakly mixing function f is orthogonal to , or equivalently that a.e. Let us say that f has relative mean zero if the latter holds.

Definition 1.A system X is aweakly mixing extensionof a factor Y if every with relative mean zero is relatively weakly mixing.

**Exercise 3.** Show that a product of a system Y with a weakly mixing system Z is always a weakly mixing extension of Y.

**Remark 1.** If X is regular, then we can disintegrate the measure as an average , see Theorem 4 from Lecture 9. It is then possible to construct a relative product system , which is the product system but with the measure instead of . It can then be shown (cf. Exercise 9 from Lecture 12) that X is a weakly mixing extension of Y if and only if is ergodic; see for instance Furstenberg’s book for details. However, in these notes we shall focus instead on the more abstract operator-algebraic approach which avoids the use of disintegrations.

Now we show that the uniform multiple recurrence property (UMR) from Lecture 13 is preserved under weakly mixing extensions (cf. Theorem 1 from Lecture 13).

Theorem 1.Suppose that is a weakly mixing extension of . If Y obeys UMR, then so does X.

The proof of this theorem rests on the following analogue of Proposition 1 from Lecture 12:

Proposition 1.Let be distinct integers for some . Let is a weakly mixing extension of , and let be such that at least one of has relative mean zero. Then(6)

in .

**Exercise 4.** Prove Proposition 1. (*Hint*: modify (or “relativise”) the proof of Proposition 1 from Lecture 12.)

Corollary 1.Let be distinct integers for some . Let is a weakly mixing extension of , and let . Then. (7)

**Exercise 5.** Prove Corollary 1. (Hint: adapt the proof of Corollary 2 from Lecture 12.)

**Proof of Theorem 1. **Let be non-negative with positive mean. Then is also non-negative with positive mean. Since Y obeys UMR, we have

. (8)

Applying Corollary 1 we see that the same statement holds with replaced by f, and the claim follows.

**Remark 2.** As the above proof shows, Corollary 1 lets us replace functions in the weakly mixing extension X by their expectations in Y for the purposes of computing k-fold averages. In the notation of Furstenberg and Weiss, Corollary 1 asserts that Y is a *characteristic factor* of X for the average (7). The deeper structural theory of such characteristic factors (and in particular, on the minimal characteristic factor for any given average) is an active and difficult area of research, with surprising connections with Lie group actions (and in particular with flows on nilmanifolds), as well as the theory of inverse problems in additive combinatorics (and in particular to inverse theorems for the Gowers norms); see for instance the ICM paper of Kra for a survey of recent developments. The concept of a characteristic factor (or more precisely, finitary analogues of this concept) also is fundamental in my work with Ben Green on primes in arithmetic progression.

— The dichotomy between structure and randomness —

The remainder of this lecture is devoted to proving the following “relative” generalisation of Theorem 1 from Lecture 12, and which is a fundamental ingredient in the proof of the Furstenberg recurrence theorem:

Theorem 2.Suppose that is an extension of a system . Then exactly one of the following statements is true:

- (Structure) X has a factor Z which is a non-trivial compact extension of Y.
- (Randomness) X is a weakly mixing extension of Y.

As in Lecture 12, the key to proving this theorem is to show

Proposition 2.Suppose that is an extension of a system . Then a function is relatively weakly mixing if and only if a.e. for all relatively almost periodic g.

The “only if” part of this proposition is Lemma 1; the harder part is the “if” part, which we will prove shortly. But for now, let us see why Proposition 2 implies Theorem 2.

From Lemma 1, we already know that no non-trivial function can be simultaneously conditionally weakly mixing and conditionally almost periodic, which shows that cases 1 and 2 of Theorem 2 cannot simultaneously hold. To finish the proof of Theorem 2, suppose that X is not a weakly mixing extension of Y, thus there exists a function of relative mean zero which is not weakly mixing. By Proposition 2, there must exist a relatively almost periodic such that does not vanish a.e.. Since f is orthogonal to all functions in , we conclude that g is *not* in , thus we have a single relatively almost periodic function. From Exercise 6 of Lecture 13, this shows that the maximal compact extension of Y is non-trivial, and the claim follows.

It thus suffices to prove the “if” part of Proposition 2; thus we need to show that every non-conditionally-weakly-mixing function correlates with some conditionally almost periodic function. But observe that if is not conditionally weakly mixing, then by definition we have

(9)

We can rearrange this as

. (10)

where is the operator

. (11)

To prove Proposition 2, it thus suffices (by weak compactness) to show that

Proposition 3.(Dual functions are almost periodic) Suppose that is an extension of a system , and let . Let be any limit point of in the weak operator topology. Then is relatively almost periodic.

**Remark 3.** By applying the mean ergodic theorem to the dynamical system , one can show that the sequence is in fact convergent in the weak or strong operator topologies (at least when X is regular). But to avoid some technicalities we shall present an argument that does not rely on existence of a strong limit.

As one might expect from the experience with unconditional weak mixing, the proof of Proposition 3 relies on the theory of conditionally Hilbert-Schmidt operators on . We give here a definition of such operators which is suited for our needs.

Definition 2.Let X, Y be as above. Asub-orthonormal setin is any at most countable sequence such that a.e. for all and a.e. for all . A linear operator is said to be aconditionally Hilbert-Schmidt operatorif we have the module propertyfor all (12)

and the bound

a.e. (13)

for all sub-orthonormal sets , and some constant ; the best such C is called the

(uniform) conditional Hilbert-Schmidt normof A.

**Remark 4. ** As in Lecture 12, one can also set up the concept of a tensor product of two Hilbert modules, and use that to define conditionally Hilbert-Schmidt operators in a way which does not require sub-orthonormal sets. But we will not need to do so here. One can also define a pointwise conditional Hilbert-Schmidt norm for each , but we will not need this concept.

**Example 2. **Suppose Y is just a finite set (with the discrete -algebra), then X splits into finitely many fibres with the conditional measures , and can be direct sum (with the norm) of the Hilbert spaces . A conditional Hilbert-Schmidt operator is then equivalent to a family of Hilbert-Schmidt operators for each y, with the uniformly bounded in Hilbert-Schmidt norm.

**Example 3. **In the skew shift example , , one can show that an operator A is conditionally Hilbert-Schmidt if and only if it takes the form a.e. for all , with finite.

**Exercise 6. **Let with a.e.. Show that the rank one operator is conditionally Hilbert-Schmidt with norm at most 1.

Observe from (11) that the are averages of rank one operators arising from the functions , and so by Exercise 6 and the triangle inequality we see that the are uniformly conditionally Hilbert-Schmidt. Taking weak limits using (13) (and Fatou’s lemma) we conclude that is also conditionally Hilbert-Schmidt.

Next, we observe from the telescoping identity that for every h, converges to zero in the weak operator topology (and even in the operator norm topology) as ; taking limits, we see that commutes with T. To show that is conditionally almost periodic, it thus suffices to show the following analogue of Lemma 2 from Lecture 12:

Lemma 2.Let be a conditionally Hilbert-Schmidt operator. Then the image of the unit ball of under is conditionally precompact.

**Proof.** We shall prove this lemma by establishing a sort of conditional singular value decomposition for A. We can normalise A to have uniform conditional Hilbert-Schmidt norm 1. We fix , and we will also need an integer k and a small quantity depending on to be chosen later.

We first consider the quantities where ranges over all sub-orthonormal sets of cardinality 1. On the one hand, these quantities are bounded pointwise by 1, thanks to (13). On the other hand, observe that if and are of the above form, then so is the join , as can be seen by taking and , where E is the set where exceeds . By using a maximising sequence for the quantity and applying joins repeatedly, we can thus (on taking limits) find a pair which is near-optimal in the sense that a.e. for all competitors .

Now fix , and consider the quantity , where and are sub-orthonormal sets. By arguing as before we can find an which is near optimal in the sense that a.e. for all competitors .

We continue in this fashion k times to obtain sub-orthonormal sets and with the property that whenever are sub-orthonormal sets. On the other hand, from (13) we know that . From these two facts we soon conclude that a.e. whenever and are sub-orthonormal. If are chosen appropriately we obtain a.e. Thus (by duality) A maps the unit ball of the orthogonal complement of the span of to the -neighbourhood of the span of (with notions such as orthogonality, span, and neighbourhood being defined conditionally of course, using the -Hilbert module structure of ). From this it is not hard to establish the desired precompactness.

[Update, Mar 1: Typo corrected.]

[Update, June 2 2009: Some minor changes in the proof of Lemma 1. Thanks to Jeremy Avigad for corrections.]

## 21 comments

Comments feed for this article

3 March, 2008 at 7:43 am

LiorI may be confused, but in the proof of Thm. 2 shouldn’t the almost periodic function g have

non-zerocorrelation with f?3 March, 2008 at 8:49 am

Terence TaoOops! You’re right of course; thanks for the correction!

5 March, 2008 at 8:46 pm

254A, Lecture 15: The Furstenberg-Zimmer structure theorem and the Furstenberg recurrence theorem « What’s new[…] be a measure-preserving system, and let be a factor. In Theorem 2 of the previous lecture, we showed that X was not a weakly mixing extension of Y, then we could find a non-trivial compact […]

6 March, 2008 at 5:18 pm

gI take it you’ve noticed the TeXbreakage? (Looks like backslashes are getting eaten somewhere.)

8 March, 2008 at 6:48 am

orrHello,

After prop. 2, you write “let us see why Proposition 2 implies Proposition 1”,

and you probably meant Theorem 2.

8 March, 2008 at 8:54 am

Terence TaoDear orr: thanks for the correction!

3 January, 2009 at 3:47 am

陶哲轩遍历论习题解答：第十四讲 « Liu Xiaochuan’s Weblog[…] (注：遍历论为陶哲轩教授于今年年初的一门课程，我尝试将所有习题做出来，这是第十四讲的六个习题。这里是本讲的链接。) […]

4 January, 2009 at 10:46 pm

liuxiaochuanDear Professor Tao:

Here are several typos:

1,Just before (2) the “L^2” should be ““.

2,the last sentence of exercise, a “the” repeated twice.

3,In Corrollary 1, should “…be such that at least one of has relative mean zero.” be deleted?

4,in (13), the sub of second sum is .

[Corrected, thanks – T.]24 October, 2009 at 11:36 am

Ergodic Ramsey Theory (by Yuri Lima) « Disquisitiones Mathematicae[…] Compact and weak-mixing […]

12 April, 2011 at 9:32 am

ERT15: Weakly mixing extensions « Disquisitiones Mathematicae[…] is the other way of defining the concepts, in terms of ergodic averages and its expectations. See this post of Terry Tao for further details on this […]

20 July, 2011 at 11:10 pm

MarcoDear Prof Tao,

the Hilbert module L^2(X|Y) is not a Hilbert space, so what kind of weak compactness are you talking about, when you say “To prove Proposition 2, it thus suffices (by weak compactness) to show that …”?

Is it true that bounded subsets in a Hilbert module are somehow relatively weakly compact?

21 July, 2011 at 7:46 am

Terence TaoOne can use the embedding of L^2(X|Y) in L^2(X) and use the weak topology of the latter. (It is not difficult to see that the closed unit ball in L^2(X|Y) is weakly closed in L^2(X).)

3 January, 2013 at 5:47 am

cuttheknotDear Professor, in the end of the proof of Lemma 2 how do we infer that ‘A maps the unit ball of the orthogonal complement of the span of to the -neighbourhood of the span of ?

Maybe this is a stupid question, but the issue is that in this Hilbert module we cannot take the usual orthogonal projections:

I mean, given we may not be able to find a such that , am I right?

Thank you for your patience.

3 January, 2013 at 9:41 am

Terence TaoWell, if one normalises v so that to only take the values 0 and 1, then one can do this, by setting as per usual. A sub-orthonormal system can always be normalised in this fashion, dividing each basis element by when the latter quantity is non-zero, and the theory continues as before. (Basically, a sub-orthonormal system, after normalisation, should be viewed as a family of orthonormal systems in which some basis elements could be zeroed out on certain fibres.)

3 January, 2013 at 11:16 am

cuttheknotThanks, indeed if I’m not wrong iff where v’ is the ‘normalized’ v, so everything is fine now.

6 April, 2015 at 6:39 am

MizarDear Professor, in the paragraph after Exercise 6 you state that is conditionally Hilbert-Schmidt.

I think that is only required to guarantee that converges to weakly in for any (for some suitable subsequence ).

If this is the case, how do you deduce the pointwise bound ?

One would need to know something like pointwise, but I fail to see why this is true.

Sorry for this terrible necroposting!

(BTW: in Proposition 3, ‘topology’ is misspelled as ‘technology’, although ‘technology’ is suggestive!)

6 April, 2015 at 7:34 am

Terence TaoThe pointwise bound can be deduced from the assertion that for any finitely supported constants that square-sum to 1 (this is the weak formulation of the bound), and any measurable in Y. This in turn already holds for the and one can then take limits using weak convergence.

(This is an instance of a more general rule of thumb, namely that any “convex” condition is likely to be preserved under any reasonable notion of a “weak” limit, basically because of the weak formulation of that condition that is available through convex duality.)

4 October, 2015 at 8:46 am

Szemerédi Theorem Part VI – Dichotomy between weak mixing and compact extension | I Can't Believe It's Not Random![…] properties of disintegration of measures). The only alternative I know of is presented in this post of […]

6 June, 2016 at 5:54 am

Michael SchnurrCorrect me if I’m wrong, but I believe you made a typo in (2) where you wrote out the limit for the definition of a conditionally/relatively weakly mixing function . Since will in particular be a function of , then shouldn’t the integral be over and with respect to ?

I wanted to be certain, because I’m currently writing up a result about weak mixing extensions, using the definition of use here, and I want to be absolutely certain I’m using the correct definition

6 June, 2016 at 6:50 am

Michael SchnurrAfter thinking about it some more, I think it might not be a typo, but, well… I’m viewing my systems as lying in a set Z= X \times Y, and then my systems are extensions of systems on X, though the natural projection. So while in general it might make sense to be integrating over Z (in my notation), should I not be integrating over X?

6 June, 2016 at 1:48 pm

Terence TaoIn this post I am identifying with a subspace of , so either integral would be appropriate here.