In the previous lecture, we studied the recurrence properties of compact systems, which are systems in which all measurable functions exhibit almost periodicity – they almost return completely to themselves after repeated shifting. Now, we consider the opposite extreme of *mixing systems* – those in which all measurable functions (of mean zero) exhibit *mixing* – they become orthogonal to themselves after repeated shifting. (Actually, there are two different types of mixing, *strong mixing* and *weak mixing*, depending on whether the orthogonality occurs individually or on the average; it is the latter concept which is of more importance to the task of establishing the Furstenberg recurrence theorem.)

We shall see that for weakly mixing systems, averages such as can be computed very explicitly (in fact, this average converges to the constant ). More generally, we shall see that weakly mixing components of a system tend to average themselves out and thus become irrelevant when studying many types of ergodic averages. Our main tool here will be the humble Cauchy-Schwarz inequality, and in particular a certain consequence of it, known as the *van der Corput lemma*.

As one application of this theory, we will be able to establish Roth’s theorem (the k=3 case of Szemerédi’s theorem).

— Mixing functions —

Much as compact systems were characterised by their abundance of almost periodic functions, we will characterise mixing systems by their abundance of mixing functions (this is not standard terminology). To define and motivate this concept, it will be convenient to introduce a weak notion of convergence (this notation is also not standard):

Definition 1.(Cesàro convergence) A sequence in a normed vector space is said toconverge in the Cesàro senseto a limit c if the averages converge strongly to c, in which case we write . We also write (thus if and only if ).

**Example 1.** The sequence has a Cesàro limit of 1/2.

**Exercise 1.** Let be a bounded sequence of *non-negative *numbers. Show that the following three statements are equivalent:

- .
- .
- converges to zero in density. [We say
*converges in density*to c if for any , the set has upper density zero.]

Which of the implications between 1, 2, 3 remain valid if is not bounded? Let be a measure-preserving system, and let be a function. We consider the *correlation coefficients* as n goes to infinity. Note that we have the symmetry , so we only need to consider the case when n is positive. The mean ergodic theorem (Corollary 2 from Lecture 8) tells us the Cesàro behaviour of these coefficients. Indeed, we have

(1)

where is the -algebra of essentially shift-invariant sets. In particular, if the system is ergodic, and f has mean zero (i.e. ), then we have

, (2)

thus the correlation coefficients go to zero in the Cesàro sense. However, this does not necessarily imply that these coefficients go to zero pointwise. For instance, consider a circle shift system with irrational (and with uniform measure), thus this system is ergodic by Exercise 5 from Lecture 9. Then the function has mean zero, but one easily computes that . The coefficients converge in the Cesàro sense to zero, but have magnitude 1 and thus do not converge to zero pointwise.

Definition 2.(Mixing) Let be a measure-preserving system. A function isstrongly mixingif , andweakly mixingif .

**Remark 1.** Clearly strong mixing implies weak mixing. From (1) we also see that if f is weakly mixing, then must vanish a.e.

**Exercise 2.** Show that if f is both almost periodic and weakly mixing, then it must be 0 almost everywhere. In particular, in a compact system, the only weakly mixing function is 0 (up to a.e. equivalence).

**Exercise 3.** In any Bernoulli system with the product -algebra and a product measure, and the standard shift, show that any function of mean zero is strongly mixing. (*Hint*: first do this for functions that depend on only finitely many of the variables.)

**Exercise 4.** Consider a skew shift system with the usual Lebesgue measure and Borel -algebra, and with irrational. Show that the function is neither strongly mixing nor weakly mixing, but that the function is both strongly mixing and weakly mixing.

**Exercise 5.** Let be given the product Borel -algebra and the shift . For each , let be the probability distribution in X of the random sequence given by the rule

, (3)

where the are iid standard complex Gaussians (thus each w has probability distribution ). Show that each is shift invariant. If is a vague limit point of the sequence , and is the function defined as , show that f is weakly mixing but not strongly mixing (and more specifically, that stays bounded away from zero) with respect to the system .

**Remark 2. ** Exercise 5 illustrates an important point, namely that stationary processes yield a rich source of measure-preserving systems (indeed the two notions are almost equivalent in some sense, especially after one distinguishes a specific function f on the measure-preserving system). However, we will not adopt this more probabilistic perspective to ergodic theory here.

**Remark 3.** We briefly discuss the finitary analogue of the weak mixing concept in the context of functions on a large cyclic group with the usual shift . Then one can compute

(4)

where are the Fourier coefficients of f. Comparing this against the Plancherel identity we thus see that a function f bounded in norm should be considered “weakly mixing” if it has no large Fourier coefficients. Contrast this with Remark 7 from Lecture 11.

Now let us see some consequences of the weak mixing property. We need the following lemma, which gives a useful criterion as to whether a sequence of bounded vectors in a Hilbert space converges in the Cesàro sense to zero.

Lemma 1 (van der Corput lemma). Let be a bounded sequence of vectors in a Hilbert space H. If(5)

then .

Informally, this lemma asserts that if each vector in a bounded sequence tends to be orthogonal to nearby elements in that sequence, then the vectors will converge to zero in the Cesàro sense. This formulation of the lemma is essentially the version in this paper by Bergelson, except that we have made the minor change of replacing one of the Cesàro limits with a Cesàro supremum.

**Proof.** We can normalise so that for all n. In particular, we have , where O(1) denotes a vector of bounded magnitude. For any h and , we thus have the telescoping identity

(6)

averaging this over all h from 0 to H-1 for some , we obtain

(7)

by the triangle inequality we thus have

(8)

where the O() terms are now scalars rather than vectors. We square this (using the crude inequality ) and apply Cauchy-Schwarz to obtain

(9)

which we rearrange as

. (10)

We take limits as (keeping H fixed for now) to conclude

. (11)

Another telescoping argument (and symmetry) gives us

(12)

and so

. (13)

Taking limits as and using (5) we obtain the claim.

**Exercise 6.** Let be a polynomial with at least one irrational non-constant coefficient. Using Lemma 1 (in the scalar case ) and an induction on degree, show that . Conclude that the sequence is uniformly distributed with respect to uniform measure (see Lecture 9 for a definition of uniform distribution).

**Exercise 7.** Using Exercise 6, give another proof of Theorem 1 from Lecture 6.

We now apply the van der Corput lemma to weakly mixing functions.

Corollary 1.Let be a measure-preserving system, and let be weakly mixing. Then for any we have and .

**Proof.** We just prove the first claim, as the second claim is similar. By Exercise 1, it suffices to show that

(14)

as . The left-hand side can be rewritten as

(15)

so by Cauchy-Schwarz it suffices to show that

(16)

Applying the van der Corput lemma and discarding the bounded coefficients , it suffices to show that

(17)

But , and the claim now follows from the weakly mixing nature of f.

— Weakly mixing systems —

Now we consider systems which are full of mixing functions.

Definition 3.(Mixing systems) A measure-preserving system isweakly mixing(resp.strongly mixing) if every function with mean zero is weakly mixing (resp. strongly mixing).

**Example 2.** From Exercise 2, we know that any system with a non-trivial Kronecker factor is not weakly mixing (and thus not strongly mixing). On the other hand, from Exercise 3, we know that any Bernoulli system is strongly mixing (and thus weakly mixing also). From Remark 1 we see that any strongly or weakly mixing system must be ergodic.

**Exercise 8. ** Show that the system in Exercise 5 is weakly mixing but not strongly mixing.

Here is another characterisation of weak mixing:

**Exercise 9.** Let be a measure preserving system. Show that the following are equivalent:

- is weakly mixing.
- For every , converges in density to . (See Exercise 1 for a definition of convergence in density.)
- For any measurable , converges in density to .
- The product system is ergodic.

[Hints: To equate 1 and 2, use the decomposition of a function into its mean and mean-free components. To equate 2 and 4, use the fact that the space is spanned (in the topological vector space sense) by tensor products with .]

**Exercise 10. **Show that the equivalences between 1, 2, 3 in Exercise 9 remain if “weak mixing” and “converges in density” are replaced by “strong mixing” and “converges” respectively.

**Exercise 11.** Let be any minimal topological system with Borel -algebra , and let be a shift invariant Borel probability measure. Show that if is weakly mixing (resp. strongly mixing), then is topologically weakly mixing (resp. topologically mixing), as defined in Definition 3 and Exercise 12 of Lecture 7.

**Exercise 12. **If is weakly mixing, show that is weakly mixing for any non-zero n.

**Exercise 13. ** Let be a measure preserving system. Show that the following are equivalent:

- is weakly mixing.
- Whenever is ergodic, the product system is ergodic.

(Hint: To obtain 1 from 2, use Exercise 9. To obtain 2 from 1, repeat the *methods* used to prove Exercise 9.)

**Exercise 14. **Show that the product of two weakly mixing systems is again weakly mixing. (Hint: use Exercises 9 and 13.)

Now we come to an important type of observation for the purposes of establishing the Furstenberg recurrence theorem: in weakly mixing systems, functions of mean zero are negligible as far as multiple averages are concerned.

Proposition 1.Let be distinct non-zero integers for some . Let be weakly mixing, and let be such that at least one of has mean zero. Then we have(18)

in .

**Proof.** We induct on k. When k=1 the claim follows from the mean ergodic theorem and Exercise 12 (recall from Example 2 that all weakly mixing systems are ergodic).

Now let and suppose that the claim has already been proven for k-1. Without loss of generality we may assume that it is which has mean zero. Applying the van der Corput lemma (Lemma 1), it suffices to show that

(19)

converges in density to zero as . But the left-hand side can be rearranged as

(20)

where . Applying Cauchy-Schwarz, it suffices to show that

(21)

converges in density to zero as .

Since is weakly mixing, the mean-zero function is weakly mixing, and so the mean of goes to zero in density as . As all functions are assumed to be bounded, we can thus subtract the mean from in (21) without affecting the desired conclusion, leaving behind the mean-zero component . But then the contribution of this expression to (21) vanishes by the induction hypothesis.

**Remark 4. ** The key point here was that functions f of mean zero were weakly mixing and thus had the property that almost had mean zero, and were thus almost weakly mixing. One could iterate this further to investigate the behaviour of “higher derivatives” of f such as . Pursuing this analysis further leads to the Gowers-Host-Kra seminorms, which are closely related to the Gowers uniformity norms in additive combinatorics.

Corollary 2.Let be distinct integers for some , let be a weakly mixing system, and let . Then converges in the Cesáro sense to .

Note in particular that this establishes the Furstenberg recurrence theorem (Theorem 1 from Lecture 11) in the case of weakly mixing systems.

**Proof.** We again induct on k. The k=1 case is trivial, so suppose and the claim has already been proven for k-1. If any of the functions is constant then the claim follows from the induction hypothesis, so we may subtract off the mean from each function and suppose that all functions have mean zero. By shift-invariance we may also fix (say) to be zero. The claim now follows from Proposition 1 and Cauchy-Schwarz.

**Exercise 15.** Show that the Cesáro convergence in Corollary 2 can be strengthened to convergence in density. (Hint: first reduce to the mean zero case, then apply Exercise 14 to work with the product system instead.)

**Exercise 16.** Let be a weakly mixing system, and let have mean zero. Show that converges in the Cesáro sense in to zero. (Hint: use van der Corput and Proposition 1 or Corollary 2.)

**Exercise 17.** Show that Corollary 2 continues to hold if the linear polynomials are replaced by arbitrary polynomials from the integers to the integers, so long as the difference between any two of these polynomials is non-constant. (Hint: you will need the “PET induction” machinery from Exercise 3 of Lecture 5. This result was first established by Bergelson.)

— Hilbert-Schmidt operators —

We have now established the Furstenberg recurrence theorem for two distinct types of systems: compact systems and weakly mixing systems. From Example 2 we know that these systems are indeed quite distinct from each other. Here is another indication of “distinctness”:

**Exercise 18. ** In any measure-preserving system , show that almost periodic functions and weakly mixing functions are always orthogonal to each other.

On the other hand, there are certainly systems which are neither weakly mixing nor compact (e.g. the skew shift). But we have the following important dichotomy (cf. Theorem 3 from Lecture 7):

Theorem 1.Suppose that is a measure-preserving system. Then exactly one of the following statements is true:

- (Structure) has a non-trivial compact factor.
- (Randomness) is weakly mixing.

[Note: in ergodic theory, a *factor* of a measure-preserving system is simply a morphism from that system to some other measure-preserving system. Unlike the case with topological dynamics, we do not need to assume surjectivity of the morphism, since in the measure-theoretic setting, the image of a morphism always has full measure.]

In Example 2 we have already shown that 1 and 2 cannot be both true; the tricky part is to show that lack of weak mixing implies a non-trivial compact factor.

In order to prove this result, we recall some standard results about Hilbert-Schmidt operators on a separable Hilbert space. (As usual, the hypothesis of separability is not absolutely essential, but is convenient to assume throughout; for instance, it assures that orthonormal bases always exist and are at most countable.) We begin by recalling the notion of a tensor product of two Hilbert spaces:

Proposition 2.Let be two separable Hilbert spaces. Then there exists another separable Hilbert space and a bilinear tensor product map such that(22)

for all and . Furthermore, the tensor products between any orthonormal bases , of H and H’ respectively, form an orthonormal basis of .

It is easy to see that is unique up to isomorphism, and so we shall abuse notation slightly and refer to as **the*** *tensor product of H and H’.

**Example 3. ** The tensor product of and is , with the tensor product operation . The tensor product of and is , which can be thought of as the Hilbert space of (or ) matrices, with the inner product .

**Proof.** Take any orthonormal bases and of H and H’ respectively, and let be the Hilbert space generated by declaring the formal quantities to be an orthonormal basis. If one then defines

(23)

for all square-summable sequences and , one easily verifies that is indeed a bilinear map that obeys (22). in particular, if and are some other orthonormal bases of respectively, then from (22) is an orthonormal set, and one can approximate any element in the original orthonormal basis to arbitrary accuracy by linear combinations from this orthonormal set, and so this set is in fact an orthonormal basis as required.

Given a Hilbert space H, define its *complex conjugate* to be the same set as H, but with the conjugated scalar multiplication structure and the conjugated inner product , but with all other structures unchanged. This is also a Hilbert space. (Of course, for real Hilbert spaces rather than complex, the notion of complex conjugation is trivial.)

**Example 4. **The conjugation map is a Hilbert space isometry between the Hilbert space and its complex conjugate.

Every element induces a bounded linear operator , defined via duality by the formula

(24)

for all . We refer to K as the *kernel* of . Any operator that arises in this manner is called a *Hilbert-Schmidt operator* from H to H’. The Hilbert space structure on the space of kernels induces an analogous Hilbert space structure on the Hilbert-Schmidt operators, leading to the Hilbert-Schmidt norm and inner product for such operators. Here are some other characterisations of this concept:

**Exercise 19.** Let be Hilbert spaces with orthonormal bases and respectively, and let be a bounded linear operator. Show that the following are equivalent:

- T is a Hilbert-Schmidt operator.
- .
- .

Also, show that if are Hilbert-Schmidt operators, then

(25)

and

. (26)

As one consequence of the above exercise, we see that the Hilbert-Schmidt norm controls the operator norm, thus for all vectors v.

**Remark 5. **From this exercise and Fatou’s lemma, we see in particular that the limit (in either the norm, strong or weak operator topologies) of a sequence of Hilbert-Schmidt operators with uniformly bounded Hilbert-Schmidt norm, is still Hilbert-Schmidt. We also see that the composition of a Hilbert-Schmidt operator with a bounded operator is still Hilbert-Schmidt (thus the Hilbert-Schmidt operators can be viewed as a closed two-sided ideal in the space of bounded operators).

**Example 5. **An operator is Hilbert-Schmidt if and only if it takes the form for some kernel , in which case the Hilbert-Schmidt norm is . The Hilbert-Schmidt inner product is defined similarly.

**Example 6. **The identity operator on an infinite-dimensional Hilbert space is never Hilbert-Schmidt, despite being bounded. On the other hand, every finite rank operator is Hilbert-Schmidt.

One of the key properties of Hilbert-Schmidt operators which will be relevant to us is the following.

Lemma 2.If is Hilbert-Schmidt, then it is compact (i.e. the image of any bounded set is precompact).

**Proof.** Let be arbitrary. By Exercise 19 and monotone convergence, we can find a finite orthonormal set such that , and in particular that for any orthogonal to . As a consequence, the image of the unit ball of H under T lies within of the image of the unit ball of the finite-dimensional space . This image is therefore totally bounded and thus precompact.

The following exercise may help illuminate the distinction between bounded operators, Hilbert-Schmidt operators, and compact operators:

**Exercise 20. **Let be a sequence of complex numbers, and consider the diagonal operator on .

- Show that T is a well-defined bounded linear operator on if and only if the sequence is bounded.
- Show that T is Hilbert-Schmidt if and only if the sequence is square-summable.
- Show that T is compact if and only if the sequence goes to zero as .

Now we apply the above theory to establish Theorem 1. Let be a measure-preserving system, and let . The rank one operators can easily be verified to have a Hilbert-Schmidt norm of , and so by the triangle inequality, their averages have a Hilbert-Schmidt norm of at most . On the other hand, from the identity

(27)

and the mean ergodic theorem (applied to the product space) we see that converges in the weak operator norm to some limit , which is then also Hilbert-Schmidt by Remark 5, and thus compact by Lemma 2. (Actually, converges to in the Hilbert-Schmidt norm, and thus also in the operator norm and in the strong topology: this is another application of the mean ergodic theorem, which we leave as an exercise. Since each of the is clearly finite rank, this gives a direct proof of the compactness of .) Also, it is easy to see that is self-adjoint and commutes with T. As a consequence, we conclude that for any , the image is almost periodic (since is the image of a bounded set by the compact operator and therefore precompact).

On the other hand, observe that

. (28)

Thus by Definition 2 (and Exercise 1), we see that whenever f is not weakly mixing. In particular, f is not orthogonal to the almost periodic function . From this and Exercise 18, we have thus shown

Proposition 3.(Dichotomy between structure and randomness) Let be a measure-preserving system. A function is weakly mixing if and only if it is orthogonal to all almost periodic functions (or equivalently, orthogonal to all eigenfunctions).

**Remark 6.** Interestingly, essentially the same result appears in the spectral and scattering theory of linear Schrödinger equations, which in that context is known as the “RAGE theorem” (after Ruelle, Amrein-Georgescu, and Enss).

**Remark 7**. The finitary analogue of the expression is the *dual function* (of order 2) of f (the dual function of order 1 was briefly discussed in Lecture 8). If we are working on with the usual shift, then can be viewed as a Fourier multiplier which multiplies the Fourier coefficient at by ; informally, filters out all the low amplitude frequencies of f, leaving only a handful of high-amplitude frequencies.

Recall from Proposition 2 and Exercise 5 of Lecture 11 that a function is almost periodic if and only if it is -measurable, or if it lies in the pure point component of the shift operator T. We thus have

Corollary 3.(Koopman-von Neumann theorem) Let be a measure-preserving system, and let . Let be the -algebra generated by the eigenfunctions of T.

- f is almost periodic if and only if if and only if .
- f is weakly mixing if and only if a.e. if and only if (corresponding to the continuous spectrum of T).
- In general, f has a unique decomposition into an almost periodic function and a weakly mixing function . Indeed, and .

Theorem 1 follows immediately from this Corollary. Indeed, if a system is not weakly mixing, then by the above Corollary we see that is non-trivial, and the identity map from to yields a non-trivial compact factor.

— Roth’s theorem —

As a quick application of the above machinery we give a proof of Roth’s theorem. We first need a variant of Corollary 2, which is proven by much the same means:

**Exercise 21. ** Let be an ergodic measure-preserving system, let be distinct integers, and let with at least one of weakly mixing. Show that .

Theorem 2(Roth’s theorem). Let be an ergodic measure-preserving system, and let be non-negative with . Then. (29)

**Proof.** We decompose as in Corollary 3. The contribution of is negligible by Exercise 21, so it suffices to show that

. (30)

But as is almost periodic, the claim follows from Proposition 1 of Lecture 11.

One can then immediately establish the k=3 case of Furstenberg’s theorem (Theorem 2 from Lecture 10) by combining the above result with the ergodic decomposition (Proposition 4 from Lecture 9). The k=3 case of Szemerèdi’s theorem (i.e. Roth’s theorem) then follows from the Furstenberg correspondence principle (see Lecture 10).

**Exercise 22. **Let be a measure-preserving system, and let be non-negative. Show that for every , one has for infinitely many n. (Hint: first show this when f is almost periodic, and then use Corollary 1 and Corollary 3 to prove the general case.) This is a simplified version of the *Khintchine recurrence theorem*, which asserts that the set of such n is not only infinite, but is also syndetic. Analogues of the Khintchine recurrence theorem hold for double recurrence but not for triple recurrence; see this paper of Bergelson, Host, and Kra for details.

[*Update*, Mar 3: Added the observation that as converges to in the operator norm, is the limit of finite rank operators and thus clearly compact. Thanks to Guatam Zubin for this remark.]

## 24 comments

Comments feed for this article

23 February, 2008 at 2:26 pm

David SpeyerI am not sure I see how Theorem 1 follows from corollary 3. The obvious conclusion from cor. 3 is that, if your dynamical system is not weakly mixing, then there is a nonconstant almost periodic function. We have to use this to build a compact factor. Looking at Theorem 3 in Lecture 7, I think the chain of reasoning is the following:

Since X is not weakly mixing, X \times X is also not weakly mixing so

there is a nonconstant almost periodic function f on X \times X. View f as a function g:X –> L^2(X), then g commutes with the translation T. Moreover, f is continuous (why?) and g(X) is compact (why?). It’s not even clear to me which of these two points uses the fact that f is almost periodic (although I am betting on the second.)

Any hints are appreciated!

23 February, 2008 at 2:34 pm

Terence TaoDear David,

In this setting, factor refers to the measure-theoretic notion of a factor rather than the topological one (despite the adjective “compact” – sorry for the confusion!). In this case, the non-trivial factor of can be taken to be simply , with the factor map being the identity. I’ll write a little bit more in the notes to clarify this.

A posteriori, one can identify the compact factor with a topological dynamical system (and in fact, with a Kronecker system), in the ergodic case at least, using Theorem 2 from Lecture 11, but it is not necessary to do so here.24 February, 2008 at 7:42 am

David SpeyerThanks, I get it now.

2 March, 2008 at 9:25 pm

254A, Lecture 14: Weakly mixing extensions « What’s new[…] versions of compact systems, weakly mixing extensions are “relative” versions of weakly mixing systems, in which the underlying algebra of scalars is replaced by . As in the case of unconditionally […]

23 December, 2008 at 1:17 am

liuxiaochuanDear Professor Tao:

About Remark 3, I computed (4) and got that the right-hand side (if I am right) is

23 December, 2008 at 8:58 am

Terence TaoDear Liuxiaochuan,

I think the formula is correct as it stands (note that the inner product is with respect to normalised counting measure on rather than counting measure, thus ).

A simple way to double-check things like powers of N is to plug in a simple example, e.g. (in which case and all other Fourier coefficients vanish).

24 December, 2008 at 4:06 am

liuxiaochuanDear Professor Tao:

Merry Christmas! Thanks for the method!

After (20), I think ( and hope I am not wrong this time) it should be .

24 December, 2008 at 8:40 am

Terence TaoActually, I think the formula is again correct as stated; here we are using the unitarity of the shift map to cancel the dependence on n. (Again, one can work with a simple example, such as or , first.)

25 December, 2008 at 7:07 am

liuxiaochuanDear Professor Tao:

I still have doubts about (20), I think your example is too simple. What if and , then the integrand would be .

My previous correction is also not thre. I think it should be .

25 December, 2008 at 9:08 am

Terence TaoAh, I see the issue now. Thanks for the correction! (Actually, it is slightly more convenient to take .)

26 December, 2008 at 7:35 am

liuxiaochuanDear Professor Tao:

Another small correction: after (27), “then also Hilbert-Schmidt by Remark 4”, where I think should be “Example 4”.

26 December, 2008 at 9:46 am

Terence TaoThanks! (Actually, it should be Remark 5.)

4 January, 2009 at 10:34 pm

liuxiaochuanDear Pressor Tao:

About the proof of Proposition 1, in the case , I didn’t see a direct way to get it just from the mean ergodic theorem. I am tying to prove it using the exercises.

In 2 of exercise 9, is the inner product be or just ? If it is the former, I think the conclusion should become . I got confused a little.

About exercise 12, I didn’t find a direct way to prove it, except using both exercise 9 and exercise 13, chould you give me a hint?

4 January, 2009 at 11:23 pm

Terence TaoThe k=1 case of Proposition 1 follows from the mean ergodic theorem combined with Exercise 12 (to ensure that is ergodic).

Yes, there should be a conjugate for the g’s in the inner product and in Exercise 9.

Exercise 12 can be proven using Exercise 9.

20 February, 2010 at 7:56 am

MartinoDear Professor Tao:

I cannot see how to solve exercise 21. Could you give me a hint?

Also, do you know if a proof of Roth’s theorem, similar to the one given here, can be given using the characterization of weakly mixing and almost periodic functions in terms of essential idempotent ultrafilters? (a function f is almost periodic iff p-lim T^n f = f for some, or equivalently every, idempotent ultrafilter p every whose element has positive Banach density, and is weakly mixing iff for some, or equivalently every, idempotent ultrafilter p as above, p-lim T^n f = 0 weakly)

20 February, 2010 at 8:30 am

Terence TaoTry the model case when a_1=0, a_2=1, a_3=2, and f_3 is weakly mixing. In this case one can pull f_1 out of the average, and it would suffice to show that the Cesaro averages of go to zero in norm. To establish this, apply the van der Corput lemma.

As for your second question, there is a paper of Bergelson and McCutcheon which follows this approach, see

http://muse.jhu.edu/journals/american_journal_of_mathematics/v129/129.5bergelson.pdf

although the Roth-type theorem they obtain is slightly different from the formulation given here.

2 June, 2010 at 5:53 am

liuxiaochuanDear Martino:

I got stuck at exercise 21 at first, too. I overlooked the condition that the system is ergodic.

2 June, 2010 at 6:29 am

liuxiaochuanDear Professor Tao:

There is a small typo, in exercise 15 and before exercise 21, you actually mean Corollary 2, right?

[Corrected, thanks – T.]11 June, 2010 at 5:05 am

ERT12: Kronecker factor – coexistence of compact and weak mixing behaviour « Disquisitiones Mathematicae[…] interested reader is invited to read this post of Terence […]

18 June, 2010 at 2:53 am

Solutions to Ergodic Theory：Lecture twelve « Xiaochuan Liu's Weblog[…] note:：Professor Terence Tao Began posting his lecture notes about the course “ergodic theory” early in 2008. I try doing all the exercises. This is the Twenty-two exercises in lecture twelve. Here is the the page of this course，and here is the page of this lecture. […]

10 July, 2011 at 9:09 am

Lyapunov spectrum of the Kontsevich-Zorich cocycle on the Hodge bundle over square-tiled cyclic covers IV « Disquisitiones Mathematicae[…] and almost every vertical translation flow (on genus translation surfaces) are weakly mixing. Here, we say that an i.e.t. corresponds to a rotation if its combinatorial data has the form […]

20 October, 2012 at 8:02 am

Lyapunov spectrum of the Kontsevich-Zorich cocycle on the Hodge bundle over square-tiled cyclic covers V « Disquisitiones Mathematicae[…] that a dynamical system preserving a probability is weak mixing […]

19 September, 2017 at 7:59 am

hcIs there any reason that Cesaro mean plays a special role here? I can see this arise historically (e.g. in von Neumann’s ergodic theorem), but does other kind of summability (such as Abel summation, or Riesz mean) play a role in ergodic theory as well? Thanks!

19 September, 2017 at 10:45 am

Terence TaoMany other ergodic theorems rely in one form or another on the classic ergodic theorems (mean, pointwise, or maximal), which all use Cesaro type averaging. Once one can control Cesaro means one can also control many other averages (such as Abel or Riesz) by summation by parts, so there is not much benefit in moving to these smoothing means. But there is certainly a lot of literature on working with ergodic theorems involving more exotic and difficult notions of averaging (e.g. averaging over primes, values of a polynomial, etc.).