In the previous lecture, we studied the recurrence properties of compact systems, which are systems in which all measurable functions exhibit almost periodicity – they almost return completely to themselves after repeated shifting. Now, we consider the opposite extreme of mixing systems – those in which all measurable functions (of mean zero) exhibit mixing – they become orthogonal to themselves after repeated shifting. (Actually, there are two different types of mixing, strong mixing and weak mixing, depending on whether the orthogonality occurs individually or on the average; it is the latter concept which is of more importance to the task of establishing the Furstenberg recurrence theorem.)
We shall see that for weakly mixing systems, averages such as can be computed very explicitly (in fact, this average converges to the constant ). More generally, we shall see that weakly mixing components of a system tend to average themselves out and thus become irrelevant when studying many types of ergodic averages. Our main tool here will be the humble Cauchy-Schwarz inequality, and in particular a certain consequence of it, known as the van der Corput lemma.
— Mixing functions —
Much as compact systems were characterised by their abundance of almost periodic functions, we will characterise mixing systems by their abundance of mixing functions (this is not standard terminology). To define and motivate this concept, it will be convenient to introduce a weak notion of convergence (this notation is also not standard):
Definition 1. (Cesàro convergence) A sequence in a normed vector space is said to converge in the Cesàro sense to a limit c if the averages converge strongly to c, in which case we write . We also write (thus if and only if ).
Example 1. The sequence has a Cesàro limit of 1/2.
Exercise 1. Let be a bounded sequence of non-negative numbers. Show that the following three statements are equivalent:
- converges to zero in density. [We say converges in density to c if for any , the set has upper density zero.]
Which of the implications between 1, 2, 3 remain valid if is not bounded? Let be a measure-preserving system, and let be a function. We consider the correlation coefficients as n goes to infinity. Note that we have the symmetry , so we only need to consider the case when n is positive. The mean ergodic theorem (Corollary 2 from Lecture 8) tells us the Cesàro behaviour of these coefficients. Indeed, we have
where is the -algebra of essentially shift-invariant sets. In particular, if the system is ergodic, and f has mean zero (i.e. ), then we have
thus the correlation coefficients go to zero in the Cesàro sense. However, this does not necessarily imply that these coefficients go to zero pointwise. For instance, consider a circle shift system with irrational (and with uniform measure), thus this system is ergodic by Exercise 5 from Lecture 9. Then the function has mean zero, but one easily computes that . The coefficients converge in the Cesàro sense to zero, but have magnitude 1 and thus do not converge to zero pointwise.
Definition 2. (Mixing) Let be a measure-preserving system. A function is strongly mixing if , and weakly mixing if .
Remark 1. Clearly strong mixing implies weak mixing. From (1) we also see that if f is weakly mixing, then must vanish a.e.
Exercise 2. Show that if f is both almost periodic and weakly mixing, then it must be 0 almost everywhere. In particular, in a compact system, the only weakly mixing function is 0 (up to a.e. equivalence).
Exercise 3. In any Bernoulli system with the product -algebra and a product measure, and the standard shift, show that any function of mean zero is strongly mixing. (Hint: first do this for functions that depend on only finitely many of the variables.)
Exercise 4. Consider a skew shift system with the usual Lebesgue measure and Borel -algebra, and with irrational. Show that the function is neither strongly mixing nor weakly mixing, but that the function is both strongly mixing and weakly mixing.
Exercise 5. Let be given the product Borel -algebra and the shift . For each , let be the probability distribution in X of the random sequence given by the rule
where the are iid standard complex Gaussians (thus each w has probability distribution ). Show that each is shift invariant. If is a vague limit point of the sequence , and is the function defined as , show that f is weakly mixing but not strongly mixing (and more specifically, that stays bounded away from zero) with respect to the system .
Remark 2. Exercise 5 illustrates an important point, namely that stationary processes yield a rich source of measure-preserving systems (indeed the two notions are almost equivalent in some sense, especially after one distinguishes a specific function f on the measure-preserving system). However, we will not adopt this more probabilistic perspective to ergodic theory here.
Remark 3. We briefly discuss the finitary analogue of the weak mixing concept in the context of functions on a large cyclic group with the usual shift . Then one can compute
where are the Fourier coefficients of f. Comparing this against the Plancherel identity we thus see that a function f bounded in norm should be considered “weakly mixing” if it has no large Fourier coefficients. Contrast this with Remark 7 from Lecture 11.
Now let us see some consequences of the weak mixing property. We need the following lemma, which gives a useful criterion as to whether a sequence of bounded vectors in a Hilbert space converges in the Cesàro sense to zero.
Lemma 1 (van der Corput lemma). Let be a bounded sequence of vectors in a Hilbert space H. If
Informally, this lemma asserts that if each vector in a bounded sequence tends to be orthogonal to nearby elements in that sequence, then the vectors will converge to zero in the Cesàro sense. This formulation of the lemma is essentially the version in this paper by Bergelson, except that we have made the minor change of replacing one of the Cesàro limits with a Cesàro supremum.
Proof. We can normalise so that for all n. In particular, we have , where O(1) denotes a vector of bounded magnitude. For any h and , we thus have the telescoping identity
averaging this over all h from 0 to H-1 for some , we obtain
by the triangle inequality we thus have
where the O() terms are now scalars rather than vectors. We square this (using the crude inequality ) and apply Cauchy-Schwarz to obtain
which we rearrange as
We take limits as (keeping H fixed for now) to conclude
Another telescoping argument (and symmetry) gives us
Taking limits as and using (5) we obtain the claim.
Exercise 6. Let be a polynomial with at least one irrational non-constant coefficient. Using Lemma 1 (in the scalar case ) and an induction on degree, show that . Conclude that the sequence is uniformly distributed with respect to uniform measure (see Lecture 9 for a definition of uniform distribution).
Exercise 7. Using Exercise 6, give another proof of Theorem 1 from Lecture 6.
We now apply the van der Corput lemma to weakly mixing functions.
Corollary 1. Let be a measure-preserving system, and let be weakly mixing. Then for any we have and .
Proof. We just prove the first claim, as the second claim is similar. By Exercise 1, it suffices to show that
as . The left-hand side can be rewritten as
so by Cauchy-Schwarz it suffices to show that
Applying the van der Corput lemma and discarding the bounded coefficients , it suffices to show that
But , and the claim now follows from the weakly mixing nature of f.
— Weakly mixing systems —
Now we consider systems which are full of mixing functions.
Definition 3. (Mixing systems) A measure-preserving system is weakly mixing (resp. strongly mixing) if every function with mean zero is weakly mixing (resp. strongly mixing).
Example 2. From Exercise 2, we know that any system with a non-trivial Kronecker factor is not weakly mixing (and thus not strongly mixing). On the other hand, from Exercise 3, we know that any Bernoulli system is strongly mixing (and thus weakly mixing also). From Remark 1 we see that any strongly or weakly mixing system must be ergodic.
Exercise 8. Show that the system in Exercise 5 is weakly mixing but not strongly mixing.
Here is another characterisation of weak mixing:
Exercise 9. Let be a measure preserving system. Show that the following are equivalent:
- is weakly mixing.
- For every , converges in density to . (See Exercise 1 for a definition of convergence in density.)
- For any measurable , converges in density to .
- The product system is ergodic.
[Hints: To equate 1 and 2, use the decomposition of a function into its mean and mean-free components. To equate 2 and 4, use the fact that the space is spanned (in the topological vector space sense) by tensor products with .]
Exercise 10. Show that the equivalences between 1, 2, 3 in Exercise 9 remain if “weak mixing” and “converges in density” are replaced by “strong mixing” and “converges” respectively.
Exercise 11. Let be any minimal topological system with Borel -algebra , and let be a shift invariant Borel probability measure. Show that if is weakly mixing (resp. strongly mixing), then is topologically weakly mixing (resp. topologically mixing), as defined in Definition 3 and Exercise 12 of Lecture 7.
Exercise 12. If is weakly mixing, show that is weakly mixing for any non-zero n.
Exercise 13. Let be a measure preserving system. Show that the following are equivalent:
- is weakly mixing.
- Whenever is ergodic, the product system is ergodic.
(Hint: To obtain 1 from 2, use Exercise 9. To obtain 2 from 1, repeat the methods used to prove Exercise 9.)
Exercise 14. Show that the product of two weakly mixing systems is again weakly mixing. (Hint: use Exercises 9 and 13.)
Now we come to an important type of observation for the purposes of establishing the Furstenberg recurrence theorem: in weakly mixing systems, functions of mean zero are negligible as far as multiple averages are concerned.
Proposition 1. Let be distinct non-zero integers for some . Let be weakly mixing, and let be such that at least one of has mean zero. Then we have
Proof. We induct on k. When k=1 the claim follows from the mean ergodic theorem and Exercise 12 (recall from Example 2 that all weakly mixing systems are ergodic).
Now let and suppose that the claim has already been proven for k-1. Without loss of generality we may assume that it is which has mean zero. Applying the van der Corput lemma (Lemma 1), it suffices to show that
converges in density to zero as . But the left-hand side can be rearranged as
where . Applying Cauchy-Schwarz, it suffices to show that
converges in density to zero as .
Since is weakly mixing, the mean-zero function is weakly mixing, and so the mean of goes to zero in density as . As all functions are assumed to be bounded, we can thus subtract the mean from in (21) without affecting the desired conclusion, leaving behind the mean-zero component . But then the contribution of this expression to (21) vanishes by the induction hypothesis.
Remark 4. The key point here was that functions f of mean zero were weakly mixing and thus had the property that almost had mean zero, and were thus almost weakly mixing. One could iterate this further to investigate the behaviour of “higher derivatives” of f such as . Pursuing this analysis further leads to the Gowers-Host-Kra seminorms, which are closely related to the Gowers uniformity norms in additive combinatorics.
Corollary 2. Let be distinct integers for some , let be a weakly mixing system, and let . Then converges in the Cesáro sense to .
Note in particular that this establishes the Furstenberg recurrence theorem (Theorem 1 from Lecture 11) in the case of weakly mixing systems.
Proof. We again induct on k. The k=1 case is trivial, so suppose and the claim has already been proven for k-1. If any of the functions is constant then the claim follows from the induction hypothesis, so we may subtract off the mean from each function and suppose that all functions have mean zero. By shift-invariance we may also fix (say) to be zero. The claim now follows from Proposition 1 and Cauchy-Schwarz.
Exercise 15. Show that the Cesáro convergence in Corollary 2 can be strengthened to convergence in density. (Hint: first reduce to the mean zero case, then apply Exercise 14 to work with the product system instead.)
Exercise 16. Let be a weakly mixing system, and let have mean zero. Show that converges in the Cesáro sense in to zero. (Hint: use van der Corput and Proposition 1 or Corollary 2.)
Exercise 17. Show that Corollary 2 continues to hold if the linear polynomials are replaced by arbitrary polynomials from the integers to the integers, so long as the difference between any two of these polynomials is non-constant. (Hint: you will need the “PET induction” machinery from Exercise 3 of Lecture 5. This result was first established by Bergelson.)
— Hilbert-Schmidt operators —
We have now established the Furstenberg recurrence theorem for two distinct types of systems: compact systems and weakly mixing systems. From Example 2 we know that these systems are indeed quite distinct from each other. Here is another indication of “distinctness”:
Exercise 18. In any measure-preserving system , show that almost periodic functions and weakly mixing functions are always orthogonal to each other.
On the other hand, there are certainly systems which are neither weakly mixing nor compact (e.g. the skew shift). But we have the following important dichotomy (cf. Theorem 3 from Lecture 7):
Theorem 1. Suppose that is a measure-preserving system. Then exactly one of the following statements is true:
- (Structure) has a non-trivial compact factor.
- (Randomness) is weakly mixing.
[Note: in ergodic theory, a factor of a measure-preserving system is simply a morphism from that system to some other measure-preserving system. Unlike the case with topological dynamics, we do not need to assume surjectivity of the morphism, since in the measure-theoretic setting, the image of a morphism always has full measure.]
In Example 2 we have already shown that 1 and 2 cannot be both true; the tricky part is to show that lack of weak mixing implies a non-trivial compact factor.
In order to prove this result, we recall some standard results about Hilbert-Schmidt operators on a separable Hilbert space. (As usual, the hypothesis of separability is not absolutely essential, but is convenient to assume throughout; for instance, it assures that orthonormal bases always exist and are at most countable.) We begin by recalling the notion of a tensor product of two Hilbert spaces:
Proposition 2. Let be two separable Hilbert spaces. Then there exists another separable Hilbert space and a bilinear tensor product map such that
for all and . Furthermore, the tensor products between any orthonormal bases , of H and H’ respectively, form an orthonormal basis of .
It is easy to see that is unique up to isomorphism, and so we shall abuse notation slightly and refer to as the tensor product of H and H’.
Example 3. The tensor product of and is , with the tensor product operation . The tensor product of and is , which can be thought of as the Hilbert space of (or ) matrices, with the inner product .
Proof. Take any orthonormal bases and of H and H’ respectively, and let be the Hilbert space generated by declaring the formal quantities to be an orthonormal basis. If one then defines
for all square-summable sequences and , one easily verifies that is indeed a bilinear map that obeys (22). in particular, if and are some other orthonormal bases of respectively, then from (22) is an orthonormal set, and one can approximate any element in the original orthonormal basis to arbitrary accuracy by linear combinations from this orthonormal set, and so this set is in fact an orthonormal basis as required.
Given a Hilbert space H, define its complex conjugate to be the same set as H, but with the conjugated scalar multiplication structure and the conjugated inner product , but with all other structures unchanged. This is also a Hilbert space. (Of course, for real Hilbert spaces rather than complex, the notion of complex conjugation is trivial.)
Example 4. The conjugation map is a Hilbert space isometry between the Hilbert space and its complex conjugate.
Every element induces a bounded linear operator , defined via duality by the formula
for all . We refer to K as the kernel of . Any operator that arises in this manner is called a Hilbert-Schmidt operator from H to H’. The Hilbert space structure on the space of kernels induces an analogous Hilbert space structure on the Hilbert-Schmidt operators, leading to the Hilbert-Schmidt norm and inner product for such operators. Here are some other characterisations of this concept:
Exercise 19. Let be Hilbert spaces with orthonormal bases and respectively, and let be a bounded linear operator. Show that the following are equivalent:
- T is a Hilbert-Schmidt operator.
Also, show that if are Hilbert-Schmidt operators, then
As one consequence of the above exercise, we see that the Hilbert-Schmidt norm controls the operator norm, thus for all vectors v.
Remark 5. From this exercise and Fatou’s lemma, we see in particular that the limit (in either the norm, strong or weak operator topologies) of a sequence of Hilbert-Schmidt operators with uniformly bounded Hilbert-Schmidt norm, is still Hilbert-Schmidt. We also see that the composition of a Hilbert-Schmidt operator with a bounded operator is still Hilbert-Schmidt (thus the Hilbert-Schmidt operators can be viewed as a closed two-sided ideal in the space of bounded operators).
Example 5. An operator is Hilbert-Schmidt if and only if it takes the form for some kernel , in which case the Hilbert-Schmidt norm is . The Hilbert-Schmidt inner product is defined similarly.
Example 6. The identity operator on an infinite-dimensional Hilbert space is never Hilbert-Schmidt, despite being bounded. On the other hand, every finite rank operator is Hilbert-Schmidt.
One of the key properties of Hilbert-Schmidt operators which will be relevant to us is the following.
Proof. Let be arbitrary. By Exercise 19 and monotone convergence, we can find a finite orthonormal set such that , and in particular that for any orthogonal to . As a consequence, the image of the unit ball of H under T lies within of the image of the unit ball of the finite-dimensional space . This image is therefore totally bounded and thus precompact.
The following exercise may help illuminate the distinction between bounded operators, Hilbert-Schmidt operators, and compact operators:
Exercise 20. Let be a sequence of complex numbers, and consider the diagonal operator on .
- Show that T is a well-defined bounded linear operator on if and only if the sequence is bounded.
- Show that T is Hilbert-Schmidt if and only if the sequence is square-summable.
- Show that T is compact if and only if the sequence goes to zero as .
Now we apply the above theory to establish Theorem 1. Let be a measure-preserving system, and let . The rank one operators can easily be verified to have a Hilbert-Schmidt norm of , and so by the triangle inequality, their averages have a Hilbert-Schmidt norm of at most . On the other hand, from the identity
and the mean ergodic theorem (applied to the product space) we see that converges in the weak operator norm to some limit , which is then also Hilbert-Schmidt by Remark 5, and thus compact by Lemma 2. (Actually, converges to in the Hilbert-Schmidt norm, and thus also in the operator norm and in the strong topology: this is another application of the mean ergodic theorem, which we leave as an exercise. Since each of the is clearly finite rank, this gives a direct proof of the compactness of .) Also, it is easy to see that is self-adjoint and commutes with T. As a consequence, we conclude that for any , the image is almost periodic (since is the image of a bounded set by the compact operator and therefore precompact).
On the other hand, observe that
Thus by Definition 2 (and Exercise 1), we see that whenever f is not weakly mixing. In particular, f is not orthogonal to the almost periodic function . From this and Exercise 18, we have thus shown
Proposition 3. (Dichotomy between structure and randomness) Let be a measure-preserving system. A function is weakly mixing if and only if it is orthogonal to all almost periodic functions (or equivalently, orthogonal to all eigenfunctions).
Remark 6. Interestingly, essentially the same result appears in the spectral and scattering theory of linear Schrödinger equations, which in that context is known as the “RAGE theorem” (after Ruelle, Amrein-Georgescu, and Enss).
Remark 7. The finitary analogue of the expression is the dual function (of order 2) of f (the dual function of order 1 was briefly discussed in Lecture 8). If we are working on with the usual shift, then can be viewed as a Fourier multiplier which multiplies the Fourier coefficient at by ; informally, filters out all the low amplitude frequencies of f, leaving only a handful of high-amplitude frequencies.
Recall from Proposition 2 and Exercise 5 of Lecture 11 that a function is almost periodic if and only if it is -measurable, or if it lies in the pure point component of the shift operator T. We thus have
Corollary 3. (Koopman-von Neumann theorem) Let be a measure-preserving system, and let . Let be the -algebra generated by the eigenfunctions of T.
- f is almost periodic if and only if if and only if .
- f is weakly mixing if and only if a.e. if and only if (corresponding to the continuous spectrum of T).
- In general, f has a unique decomposition into an almost periodic function and a weakly mixing function . Indeed, and .
Theorem 1 follows immediately from this Corollary. Indeed, if a system is not weakly mixing, then by the above Corollary we see that is non-trivial, and the identity map from to yields a non-trivial compact factor.
— Roth’s theorem —
As a quick application of the above machinery we give a proof of Roth’s theorem. We first need a variant of Corollary 2, which is proven by much the same means:
Exercise 21. Let be an ergodic measure-preserving system, let be distinct integers, and let with at least one of weakly mixing. Show that .
Theorem 2 (Roth’s theorem). Let be an ergodic measure-preserving system, and let be non-negative with . Then
Proof. We decompose as in Corollary 3. The contribution of is negligible by Exercise 21, so it suffices to show that
But as is almost periodic, the claim follows from Proposition 1 of Lecture 11.
One can then immediately establish the k=3 case of Furstenberg’s theorem (Theorem 2 from Lecture 10) by combining the above result with the ergodic decomposition (Proposition 4 from Lecture 9). The k=3 case of Szemerèdi’s theorem (i.e. Roth’s theorem) then follows from the Furstenberg correspondence principle (see Lecture 10).
Exercise 22. Let be a measure-preserving system, and let be non-negative. Show that for every , one has for infinitely many n. (Hint: first show this when f is almost periodic, and then use Corollary 1 and Corollary 3 to prove the general case.) This is a simplified version of the Khintchine recurrence theorem, which asserts that the set of such n is not only infinite, but is also syndetic. Analogues of the Khintchine recurrence theorem hold for double recurrence but not for triple recurrence; see this paper of Bergelson, Host, and Kra for details.
[Update, Mar 3: Added the observation that as converges to in the operator norm, is the limit of finite rank operators and thus clearly compact. Thanks to Guatam Zubin for this remark.]