We now begin our study of *measure-preserving systems* , i.e. a probability space together with a probability space isomorphism (thus is invertible, with T and both being measurable, and for all and all n). For various technical reasons it is convenient to restrict to the case when the -algebra is separable, i.e. countably generated. One reason for this is as follows:

**Exercise 1.** Let be a probability space with separable. Then the Banach spaces are separable (i.e. have a countable dense subset) for every ; in particular, the Hilbert space is separable. Show that the claim can fail for . (We allow the spaces to be either real or complex valued, unless otherwise specified.)

**Remark 1. **In practice, the requirement that be separable is not particularly onerous. For instance, if one is studying the recurrence properties of a function on a non-separable measure-preserving system , one can restrict to the separable sub--algebra generated by the level sets for integer n and rational q, thus passing to a separable measure-preserving system on which f is still measurable. Thus we see that in many cases of interest, we can immediately reduce to the separable case. (In particular, for many of the theorems in this course, the hypothesis of separability can be dropped, though we won’t bother to specify for which ones this is the case.)

We are interested in the recurrence properties of sets or functions . The simplest such recurrence theorem is

Theorem 1.(Poincaré recurrence theorem) Let be a measure-preserving system, and let be a set of positive measure. Then . In particular, has positive measure (and is thus non-empty) for infinitely many n.

(Compare with Theorem 1 of Lecture 3.)

**Proof.** For any integer , observe that , and thus by Cauchy-Schwarz

(1)

The left-hand side of (1) can be rearranged as

(2)

On the other hand, . From this one easily obtains the asymptotic

(3)

where o(1) denotes an expression which goes to zero as N goes to infinity. Combining (1), (2), (3) and taking limits as we obtain

as desired.

**Remark 2.** In classical physics, the evolution of a physical system in a compact phase space is given by a (continuous-time) measure-preserving system (this is Hamilton’s equations of motion combined with Liouville’s theorem). The Poincaré recurrence theorem then has the following unintuitive consequence: every collection E of states of positive measure, no matter how small, must eventually return to overlap itself given sufficient time. For instance, if one were to burn a piece of paper in a closed system, then there exist arbitrarily small perturbations of the initial conditions such that, if one waits long enough, the piece of paper will eventually reassemble (modulo arbitrarily small error)! This seems to contradict the second law of thermodynamics, but the reason for the discrepancy is because the time required for the recurrence theorem to take effect is inversely proportional to the measure of the set E, which in physical situations is exponentially small in the number of degrees of freedom (which is already typically quite large, e.g. of the order of the Avogadro constant). This gives more than enough opportunity for Maxwell’s demon to come into play to reverse the increase of entropy. (This can be viewed as a manifestation of the curse of dimensionality.) The more sophisticated recurrence theorems we will see later have much poorer quantitative bounds still, so much so that they basically have no direct significance for any physical dynamical system with many relevant degrees of freedom.

**Exercise 2.** Prove the following generalisation of the Poincaré recurrence theorem: if is a measure-preserving system and is non-negative, then .

**Exercise 3.** Give examples to show that the quantity in the conclusion of Theorem 1 cannot be replaced by any larger quantity in general, regardless of the actual value of . (Hint: use a Bernoulli system example.)

**Exercise 4.** Using the pigeonhole principle instead of the Cauchy-Schwarz inequality (and in particular, the statement that if , then the sets cannot all be disjoint), prove the weaker statement that for any set E of positive measure in a measure-preserving system, the set is non-empty for infinitely many n. (This exercise illustrates the general point that the Cauchy-Schwarz inequality can be viewed as a quantitative strengthening of the pigeonhole principle.)

For this lecture and the next we shall study several variants of the Poincaré recurrence theorem. We begin by looking at the mean ergodic theorem, which studies the limiting behaviour of the ergodic averages in various spaces, and in particular in .

— Hilbert space formulation —

We begin with the Hilbert space formulation of the mean ergodic theorem, due to von Neumann.

Theorem 2.(Von Neumann ergodic theorem) Let be a unitary operator on a separable Hilbert space H. Then for every we have, (5)

where is the orthogonal projection from H to the closed subspace consisting of the U-invariant vectors.

**Proof.** We give the slick (but not particularly illuminating) proof of Riesz. It is clear that (5) holds if v is already invariant (i.e. ). Next, let W denote the (possibly non-closed) space . If Uw-w lies in W and v lies in , then by unitarity

(6)

and thus W is orthogonal to . In particular . From the telescoping identity

(7)

we conclude that (5) also holds if ; by linearity we conclude that (5) holds for all v in . A standard limiting argument (using the fact that the linear transformations and are bounded on H, uniformly in n) then shows that (5) holds for v in the closure .

To conclude, it suffices to show that the closed space is all of H. Suppose for contradiction that this is not the case. Then there exists a non-zero vector w which is orthogonal to all of . In particular, w is orthogonal to Uw – w. Applying the easily verified identity (related to the parallelogram law) we conclude that Uw=w, thus w lies in . This implies that w is orthogonal to itself and is thus zero, a contradiction.

On a measure-preserving system , the shift map is a unitary transformation on the separable Hilbert space . We conclude

Corollary 1.(mean ergodic theorem) Let be a measure-preserving system, and let . Then we have converges in norm to , where is the orthogonal projection to the space consists of the shift-invariant functions in .

**Example 4.** (Finite case) Suppose that is a finite measure-preserving system, with discrete and the uniform probability measure. Then T is a permutation on X and thus decomposes as the direct sum of disjoint cycles (possibly including trivial cycles of length 1). Then the shift-invariant functions are precisely those functions which are constant on each of these cycles, and the map replaces a function with its average value on each of these cycles. It is then an instructive exercise to verify the mean ergodic theorem by hand in this case.

**Exercise 5.** With the notation and assumptions of Corollary 1, show that the limit exists, is real, and is greater than or equal to . (Hint: the constant function lies in .) Note that this is stronger than the conclusion of Exercise 2.

Let us now give some other proofs of the von Neumann ergodic theorem. We first give a proof (close to von Neumann’s original proof) using the spectral theorem for unitary operators. This theorem asserts (among other things) that a unitary operator can be expressed in the form , where is the unit circle and is a projection-valued Borel measure on the circle. More generally, we have

(8)

and so for any vector v in H and any positive integer N

. (9)

We separate off the portion of this integral. For , we have the geometric series formula

(10)

(compare with (7)), thus we can rewrite (9) as

. (11)

Now observe (using (10)) that is bounded in magnitude by 1 and converges to zero as for any fixed . Applying the dominated convergence theorem (which requires a little bit of justification in this vector-valued case), we see that the second term in (11) goes to zero as . So we see that (9) converges to . But is just the orthogonal projection to the eigenspace of U with eigenvalue 1, i.e. the space , thus recovering the von Neumann ergodic theorem. (It is instructive to use spectral theory to interpret Riesz’s proof of this theorem and see how it relates to the argument just given.)

**Remark 3. ** The above argument in fact shows that the rate of convergence in the von Neumann ergodic theorem is controlled by the spectral gap of U – i.e. how well-separated the trivial component of the spectrum is from the rest of the spectrum. This is one of the reasons why results on spectral gaps of various operators are highly prized.

We now give another proof of Theorem 2, based on the *energy decrement method*; this proof is significantly lengthier, but is particularly well suited for conversion to finitary quantitative settings. For any positive integer N, define the averaging operators ; by the triangle inequality we see that for all v. Now we observe

Lemma 1.(Lack of uniformity implies energy decrement) Suppose . Then .

**Proof.** This follows from the identity

(12)

and the fact that has operator norm at most 1.

We now iterate this to obtain

Proposition 1.(Koopman-von Neumann type theorem) Let v be a unit vector, let , and let be a sequence of integers with . Then there exists and a decomposition where and for all .

(The letters s, r stand for “structured” and “random” (or “residual”) respectively. For more on decompositions into structured and random components, see my FOCS lecture notes.)

Proof. We perform the following algorithm:

- Initialise j := J-1, s := 0, and r := v.
- If for all then STOP. If instead for some , observe from Lemma 1 that .
- Replace r with , replace s with , and replace j with j-1. Then return to Step 2.

Observe that this procedure must terminate in at most steps (since the energy starts at 1, drops by at least at each stage, and cannot go below zero). In particular, j stays positive. Observe also that r always has norm at most 1, and thus at any given stage of the algorithm. From this and the triangle inequality one easily verifies the required claims.

Corollary 2(partial von Neumann ergodic theorem). For any vector v, the averages form a Cauchy sequence in H.

**Proof.** Without loss of generality we can take v to be a unit vector. Suppose for contradiction that was not Cauchy. Then one could find and such that (say) for all j. By sparsifying the sequence if necessary we can assume that is large compared to , and . Now we apply Proposition 1 to find and a decomposition v = s + r such that and . If is large enough depending on , we thus have , and thus by the triangle inequality, , a contradiction.

**Remark 4.** This result looks weaker than Theorem 2, but the argument is much more robust; for instance, one can modify it to establish convergence of multiple averages such as in norms for commuting shifts , which does not seem possible using the other arguments given here; see this paper of mine for details. Further quantitative analysis of the mean ergodic theorem can be found in this paper of Avigad, Gerhardy, and Towsner.

Corollary 2 can be used to recover Theorem 2 in its full strength, by combining it with a weak form of Theorem 2:

Proposition 2(Weak von Neumann ergodic theorem) The conclusion (5) of Theorem 2 holds in the weak topology.

**Proof.** The averages lie in a bounded subset of the separable Hilbert space H, and are thus precompact in the weak topology by the sequential Banach-Alaoglu theorem. Thus, if (5) fails, then there exists a subsequence which converges in the weak topology to some limit w other than . By telescoping series we see that , and so on taking limits we see that , i.e. . On the other hand, if y is any vector in , then , and thus on taking inner products with v we obtain . Taking limits we obtain , i.e. v-w is orthogonal to . These facts imply that , giving the desired contradiction.

— Conditional expectation —

We now turn away from the abstract Hilbert approach to the ergodic theorem (which is excellent for proving the mean ergodic theorem, but not flexible enough to handle more general ergodic theorems) and turn to a more measure-theoretic dynamics approach, based on manipulating the four components of the underlying system separately, rather than working with the single object (with the unitary shift T). In particular it is useful to replace the -algebra by a sub--algebra , thus reducing the number of measurable functions. This creates an isometric embedding of Hilbert spaces

(13)

and so the former space is a closed subspace of the latter. In particular, we have an orthogonal projection , which can be viewed as the adjoint of the inclusion (13). In other words, for any , is the unique element of such that

(14)

for all . (A reminder: when dealing with spaces, we identify any two functions which agree -almost everywhere. Thus, technically speaking, elements of spaces are not actually functions, but rather equivalence classes of functions.)

**Example 5**. (Finite case) Let X be a finite set, thus can be viewed as a partition of X, and is a coarser partition of X. To avoid degeneracies, assume that every point in X has positive measure with respect to . Then an element f of is just a function which is constant on each atom of . Similarly for . The conditional expectation is then the function whose value on each atom A of is equal to the average value on that atom. (What needs to be changed here if some points have zero measure?)

We leave the following standard properties of conditional expectation as an exercise.

**Exercise 6.** Let be a probability space, and let be a sub--algebra. Let .

- The operator is a bounded self-adjoint projection on . It maps real functions to real functions, it preserves constant functions (and more generally preserves -measurable functions), and commutes with complex conjugation.
- If f is non-negative, then is non-negative (up to sets of measure zero, of course). More generally, we have a
*comparison principle*: if f, g are real-valued and pointwise a. e., then a.e. Similarly, we have the*triangle inequality*a.e.. - (Module property) If , then a.e..
- (Contraction) If for some , then . (Hint: do the p=1 and cases first.) This implies in particular that conditional expectation has a unique continuous extension to for (the case is exceptional, but note that is contained in since is finite).

For applications to ergodic theory, we will only be interested in taking conditional expectations with respect to a *shift-invariant* sub--algebra , thus and preserve . In that case T preserves , and thus T commutes with conditional expectation, or in other words that

(15)

a.e. for all and all n.

Now we connect conditional expectation to the mean ergodic theorem. Let be the set of essentially shift-invariant sets. One easily verifies that this is a shift-invariant sub--algebra of .

**Exercise 7.** Show that if E lies in , then there exists a set which is genuinely invariant (TF=F) and which differs from E only by a set of measure zero. Thus it does not matter whether we deal with shift-invariance or essential shift-invariance here. (More generally, it will not make any significant difference if we modify any of the sets in our -algebras by null sets.)

The relevance of this algebra to the mean ergodic theorem arises from the following identity:

**Exercise 8. **Show that .

As a corollary of this and Corollary 1, we have

Corollary 2.(Mean ergodic theorem, again) Let be a measure-preserving system. Then for any , the averages converge in norm to .

**Exercise 9. **Show that Corollary 2 continues to hold if is replaced throughout by for any . (Hint: for the case , use that is dense in . For the case , use that is dense in .) What happens when ?

Let us now give another proof of Corollary 2 (leading to a fourth proof of the mean ergodic theorem). The key here will be the decomposition , where is the “structured” part of f (at least as far as the mean ergodic theorem is concerned) and is the “random” part. (The subscripts stand for “anti-uniform” and “uniform” respectively; this notation is not standard.) As is shift-invariant, we clearly have

(16)

so it suffices to show that

(17)

as . But we can expand out the left-hand side (using the unitarity of T) as

(18)

where is the *dual function* of f_U, defined as

. (19)

Now, from the triangle inequality we know that the sequence of dual functions is uniformly bounded in norm, and so by Cauchy-Schwarz we know that the inner products are bounded. If they converge to zero, we are done; otherwise, by the Bolzano-Weierstrass theorem, we have for some subsequence and some non-zero c.

(One could also use ultrafilters instead of subsequences here if desired, it makes little difference to the argument.) By the Banach-Alaoglu theorem (or more precisely, the sequential version of this in the separable case), there is a further subsequence which converges weakly (or equivalently in this Hilbert space case, in the weak-* sense) to some limit . Since c is non-zero, must also be non-zero. On the other hand, from telescoping series one easily computes that decays like O(1/N) as , so on taking limits we have . In other words, lies in .

On the other hand, by construction of we have . From (15) and linearity we conclude that for all N, so on taking limits we have . But since is already in , we conclude , a contradiction.

**Remark 5. **This argument is lengthier than some of the other proofs of the mean ergodic theorem, but it turns out to be fairly robust; it demonstrates (using the compactness properties of certain “dual functions”) that a function with sufficiently strong “mixing” properties (in this case, we require that ) will cancel itself out when taking suitable ergodic averages, thus reducing the study of averages of f to the study of averages of . In the modern jargon, this means that is (the -algebra induced by) a characteristic factor of the ergodic average . We will see further examples of characteristic factors for other averages later in this course.

**Exercise 10. ** Let be a countably infinite discrete group. A *Følner sequence* is a sequence of increasing finite non-empty sets in with with the property that for any given finite set , we have as , where is the product set of and S, denotes the cardinality of , and denotes symmetric difference. (For instance, in the case , the sequence is a Følner sequence.) If acts (on the left) in a measure-preserving manner on a probability space , and , show that converges in to , where is the collection of all measurable sets which are -invariant modulo null sets, and is the function .

[*Update*, Jan 30: exercise corrected, another exercise added.]

[*Update*, Feb 1: Some corrections.]

[*Update*, Feb 4: Ergodic averages changed to sum over 0 to N-1 rather than over 1 to N.]

[*Update*, Feb 11: Discussion of the ergodic theorem in weak topologies (Proposition 2) added.]

[*Update*, Feb 21: Exercise 10 corrected.]

[*Update,* Mar 31: Exercise 5 corrected.]

[*Update*, Jun 14: Some corrections.]

## 63 comments

Comments feed for this article

3 November, 2018 at 11:31 pm

Maths studentUnfortunately, there seems to have been a flaw in my argument, but I believe also that I see how it may be corrected.

4 November, 2018 at 12:09 am

Maths studentApparently, the “slick” proof is the best way forward, because it seems as though no matter what, one needs to exploit the dychotomy between being in the invariant space and being orthogonal to it. Too bad that that one computation I made is not valid…

30 November, 2020 at 11:59 pm

AdamDear professor

I read your proof of theorem 1, and I have one question. Why inequality (3) is true?. I have tried to prove it using intuition, that the greatest element of a given set is greater than arithmetic mean of this set. I would be grateful if you gave me at least a hint.

Regards

4 December, 2020 at 12:43 pm

Terence TaoHint: For any one has such that for all except possibly for those pairs for which , for which one can instead use the trivial bound .