We now begin our study of *measure-preserving systems* , i.e. a probability space together with a probability space isomorphism (thus is invertible, with T and both being measurable, and for all and all n). For various technical reasons it is convenient to restrict to the case when the -algebra is separable, i.e. countably generated. One reason for this is as follows:

**Exercise 1.** Let be a probability space with separable. Then the Banach spaces are separable (i.e. have a countable dense subset) for every ; in particular, the Hilbert space is separable. Show that the claim can fail for . (We allow the spaces to be either real or complex valued, unless otherwise specified.)

**Remark 1. **In practice, the requirement that be separable is not particularly onerous. For instance, if one is studying the recurrence properties of a function on a non-separable measure-preserving system , one can restrict to the separable sub--algebra generated by the level sets for integer n and rational q, thus passing to a separable measure-preserving system on which f is still measurable. Thus we see that in many cases of interest, we can immediately reduce to the separable case. (In particular, for many of the theorems in this course, the hypothesis of separability can be dropped, though we won’t bother to specify for which ones this is the case.)

We are interested in the recurrence properties of sets or functions . The simplest such recurrence theorem is

Theorem 1.(Poincaré recurrence theorem) Let be a measure-preserving system, and let be a set of positive measure. Then . In particular, has positive measure (and is thus non-empty) for infinitely many n.

(Compare with Theorem 1 of Lecture 3.)

**Proof.** For any integer , observe that , and thus by Cauchy-Schwarz

(1)

The left-hand side of (1) can be rearranged as

(2)

On the other hand, . From this one easily obtains the asymptotic

(3)

where o(1) denotes an expression which goes to zero as N goes to infinity. Combining (1), (2), (3) and taking limits as we obtain

as desired.

**Remark 2.** In classical physics, the evolution of a physical system in a compact phase space is given by a (continuous-time) measure-preserving system (this is Hamilton’s equations of motion combined with Liouville’s theorem). The Poincaré recurrence theorem then has the following unintuitive consequence: every collection E of states of positive measure, no matter how small, must eventually return to overlap itself given sufficient time. For instance, if one were to burn a piece of paper in a closed system, then there exist arbitrarily small perturbations of the initial conditions such that, if one waits long enough, the piece of paper will eventually reassemble (modulo arbitrarily small error)! This seems to contradict the second law of thermodynamics, but the reason for the discrepancy is because the time required for the recurrence theorem to take effect is inversely proportional to the measure of the set E, which in physical situations is exponentially small in the number of degrees of freedom (which is already typically quite large, e.g. of the order of the Avogadro constant). This gives more than enough opportunity for Maxwell’s demon to come into play to reverse the increase of entropy. (This can be viewed as a manifestation of the curse of dimensionality.) The more sophisticated recurrence theorems we will see later have much poorer quantitative bounds still, so much so that they basically have no direct significance for any physical dynamical system with many relevant degrees of freedom.

**Exercise 2.** Prove the following generalisation of the Poincaré recurrence theorem: if is a measure-preserving system and is non-negative, then .

**Exercise 3.** Give examples to show that the quantity in the conclusion of Theorem 1 cannot be replaced by any smaller quantity in general, regardless of the actual value of . (Hint: use a Bernoulli system example.)

**Exercise 4.** Using the pigeonhole principle instead of the Cauchy-Schwarz inequality (and in particular, the statement that if , then the sets cannot all be disjoint), prove the weaker statement that for any set E of positive measure in a measure-preserving system, the set is non-empty for infinitely many n. (This exercise illustrates the general point that the Cauchy-Schwarz inequality can be viewed as a quantitative strengthening of the pigeonhole principle.)

For this lecture and the next we shall study several variants of the Poincaré recurrence theorem. We begin by looking at the mean ergodic theorem, which studies the limiting behaviour of the ergodic averages in various spaces, and in particular in .

– Hilbert space formulation –

We begin with the Hilbert space formulation of the mean ergodic theorem, due to von Neumann.

Theorem 2.(Von Neumann ergodic theorem) Let be a unitary operator on a separable Hilbert space H. Then for every we have, (5)

where is the orthogonal projection from H to the closed subspace consisting of the U-invariant vectors.

**Proof.** We give the slick (but not particularly illuminating) proof of Riesz. It is clear that (5) holds if v is already invariant (i.e. ). Next, let W denote the (possibly non-closed) space . If Uw-w lies in W and v lies in , then by unitarity

(6)

and thus W is orthogonal to . In particular . From the telescoping identity

(7)

we conclude that (5) also holds if ; by linearity we conclude that (5) holds for all v in . A standard limiting argument (using the fact that the linear transformations and are bounded on H, uniformly in n) then shows that (5) holds for v in the closure .

To conclude, it suffices to show that the closed space is all of H. Suppose for contradiction that this is not the case. Then there exists a non-zero vector w which is orthogonal to all of . In particular, w is orthogonal to Uw – w. Applying the easily verified identity (related to the parallelogram law) we conclude that Uw=w, thus w lies in . This implies that w is orthogonal to itself and is thus zero, a contradiction.

On a measure-preserving system , the shift map is a unitary transformation on the separable Hilbert space . We conclude

Corollary 1.(mean ergodic theorem) Let be a measure-preserving system, and let . Then we have converges in norm to , where is the orthogonal projection to the space consists of the shift-invariant functions in .

**Example 4.** (Finite case) Suppose that is a finite measure-preserving system, with discrete and the uniform probability measure. Then T is a permutation on X and thus decomposes as the direct sum of disjoint cycles (possibly including trivial cycles of length 1). Then the shift-invariant functions are precisely those functions which are constant on each of these cycles, and the map replaces a function with its average value on each of these cycles. It is then an instructive exercise to verify the mean ergodic theorem by hand in this case.

**Exercise 5.** With the notation and assumptions of Corollary 1, show that the limit exists, is real, and is greater than or equal to . (Hint: the constant function lies in .) Note that this is stronger than the conclusion of Exercise 2.

Let us now give some other proofs of the von Neumann ergodic theorem. We first give a proof (close to von Neumann’s original proof) using the spectral theorem for unitary operators. This theorem asserts (among other things) that a unitary operator can be expressed in the form , where is the unit circle and is a projection-valued Borel measure on the circle. More generally, we have

(8)

and so for any vector v in H and any positive integer N

. (9)

We separate off the portion of this integral. For , we have the geometric series formula

(10)

(compare with (7)), thus we can rewrite (9) as

. (11)

Now observe (using (10)) that is bounded in magnitude by 1 and converges to zero as for any fixed . Applying the dominated convergence theorem (which requires a little bit of justification in this vector-valued case), we see that the second term in (11) goes to zero as . So we see that (9) converges to . But is just the orthogonal projection to the eigenspace of U with eigenvalue 1, i.e. the space , thus recovering the von Neumann ergodic theorem. (It is instructive to use spectral theory to interpret Riesz’s proof of this theorem and see how it relates to the argument just given.)

**Remark 3. ** The above argument in fact shows that the rate of convergence in the von Neumann ergodic theorem is controlled by the spectral gap of U – i.e. how well-separated the trivial component of the spectrum is from the rest of the spectrum. This is one of the reasons why results on spectral gaps of various operators are highly prized.

We now give another proof of Theorem 2, based on the *energy decrement method*; this proof is significantly lengthier, but is particularly well suited for conversion to finitary quantitative settings. For any positive integer N, define the averaging operators ; by the triangle inequality we see that for all v. Now we observe

Lemma 1.(Lack of uniformity implies energy decrement) Suppose . Then .

**Proof.** This follows from the identity

(12)

and the fact that has operator norm at most 1.

We now iterate this to obtain

Proposition 1.(Koopman-von Neumann type theorem) Let v be a unit vector, let , and let be a sequence of integers with . Then there exists and a decomposition where and for all .

(The letters s, r stand for “structured” and “random” (or “residual”) respectively. For more on decompositions into structured and random components, see my FOCS lecture notes.)

Proof. We perform the following algorithm:

- Initialise j := J-1, s := 0, and r := v.
- If for all then STOP. If instead for some , observe from Lemma 1 that .
- Replace r with , replace s with , and replace j with j-1. Then return to Step 2.

Observe that this procedure must terminate in at most steps (since the energy starts at 1, drops by at least at each stage, and cannot go below zero). In particular, j stays positive. Observe also that r always has norm at most 1, and thus at any given stage of the algorithm. From this and the triangle inequality one easily verifies the required claims.

Corollary 2(partial von Neumann ergodic theorem). For any vector v, the averages form a Cauchy sequence in H.

**Proof.** Without loss of generality we can take v to be a unit vector. Suppose for contradiction that was not Cauchy. Then one could find and such that (say) for all j. By sparsifying the sequence if necessary we can assume that is large compared to , and . Now we apply Proposition 1 to find and a decomposition v = s + r such that and . If is large enough depending on , we thus have , and thus by the triangle inequality, , a contradiction.

**Remark 4.** This result looks weaker than Theorem 2, but the argument is much more robust; for instance, one can modify it to establish convergence of multiple averages such as in norms for commuting shifts , which does not seem possible using the other arguments given here; see this paper of mine for details. Further quantitative analysis of the mean ergodic theorem can be found in this paper of Avigad, Gerhardy, and Towsner.

Corollary 2 can be used to recover Theorem 2 in its full strength, by combining it with a weak form of Theorem 2:

Proposition 2(Weak von Neumann ergodic theorem) The conclusion (5) of Theorem 2 holds in the weak topology.

**Proof.** The averages lie in a bounded subset of the separable Hilbert space H, and are thus precompact in the weak topology by the sequential Banach-Alaoglu theorem. Thus, if (5) fails, then there exists a subsequence which converges in the weak topology to some limit w other than . By telescoping series we see that , and so on taking limits we see that , i.e. . On the other hand, if y is any vector in , then , and thus on taking inner products with v we obtain . Taking limits we obtain , i.e. v-w is orthogonal to . These facts imply that , giving the desired contradiction.

– Conditional expectation –

We now turn away from the abstract Hilbert approach to the ergodic theorem (which is excellent for proving the mean ergodic theorem, but not flexible enough to handle more general ergodic theorems) and turn to a more measure-theoretic dynamics approach, based on manipulating the four components of the underlying system separately, rather than working with the single object (with the unitary shift T). In particular it is useful to replace the -algebra by a sub--algebra , thus reducing the number of measurable functions. This creates an isometric embedding of Hilbert spaces

(13)

and so the former space is a closed subspace of the latter. In particular, we have an orthogonal projection , which can be viewed as the adjoint of the inclusion (13). In other words, for any , is the unique element of such that

(14)

for all . (A reminder: when dealing with spaces, we identify any two functions which agree -almost everywhere. Thus, technically speaking, elements of spaces are not actually functions, but rather equivalence classes of functions.)

**Example 5**. (Finite case) Let X be a finite set, thus can be viewed as a partition of X, and is a coarser partition of X. To avoid degeneracies, assume that every point in X has positive measure with respect to . Then an element f of is just a function which is constant on each atom of . Similarly for . The conditional expectation is then the function whose value on each atom A of is equal to the average value on that atom. (What needs to be changed here if some points have zero measure?)

We leave the following standard properties of conditional expectation as an exercise.

**Exercise 6.** Let be a probability space, and let be a sub--algebra. Let .

- The operator is a bounded self-adjoint projection on . It maps real functions to real functions, it preserves constant functions (and more generally preserves -valued functions), and commutes with complex conjugation.
- If f is non-negative, then is non-negative (up to sets of measure zero, of course). More generally, we have a
*comparison principle*: if f, g are real-valued and pointwise a. e., then a.e. Similarly, we have the*triangle inequality*a.e.. - (Module property) If , then a.e..
- (Contraction) If for some , then . (Hint: do the p=1 and cases first.) This implies in particular that conditional expectation has a unique continuous extension to for (the case is exceptional, but note that is contained in since is finite).

For applications to ergodic theory, we will only be interested in taking conditional expectations with respect to a *shift-invariant* sub--algebra , thus and preserve . In that case T preserves , and thus T commutes with conditional expectation, or in other words that

(15)

a.e. for all and all n.

Now we connect conditional expectation to the mean ergodic theorem. Let be the set of essentially shift-invariant sets. One easily verifies that this is a shift-invariant sub--algebra of .

**Exercise 7.** Show that if E lies in , then there exists a set which is genuinely invariant (TF=F) and which differs from E only by a set of measure zero. Thus it does not matter whether we deal with shift-invariance or essential shift-invariance here. (More generally, it will not make any significant difference if we modify any of the sets in our -algebras by null sets.)

The relevance of this algebra to the mean ergodic theorem arises from the following identity:

**Exercise 8. **Show that .

As a corollary of this and Corollary 1, we have

Corollary 2.(Mean ergodic theorem, again) Let be a measure-preserving system. Then for any , the averages converge in norm to .

**Exercise 9. **Show that Corollary 2 continues to hold if is replaced throughout by for any . (Hint: for the case , use that is dense in . For the case , use that is dense in .) What happens when ?

Let us now give another proof of Corollary 2 (leading to a fourth proof of the mean ergodic theorem). The key here will be the decomposition , where is the “structured” part of f (at least as far as the mean ergodic theorem is concerned) and is the “random” part. (The subscripts stand for “anti-uniform” and “uniform” respectively; this notation is not standard.) As is shift-invariant, we clearly have

(16)

so it suffices to show that

(17)

as . But we can expand out the left-hand side (using the unitarity of T) as

(18)

where is the *dual function* of f_U, defined as

. (19)

Now, from the triangle inequality we know that the sequence of dual functions is uniformly bounded in norm, and so by Cauchy-Schwarz we know that the inner products are bounded. If they converge to zero, we are done; otherwise, by the Bolzano-Weierstrass theorem, we have for some subsequence and some non-zero c.

(One could also use ultrafilters instead of subsequences here if desired, it makes little difference to the argument.) By the Banach-Alaoglu theorem (or more precisely, the sequential version of this in the separable case), there is a further subsequence which converges weakly (or equivalently in this Hilbert space case, in the weak-* sense) to some limit . Since c is non-zero, must also be non-zero. On the other hand, from telescoping series one easily computes that decays like O(1/N) as , so on taking limits we have . In other words, lies in .

On the other hand, by construction of we have . From (15) and linearity we conclude that for all N, so on taking limits we have . But since is already in , we conclude , a contradiction.

**Remark 5. **This argument is lengthier than some of the other proofs of the mean ergodic theorem, but it turns out to be fairly robust; it demonstrates (using the compactness properties of certain “dual functions”) that a function with sufficiently strong “mixing” properties (in this case, we require that ) will cancel itself out when taking suitable ergodic averages, thus reducing the study of averages of f to the study of averages of . In the modern jargon, this means that is (the -algebra induced by) a characteristic factor of the ergodic average . We will see further examples of characteristic factors for other averages later in this course.

**Exercise 10. ** Let be a countably infinite discrete group. A *Følner sequence* is a sequence of increasing finite non-empty sets in with with the property that for any given finite set , we have as , where is the product set of and S, denotes the cardinality of , and denotes symmetric difference. (For instance, in the case , the sequence is a Følner sequence.) If acts (on the left) in a measure-preserving manner on a probability space , and , show that converges in to , where is the collection of all measurable sets which are -invariant modulo null sets, and is the function .

[*Update*, Jan 30: exercise corrected, another exercise added.]

[*Update*, Feb 1: Some corrections.]

[*Update*, Feb 4: Ergodic averages changed to sum over 0 to N-1 rather than over 1 to N.]

[*Update*, Feb 11: Discussion of the ergodic theorem in weak topologies (Proposition 2) added.]

[*Update*, Feb 21: Exercise 10 corrected.]

[*Update,* Mar 31: Exercise 5 corrected.]

[*Update*, Jun 14: Some corrections.]

## 50 comments

Comments feed for this article

30 January, 2008 at 6:41 pm

LiorHi Terry: you’re missing a \sum in Exercise 4.

30 January, 2008 at 6:58 pm

Terence TaoThanks, Lior, for the correction!

30 January, 2008 at 9:20 pm

HaoDear Terry,

My friend asked me a question this afternoon: Do the digits of the decimal expansion of \pi contains all finite patterns?

I tried to use ergodic theory and only got that the decimal expansion of almost every numbers in [0,1] contains all patterns. (and every pattern has the expected limiting frequency).

I know that whether \pi is normal is still an open problem (I am not sure whether someone has proven \pi is 10-normal. But at least for 2-normal, it’s open.). But is there any method to prove my weaker proposition? Is there any theorems in ergodic theory describing whether the orbit of a specific point is dense?

Thanks for reading my comment.

31 January, 2008 at 3:07 am

Pedro Lauridsen RibeiroYou mean “Avogadro (constant)” instead of “Avagadro” in Remark 2, I suppose…

31 January, 2008 at 8:49 am

Terence TaoDear Hao: This problem is likely to be extremely difficult with current technology. Basically, because there do exist (uncountably many) numbers in [0,1] which do not contain all digit patterns, one is going to have to use some special property of which is not satisfied by those other numbers. In particular, this special property should allow one to obtain non-trivial control on the fractional parts of for large n. While there are some interesting formulae for these sorts of things (some of which are in fact used when computing the digits of ), they do not seem to have a rich enough algebraic structure that we can use existing technology to control their distribution, either in the qualitative sense you ask (which is equivalent to the orbit of being dense) or in the more traditional sense of 10-normality (which is equivalent to the orbit being equidistributed).

More generally, the task of rigorously demonstrating pseudorandom behaviour from deterministic processes without much algebraic structure (such as the digit sequence of ) is a very important class of problem to which we still do not have nearly enough tools to tackle. In the absence of algebraic structure, ergodic theory methods seem to require the injection of a little bit of randomness (e.g. replacing by a randomly chosen number) before they can really say anything, and in any case this is probably too “soft” of a tool for the problem at hand, which needs to fully exploit the structure of . (Incidentally, I would start with or the golden ratio instead of , and work base 2 or base 5 respectively, as there is a slightly better hope of detecting some exploitable structure, though even here it’s not clear how to proceed.)

Dear Pedro: thanks for the correction!

1 February, 2008 at 3:22 am

ben greenTerry, Hao

As Terry hints, there *are* some special properties of which have to do with digits, though only in base 16 so far as I know and not in base 10. Specifically, there is the Borwein-Bailey-Plouffe algorithm, which has its own Wikipedia entry:

http://en.wikipedia.org/wiki/Bailey-Borwein-Plouffe_formula

This allows one to compute the th hexadecimal digit of without computing all the previous digits. It’s actually remarkably explicit, but just complicated enough to make one think twice about sitting down to try and prove that each hex digit of occurs one sixteenth of the time. For example one seems to need to understand the distribution of modulo for and this already looks very tricky to me.

1 February, 2008 at 11:34 am

orrI enjoy the posts in this course very much, thanks!

It looks like a very difficult course. I was wondering:

does every post like this one fit in a two hour lecture?

you must have exeptionially good students, if this is the pace.

It would be very interesting to hear from the students in the course

how it is.

BTW, I saw two typos:

1. eq. 10, a minus sign in the denominator.

2. proof of proposition 1, step 2 of the alg. – reverse an inequality.

1 February, 2008 at 12:45 pm

Terence TaoDear Orr: thanks for the corrections! The lecture notes are being posted somewhat asynchronously to the actual lectures; for instance, we are currently about halfway through Lecture 6 in the actual class. (Each notes takes about 1-2 classes to cover, though my lectures often do not correspond exactly with what is posted here.)

4 February, 2008 at 7:27 pm

254A, Lecture 9: Ergodicity « What’s new[...] as we derived the mean ergodic theorem from the more abstract von Neumann ergodic theorem in the previous lecture, we shall derive the maximal ergodic theorem from the following abstract maximal [...]

10 February, 2008 at 4:53 pm

254A, Lecture 10: The Furstenberg correspondence principle « What’s new[...] 2 is trivial, while the k=2 case follows from the Poincaré recurrence theorem (Theorem 1 from Lecture 8). We will prove the higher k cases of this theorem in later lectures. In this one, we will explain [...]

11 February, 2008 at 9:28 pm

254A, Lecture 11: Compact systems « What’s new[...] we can normalise a.e.. By modifying each eigenfunction on a set of measure zero (cf. Exercise 7 of Lecture 8) we can assume that and everywhere rather than just almost everywhere. Then the map is a [...]

20 February, 2008 at 8:44 pm

mmailliw/williamConcerning Exercise 10, I am wondering whether the \gamma^-1 in the summation should be replaced by \gamma (replacing the group representation with an antirepresentation)?

(Otherwise, when averaging the images of f – f o \gamma^-1 under the action of the Folner set, the fact that \gamma^-1 is applied before the Folner set means that we’re taking the difference between F_n and its right coset, a ‘cancellation’ not allowed by our definition.)

21 February, 2008 at 10:29 am

Terence TaoHmm, you’re right. I’ve flipped the definition of a Folner set to be near-invariant under right translations instead of left translations to address this.

21 February, 2008 at 8:21 pm

254A, Lecture 12: Weakly mixing systems « What’s new[...] so we only need to consider the case when n is positive. The mean ergodic theorem (Corollary 2 from Lecture 8) tells us the Cesàro behaviour of these coefficients. Indeed, we [...]

28 February, 2008 at 2:41 pm

Oliver Diaz-EspinosaDear Terence,

I found this byproduct of the ergodic theory which can be used to prove the Shannon-MacMillan-Breiman theorem:

Let a probability space invariant.

a.s and in

then

almost surely and in

If is dominated, which occurs in many applications, it is relatively easy to prove the statement above. However, what happens is is not dominated?

A common reference to this result is the book af Ma\~ne on ergodic theory. However, his argument in the non dominated case is shaky. Do you know if the almost surely convergence still holds in this case?

best

ode

28 February, 2008 at 3:38 pm

Terence TaoDear Oliver,

It looks doubtful that almost sure convergence holds without a domination assumption. Consider for instance a finitary counterpart in which X is the cyclic system with uniform measure and shift , and let be times the Kronecker delta function at for (with for all other n), thus each function has an norm of . Then for each , the averages get as large as for some , despite the fact that the functions have small L^1 norm and vanish outside of a set of small measure. I would imagine that by taking a suitable combination of infinitely many copies of this example (with N going off to infinity and going slowly to zero) one could obtain a counterexample to your claim.

31 March, 2008 at 9:03 am

sugatasir,

in exc.5,there are some problems i suppose.first of all,f is not real,so the limit is ‘greater than or equal to’ does not make any sense.also,if we take f in the orthogonal complement of the subspace itself then the limit is zero.

31 March, 2008 at 8:19 pm

Terence TaoDear sugata,

Thanks for the comments. I corrected to . Note though that if f is in the orthogonal complement of then is zero too (since f is orthogonal to 1).

9 June, 2008 at 5:27 pm

liuxiaochuanDear Professor 陶：

I’ve got several corrections( if I am right):

1,At the end of the proof of Theorem 2, I think it should be the identity .

2,The proof of Proposition 2, the end of the second sentence, it should be .

刘

14 June, 2008 at 8:43 am

Terence TaoDear 刘: thanks for the corrections!

20 June, 2008 at 8:34 am

The strong law of large numbers « What’s new[...] as special cases of the mean and pointwise ergodic theorems respectively (see Exercise 9 from 254A Lecture 8 and Theorem 2 from 254A Lecture [...]

1 September, 2008 at 7:55 am

The correspondence principle and finitary ergodic theory « What’s new[...] N, arising from iterating F about times. This result is essentially proven as Corollary 2 of my 254A lecture notes on this topic. Possibly related posts: (automatically generated)254A, Lecture 10: The Furstenberg correspondence [...]

19 December, 2008 at 11:35 pm

陶哲轩遍历论习题解答：第八讲 « Liu Xiaochuan’s Weblog[...] (注：遍历论为陶哲轩教授于今年年初的一门课程，我尝试将所有习题做出来，这是第八讲的十个习题。第八讲是关于保测度变换的第一讲。这里是本讲的链接。) [...]

9 February, 2009 at 9:35 am

RajaDr. Tao,

I have been working on the formulation close to what is described in Exercise 9 for some time now (I am at Wichita State University, Wichita, Kansas). I wondered if I can obtain a copy of the proof for that Exercise?

Thank you.

— Raja

5 August, 2009 at 5:09 pm

Moser’s entropy compression argument « What’s new[...] theory, which then underlies a number of arguments in ergodic theory, as discussed for instance in this earlier blog post. Another basic example is the standard proof of the Szemerédi regularity lemma (where the [...]

23 September, 2009 at 5:52 pm

Máté WierdlTerry,

I believe the proof you give for the abstract mean ergodic theorem is due to F. Riesz. Von Neumann used the spectral theorem.

24 September, 2009 at 2:38 pm

Terence TaoThanks for the correction!

28 September, 2009 at 7:48 am

Máté WierdlWell, a reorganization of the proof allows to prove (equally simply) the abstract mean erg thm for contractions. It then appears that the main ingredient is the Pythagorian theorem to show that the orthocomplement of the invariant functions is spanned by .

http://otthon.csi.hu/mw/proof_of_mean_erg_thm.pdf

http://otthon.csi.hu/mw/proof_of_mean_erg_thm.tex

As you see, if we just want to show convergence, then all follows from the Pythagorean theorem.

7 October, 2009 at 10:55 am

ERT1: Poincaré’s Recurrence Theorem and Von Neumann’s Theorems « Disquisitiones Mathematicae[...] So, expressions of the type , from now on called ergodic averages, are important when dealing with recurrence. This will be our main interest in the next posts. For another perspective on Von Neumann’s Theorem and related results, the reader is referred to this Terry Tao’s lecture. [...]

29 December, 2009 at 10:08 am

Nonconventional ergodic averages and multiple recurrence for von Neumann dynamical systems « What’s new[...] For , the answer to the above four questions is also “yes”, thanks to the von Neumann ergodic theorem for unitary operators. For , we were able to establish a positive answer to the “recurrence [...]

1 January, 2010 at 8:47 pm

254A, Notes 0: A review of probability theory « What’s new[...] 6 One can interpret conditional expectation as a type of orthogonal projection; see for instance these previous lecture notes of mine. But we will not use this perspective in this course. Just as conditioning on an event and its [...]

4 March, 2010 at 1:26 pm

beni22sofMr. Tao,

I want to ask you if you have a pdf version of your notes. I am trying to study ergodic theory by myself, since we don’t study it in our school. I managed to obtain some books on ergodic theory, but my progress is very slow. I think your courses would be very helpful, but I cannot read them all on my computer.

Thank you.

with respect,

Beni Bogosel

(student Timisoara, Romania)

4 March, 2010 at 3:06 pm

Terence TaoThese notes are contained in my book

http://terrytao.wordpress.com/books/poincares-legacies-course-notes-expository-articles-and-lecture-series-from-a-mathematical-blog/

25 July, 2010 at 11:51 am

Yiannishi — a couple of small possible typos: do you need the last sentence in the proof of Theorem 1? it looks like (4) is the same as the stated claim. also, in Exercise 3, should the two appearances of $\mu(X)$ be replaced by $\mu(E)$? thanks — yiannis

[Corrected (very belatedly) – thanks – T.[20 March, 2011 at 2:04 am

ChoiHi Terry!

I am looking at different books about ergodic theory. Poincare recurrent theorem is stated differently in different books. For example, your statement is already different from the wiki’s one. Another example, Silva states theorem that almost every point in a set of positive measure will return to this set. (p86, p88, Invitation to Ergodic theory, http://www.ams.org/bookstore/pspdf/stml-42-prev.pdf)

Why do all authors pick their own Poincare recurrent theorem?

20 March, 2011 at 7:22 am

Terence TaoThe various formulations of the Poincare recurrence theorem are closely related to each other. For instance, the version of Wikipedia can be deduced from the one given here using the Borel-Cantelli lemma. The version you cite from Silva can be deduced from either of the other two versions by considering the set of points that do not return to the set.

More generally, in the more analytic portions of mathematics, the precise statement of a mathematical result is often not of primary importance; it is the

natureof the result (and of the methods used to prove it) which tends to be more important. See Tim Gowers’ “two cultures of mathematics” for further discussion of this point.24 August, 2011 at 11:21 am

254A, Lecture 17: A Ratner-type theorem for SL_2(R) orbits « What’s new[...] 2. Prove this claim. (Hint: obtain continuous analogues of the theory from Lecture 8 and Lecture [...]

10 June, 2012 at 3:36 am

RexHow is it possible for the statement of Exercise 1 to fail for if it holds for when , given that is a probability space?

Since , don't we have ?

10 June, 2012 at 3:40 am

RexOn second thought, I suppose the norms could be different.

22 September, 2012 at 11:20 am

RexJust to make sure we have the same conventions:

In the definition of measure-preserving system, when you say “ is invertible”, you mean that the map is a (measurable) bijection, so that is defined on the level of sets?

In this case, we are not considering examples such as the doubling map on the circle? I think some other texts only require that satisfy for all measurable .

22 September, 2012 at 3:00 pm

Terence Tao(a) yes; see http://terrytao.wordpress.com/2008/01/08/254a-lecture-1-overview/

(b) Not directly, though in practice one can often derive recurrence results for non-invertible ergodic systems from the invertible case by a lifting trick, see e.g. Exercise 9 of http://terrytao.wordpress.com/2008/01/15/254a-lecture-4-multiple-recurrence/ for an instance of this.

22 September, 2012 at 7:06 pm

RexTypo: where you write “Now observe (using (10)) that is bounded in magnitude by 1…”

I think you want:

[Corrected, thanks - T.]25 October, 2012 at 10:10 am

Walsh’s ergodic theorem, metastability, and external Cauchy convergence « What’s new[...] averages, namely the orthogonal projection of to the -invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general [...]

30 October, 2012 at 12:26 am

Advanced Analyis, Notes 5: Hilbert spaces (Application: Von Neumann’s mean ergodic theorem) « Noncommutative Analysis[...] operators-on-Hilbert-space theory, by proving von Neumann’s mean ergodic theorem. See also this treatment by Terry Tao on his [...]

27 November, 2012 at 4:12 pm

JackIn this note, the “conditional expectation” is defined as the projection on the closed subspace of a Hilbert space. I learned another version on [this Wikipedia article] (http://en.wikipedia.org/wiki/Conditional_expectation#Formal_definition). And I also see one in 254A Note 0

(http://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/#as-rem), which is for the discrete random variables. Are they essentially the same?

27 November, 2012 at 4:52 pm

Terence TaoYes; I recommend the exercise of verifying that the definition given in the Wikipedia article is indeed the orthogonal projection to the Hilbert space (this is also alluded to in Section 5 of the Wikipedia article, as well as in equation (14) of the present article), and that in the case when the factor is generated by a discrete random variable, the definition collapses (outside of an event of probability zero, at least) to the definition given in Notes 0 (or at the beginning of the Wikipedia article); this is alluded to in Remark 6 of these notes.

3 February, 2013 at 10:02 am

Alok BakshiDear Professor Tao, have you applied Jensen’s inequality with to obtain (1), because I couldn’t see how Cauchy Schwartz inequality be applied here.

[

To show , one can either use Jensen, or else apply Cauchy-Schwarz to the functions and . -T]21 May, 2013 at 4:30 pm

Multiple recurrence and convergence results associated to $F_{p}^{omega}$-actions | What's new[…] e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if is drawn at […]

10 January, 2014 at 2:02 pm

Cezary FerensIn 1966 I have proved the conjecture of Ryll-Nardzewski that each sigma-algebra of power c is separable under the continuum hypothesis.

26 August, 2014 at 6:52 am

ShukiThanks for these notes.

Where the separability assumption were used in the first proof of Theorem 2?

[The first proof is in fact valid for inseparable Hilbert spaces also. -T]