We continue our study of basic ergodic theorems, establishing the maximal and pointwise ergodic theorems of Birkhoff. Using these theorems, we can then give several equivalent notions of the fundamental concept of ergodicity, which (roughly speaking) plays the role in measure-preserving dynamics that minimality plays in topological dynamics. A general measure-preserving system is not necessarily ergodic, but we shall introduce the *ergodic decomposition*, which allows one to express any non-ergodic measure as an average of ergodic measures (generalising the decomposition of a permutation into disjoint cycles).

— The maximal ergodic theorem —

Just as we derived the mean ergodic theorem from the more abstract von Neumann ergodic theorem in the previous lecture, we shall derive the maximal ergodic theorem from the following abstract maximal inequality.

Theorem 1. (Dunford-Schwartz maximal inequality) Let be a probability space, and let be a linear operator with P1=1 and (i.e. for all . Assume also that P maps non-negative functions to non-negative functions. Then the maximal function obeys the inequality(1)

for any .

**Proof.** We can rewrite (1) as

. (2)

Since , we thus see (by replacing f with ) that we can reduce to proving (2) in the case .

For every , consider the modified maximal function . Observe that if and only if for all sufficiently large m. By the dominated convergence theorem, it thus suffices to show that

(3)

for all m. But observe from definition of (and the positivity preserving nature of P) that we have the pointwise recursive inequality

. (4)

Integrating this on the region and using the non-negativity of , we obtain

. (6)

Since and , the claim follows.

Applying this in the case when P is a shift operator, and replacing f by |f|, we obtain

Corollary 1.(Maximal ergodic theorem) Let be a measure-preserving system. Then for any and one has. (7)

Note that this inequality implies Markov’s inequality

. (8)

as a special case. Applying the real interpolation method, one also easily deduces the maximal inequality

(9)

for all , where the constant depends on p (it blows up like in the limit ).

**Exercise 1** (Rising sun inequality). If , and , establish the *rising sun inequality*

(10)

for any . (*Hint*: one can either adapt the proof of Theorem 1, or else partition the set appearing in (10) into disjoint intervals. The latter proof also leads to a proof of Corollary 1 which avoids the Dunford-Schwartz trick of introducing the functions . The terminology “rising sun” comes from seeing how these intervals interact with the graph of the partial sums of f, which resembles the shadows cast on a hilly terrain by a rising sun.)

**Exercise 2.** (Transference principle) Show that Corollary 1 can be deduced directly from (10). (*Hint*: given , apply (10) to the functions for each (truncating the integers to a finite set if necessary), and then integrate in x using Fubini’s theorem.) This is an example of a *transference principle* between maximal inequalities on and maximal inequalities on measure-preserving systems.

**Exercise 3** (Stein-Stromberg maximal inequality). Derive a continuous version of the Dunford-Schwartz maximal inequality, in which the operators are replaced by a semigroup acting on both and , in which the underlying measure space is only assumed to be -finite rather than a probability space, and the averages are replaced by . Apply this continuous version with equal to the heat operator on for to deduce the *Stein-Stromberg maximal inequality*

(11)

for all and , where m is Lebesgue measure, is the Euclidean ball of radius R centred at x, and the constant C is absolute (independent of d). This improves upon the Hardy-Littlewood maximal inequality, which gives the same estimate but with replaced by . It is an open question whether the dependence on d can be removed entirely; the estimate (11) is still the best known in high dimension. For d=1, the best constant C is known to be , a result of Melas.

**Remark 1.** The study of maximal inequalities in ergodic theory is, of course, a subject in itself; a classical reference is this monograph of Stein.

— The pointwise ergodic theorem —

Using the maximal ergodic theorem and a standard limiting argument we can now deduce

Theorem 2(Pointwise ergodic theorem). Let be a measure-preserving system, and let . Then for -almost every , converges to .

**Proof.** By subtracting from f if necessary, it suffices to show that

(12)

a.e. whenever . By telescoping series, (12) is already true when f takes the form for some . So by the arguments used to prove the von Neumann ergodic theorem from the previous lecture, we have already established the claim for a dense class of functions f in with , and thus also for a dense class of functions in with (since the latter space is dense in the former, and the norm controls the norm by the Cauchy-Schwarz inequality).

Now we use a standard limiting argument. Let with . Then we can find a sequence in the above dense class which converges in to f. For almost every x, we thus have

(13)

for all j, and so by the triangle inequality we have

. (14)

But by Corollary 1 we see that the right-hand side of (14) converges to zero in measure as . Since the left-hand side does not depend on j, it must vanish almost everywhere, as required.

**Remark 2. ** More generally, one can derive a pointwise convergence result on a class of rough functions by first establishing convergence for a dense subclass of functions, and then establishing a maximal inequality which is strong enough to allow one to take limits and establish pointwise convergence for all functions in the larger class. Conversely, principles such as Stein’s maximal principle indicate that in many cases this is in some sense the *only* way to establish such pointwise convergence results for rough functions.

**Remark 3. ** Using the dominated convergence theorem (starting first with bounded functions f in order to get the domination), one can deduce the mean ergodic theorem from the pointwise ergodic theorem. But the converse is significantly more difficult; pointwise convergence for various ergodic averages is often a much harder result to establish than the corresponding norm convergence result (in particular, many of the techniques discussed in this course appear to be of sharply limited utility for pointwise convergence problems), and many questions in this area remain open.

**Exercise 4** (Lebesgue differentiation theorem). Let with Lebesgue measure dm. Show that for almost every , we have , and in particular that .

— Ergodicity —

Combining the mean ergodic theorem with the pointwise ergodic theorem (and with Exercises 7, 8 from the previous lecture) we have

Theorem 3(Characterisations of ergodicity) Let be a measure-preserving system. Then the following are equivalent:

- Any set which is invariant (thus TE=E) has either full measure or zero measure .
- Any set which is almost invariant (thus TE differs from E by a null set) has either full measure or zero measure.
- Any measurable function f with a.e. is constant a.e.
- For any and , the averages converge in norm to .
- For any two , we have .
- For any two measurable sets E and F, we have .
- For any , the averages converge pointwise almost everywhere to .

A measure-preserving system with any (and hence all) of the above properties is said to be *ergodic*.

**Remark 4. ** Strictly speaking, ergodicity is a property that applies to a measure-preserving system . However, we shall sometimes abuse notation and apply the adjective “ergodic” to a single component of a system, such as the measure or the shift T, when the other three components of the system are clear from context.

Here are some simple examples of ergodicity:

**Example 1.** If X is finite with uniform measure, then a shift map is ergodic if and only if it is a cycle.

**Example 2.** If a shift T is ergodic, then so is . However, from Example 1 we see that it is not necessarily true that is ergodic for all n (this latter property is also known as *total ergodicity*).

**Exercise 5.** Show that the circle shift (with the usual Lebesgue measure) is ergodic if and only if is irrational. (*Hint*: analyse the equation for (say) using Fourier analysis. *Added*, Feb 21: As pointed out to me in class, another way to proceed is to use the Lebesgue density theorem (or Lebesgue differentiation theorem) combined with Exercise 14 from Lecture 6.)

**Exercise 6.** Let be a standard Borel probability space. Show that the Bernoulli shift on the product system is ergodic. (*Hint*: first establish property 6 of Theorem 3 when E and F each depend on only finitely many of the coordinates of .)

**Exercise 7.** Let be an ergodic system. Show that if is an eigenvalue of , then , the eigenspace is one-dimensional, and that every eigenfunction f has constant magnitude |f| a.e.. Show that the the eigenspaces are orthogonal to each other in , and the set of all eigenvalues of T forms an at most countable subgroup of the unit circle .

Now we give a less trivial example of an ergodic system.

Proposition 1.(Ergodicity of skew shift) Let be irrational. Then the skew shift is ergodic.

**Proof.** Write the skew shift system as . To simplify the notation we shall omit the phrase “almost everywhere” in what follows.

We use an argument of Parry. If the system is not ergodic, then we can find a non-constant such that Tf = f. Next, we use Fourier analysis to write , where . Since f is T-invariant, and the vertical rotations commute with T, we see that the are also T-invariant. The function depends only on the x variable, and so is constant by Exercise 5. So it suffices to show that is zero for all non-zero m.

Fix m. We can factorise . The T-invariance of now implies that . If we then define for , we see that , thus is an eigenfunction of the circle shift with eigenvalue . But this implies (by Exercise 7) that is orthogonal to for close to zero. Taking limits we see that is orthogonal to itself and must vanish; this implies that and hence vanish as well, as desired.

**Exercise 8**. Show that for any irrational and any , the iterated skew shift system is ergodic.

— Generic points —

Now let us suppose that we have a topological measure preserving system , i.e. a measure-preserving system which is also a topological dynamical system , with the Borel -algebra of T. Then we have the space C(X) of continuous (real or complex-valued) functions on X, which is dense inside . From the Stone-Weierstrass theorem we also see that C(X) is separable.

A sequence in X is said to be *uniformly distributed* with respect to if we have

(15)

for all . A point x in X is said to be *generic* if the forward orbit is uniformly distributed.

**Exercise 9. **Let be a compact metrisable space with a Borel probability measure , and let be a sequence in X. Show that this sequence is uniformly distributed if and only if for all open sets U in X with . What happens if the hypothesis that the boundary of has measure zero is removed?

From Theorem 3 and the separability of C(X) we obtain

Proposition 2.A topological measure-preserving system is ergodic if and only if almost every point is generic.

A topological measure-preserving system is said to be uniquely ergodic if *every* point is generic. The following exercise explains the terminology:

**Exercise 10.** Show that a topological measure-preserving system is uniquely ergodic if and only if the only T-invariant Borel probability measure on T is . (Hint: use Lemma 1 from Lecture 7.) Because of this fact, one can sensibly define what it means for a topological dynamical system to be uniquely ergodic, namely that it has a unique T-invariant Borel probability measure.

It is not always the case that an ergodic system is uniquely ergodic. For instance, in the Bernoulli system (with uniform measure on , say), the point is not generic. However, for more algebraic systems, it turns out that ergodicity and unique ergodicity are largely equivalent. We illustrate this with the circle and skew shifts:

**Exercise 11.** Show that the circle shift (with the usual Lebesgue measure) is uniquely ergodic if and only if is irrational. (*Hint*: first show in the circle shift system that any translate of a generic point is generic.)

Proposition 3.(Unique ergodicity of skew shift) Let be irrational. Then the skew shift is uniquely ergodic.

**Proof.** We use an argument of Furstenberg. We again write the skew shift as . Suppose this system was not uniquely ergodic, then by Exercise 10 there is another shift-invariant Borel probability measure . If we push and down to the circle shift system by the projection map , then by Exercises 10, 11 we must get the same measure. Thus and must agree on any set of the form .

Let E denote the points in which are generic with respect to ; note that this set is Borel measurable. By Proposition 2, this set has full measure in . Also, since the vertical rotations commute with T and preserve , we see that E must be invariant under such rotations; thus they are of the form for some A. By the preceding discussion, we conclude that E also has full measure in . But then (by the pointwise or mean ergodic theorem for ) we conclude that -almost everywhere for every continuous f, and thus on integrating with respect to we obtain for every continuous f. But then by the Riesz representation theorem we have , a contradiction.

Corollary 2.If is irrational, then the sequence is uniformly distributed in (with respect to uniform measure).

**Exercise 11a.** Show that the systems considered in Exercise 8 are uniquely ergodic. Conclude that the exponent 2 in Corollary 2 can be replaced by any positive integer d.

Note that the topological dynamics theory developed in Lecture 6 only establishes the weaker statement that the above sequence is dense in rather than uniformly distributed. More generally, it seems that ergodic theory methods can prove topological dynamics results, but not vice versa. Here is another simple example of the same phenomenon:

**Exercise 12. **Show that a uniquely ergodic topological dynamical system (with the support of the measure equal to the whole space) is necessarily minimal. (The converse is not necessarily true, as already mentioned in Remark 6 of Lecture 7.)

— The ergodic decomposition —

Just as not every topological dynamical system is minimal, not every measure-preserving system is ergodic. Nevertheless, there is an important decomposition that allows one to represent non-ergodic measures as averages of ergodic measures. One can already see this in the finite case, when X is a finite set with the discrete -algebra, and is a permutation on X, which can be decomposed as the disjoint union of cycles on a partition of X. In this case, all shift-invariant probability measures take the form

(16)

where is the uniform probability measure on the cycle , and are non-negative constants adding up to 1. Each of the are ergodic, but no non-trivial linear combination of these measures is ergodic. Thus we see in the finite case that every shift-invariant measure can be uniquely expressed as a convex combination of ergodic measures.

It turns out that a similar decomposition is available in general, at least if the underlying measure space is a compact topological space (or more generally, a Radon space). This is because of the following general theorem from measure theory.

Definition 1(Probability kernel). Let and be measurable spaces. Aprobability kernelis an assignment of a probability measure on X to each in such a way that the map is measurable for every bounded measurable .

**Example 3.** Every measurable map induces a probability kernel . Every probability measure on X can be viewed as a probability kernel from a point to X. If and are two probability kernels from Y to Z and from X to Y respectively, their composition is also a probability kernel, where is the measure that assigns to any measurable set E in Z. Thus one can view the class of measurable spaces and their probability kernels as a category, which includes the class of measurable spaces and their measurable maps as a subcategory.

Definition 2.(Regular space) A measurable space is said to beregularif there exists a compact metrisable topology on X for which is the Borel -algebra.

**Example 4.** Every topological measure-preserving system is regular.

**Remark 5. **Measurable spaces in which is the Borel -algebra of a topological space generated by a separable complete metric space (i.e. a Polish space) are known as *standard Borel spaces*. It is a non-trivial theorem from descriptive set theory that up to measurable isomorphism, there are only three types of standard Borel spaces: finite discrete spaces, countable discrete spaces, and the unit interval [0,1] with the usual Borel -algebra. From this one can see that regular spaces are the same as standard Borel spaces, though we will not need this fact here.

Theorem 4(Disintegration theorem). Let and be probability spaces, with regular. Let be a morphism (thus ). Then there exists a probability kernel such that(17)

for any bounded measurable and . Also, for any such g, we have

-a.e. (18)

for -a.e. y.

Furthermore, this probability kernel is unique up to -almost everywhere equivalence, in the sense that if is another probability kernel with the same properties, then for -almost every .

We refer to the probability kernel generated by the above theorem as the *disintegration* of relative to the factor map .

**Proof.** We begin by proving uniqueness. Suppose we have two probability kernels with the above properties. Then on subtraction we have

(19)

for all bounded measurable , . Specialising to for some measurable set , we conclude that for -almost every y. Since is regular, it is separable and we conclude that for -almost every y, as required.

Now we prove existence. The pullback map defined by has an adjoint , thus

(20)

for all and . It is easy to see from duality that we have for all (where we select a compact metrisable topology that generates the regular -algebra ). Recall that is not quite a measurable function, but is instead an equivalence class of measurable functions modulo -almost everywhere equivalence. Since C(X) is separable, we find a measurable representative of to every which varies linearly with f, and is such that for all y outside of a set E of -measure zero and for all . For all such y, we can then apply the Riesz representation theorem to obtain a Radon probability measure such that

(21)

for all such y. We set equal to some arbitrarily fixed Radon probability measure for . We then observe that the required properties (including the measurability of ) are already obeyed for . To generalise this to bounded measurable f, observe that the class of f obeying the required properties is closed under dominated pointwise convergence, and so contains the indicator functions of open or compact sets (by Urysohn’s lemma). Applying dominated pointwise convergence again and inner and outer regularity, we see that the indicator functions of any Borel set lies in . Thus all simple measurable functions lie in , and on taking uniform limits we obtain the claim.

Finally, we prove (18). From two applications of (17) we have

(22)

for all bounded measurable and . The claim follows (using the separability of the space of all f).

**Exercise 13. **Let the notation and assumptions be as in Theorem 4. Suppose that is also regular, and that the map is continuous with respect to some compact metrisable topologies that generate and respectively. Then show that for -almost every y, the probability measure is supported in .

Proposition 4(Ergodic decomposition). Let be a regular measure-preserving system. Let be the system defined by , , , and , and let be the identity map. Let be the disintegration of with respect to the factor map . Then for -almost every y, the measure is T-invariant and ergodic.

**Proof.** Observe from the T-invariance of (and of ) that the probability kernel would also be a disintegration of . Thus we have for -almost every y.

Now we show the ergodicity. As the space of bounded measurable is separable, it suffices by Theorem 3 and a limiting argument to show that for any fixed such f, the averages converge pointwise -a.e. to for -a.e. y.

From the pointwise ergodic theorem, we already know that converges to outside of a set of -measure zero. By (17), this set also has -measure zero for -almost every y. Thus it will suffice to show that is -a.e. equal to for -a.e. y. Now observe that , so the claim follows from (18) and (21).

**Exercise 14.** Let be a separable measurable space, and let T be bimeasurable bijection . Let denote the Banach space of all finite measures on with the total variation norm. Let denote the collection of probability measures on which are T-invariant. Show that this is a closed convex subset of , and the extreme points of are precisely the ergodic probability measures (which also form a closed subset of ). We remark that one can use these facts to prove some variants of Proposition 4 using Choquet’s theorem, although I do not know of a way to prove this proposition in full generality just from Choquet.

**Exercise 15.** Show that a topological measure-preserving system is uniquely ergodic if and only if the only ergodic shift-invariant Borel probability measure on X is .

## 77 comments

Comments feed for this article

5 February, 2008 at 7:04 am

LiorIn the proof of Thm 1, we should “[o]bserve that …” and not as written. Ex.11 duplicates Ex. 5 (should ask to show the circle shift is uniquely ergodic). Def. 1 should note that is a measure on . Example 3 should have and not as written; “composition” there is a form of convolution.

5 February, 2008 at 11:53 am

Terence TaoThanks, Lior, for the corrections!

5 February, 2008 at 1:26 pm

LiorOf course my remark on Example 3 is wrong — sorry about that. I too should re-read what I write.. To Exercize 14 you can add showing that the set of ergodic measures is closed (eliminating the annoyance with the meaning of the “support” of the measure you get from Choquet’s theorem).

13 February, 2008 at 1:39 pm

254A, Lecture 11: Compact systems « What’s new[…] measure-preserving systems are isomorphic); the notion of regularity was defined in Definition 2 of Lecture 9. Hint: take a countable shift-invariant family of sets that generate (thus T acts on this space by […]

22 February, 2008 at 2:39 pm

254A, Lecture 12: Weakly mixing systems « What’s new[…] system with irrational (and with uniform measure), thus this system is ergodic by Exercise 5 from Lecture 9. Then the function has mean zero, but one easily computes that . The coefficients converge in the […]

27 February, 2008 at 7:07 pm

254A, Lecture 13: Compact extensions « What’s new[…] 8. Show that each of the iterated skew shifts (Exercise 8 from Lecture 9) are compact extensions of the preceding skew […]

6 March, 2008 at 10:07 am

254A, Lecture 14: Weakly mixing extensions « What’s new[…] 1. If X is regular, then we can disintegrate the measure as an average , see Theorem 4 from Lecture 9. It is then possible to construct a relative product system , which is the product system but with […]

9 March, 2008 at 8:55 pm

254A, Lecture 16: A Ratner-type theorem for nilmanifolds « What’s new[…] This result is originally due to Leon Green, using spectral theory methods. We will use an argument of Parry (and adapted by Leibman), relying on “vertical” Fourier analysis and topological arguments, which we have already used for the skew shift in Proposition 1 of Lecture 9. […]

15 March, 2008 at 3:07 pm

254A, Lecture 17: A Ratner-type theorem for SL_2(R) orbits « What’s new[…] to H, and thus (by Corollary 1) ergodic with respect to U. This implies (cf. Proposition 2 from Lecture 9) that -almost every point x in is generic (with respect to U) in the sense […]

31 March, 2008 at 9:15 am

sugatasir,

in proving thm.1, you have written F(m+1)=max{0,f+PFm}.this means f(m+1) is positive always.but from the definition that is not true for negative functions.

31 March, 2008 at 8:32 pm

Terence TaoDear sugata,

is always non-negative, because the N=0 term in the definition of is automatically zero.

20 June, 2008 at 7:59 pm

The strong law of large numbers « What’s new[…] Remark 4. By viewing the sequence as a stationary process, and thus as a special case of a measure-preserving system one can view the weak and strong law of large numbers as special cases of the mean and pointwise ergodic theorems respectively (see Exercise 9 from 254A Lecture 8 and Theorem 2 from 254A Lecture 9). […]

23 December, 2008 at 5:07 am

liuxiaochuanDear Professor Tao:

In theorem 1, is the linear operation P asked to be positive?

in (10), should the right-hand side be or ?

23 December, 2008 at 8:55 am

Terence TaoThanks for the corrections!

24 February, 2009 at 11:26 am

mykolaDear Professor Tao,

in the last but one paragraph of the proof of Theorem 4 you claim:

“we see that the sets whose indicator functions lie in form a -algebra.”

It is not clear that the mentioned collection (of the sets whose indicator functions lie in ) is closed under taking a union of (not disjoint) sets.

24 February, 2009 at 11:46 am

Terence TaoFair enough. I guess we can use outer and inner regularity (and dominated or monotone convergence) to get indicator functions of Borel sets to lie in , instead.

24 February, 2009 at 1:22 pm

mykolaThank you very much!

1 April, 2009 at 10:16 am

pablolessaIn the proof of the pointwise ergodic theorem (Theorem 2). I don’t see how one can aproximate a given variable by a one, while at the same time preserving the conditional expectation.

1 April, 2009 at 10:28 am

Terence TaoDear pablolessa,

One can approximate (in ) an function f with zero conditional expectation by a function g in with arbitrarily high accuracy (in ). Now, g may not have zero conditional expectation

a priori, but one can simply replace it by its projection onto the zero expectation functions. Since this projection is bounded on , and also preserves f, we see that also approximates f in to arbitrarily high accuracy.1 April, 2009 at 1:05 pm

pablolessaYou’re right! I got mixed up trying to bound the norm of which actually might be unbounded.

Thank you.

24 June, 2009 at 6:57 pm

AnonymousDear Terry,

I was wondering if it is true that in the setting of Proposition 4, the invariant $\sigma$-algebra ${\cal X}^T$ is always countably generated? If for some reason this is true, it would give an easy way to check that measures $\mu_y$ are $T$-invariant and ergodic without appealing to the pointwise ergodic theorem, simply by looking at what happens on the sets that generate ${\cal X}^T$.

Thanks!

25 June, 2009 at 12:30 pm

Terence TaoDear anonymous,

Hmm, the issue seems to be delicate. It is not hard to show that is countably generated

modulo -null sets– for instance one can take a countable dense subset of and look at their projections to . But this doesn’t necessarily mean that is countably generatedmodulo -null setsfor almost every y, unless one is very careful with the measure-theoretic issues. (One may also have to be more precise about how is defined – does one want genuinely -invariant sets, or sets that are only invariant up to -null sets?)6 August, 2009 at 7:33 am

陶哲轩遍历论习题解答：第五讲 « Liu Xiaochuan’s Weblog[…] is similar to the proof of theorem 4, though some difference should be taken care of. I wrote another post in […]

6 August, 2009 at 7:35 am

liuxiaochuanDear Professor Tao:

about (10), I am wondering if the right hand should be

[Corrected, thanks – T.]7 August, 2009 at 10:45 am

liuxiaochuanDear Professor Tao:

about (6), I only can get , I don’t see why

7 August, 2009 at 10:59 am

Terence Taois non-negative, so is bounded above by .

10 August, 2009 at 1:31 am

liuxiaochuanDear Professor Tao:

In Exercise 4 , the two integrals in the conclusion are both dy, instead of dx.

22 October, 2009 at 7:26 am

Carlos MeninhoDear Professor Terence Tao,

I have a question regarding Exercise 9; consider the following topological dynamical system on the circle:

Take the homeomorphism . This map fixes 0 and 1, hence induces a homeomorphism of the circle into itself. It is easy to see that this dynamical system has only one invariant probability measure: the atomic measure on the point 0~1.

Therefore, is uniquely ergodic by Exercise 10 and every point on the circle is generic. The positive orbit of a point in is contained in this open set. If Exercise 9 is true then , but , that is a contradiction.

Could you indicate what is the problem in this example?

I can prove a version of this exercise changing “for all open sets in ” by “for all open sets in with ” (the measure is zero on the boundary). This is true with the local compacity condition (weaker than compacity) by using Urysohn’s Lemma.

Best wishes,

Carlos Meninho.

22 October, 2009 at 8:32 am

Terence TaoHmm, you’re right; I’ve changed Exercise 9 accordingly.

1 January, 2010 at 8:47 pm

254A, Notes 0: A review of probability theory « What’s new[…] for every random variable , by using tools such as the Radon-Nikodym theorem; see Theorem 4 of these previous lecture notes of mine. In practice, we will not invoke these general results here (as it is not natural for us to place […]

1 April, 2010 at 12:59 am

liuxiaochuanDear Professor Tao:

I think the right hand sidd of (22) should be and the last sentance of exercise 13 is “…measure is …

While I am trying to do exercise 13, I construct a function g in order to get a contradiction with (18) as follows.

I fix a and define

It seems this can work. But in this way the condition that “ is continuous ” is not necessary. I don’t know if I am right.

On exercise 14, I didn’t found a way to prove “any non-ergodic measure is not extreme points” as one direction of the conclusion without using the Ergodic decomposition theorem. But since you said this can provide a alternative method for that theorem, could give a hint for this?

1 April, 2010 at 9:18 am

Terence TaoThanks for the corrections!

The continuity of is needed to make the preimage closed (note that the support of a measure is closed by definition).

For exercise 14, use the fact that a non-ergodic measure has an invariant set of measure strictly between 0 and 1. By conditioning the measure to that set or its complement one can show that the measure is not an extreme measure.

2 April, 2010 at 7:04 am

anonymousThe tag you have used for this post is different from the tag for the other posts in the series.

2 April, 2010 at 7:06 am

anonymousAh! Never mind, I thought this was part of the same series as the Random Matrices one (which stopped at nr 8)…

23 April, 2010 at 4:14 pm

Michael S.Dear Professor Tao,

concerning Exercise 12 there seems to be a small problem (hope I understood the definitions and details):

Suppose T is a homeomorphism of the circle K with exactly one fixed point, say 0, whose complement K\{0} is wandering. E.g. we may take the map

z=exp(2*pi*i*x) —> T(z)=exp(2*pi*i*x^2) where x is in [0,1].

Then T is uniquely ergodic as every invariant measure is supported in the non-wandering set, thus there is only one invariant measure, the Dirac mass in 0. Nevertheless T is not minimal as is has a fixed point.

Best

Michael

18 June, 2010 at 2:53 am

Solutions to Ergodic Theory：Lecture twelve « Xiaochuan Liu's Weblog[…] First we recall the exercise 11a of lecture 9, which asserts thatthen the sequence is uniformly distributed in , which means (by choosing ) that […]

12 August, 2010 at 6:50 am

SamI want an example of an ergodic measure preserving transformation T for which T^2 is not ergodic.

12 August, 2010 at 6:57 am

AnonymousDear Sam,

just take the non-trivial permutation (flip) on two points of measure 1/2 each. This is ergodic by definition (no invariant sets of measure strictly between 0 and 1), but its square will be the identity which is not ergodic (each point is an invariant subset of measure 1/2).

You can modify this idea to get more elaborate examples on more complicated measure spaces.

Best

Michael

12 August, 2010 at 7:05 am

SamBy T^2, I mean the second iterate of T. That is, T(T(x)). …. and not T x T

12 August, 2010 at 7:07 am

AnonymousThat’s what it is…

Say you have the two points 1 and 2 and the transformation is 1 goes to 2 and 2 goes to 1, then T^2 is the identity…

12 August, 2010 at 7:16 am

Samwell-understood now. Great!!!!

12 August, 2010 at 8:17 am

Samoops!!! can I use the Dirac measure as a measure preserving transformation?

12 August, 2010 at 8:37 am

AnonymousThe Dirac measure is not a transformation, but what you probably mean is, can you define a (measure preserving) transformation which preserves just the Dirac measure.

The answer is yes. Suppose you have any space (set etc.). Put a Dirac mass on one of its points (the remaining set/space then has measure zero, as you normally want a probability space). Then a transformation is measure preserving iff it has the point with the Dirac mass as a fixed point and in fact then it is already an ergodic transformation.

The other way around it works too: Suppose you have a given transformation on some space and you want to get an invariant measure. If the transformation has a fixed point, just put a Dirac mass in this point and you produced an invariant measure.

If the transformation has a finite orbit (i.e. a periodic point with finite period) just put the same mass on each of the points of this finite orbit – the mass should be 1/(length of the orbit) – and you again produced an invariant measure.

In fact most often you construct arbitrary invariant measures as limits (in the weak star topology) of measures which have support on certain finite orbits. Usually those “periodic” measures are dense in the space of all invariant measures and thus you can always approximate an invariant measure using a sequence of those special measures.

12 August, 2010 at 4:25 pm

SamFor the class of invariant events, I, how does one show that a random variable X is I-measurable if and only if X=XoT, where T is a map that preserves the probability P? I have tried it for an indicator variable, then simple r. v., then a nonnegative r. v. but having problem getting a formal derivation for the general r. v. Moreover, my idea on the indicator variable is not clear as to how I should put it mathematically.

16 October, 2010 at 8:29 pm

245A, Notes 5: Differentiation theorems « What’s new[…] this inequality implies Lemma 12. (Hint: First do the case, by invoking the rising sun lemma.) See these lecture notes for some further discussion of inequalities of this type, and applications to ergodic theory (and […]

23 May, 2011 at 8:50 am

Stein’s spherical maximal theorem « What’s new[…] abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type with a constant of , and this can be […]

2 October, 2012 at 7:26 am

RexIn the statement of the Dunford-Schwartz maximal inequality, you write

“”

Do you want this to be instead?

Otherwise, we seem to be missing the term in the sum, and then it is not clear to me why if and only if for sufficiently large .

[Corrected, thanks – T.]3 October, 2012 at 1:54 pm

RexIn the Dunford-Schwartz maximal inequality, is there any need to assume that is a probability space? I don’t see any place where this assumption is used.

In particular, we should be able to just apply it directly to Exercise 1, yes?

3 October, 2012 at 2:12 pm

Terence TaoIt’s used implicitly via the fact that constant functions (such as ) lie in , which is only true in measure spaces of finite measure.

But yes, if one formulates things carefully, one should be able to extend the Dunford-Schwarz inequality to the infinite measure spacing (but one would have to extend P from to some larger space, such as ).

4 October, 2012 at 5:01 pm

RexIn the statement of the pointwise ergodic theorem, we assume that is , but not necessarily . How do we define its conditional expectation then?

4 October, 2012 at 6:24 pm

Terence TaoSee Exercise 6.4 of Lecture notes 8. The point is that conditional expectation is a contraction in L^1 norm, and L^2 is a dense subspace of L^1, so one can extend conditional expectation to L^1 by density.

5 October, 2012 at 3:19 pm

RexI’m a bit confused with the composition of probability kernels in Example 3.

The composition seems to have domain , but shouldn’t composing a map from to and then to should give us something with domain ?

Also, we seem to be taking values where , but is a measure on .

5 October, 2012 at 4:16 pm

Terence TaoSorry, and should be from Y to Z and from X to Y respectively.

5 October, 2012 at 5:03 pm

RexAh, okay. That makes more sense. I think you want to write the integrals

to be taken over rather than as well.

[Corrected, thanks – T.]14 April, 2013 at 7:39 am

JeffI don’t think I know how to go from the hint in exercise 6 to the statement for general E, F. Also, don’t you need the space to be a standard Borel space or something so that you can take infinite product spaces?

14 April, 2013 at 7:49 am

JeffTo be specific, I know that the theorem is useful here, but cannot check that the left side of condition 6 is a measure, because the limit may not commute with infinite sums. (Or you can view this as checking that the set where 6 holds for say fixed F is a lambda system.)

14 April, 2013 at 9:58 am

Terence TaoYes, one needs a hypothesis such as standard Borel (actually inner regular should suffice) in the exercise; I’ve added it as such. To handle general measurable sets, show that such sets an be approximated in measure by sets that depend on only finitely many factors.

15 April, 2013 at 11:09 am

Stéphane LaurentThere is an error in point 6 of Theorem. The exponent should be -n, not n. For instance take T(x)=2x(mod 1) and E=[0,1/2]. Then 6 fails but T is ergodic.

15 April, 2013 at 11:46 am

Terence TaoIn these notes we are restricting attention to invertible systems (see https://terrytao.wordpress.com/2008/01/08/254a-lecture-1-overview/ ).

15 April, 2013 at 12:30 pm

Stéphane LaurentOoops sorry. But isn’t it more natural with the negative exponent ?

15 April, 2013 at 2:57 pm

Terence TaoI suppose so, though in the invertible case one has so there is basically no distinction to be made in that case.

21 May, 2013 at 4:30 pm

Multiple recurrence and convergence results associated to $F_{p}^{omega}$-actions | What's new[…] is the pushforward map associated to the map ; see e.g. this previous blog post. We can interpret this as an equidistribution result. If is a pair as before, then we no longer […]

12 September, 2013 at 2:58 pm

Measure preserving dynamics | Peter's ruminations[…] https://terrytao.wordpress.com/2008/02/04/254a-lecture-9-ergodicity/#more-252 […]

20 June, 2014 at 9:10 pm

An abstract ergodic theorem, and the Mackey-Zimmer theorem | What's new[…] topology, where is the -invariant subspace of , and is the orthogonal projection to . (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if is a countable amenable […]

30 August, 2014 at 5:59 pm

GDear Prof Tao,

there is a typo in Exercise 13, the measure should be \mu_y and not \nu_y.

Best,

G

[Corrected, thanks – T.]18 May, 2016 at 7:28 am

Ian BiringerFor Exercise 14, is there actually a version of Choquet’s theorem that applies in such generality? The original needs the set of measures to be compact, not closed. Noncompact version like

require things like the convex set to be separable and contained in a Banach space with the Radon-Nikodym property. The exercise is a good one, but the last sentence seems misleading. (I’m not an expert, though, so if you do have such a reference I’d love to see and use it.)

18 May, 2016 at 10:07 am

Terence TaoFair enough; Choquet’s theorem can be applied as is only under some additional regularity hypotheses on the dynamical system (such as differentiability), see e.g.http://www.cimat.mx/~albarran/documentos/ergodicdesc.pdf . I’ll reword the final remark.

19 May, 2016 at 1:52 am

Ian BiringerThat’s right, although I’m not sure that differentiability matters much. In the notes you cite, there is a standing assumption that the underlying space is a compact manifold.

In general, whenever you have a group G acting continuously on a compact metric space, the weak* topology on the space of probability measures is compact and you can apply Choquet to the weak* closed subset of G-invariant measures.

After further investigation, it does seem that one can prove an ergodic decomposition theorem for (l.c.s.c.) G-actions on standard Borel spaces using Choquet, though. The basic idea is to use a result of Varadarajan that allows one to Borel-embed arbitrary standard Borel G-spaces into compact metric G-spaces on which the G-action is continuous.

Varadarajan: Theorem 3.2 of http://www.ams.org/journals/tran/1963-109-02/S0002-9947-1963-0159923-5/S0002-9947-1963-0159923-5.pdf

Here’s a paper where they use this approach:

16 August, 2016 at 11:31 pm

OCHONOGORProve that the identity map on any measure space is measure preserving

27 September, 2017 at 4:47 pm

Kevin M. PilgrimIn Proposition 3, it seems like you need to know E is nonempty; this would follow from ergodicity of the skew-shift with respect to Lebesgue measure. In turn, this follows from looking at the induced unitary operator on L^2–but did you have another argument in mind?

[See Proposition 1 -T.]27 September, 2018 at 4:16 pm

Abhishek KhetanhekDear Professor Tao,

In the second paragraph of the proof of Proposition 4 (ergodic decomposition theorem) there is a line which reads “it suffices by Theorem 3 and a limiting argument …” I am not able to follow the sufficiency. Can you please provide some details. Thank you.

2 October, 2018 at 4:21 pm

Terence TaoOne can find a countable set of bounded measurable functions that is dense in every ; indeed one can take to be rational linear combinations of indicator functions of the countable generating set of the sigma-algebra.

By hypothesis and countable union, we see that for -almost every , the pointwise ergodic theorem holds in for every . By a limiting argument, that shows that the mean ergodic theorem holds in for every , hence by Theorem 3, is ergodic.

7 October, 2018 at 11:58 am

Abhishek KhetanhekThank you for the detailed response.

6 March, 2019 at 8:30 am

Yuping ZhangMy conclusion: The Dr. Terence Tao’s the blue-eyed islanders puzzle only works out if N=2, N=1, and in condition N=0 ( see below)

The example 1) explain 2 colors in the puzzle .

The case, 10 and N=1, and N= is the only person to whose eyes are not green. Rest of the 9 are greens.

Someone announced: ”At least one of the ten has green eyes.”

In the 9th days, CAN all of the 9 greens figure it out their eye color logically? NO

The example 2) explain 2 colors in the puzzle

The case, 10 and N=0 or the entire ten has no green eyes

Someone announced:”At least one of the ten has no green eyes.

In the 10th days, CAN all of the 10 no greens figure it out their eye color logically? YES.

Why? Again!

the crucial key is the entire 10 of them could see the single color in the same amount of people=9, therefore, it’s not leaving each day that tells more and more works out.

But not in Dr. Terence Tao’s the blue-eyed island puzzle.

15 March, 2019 at 8:16 am

Maths studentDear Prof. Tao,

in the proof of uniqueness in the disintegration theorem, do you really mean to say that is regular? Or do you mean that under a suitable metric, is separable?

15 March, 2019 at 8:22 am

Maths studentI see, what you mean is that regularity is in fact a property of the -algebra. Perhaps this should be pointed out.

[See Definition 2 – T.]11 January, 2020 at 7:45 am

Radu ZaharopolDear Professor Tao,

Have you had a chance to see Daniel Worm’s Ph.D. thesis (Leiden 2010) and my 2014 Springer monograph?

In these works we discuss the KBBY (Kryloff-Bogoliouboff-Beboutoff-Yosida) ergodic decomposition. The KBBY decomposition is obtained in a constructive manner, while the decomposition based on the Choquet-Phelps approach is highly nonconstructive.