We continue our study of basic ergodic theorems, establishing the maximal and pointwise ergodic theorems of Birkhoff. Using these theorems, we can then give several equivalent notions of the fundamental concept of ergodicity, which (roughly speaking) plays the role in measure-preserving dynamics that minimality plays in topological dynamics. A general measure-preserving system is not necessarily ergodic, but we shall introduce the ergodic decomposition, which allows one to express any non-ergodic measure as an average of ergodic measures (generalising the decomposition of a permutation into disjoint cycles).
– The maximal ergodic theorem –
Just as we derived the mean ergodic theorem from the more abstract von Neumann ergodic theorem in the previous lecture, we shall derive the maximal ergodic theorem from the following abstract maximal inequality.
Theorem 1. (Dunford-Schwartz maximal inequality) Let
be a probability space, and let
be a linear operator with P1=1 and
(i.e.
for all
. Assume also that P maps non-negative functions to non-negative functions. Then the maximal function
obeys the inequality
(1)
for any
.
Proof. We can rewrite (1) as
. (2)
Since , we thus see (by replacing f with
) that we can reduce to proving (2) in the case
.
For every , consider the modified maximal function
. Observe that
if and only if
for all sufficiently large m. By the dominated convergence theorem, it thus suffices to show that
(3)
for all m. But observe from definition of (and the positivity preserving nature of P) that we have the pointwise recursive inequality
. (4)
Integrating this on the region and using the non-negativity of
, we obtain
. (6)
Since and
, the claim follows.
Applying this in the case when P is a shift operator, and replacing f by |f|, we obtain
Corollary 1. (Maximal ergodic theorem) Let
be a measure-preserving system. Then for any
and
one has
. (7)
Note that this inequality implies Markov’s inequality
. (8)
as a special case. Applying the real interpolation method, one also easily deduces the maximal inequality
(9)
for all , where the constant
depends on p (it blows up like
in the limit
).
Exercise 1 (Rising sun inequality). If , and
, establish the rising sun inequality
(10)
for any . (Hint: one can either adapt the proof of Theorem 1, or else partition the set appearing in (10) into disjoint intervals. The latter proof also leads to a proof of Corollary 1 which avoids the Dunford-Schwartz trick of introducing the functions
. The terminology “rising sun” comes from seeing how these intervals interact with the graph of the partial sums of f, which resembles the shadows cast on a hilly terrain by a rising sun.)
Exercise 2. (Transference principle) Show that Corollary 1 can be deduced directly from (10). (Hint: given , apply (10) to the functions
for each
(truncating the integers to a finite set if necessary), and then integrate in x using Fubini’s theorem.) This is an example of a transference principle between maximal inequalities on
and maximal inequalities on measure-preserving systems.
Exercise 3 (Stein-Stromberg maximal inequality). Derive a continuous version of the Dunford-Schwartz maximal inequality, in which the operators are replaced by a semigroup
acting on both
and
, in which the underlying measure space is only assumed to be
-finite rather than a probability space, and the averages
are replaced by
. Apply this continuous version with
equal to the heat operator on
for
to deduce the Stein-Stromberg maximal inequality
(11)
for all and
, where m is Lebesgue measure,
is the Euclidean ball of radius R centred at x, and the constant C is absolute (independent of d). This improves upon the Hardy-Littlewood maximal inequality, which gives the same estimate but with
replaced by
. It is an open question whether the dependence on d can be removed entirely; the estimate (11) is still the best known in high dimension. For d=1, the best constant C is known to be
, a result of Melas.
Remark 1. The study of maximal inequalities in ergodic theory is, of course, a subject in itself; a classical reference is this monograph of Stein.
– The pointwise ergodic theorem –
Using the maximal ergodic theorem and a standard limiting argument we can now deduce
Theorem 2 (Pointwise ergodic theorem). Let
be a measure-preserving system, and let
. Then for
-almost every
,
converges to
.
Proof. By subtracting from f if necessary, it suffices to show that
(12)
a.e. whenever . By telescoping series, (12) is already true when f takes the form
for some
. So by the arguments used to prove the von Neumann ergodic theorem from the previous lecture, we have already established the claim for a dense class of functions f in
with
, and thus also for a dense class of functions in
with
(since the latter space is dense in the former, and the
norm controls the
norm by the Cauchy-Schwarz inequality).
Now we use a standard limiting argument. Let with
. Then we can find a sequence
in the above dense class which converges in
to f. For almost every x, we thus have
(13)
for all j, and so by the triangle inequality we have
. (14)
But by Corollary 1 we see that the right-hand side of (14) converges to zero in measure as . Since the left-hand side does not depend on j, it must vanish almost everywhere, as required.
Remark 2. More generally, one can derive a pointwise convergence result on a class of rough functions by first establishing convergence for a dense subclass of functions, and then establishing a maximal inequality which is strong enough to allow one to take limits and establish pointwise convergence for all functions in the larger class. Conversely, principles such as Stein’s maximal principle indicate that in many cases this is in some sense the only way to establish such pointwise convergence results for rough functions.
Remark 3. Using the dominated convergence theorem (starting first with bounded functions f in order to get the domination), one can deduce the mean ergodic theorem from the pointwise ergodic theorem. But the converse is significantly more difficult; pointwise convergence for various ergodic averages is often a much harder result to establish than the corresponding norm convergence result (in particular, many of the techniques discussed in this course appear to be of sharply limited utility for pointwise convergence problems), and many questions in this area remain open.
Exercise 4 (Lebesgue differentiation theorem). Let with Lebesgue measure dm. Show that for almost every
, we have
, and in particular that
.
– Ergodicity –
Combining the mean ergodic theorem with the pointwise ergodic theorem (and with Exercises 7, 8 from the previous lecture) we have
Theorem 3 (Characterisations of ergodicity) Let
be a measure-preserving system. Then the following are equivalent:
- Any set
which is invariant (thus TE=E) has either full measure
or zero measure
.
- Any set
which is almost invariant (thus TE differs from E by a null set) has either full measure or zero measure.
- Any measurable function f with
a.e. is constant a.e.
- For any
and
, the averages
converge in
norm to
.
- For any two
, we have
.
- For any two measurable sets E and F, we have
.
- For any
, the averages
converge pointwise almost everywhere to
.
A measure-preserving system with any (and hence all) of the above properties is said to be ergodic.
Remark 4. Strictly speaking, ergodicity is a property that applies to a measure-preserving system . However, we shall sometimes abuse notation and apply the adjective “ergodic” to a single component of a system, such as the measure
or the shift T, when the other three components of the system are clear from context.
Here are some simple examples of ergodicity:
Example 1. If X is finite with uniform measure, then a shift map is ergodic if and only if it is a cycle.
Example 2. If a shift T is ergodic, then so is . However, from Example 1 we see that it is not necessarily true that
is ergodic for all n (this latter property is also known as total ergodicity).
Exercise 5. Show that the circle shift (with the usual Lebesgue measure) is ergodic if and only if
is irrational. (Hint: analyse the equation
for (say)
using Fourier analysis. Added, Feb 21: As pointed out to me in class, another way to proceed is to use the Lebesgue density theorem (or Lebesgue differentiation theorem) combined with Exercise 14 from Lecture 6.)
Exercise 6. Let be a standard Borel probability space. Show that the Bernoulli shift on the product system
is ergodic. (Hint: first establish property 6 of Theorem 3 when E and F each depend on only finitely many of the coordinates of
.)
Exercise 7. Let be an ergodic system. Show that if
is an eigenvalue of
, then
, the eigenspace
is one-dimensional, and that every eigenfunction f has constant magnitude |f| a.e.. Show that the the eigenspaces are orthogonal to each other in
, and the set of all eigenvalues of T forms an at most countable subgroup of the unit circle
.
Now we give a less trivial example of an ergodic system.
Proposition 1. (Ergodicity of skew shift) Let
be irrational. Then the skew shift
is ergodic.
Proof. Write the skew shift system as . To simplify the notation we shall omit the phrase “almost everywhere” in what follows.
We use an argument of Parry. If the system is not ergodic, then we can find a non-constant such that Tf = f. Next, we use Fourier analysis to write
, where
. Since f is T-invariant, and the vertical rotations
commute with T, we see that the
are also T-invariant. The function
depends only on the x variable, and so is constant by Exercise 5. So it suffices to show that
is zero for all non-zero m.
Fix m. We can factorise . The T-invariance of
now implies that
. If we then define
for
, we see that
, thus
is an eigenfunction of the circle shift with eigenvalue
. But this implies (by Exercise 7) that
is orthogonal to
for
close to zero. Taking limits we see that
is orthogonal to itself and must vanish; this implies that
and hence
vanish as well, as desired.
Exercise 8. Show that for any irrational and any
, the iterated skew shift system
is ergodic.
– Generic points –
Now let us suppose that we have a topological measure preserving system , i.e. a measure-preserving system
which is also a topological dynamical system
, with
the Borel
-algebra of T. Then we have the space C(X) of continuous (real or complex-valued) functions on X, which is dense inside
. From the Stone-Weierstrass theorem we also see that C(X) is separable.
A sequence in X is said to be uniformly distributed with respect to
if we have
(15)
for all . A point x in X is said to be generic if the forward orbit
is uniformly distributed.
Exercise 9. Let be a compact metrisable space with a Borel probability measure
, and let
be a sequence in X. Show that this sequence is uniformly distributed if and only if
for all open sets U in X with
. What happens if the hypothesis that the boundary of
has measure zero is removed?
From Theorem 3 and the separability of C(X) we obtain
Proposition 2. A topological measure-preserving system is ergodic if and only if almost every point is generic.
A topological measure-preserving system is said to be uniquely ergodic if every point is generic. The following exercise explains the terminology:
Exercise 10. Show that a topological measure-preserving system is uniquely ergodic if and only if the only T-invariant Borel probability measure on T is
. (Hint: use Lemma 1 from Lecture 7.) Because of this fact, one can sensibly define what it means for a topological dynamical system
to be uniquely ergodic, namely that it has a unique T-invariant Borel probability measure.
It is not always the case that an ergodic system is uniquely ergodic. For instance, in the Bernoulli system (with uniform measure on
, say), the point
is not generic. However, for more algebraic systems, it turns out that ergodicity and unique ergodicity are largely equivalent. We illustrate this with the circle and skew shifts:
Exercise 11. Show that the circle shift (with the usual Lebesgue measure) is uniquely ergodic if and only if
is irrational. (Hint: first show in the circle shift system that any translate of a generic point is generic.)
Proposition 3. (Unique ergodicity of skew shift) Let
be irrational. Then the skew shift
is uniquely ergodic.
Proof. We use an argument of Furstenberg. We again write the skew shift as . Suppose this system was not uniquely ergodic, then by Exercise 10 there is another shift-invariant Borel probability measure
. If we push
and
down to the circle shift system
by the projection map
, then by Exercises 10, 11 we must get the same measure. Thus
and
must agree on any set of the form
.
Let E denote the points in which are generic with respect to
; note that this set is Borel measurable. By Proposition 2, this set has full measure in
. Also, since the vertical rotations
commute with T and preserve
, we see that E must be invariant under such rotations; thus they are of the form
for some A. By the preceding discussion, we conclude that E also has full measure in
. But then (by the pointwise or mean ergodic theorem for
) we conclude that
-almost everywhere for every continuous f, and thus on integrating with respect to
we obtain
for every continuous f. But then by the Riesz representation theorem we have
, a contradiction.
Corollary 2. If
is irrational, then the sequence
is uniformly distributed in
(with respect to uniform measure).
Exercise 11a. Show that the systems considered in Exercise 8 are uniquely ergodic. Conclude that the exponent 2 in Corollary 2 can be replaced by any positive integer d.
Note that the topological dynamics theory developed in Lecture 6 only establishes the weaker statement that the above sequence is dense in rather than uniformly distributed. More generally, it seems that ergodic theory methods can prove topological dynamics results, but not vice versa. Here is another simple example of the same phenomenon:
Exercise 12. Show that a uniquely ergodic topological dynamical system (with the support of the measure equal to the whole space) is necessarily minimal. (The converse is not necessarily true, as already mentioned in Remark 6 of Lecture 7.)
– The ergodic decomposition –
Just as not every topological dynamical system is minimal, not every measure-preserving system is ergodic. Nevertheless, there is an important decomposition that allows one to represent non-ergodic measures as averages of ergodic measures. One can already see this in the finite case, when X is a finite set with the discrete -algebra, and
is a permutation on X, which can be decomposed as the disjoint union of cycles on a partition
of X. In this case, all shift-invariant probability measures take the form
(16)
where is the uniform probability measure on the cycle
, and
are non-negative constants adding up to 1. Each of the
are ergodic, but no non-trivial linear combination of these measures is ergodic. Thus we see in the finite case that every shift-invariant measure can be uniquely expressed as a convex combination of ergodic measures.
It turns out that a similar decomposition is available in general, at least if the underlying measure space is a compact topological space (or more generally, a Radon space). This is because of the following general theorem from measure theory.
Definition 1 (Probability kernel). Let
and
be measurable spaces. A probability kernel
is an assignment of a probability measure
on X to each
in such a way that the map
is measurable for every bounded measurable
.
Example 3. Every measurable map induces a probability kernel
. Every probability measure on X can be viewed as a probability kernel from a point to X. If
and
are two probability kernels from Y to Z and from X to Y respectively, their composition
is also a probability kernel, where
is the measure that assigns
to any measurable set E in Z. Thus one can view the class of measurable spaces and their probability kernels as a category, which includes the class of measurable spaces and their measurable maps as a subcategory.
Definition 2. (Regular space) A measurable space
is said to be regular if there exists a compact metrisable topology
on X for which
is the Borel
-algebra.
Example 4. Every topological measure-preserving system is regular.
Remark 5. Measurable spaces in which
is the Borel
-algebra of a topological space generated by a separable complete metric space (i.e. a Polish space) are known as standard Borel spaces. It is a non-trivial theorem from descriptive set theory that up to measurable isomorphism, there are only three types of standard Borel spaces: finite discrete spaces, countable discrete spaces, and the unit interval [0,1] with the usual Borel
-algebra. From this one can see that regular spaces are the same as standard Borel spaces, though we will not need this fact here.
Theorem 4 (Disintegration theorem). Let
and
be probability spaces, with
regular. Let
be a morphism (thus
). Then there exists a probability kernel
such that
(17)
for any bounded measurable
and
. Also, for any such g, we have
![]()
-a.e. (18)
for
-a.e. y.
Furthermore, this probability kernel is unique up to
-almost everywhere equivalence, in the sense that if
is another probability kernel with the same properties, then
for
-almost every
.
We refer to the probability kernel generated by the above theorem as the disintegration of
relative to the factor map
.
Proof. We begin by proving uniqueness. Suppose we have two probability kernels with the above properties. Then on subtraction we have
(19)
for all bounded measurable ,
. Specialising to
for some measurable set
, we conclude that
for
-almost every y. Since
is regular, it is separable and we conclude that
for
-almost every y, as required.
Now we prove existence. The pullback map defined by
has an adjoint
, thus
(20)
for all and
. It is easy to see from duality that we have
for all
(where we select a compact metrisable topology that generates the regular
-algebra
). Recall that
is not quite a measurable function, but is instead an equivalence class of measurable functions modulo
-almost everywhere equivalence. Since C(X) is separable, we find a measurable representative
of
to every
which varies linearly with f, and is such that
for all y outside of a set E of
-measure zero and for all
. For all such y, we can then apply the Riesz representation theorem to obtain a Radon probability measure
such that
(21)
for all such y. We set equal to some arbitrarily fixed Radon probability measure for
. We then observe that the required properties (including the measurability of
) are already obeyed for
. To generalise this to bounded measurable f, observe that the class
of f obeying the required properties is closed under dominated pointwise convergence, and so contains the indicator functions of open or compact sets (by Urysohn’s lemma). Applying dominated pointwise convergence again and inner and outer regularity, we see that the indicator functions of any Borel set lies in
. Thus all simple measurable functions lie in
, and on taking uniform limits we obtain the claim.
Finally, we prove (18). From two applications of (17) we have
(22)
for all bounded measurable and
. The claim follows (using the separability of the space of all f).
Exercise 13. Let the notation and assumptions be as in Theorem 4. Suppose that is also regular, and that the map
is continuous with respect to some compact metrisable topologies that generate
and
respectively. Then show that for
-almost every y, the probability measure
is supported in
.
Proposition 4 (Ergodic decomposition). Let
be a regular measure-preserving system. Let
be the system defined by
,
,
, and
, and let
be the identity map. Let
be the disintegration of
with respect to the factor map
. Then for
-almost every y, the measure
is T-invariant and ergodic.
Proof. Observe from the T-invariance of
(and of
) that the probability kernel
would also be a disintegration of
. Thus we have
for
-almost every y.
Now we show the ergodicity. As the space of bounded measurable is separable, it suffices by Theorem 3 and a limiting argument to show that for any fixed such f, the averages
converge pointwise
-a.e. to
for
-a.e. y.
From the pointwise ergodic theorem, we already know that converges to
outside of a set of
-measure zero. By (17), this set also has
-measure zero for
-almost every y. Thus it will suffice to show that
is
-a.e. equal to
for
-a.e. y. Now observe that
, so the claim follows from (18) and (21).
Exercise 14. Let be a separable measurable space, and let T be bimeasurable bijection
. Let
denote the Banach space of all finite measures on
with the total variation norm. Let
denote the collection of probability measures on
which are T-invariant. Show that this is a closed convex subset of
, and the extreme points of
are precisely the ergodic probability measures (which also form a closed subset of
). This allows one to prove a variant of Proposition 4 using Choquet’s theorem.
Exercise 15. Show that a topological measure-preserving system is uniquely ergodic if and only if the only ergodic shift-invariant Borel probability measure on X is
.
[Update, Feb 6: Some corrections; new exercises added.]
[Update, Feb 23: More exercises added.]

62 comments
Comments feed for this article
5 February, 2008 at 7:04 am
Lior
In the proof of Thm 1, we should “[o]bserve that
…” and not as written. Ex.11 duplicates Ex. 5 (should ask to show the circle shift is uniquely ergodic). Def. 1 should note that
is a measure on
. Example 3 should have
and not as written; “composition” there is a form of convolution.
5 February, 2008 at 11:53 am
Terence Tao
Thanks, Lior, for the corrections!
5 February, 2008 at 1:26 pm
Lior
Of course my remark on Example 3 is wrong — sorry about that. I too should re-read what I write.. To Exercize 14 you can add showing that the set of ergodic measures is closed (eliminating the annoyance with the meaning of the “support” of the measure you get from Choquet’s theorem).
13 February, 2008 at 1:39 pm
254A, Lecture 11: Compact systems « What’s new
[...] measure-preserving systems are isomorphic); the notion of regularity was defined in Definition 2 of Lecture 9. Hint: take a countable shift-invariant family of sets that generate (thus T acts on this space by [...]
22 February, 2008 at 2:39 pm
254A, Lecture 12: Weakly mixing systems « What’s new
[...] system with irrational (and with uniform measure), thus this system is ergodic by Exercise 5 from Lecture 9. Then the function has mean zero, but one easily computes that . The coefficients converge in the [...]
27 February, 2008 at 7:07 pm
254A, Lecture 13: Compact extensions « What’s new
[...] 8. Show that each of the iterated skew shifts (Exercise 8 from Lecture 9) are compact extensions of the preceding skew [...]
6 March, 2008 at 10:07 am
254A, Lecture 14: Weakly mixing extensions « What’s new
[...] 1. If X is regular, then we can disintegrate the measure as an average , see Theorem 4 from Lecture 9. It is then possible to construct a relative product system , which is the product system but with [...]
9 March, 2008 at 8:55 pm
254A, Lecture 16: A Ratner-type theorem for nilmanifolds « What’s new
[...] This result is originally due to Leon Green, using spectral theory methods. We will use an argument of Parry (and adapted by Leibman), relying on “vertical” Fourier analysis and topological arguments, which we have already used for the skew shift in Proposition 1 of Lecture 9. [...]
15 March, 2008 at 3:07 pm
254A, Lecture 17: A Ratner-type theorem for SL_2(R) orbits « What’s new
[...] to H, and thus (by Corollary 1) ergodic with respect to U. This implies (cf. Proposition 2 from Lecture 9) that -almost every point x in is generic (with respect to U) in the sense [...]
31 March, 2008 at 9:15 am
sugata
sir,
in proving thm.1, you have written F(m+1)=max{0,f+PFm}.this means f(m+1) is positive always.but from the definition that is not true for negative functions.
31 March, 2008 at 8:32 pm
Terence Tao
Dear sugata,
20 June, 2008 at 7:59 pm
The strong law of large numbers « What’s new
[...] Remark 4. By viewing the sequence as a stationary process, and thus as a special case of a measure-preserving system one can view the weak and strong law of large numbers as special cases of the mean and pointwise ergodic theorems respectively (see Exercise 9 from 254A Lecture 8 and Theorem 2 from 254A Lecture 9). [...]
23 December, 2008 at 5:07 am
liuxiaochuan
Dear Professor Tao:
In theorem 1, is the linear operation P asked to be positive?
in (10), should the right-hand side be
or
?
23 December, 2008 at 8:55 am
Terence Tao
Thanks for the corrections!
24 February, 2009 at 11:26 am
mykola
Dear Professor Tao,
in the last but one paragraph of the proof of Theorem 4 you claim:
“we see that the sets whose indicator functions lie in
form a
-algebra.”
It is not clear that the mentioned collection (of the sets whose indicator functions lie in
) is closed under taking a union of (not disjoint) sets.
24 February, 2009 at 11:46 am
Terence Tao
Fair enough. I guess we can use outer and inner regularity (and dominated or monotone convergence) to get indicator functions of Borel sets to lie in
, instead.
24 February, 2009 at 1:22 pm
mykola
Thank you very much!
1 April, 2009 at 10:16 am
pablolessa
In the proof of the pointwise ergodic theorem (Theorem 2). I don’t see how one can aproximate a given
variable by a
one, while at the same time preserving the conditional expectation.
1 April, 2009 at 10:28 am
Terence Tao
Dear pablolessa,
One can approximate (in
) an
function f with zero conditional expectation by a function g in
with arbitrarily high accuracy (in
). Now, g may not have zero conditional expectation a priori, but one can simply replace it by its projection
onto the zero expectation functions. Since this projection is bounded on
, and also preserves f, we see that
also approximates f in
to arbitrarily high accuracy.
1 April, 2009 at 1:05 pm
pablolessa
You’re right! I got mixed up trying to bound the
norm of
which actually might be unbounded.
Thank you.
24 June, 2009 at 6:57 pm
Anonymous
Dear Terry,
I was wondering if it is true that in the setting of Proposition 4, the invariant $\sigma$-algebra ${\cal X}^T$ is always countably generated? If for some reason this is true, it would give an easy way to check that measures $\mu_y$ are $T$-invariant and ergodic without appealing to the pointwise ergodic theorem, simply by looking at what happens on the sets that generate ${\cal X}^T$.
Thanks!
25 June, 2009 at 12:30 pm
Terence Tao
Dear anonymous,
Hmm, the issue seems to be delicate. It is not hard to show that
is countably generated modulo
-null sets – for instance one can take a countable dense subset of
and look at their projections to
. But this doesn’t necessarily mean that
is countably generated modulo
-null sets for almost every y, unless one is very careful with the measure-theoretic issues. (One may also have to be more precise about how
is defined – does one want genuinely
-invariant sets, or sets that are only invariant up to
-null sets?)
6 August, 2009 at 7:33 am
陶哲轩遍历论习题解答:第五讲 « Liu Xiaochuan’s Weblog
[...] is similar to the proof of theorem 4, though some difference should be taken care of. I wrote another post in [...]
6 August, 2009 at 7:35 am
liuxiaochuan
Dear Professor Tao:
about (10), I am wondering if the right hand should be
[Corrected, thanks - T.]
7 August, 2009 at 10:45 am
liuxiaochuan
Dear Professor Tao:
, I don’t see why 
about (6), I only can get
7 August, 2009 at 10:59 am
Terence Tao
10 August, 2009 at 1:31 am
liuxiaochuan
Dear Professor Tao:
In Exercise 4 , the two integrals in the conclusion are both dy, instead of dx.
22 October, 2009 at 7:26 am
Carlos Meninho
Dear Professor Terence Tao,
I have a question regarding Exercise 9; consider the following topological dynamical system on the circle:
Take the homeomorphism
. This map fixes 0 and 1, hence induces a homeomorphism of the circle into itself. It is easy to see that this dynamical system has only one invariant probability measure: the atomic measure on the point 0~1.
Therefore, is uniquely ergodic by Exercise 10 and every point on the circle is generic. The positive orbit of a point
in
is contained in this open set. If Exercise 9 is true then
, but
, that is a contradiction.
Could you indicate what is the problem in this example?
I can prove a version of this exercise changing “for all open sets
in
” by “for all open sets
in
with
” (the measure is zero on the boundary). This is true with the local compacity condition (weaker than compacity) by using Urysohn’s Lemma.
Best wishes,
Carlos Meninho.
22 October, 2009 at 8:32 am
Terence Tao
Hmm, you’re right; I’ve changed Exercise 9 accordingly.
1 January, 2010 at 8:47 pm
254A, Notes 0: A review of probability theory « What’s new
[...] for every random variable , by using tools such as the Radon-Nikodym theorem; see Theorem 4 of these previous lecture notes of mine. In practice, we will not invoke these general results here (as it is not natural for us to place [...]
1 April, 2010 at 12:59 am
liuxiaochuan
Dear Professor Tao:
and the last sentance of exercise 13 is “…measure
is …
I think the right hand sidd of (22) should be
While I am trying to do exercise 13, I construct a function g in order to get a contradiction with (18) as follows.
I fix a
and define 
It seems this can work. But in this way the condition that “
is continuous ” is not necessary. I don’t know if I am right.
On exercise 14, I didn’t found a way to prove “any non-ergodic measure is not extreme points” as one direction of the conclusion without using the Ergodic decomposition theorem. But since you said this can provide a alternative method for that theorem, could give a hint for this?
1 April, 2010 at 9:18 am
Terence Tao
Thanks for the corrections!
The continuity of
is needed to make the preimage
closed (note that the support of a measure is closed by definition).
For exercise 14, use the fact that a non-ergodic measure has an invariant set of measure strictly between 0 and 1. By conditioning the measure to that set or its complement one can show that the measure is not an extreme measure.
2 April, 2010 at 7:04 am
anonymous
The tag you have used for this post is different from the tag for the other posts in the series.
2 April, 2010 at 7:06 am
anonymous
Ah! Never mind, I thought this was part of the same series as the Random Matrices one (which stopped at nr 8)…
23 April, 2010 at 4:14 pm
Michael S.
Dear Professor Tao,
concerning Exercise 12 there seems to be a small problem (hope I understood the definitions and details):
Suppose T is a homeomorphism of the circle K with exactly one fixed point, say 0, whose complement K\{0} is wandering. E.g. we may take the map
z=exp(2*pi*i*x) —> T(z)=exp(2*pi*i*x^2) where x is in [0,1].
Then T is uniquely ergodic as every invariant measure is supported in the non-wandering set, thus there is only one invariant measure, the Dirac mass in 0. Nevertheless T is not minimal as is has a fixed point.
Best
Michael
18 June, 2010 at 2:53 am
Solutions to Ergodic Theory:Lecture twelve « Xiaochuan Liu's Weblog
[...] First we recall the exercise 11a of lecture 9, which asserts thatthen the sequence is uniformly distributed in , which means (by choosing ) that [...]
12 August, 2010 at 6:50 am
Sam
I want an example of an ergodic measure preserving transformation T for which T^2 is not ergodic.
12 August, 2010 at 6:57 am
Anonymous
Dear Sam,
just take the non-trivial permutation (flip) on two points of measure 1/2 each. This is ergodic by definition (no invariant sets of measure strictly between 0 and 1), but its square will be the identity which is not ergodic (each point is an invariant subset of measure 1/2).
You can modify this idea to get more elaborate examples on more complicated measure spaces.
Best
Michael
12 August, 2010 at 7:05 am
Sam
By T^2, I mean the second iterate of T. That is, T(T(x)). …. and not T x T
12 August, 2010 at 7:07 am
Anonymous
That’s what it is…
Say you have the two points 1 and 2 and the transformation is 1 goes to 2 and 2 goes to 1, then T^2 is the identity…
12 August, 2010 at 7:16 am
Sam
well-understood now. Great!!!!
12 August, 2010 at 8:17 am
Sam
oops!!! can I use the Dirac measure as a measure preserving transformation?
12 August, 2010 at 8:37 am
Anonymous
The Dirac measure is not a transformation, but what you probably mean is, can you define a (measure preserving) transformation which preserves just the Dirac measure.
The answer is yes. Suppose you have any space (set etc.). Put a Dirac mass on one of its points (the remaining set/space then has measure zero, as you normally want a probability space). Then a transformation is measure preserving iff it has the point with the Dirac mass as a fixed point and in fact then it is already an ergodic transformation.
The other way around it works too: Suppose you have a given transformation on some space and you want to get an invariant measure. If the transformation has a fixed point, just put a Dirac mass in this point and you produced an invariant measure.
If the transformation has a finite orbit (i.e. a periodic point with finite period) just put the same mass on each of the points of this finite orbit – the mass should be 1/(length of the orbit) – and you again produced an invariant measure.
In fact most often you construct arbitrary invariant measures as limits (in the weak star topology) of measures which have support on certain finite orbits. Usually those “periodic” measures are dense in the space of all invariant measures and thus you can always approximate an invariant measure using a sequence of those special measures.
12 August, 2010 at 4:25 pm
Sam
For the class of invariant events, I, how does one show that a random variable X is I-measurable if and only if X=XoT, where T is a map that preserves the probability P? I have tried it for an indicator variable, then simple r. v., then a nonnegative r. v. but having problem getting a formal derivation for the general r. v. Moreover, my idea on the indicator variable is not clear as to how I should put it mathematically.
16 October, 2010 at 8:29 pm
245A, Notes 5: Differentiation theorems « What’s new
[...] this inequality implies Lemma 12. (Hint: First do the case, by invoking the rising sun lemma.) See these lecture notes for some further discussion of inequalities of this type, and applications to ergodic theory (and [...]
23 May, 2011 at 8:50 am
Stein’s spherical maximal theorem « What’s new
[...] abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type with a constant of , and this can be [...]
2 October, 2012 at 7:26 am
Rex
In the statement of the Dunford-Schwartz maximal inequality, you write
”
“
Do you want this to be
instead?
Otherwise, we seem to be missing the
term in the sum, and then it is not clear to me why
if and only if
for sufficiently large
.
[Corrected, thanks - T.]
3 October, 2012 at 1:54 pm
Rex
In the Dunford-Schwartz maximal inequality, is there any need to assume that
is a probability space? I don’t see any place where this assumption is used.
In particular, we should be able to just apply it directly to Exercise 1, yes?
3 October, 2012 at 2:12 pm
Terence Tao
It’s used implicitly via the fact that constant functions (such as
) lie in
, which is only true in measure spaces of finite measure.
But yes, if one formulates things carefully, one should be able to extend the Dunford-Schwarz inequality to the infinite measure spacing (but one would have to extend P from
to some larger space, such as
).
4 October, 2012 at 5:01 pm
Rex
In the statement of the pointwise ergodic theorem, we assume that
is
, but not necessarily
. How do we define its conditional expectation
then?
4 October, 2012 at 6:24 pm
Terence Tao
See Exercise 6.4 of Lecture notes 8. The point is that conditional expectation is a contraction in L^1 norm, and L^2 is a dense subspace of L^1, so one can extend conditional expectation to L^1 by density.
5 October, 2012 at 3:19 pm
Rex
I’m a bit confused with the composition of probability kernels in Example 3.
The composition
seems to have domain
, but shouldn’t composing a map from
to
and then
to
should give us something with domain
?
Also, we seem to be taking values
where
, but
is a measure on
.
5 October, 2012 at 4:16 pm
Terence Tao
Sorry,
and
should be from Y to Z and from X to Y respectively.
5 October, 2012 at 5:03 pm
Rex
Ah, okay. That makes more sense. I think you want to write the integrals
to be taken over
rather than
as well.
[Corrected, thanks - T.]
14 April, 2013 at 7:39 am
Jeff
I don’t think I know how to go from the hint in exercise 6 to the statement for general E, F. Also, don’t you need the space
to be a standard Borel space or something so that you can take infinite product spaces?
14 April, 2013 at 7:49 am
Jeff
To be specific, I know that the
theorem is useful here, but cannot check that the left side of condition 6 is a measure, because the limit may not commute with infinite sums. (Or you can view this as checking that the set where 6 holds for say fixed F is a lambda system.)
14 April, 2013 at 9:58 am
Terence Tao
Yes, one needs a hypothesis such as standard Borel (actually inner regular should suffice) in the exercise; I’ve added it as such. To handle general measurable sets, show that such sets an be approximated in measure by sets that depend on only finitely many factors.
15 April, 2013 at 11:09 am
Stéphane Laurent
There is an error in point 6 of Theorem. The exponent should be -n, not n. For instance take T(x)=2x(mod 1) and E=[0,1/2]. Then 6 fails but T is ergodic.
15 April, 2013 at 11:46 am
Terence Tao
In these notes we are restricting attention to invertible systems (see http://terrytao.wordpress.com/2008/01/08/254a-lecture-1-overview/ ).
15 April, 2013 at 12:30 pm
Stéphane Laurent
Ooops sorry. But isn’t it more natural with the negative exponent ?
15 April, 2013 at 2:57 pm
Terence Tao
I suppose so, though in the invertible case one has
so there is basically no distinction to be made in that case.
21 May, 2013 at 4:30 pm
Multiple recurrence and convergence results associated to $F_{p}^{omega}$-actions | What's new
[…] is the pushforward map associated to the map ; see e.g. this previous blog post. We can interpret this as an equidistribution result. If is a pair as before, then we no longer […]