In our final lecture on topological dynamics, we discuss a remarkable theorem of Furstenberg that classifies a major type of topological dynamical system – *distal* systems – in terms of highly structured (from an algebraic point of view) systems, namely towers of isometric extensions. This theorem is also a model for an important analogous result in ergodic theory, the *Furstenberg-Zimmer structure theorem*, which we will turn to in a few lectures. We will not be able to prove Furstenberg’s structure theorem for distal systems here in full, but we hope to illustrate some of the key points and ideas.

— Distal systems —

Furstenberg’s theorem concerns a significant generalisation of the equicontinuous (or isometric) systems, namely the *distal* systems.

Definition 1.(Distal systems) Let be a topological dynamical system, and let d be an arbitrarily metric on X (it is not important which one one picks here). We say that two points x, y in X areproximalif we have . We say that X is distal if no two distinct points in X are proximal, or equivalently if for every distinct x, y there exists such that for all n.

It is obvious that every isometric or equicontinuous system is distal, but the converse is not true, as the following example shows:

**Example 1. ** If , then the skew shift turns out to be not equicontinuous; indeed, if we start with a pair of nearby points for some large n and apply , one ends up with and , thus demonstrating failure of equicontinuity. On the other hand, the system is still distal: given any pair of distinct points , either (in which case the horizontal separation between and is bounded from below) or (in which case the vertical separation is bounded from below).

**Exercise 1**. Show that any non-trivial Bernoulli system is not distal.

Distal systems interact nicely with the action of the compactified integers :

**Exercise 2**. Let be a topological dynamical system.

- Show that two points x, y in X are proximal if and only if for some .
- Show that X is distal if and only if all the maps for are injective.
- If X is distal, show that whenever is idempotent. (Hint: use part 2.)
- If X is distal, show that the set of transformations on X forms a group, known as the
*Ellis group*of X. (Hint: use part 3, together with Lemma 3 from Lecture 5.) Show that G is a compact subset of (with the product topology), and that G acts transitively on X if and only if X is minimal.

**Exercise 3.** Show that an inverse limit of a totally ordered set of distal factors is still distal. (This turns out to be slightly easier than Lemma 1 from the previous lecture.)

**Exercise 4**. Show that every topological dynamical system has a maximal distal factor. (Hint: repeat the proof of Corollary 1 from the previous lecture.)

**Exercise 5.** Show that any distal system can be partitioned into disjoint minimal distal systems. (One can of course adapt the proof of Proposition 2 from the previous lecture to do this; but there is a slicker way to do it by exploiting the Ellis group.)

Note that the skew shift system, while not isometric, does have a non-trivial isometric factor, namely the circle shift with the projection map . It turns out that this phenomenon is general:

Theorem 1(Baby Furstenberg structure theorem). Let be minimal, distal and non-trivial (i.e. not a point). Then X has a non-trivial isometric factor .

This result – a toy case of Furstenberg’ s full structure theorem – is already rather difficult to establish. We will not give Furstenberg’s original proof here (though see Exercise 13 below), but will at least sketch how the factor is constructed. A key object in the construction is the symmetric function defined by the formula

. (3)

**Example 2.** We again consider the skew shift with irrational. For sake of concreteness let us choose the taxicab metric , where is the distance from x to the integers. Then one can check that is equal to when is irrational, and equal to when is rational, where q is the least positive integer such that is an integer. Thus F is highly discontinuous, but it is at least upper semi-continuous in each of its two variables. (Actually, the upper semi-continuity of F holds for arbitrary topological dynamical systems, since F is the infimum of continuous functions.)

**Exercise 6.** Let G be the Ellis group of a minimal distal system X.

- For any , show that . In particular, for all .
- For any , show that the set is a minimal subsystem of (with the product shift . Conclude in particular that if , then the set is syndetic.
- If and is such that , show that there exists such that whenever .
- Let be the space X whose topology is generated by the basic open sets . (That this is a base follows from 3.) Equivalently, is equipped with the weakest topology on which F is upper semi-continuous in each variable. Show that is a weaker topological space than X (i.e. the identity map from X to is continuous); in particular, is compact. Also show that all the maps in G are homeomorphisms on .

If were Hausdorff, then the system would be equicontinuous, by Exercise 2 from the previous lecture. Unfortunately, is not Hausdorff in general. However, it turns out that we can “quotient out” the non-Hausdorff nature of . Define the equivalence relation on by declaring if we have for all z outside of a set of the first category in X. This is clearly an equivalence relation, and so we can create the quotient space ; since X embeds into we thus have a factor map . It is a deep fact (which we will not prove here) that this quotient space is non-trivial and Hausdorff, and that is preserved by the shift T and even by the Ellis group G (thus if and then . Because of this, G continues to act on Y homeomorphically, and so by Exercise 2 from the previous lecture, is a non-trivial isometric factor of X as desired.

**Exercise 7.** Show that in the case of the skew shift (Example 2), this construction recovers the factor that was discussed just before Theorem 1. (The trickiness of this exercise should already give you some idea of the difficulty level of Theorem 1.)

— The Furstenberg structure theorem for distal systems —

We have already noted that isometric systems are distal systems. More generally, we have

**Exercise 8.** Show that an isometric extension of a distal system is still distal. (Hint: Example 1 is a good model case.)

Thus, for instance, the iterated skew shifts that appear in (5) from the previous lecture are distal. Also, recall from Exercise 7 that the inverse limit of distal systems is again distal. It turns out that these are the *only *ways to generate distal systems, in the following sense:

Theorem 2. (Furstenberg’s structure theorem for distal systems) Let be a distal system. Then there exists an ordinal and a factor for every with the following properties:

- is a point.
- For every successor ordinal , is an isometric extension of .
- For every limit ordinal , is an inverse limit of the for .
- is equal to X.

The collection of factors is sometimes known as a “Furstenberg tower”.

Theorem 2 follows by applying Zorn’s lemma with the following key proposition:

Proposition 1.(Key inductive step) Let be a distal system, and let Y be a proper factor of X (i.e. the factor map is not an isomorphism). Then there exists another factor Z of X which is a proper isometric extension of Y.

Note that Theorem 1 is the special case of Proposition 1 when Y is a point. Indeed, Proposition 1 is proven in the same way as Theorem 1, but with several additional technicalities which I will not discuss here; see the original paper of Furstenberg for details.

**Exercise 9.** Deduce Theorem 2 from Proposition 1 and Zorn’s lemma.

**Remark 1.** It is known that in Theorem 2, one can take the ordinal to be countable, and conversely that for every countable ordinal , there exists a system whose smallest Furstenberg tower has height .

**Remark 2.** Several generalisations and extensions of Furstenberg’s structure theorem are known, but they are somewhat technical to state and will not be detailed here; see this survey of Glasner for a discussion.

— Weak mixing and isometric factors —

We have seen that distal systems always contain non-trivial isometric factors. What about more general systems? It turns out that there is in fact a nice dichotomy between systems with non-trivial isometric factors, and those without.

Definition 2.(Topological transitivity) A topological dynamical system istopologically transitiveif, for every pair U, V of non-empty open sets, there exists an integer n such that .

**Exercise 10**. Show that a topological dynamical system is topologically transitive if and only if it is equal to the orbit closure of one of its points. (Compare this with minimal systems, which is the orbit closure of *any* of its points. Thus minimality is stronger than topological transitivity; for instance, the compactified integers with the usual shift is topologically transitive but not minimal.)

**Exercise 11. **Show that any factor of a topologically transitive system is again topologically transitive.

Definition 3.(Topological weak mixing) A topological dynamical system istopologically weakly mixingif the product system is topologically transitive.

**Exercise 12. **A system is said to be *topologically mixing* if for every pair U, V of non-empty open sets, one has for all sufficiently large n. Show that topological mixing implies topological weak mixing. (The converse is false, but actually constructing a counterexample is somewhat tricky.)

**Example 3. **No circle shift is topologically weak mixing (or topologically mixing), even though such shifts are minimal (and hence transitive) when is irrational. On the other hand, any Bernoulli shift is easily seen to be topologically mixing (and hence topologically weak mixing).

We have the following dichotomy, first proven by Keynes and Robertson (using ideas from the above-mentioned paper of Furstenberg):

Theorem 3. (Dichotomy between structure and randomness) Let be a minimal topological dynamical system. Then exactly one of the following statements is true:

- (Structure) X has a non-trivial isometric factor.
- (Randomness) X is topologically weakly mixing.

**Remark 3.** Combining this with Exercise 6 from the previous lecture, we obtain an equivalent formulation of this theorem: a minimal system is topologically weakly mixing if and only if it has no non-trivial eigenfunctions.

**Proof. **We first prove the easy direction: that if X has a non-trivial isometric factor, then it is not topologically weakly mixing. In view of Exercise 11, it suffices to prove this when X itself is isometric. Let be two distinct points of Y, let r denote the distance between x and x’ with respect to the metric that makes X isometric, and let B and B’ be the open balls of radius r/10 centred at x and x’ respectively. As X is isometric, we see for any integer n that cannot intersect both B and B’, or equivalently that cannot intersect . Thus X is not topologically transitive as desired.

Now we prove the difficult direction: if X is not topologically weakly mixing, then it has a non-trivial isometric factor. For this we use an argument of Blanchard, Host, and Maass, based on earlier work of McMahon. By Definition 3, there exist open non-empty sets U, V in such that for all n. If we thus set , we see that K is a compact proper -invariant subset of with non-empty interior. On the other hand, the projection of K to either factor of is a non-empty compact invariant subset of X and thus must be all of X.

We need to somehow use K to build an isometric factor of X. For this, we shall move from the topological dynamics setting to that of the ergodic theory setting. By Corollary 1 in the appendix, X admits an invariant Borel measure . The support of is a non-empty closed invariant subset of X, and is thus equal to all of X by minimality.

The space is a metric space, with an isometric shift map . We define the map by the formula

(1)

for all , where is the indicator function of K. Because K has non-empty interior and non-empty exterior, and because has full support, it is not hard to show that is non-constant. By the T-invariance of W, it also preserves the shift T. So if we can show that is continuous, we see that will be a non-trivial isometric factor of X and we will be done.

Let us first consider the scalar function . From the dominated convergence theorem and the fact that K is closed, we see that f is upper semi-continuous, and continuous at at least one point, thanks to Lemma 3 from Lecture 4. On the other hand, since K is -invariant and is T-invariant, we see that f is T-invariant. Applying Exercise 15 from Lemma 4 we see that f is constant. On the other hand, as K is closed we have for any , and so by dominated convergence again we see that converges in to zero outside of the support of . Combining this with the constancy of f we conclude that converges to in on all of X, and thus is continuous as required.

**Remark 4.** Note how the measure-theoretic structure was used to obtain metric structure, by passing from the measure space to the metric space . This again shows that one can sometimes upgrade weak notions of structure (such as topological or measure-theoretic structure) to strong notions (such as geometric or algebraic structure).

**Exercise 13**. Use Theorem 3 to prove Theorem 1. (Hint: use Exercise 10.)

**Remark 5.** It would be very convenient if one had a relative version of Theorem 3, namely that if X is an extension of Y, then X is either relatively topologically weakly mixing with respect to Y (which means that the relative product is topologically transitive), or else X has a factor Z which is a non-trivial isometric extension of Y; among other things, this would have given a new proof of Theorem 2, and in fact establish a somewhat stronger structural theorem. Unfortunately, this relative version fails; a counterexample (based on the Morse sequence) can be found in Exercise 1.19.3 of Glasner’s book. Nevertheless, the analogue of this claim does hold true in the measure-theoretic setting, as we shall see in a few lectures.

— Appendix: sequential compactness of Borel probability measures —

We now recall some standard facts from measure theory about Borel probability measures on a compact metrisable space X. Recall that a sequence of such measures converges in the vague topology to another if we have for all .

Lemma 1.(Vague sequential compactness) The space of Borel probability measures on X is sequentially compact in the vague topology.

**Proof. **From the Stone-Weierstrass theorem we know that C(X) is separable. The claim then follows from Riesz representation theorem and the usual Arzelà-Ascoli diagonalisation argument.

Corollary 1.(Krylov-Bogolubov theorem) Let be a topological dynamical system. Then there exists a T-invariant probability measure on X.

Proof. Pick any point and consider the finite probability measures

(1)

where is the Dirac mass at x. By Lemma 1, some subsequence converges in the vague topology to another Borel probability measure . Since we have

(2)

**Remark 6.**Note that Corollary 1, like many other results obtained via compactness methods, guarantees existence of an invariant measure but not uniqueness (this latter property is known as

*unique ergodicity*). Even for minimal systems, it is possible for uniqueness to fail, although actually constructing an example is tricky (see for instance this paper of Furstenberg). However, as already observed in the proof of Theorem 3, any invariant measure on a minimal topological dynamical system must be

*full*(i.e. its support must be the whole space).

## 17 comments

Comments feed for this article

28 January, 2008 at 10:31 pm

Made EkaI use dynamical system to solve my problem in neurology. But I dont understant this lecture. Unlucky me n glad to know you.

Sucsess,

Made Eka

5 February, 2008 at 11:49 am

254A, Lecture 9: Ergodicity « What’s new[…] if and only if the only T-invariant Borel probability measure on T is . (Hint: use Lemma 1 from Lecture 7.) Because of this fact, one can sensibly define what it means for a topological dynamical system […]

10 February, 2008 at 5:20 pm

254A, Lecture 10: The Furstenberg correspondence principle « What’s new[…] going to infinity such that . On the other hand, by vague sequential compactness (Lemma 1 of Lecture 7) we know that some subsequence of converges in the vague topology to a probability measure , which […]

12 February, 2008 at 11:10 am

254A, Lecture 11: Compact systems « What’s new[…] its maximum and minimum. By the vague sequential compactness of probability measures (Lemma 1 from Lecture 7), one can find a probability measure which minimises the oscillation of . If this oscillation is […]

23 February, 2008 at 2:38 pm

254A, Lecture 12: Weakly mixing systems « What’s new[…] Exercise 11. Let be any minimal topological system with Borel -algebra , and let be a shift invariant Borel probability measure. Show that if is weakly mixing (resp. strongly mixing), then is topologically weakly mixing (resp. topologically mixing), as defined in Definition 3 and Exercise 12 of Lecture 7. […]

6 March, 2008 at 8:51 am

254A, Lecture 15: The Furstenberg-Zimmer structure theorem and the Furstenberg recurrence theorem « What’s new[…] Furstenberg’s structure theorem for distal systems in topological dynamics (Theorem 2 from Lecture 7). Indeed, in analogy to that theorem, the factors are known as distal measure-preserving systems. […]

11 November, 2008 at 9:50 am

PDEbeginnerDear Prof. Tao,

Thank you so much for your so nice lecture!

I am a little confused on Lemma 1. If our is the transition probability of standard Brownian motion at time n, it seems these measures do not converge in vague topology. It seems this is a contradiction for Lemma 1.

Another question is about the Krylov-Bogoliubov existence theorem of invariant measures. We still consider the transition probability of standard Brownian motion, does not converge to any measure.

I am not familiar with the ergodic theorem, maybe my questions are silly.

Thanks a lot!

11 November, 2008 at 11:39 am

Terence TaoDear PDEbeginner,

I am not sure exactly how you are defining , but I would imagine that the underlying space X that these measures are supported on is non-compact.

11 November, 2008 at 2:13 pm

PDEbeginnerDear Prof. Tao,

Yes, the underlying space X is non-compact. If the X is compact, I think I can understand it. If X is non-compact, I still have the problems (for instance, the Brownian motion on .

I guess the X in this note is compact, since you applied Stone-Weirerstrass theorem.

Thanks!

5 October, 2009 at 3:44 am

Ergodic Ramsey Theory (by Yuri Lima) « Disquisitiones Mathematicae[…] of general measure-preserving systems, known as Furstenberg’s Structural Theorem (see this lecture of Terence Tao for a discussion of this result in the case of distal systems). This gave birth to a new area, […]

18 May, 2010 at 5:33 pm

liuxiaochuanDear Professor Tao:

The last exercise of this lecture is the same one with exercise 12 of lecture 9. Also, the conception of uniquely ergodic is introduced there.

18 May, 2010 at 9:54 pm

Terence TaoFair enough… I’ve removed the exercise.

18 June, 2010 at 2:53 am

Solutions to Ergodic Theory：Lecture twelve « Xiaochuan Liu's Weblog[…] Exercise 11. Let be any minimal topological system with Borel -algebra , and let be a shift invariant Borel probability measure. Show that if is weakly mixing (resp. strongly mixing), then is topologically weakly mixing (resp. topologically mixing), as defined in Definition 3 and Exercise 12 of Lecture 7. […]

20 June, 2010 at 5:41 pm

Solutions to Ergodic Theory：Lecture Seven « Xiaochuan Liu's Weblog[…] This is the fifteen exercises in lecture nine. Here is the the page of this course，and here is the page of this lecture. […]

5 June, 2011 at 8:25 pm

Robert TuDear Professor Tao:

In Example 1,I think the second point should be (1/2n,0).And after applying $T^n$,we can get the desired point,and there is an n missed before $\alpha$,I think.

[Corrected, thanks – T.]10 October, 2011 at 8:40 am

Conférence internationale Géométrie Ergodique (Orsay 2011) IV « Disquisitiones Mathematicae[…] paper): Proposition 9 is the largest vector subspace such that the stabilizer of in is distal (i.e., acts on in a distal […]

4 November, 2019 at 11:01 pm

sann4673In exercise 6 (3), I think the last F should be d? If it was d, then the statement follows easily from usc of F. And I think we can construct a counterexample based on example 2 if it was F.