In our final lecture on topological dynamics, we discuss a remarkable theorem of Furstenberg that classifies a major type of topological dynamical system – distal systems – in terms of highly structured (from an algebraic point of view) systems, namely towers of isometric extensions. This theorem is also a model for an important analogous result in ergodic theory, the Furstenberg-Zimmer structure theorem, which we will turn to in a few lectures. We will not be able to prove Furstenberg’s structure theorem for distal systems here in full, but we hope to illustrate some of the key points and ideas.

– Distal systems –

Furstenberg’s theorem concerns a significant generalisation of the equicontinuous (or isometric) systems, namely the distal systems.

Definition 1. (Distal systems) Let $(X,{\mathcal F}, T)$ be a topological dynamical system, and let d be an arbitrarily metric on X (it is not important which one one picks here). We say that two points x, y in X are proximal if we have $\lim \inf_{n \to \infty} d(T^n x, T^n y) = 0$. We say that X is distal if no two distinct points $x \neq y$ in X are proximal, or equivalently if for every distinct x, y there exists $\varepsilon > 0$ such that $d(T^n x, T^n y) \geq \varepsilon$ for all n.

It is obvious that every isometric or equicontinuous system is distal, but the converse is not true, as the following example shows:

Example 1. If $\alpha \in {\Bbb R}$, then the skew shift $( ({\Bbb R}/{\Bbb Z})^2, (x,y) \mapsto (x+\alpha,y+x))$ turns out to be not equicontinuous; indeed, if we start with a pair of nearby points $(0,0), (1/2n,0)$ for some large n and apply $T^n$, one ends up with $(n\alpha, \frac{n(n-1)}{2}\alpha)$ and $(n\alpha, \frac{n(n-1)}{2}\alpha + \frac{1}{2})$, thus demonstrating failure of equicontinuity. On the other hand, the system is still distal: given any pair of distinct points $(x,y), (x',y')$, either $x \neq x'$ (in which case the horizontal separation between $T^n (x,y)$ and $T^n(x',y')$ is bounded from below) or $x = x'$ (in which case the vertical separation is bounded from below). $\diamond$

Exercise 1. Show that any non-trivial Bernoulli system $\Omega^{\Bbb Z}$ is not distal. $\diamond$

Distal systems interact nicely with the action $p \mapsto T^p$ of the compactified integers $\beta {\Bbb Z}$:

Exercise 2. Let $(X, {\mathcal F}, T)$ be a topological dynamical system.

1. Show that two points x, y in X are proximal if and only if $T^p x = T^p y$ for some $p \in \beta {\Bbb Z}$.
2. Show that X is distal if and only if all the maps $T^p$ for $p \in \beta {\Bbb Z}$ are injective.
3. If X is distal, show that $T^p = \hbox{id}$ whenever $p \in \beta{\Bbb Z}$ is idempotent.  (Hint: use part 2.)
4. If X is distal, show that the set of transformations $G := \{ T^p: p \in \beta {\Bbb Z} \}$ on X forms a group, known as the Ellis group of X. (Hint: use part 3, together with Lemma 3 from Lecture 5.)  Show that G is a compact subset of $X^X$ (with the product topology), and that G acts transitively on X if and only if X is minimal. $\diamond$

Exercise 3. Show that an inverse limit of a totally ordered set $(Y_{\alpha})_{\alpha \in A}$ of distal factors is still distal. (This turns out to be slightly easier than Lemma 1 from the previous lecture.) $\diamond$

Exercise 4. Show that every topological dynamical system has a maximal distal factor. (Hint: repeat the proof of Corollary 1 from the previous lecture.) $\diamond$

Exercise 5. Show that any distal system can be partitioned into disjoint minimal distal systems. (One can of course adapt the proof of Proposition 2 from the previous lecture to do this; but there is a slicker way to do it by exploiting the Ellis group.) $\diamond$
Note that the skew shift system, while not isometric, does have a non-trivial isometric factor, namely the circle shift $({\Bbb R}/{\Bbb Z}, x \mapsto x+\alpha)$ with the projection map $\pi: (x,y) \mapsto x$. It turns out that this phenomenon is general:

Theorem 1 (Baby Furstenberg structure theorem). Let $(X,{\mathcal F},T)$ be minimal, distal and non-trivial (i.e. not a point). Then X has a non-trivial isometric factor $\pi: X \to Y$.

This result – a toy case of Furstenberg’ s full structure theorem – is already rather difficult to establish. We will not give Furstenberg’s original proof here (though see Exercise 13 below), but will at least sketch how the factor $\pi: X \to Y$ is constructed. A key object in the construction is the symmetric function $F: X \times X \to {\Bbb R}^+$ defined by the formula

$F(x,y) := \inf_{n \in {\Bbb Z}} d( T^n x, T^n y )$. (3)

Example 2. We again consider the skew shift $(({\Bbb R}/{\Bbb Z})^2, (x,y) \mapsto (x+\alpha,y+x))$ with $\alpha$ irrational. For sake of concreteness let us choose the taxicab metric $d((x,y),(x',y')) := \|x-x'\|_{{\Bbb R}/{\Bbb Z}} + \|y-y'\|_{{\Bbb R}/{\Bbb Z}}$, where $\|x\|_{{\Bbb R}/{\Bbb Z}}$ is the distance from x to the integers. Then one can check that $F( (x,y),(x',y') )$ is equal to $\|x-x'\|_{{\Bbb R}/{\Bbb Z}}$ when $x - x'$ is irrational, and equal to $\|x-x'\|_{{\Bbb R}/{\Bbb Z}} + \frac{1}{q} \|q(y-y')\|_{{\Bbb R}/{\Bbb Z}}$ when $x-x'$ is rational, where q is the least positive integer such that $q(x-x')$ is an integer. Thus F is highly discontinuous, but it is at least upper semi-continuous in each of its two variables. (Actually, the upper semi-continuity of F holds for arbitrary topological dynamical systems, since F is the infimum of continuous functions.) $\diamond$

Exercise 6. Let G be the Ellis group of a minimal distal system X.

1. For any $x,y \in X$, show that $F(x,y) = \inf_{g \in G} d(gx, gy)$. In particular, $F(gx,gy) = F(x,y)$ for all $g \in G$.
2. For any $x,y \in X$, show that the set $\{ (gx, gy): g \in G \}$ is a minimal subsystem of $X \times X$ (with the product shift $(x,y) \mapsto (Tx, Ty)$. Conclude in particular that if $F(x,y) < a$, then the set $\{ n \in {\Bbb Z}: d(T^n x, T^n y) < a )\}$ is syndetic.
3. If $x,y \in X$ and $a > 0$ is such that $F(x,y) < a$, show that there exists $\varepsilon$ such that $F(x,z) < a$ whenever $F(y,z) < \varepsilon$.
4. Let $X_F = (X, {\mathcal F}_F)$ be the space X whose topology is generated by the basic open sets $U_{a,x} := \{ y \in X: F(x,y) < a \}$. (That this is a base follows from 3.) Equivalently, $X_F$ is equipped with the weakest topology on which F is upper semi-continuous in each variable. Show that $X_F$ is a weaker topological space than X (i.e. the identity map from X to $X_F$ is continuous); in particular, $X_F$ is compact. Also show that all the maps in G are homeomorphisms on $X_F$. $\diamond$

If $X_F$ were Hausdorff, then the system $(X_F, {\mathcal F}_F, T)$ would be equicontinuous, by Exercise 2 from the previous lecture. Unfortunately, $X_F$ is not Hausdorff in general. However, it turns out that we can “quotient out” the non-Hausdorff nature of $X_F$. Define the equivalence relation $\sim$ on $X_F$ by declaring $x \sim y$ if we have $F(x,z) = F(y,z)$ for all z outside of a set of the first category in X. This is clearly an equivalence relation, and so we can create the quotient space $Y := X_F/\sim$; since X embeds into $X_F$ we thus have a factor map $\pi: X \to Y$. It is a deep fact (which we will not prove here) that this quotient space is non-trivial and Hausdorff, and that $\sim$ is preserved by the shift T and even by the Ellis group G (thus if $x \sim y$ and $g \in G$ then $gx \sim gy$. Because of this, G continues to act on Y homeomorphically, and so by Exercise 2 from the previous lecture, $\pi: X \to Y$ is a non-trivial isometric factor of X as desired.

Exercise 7. Show that in the case of the skew shift (Example 2), this construction recovers the factor that was discussed just before Theorem 1. (The trickiness of this exercise should already give you some idea of the difficulty level of Theorem 1.) $\diamond$

– The Furstenberg structure theorem for distal systems –

We have already noted that isometric systems are distal systems. More generally, we have

Exercise 8. Show that an isometric extension of a distal system is still distal. (Hint: Example 1 is a good model case.) $\diamond$

Thus, for instance, the iterated skew shifts that appear in (5) from the previous lecture are distal. Also, recall from Exercise 7 that the inverse limit of distal systems is again distal. It turns out that these are the only ways to generate distal systems, in the following sense:

Theorem 2. (Furstenberg’s structure theorem for distal systems) Let $(X, {\mathcal F}, T)$ be a distal system. Then there exists an ordinal $\alpha$ and a factor $Y_\beta$ for every $\beta \leq \alpha$ with the following properties:

1. $Y_\emptyset$ is a point.
2. For every successor ordinal $\beta+1 \leq \alpha$, $Y_{\beta+1}$ is an isometric extension of $Y_\beta$.
3. For every limit ordinal $\beta \leq \alpha$, $Y_\beta$ is an inverse limit of the $Y_\gamma$ for $\gamma < \beta$.
4. $Y_\alpha$ is equal to X.

The collection of factors $(Y_\beta)_{\beta \leq \alpha}$ is sometimes known as a “Furstenberg tower”.

Theorem 2 follows by applying Zorn’s lemma with the following key proposition:

Proposition 1. (Key inductive step) Let $(X,{\mathcal F},T)$ be a distal system, and let Y be a proper factor of X (i.e. the factor map is not an isomorphism). Then there exists another factor Z of X which is a proper isometric extension of Y.

Note that Theorem 1 is the special case of Proposition 1 when Y is a point. Indeed, Proposition 1 is proven in the same way as Theorem 1, but with several additional technicalities which I will not discuss here; see the original paper of Furstenberg for details.

Exercise 9. Deduce Theorem 2 from Proposition 1 and Zorn’s lemma. $\diamond$
Remark 1. It is known that in Theorem 2, one can take the ordinal $\alpha$ to be countable, and conversely that for every countable ordinal $\alpha$, there exists a system whose smallest Furstenberg tower has height $\alpha$. $\diamond$

Remark 2. Several generalisations and extensions of Furstenberg’s structure theorem are known, but they are somewhat technical to state and will not be detailed here; see this survey of Glasner for a discussion. $\diamond$

– Weak mixing and isometric factors –

We have seen that distal systems always contain non-trivial isometric factors. What about more general systems? It turns out that there is in fact a nice dichotomy between systems with non-trivial isometric factors, and those without.

Definition 2. (Topological transitivity) A topological dynamical system $(X, {\mathcal F}, T)$ is topologically transitive if, for every pair U, V of non-empty open sets, there exists an integer n such that $T^n U \cap V \neq \emptyset$.

Exercise 10. Show that a topological dynamical system is topologically transitive if and only if it is equal to the orbit closure of one of its points. (Compare this with minimal systems, which is the orbit closure of any of its points. Thus minimality is stronger than topological transitivity; for instance, the compactified integers $\{-\infty\} \cup {\Bbb Z} \cup \{+\infty\}$ with the usual shift is topologically transitive but not minimal.) $\diamond$

Exercise 11. Show that any factor of a topologically transitive system is again topologically transitive. $\diamond$

Definition 3. (Topological weak mixing) A topological dynamical system $(X, {\mathcal F}, T)$ is topologically weakly mixing if the product system $X \times X$ is topologically transitive.

Exercise 12. A system is said to be topologically mixing if for every pair U, V of non-empty open sets, one has $T^n U \cap V \neq \emptyset$ for all sufficiently large n. Show that topological mixing implies topological weak mixing. (The converse is false, but actually constructing a counterexample is somewhat tricky.) $\diamond$

Example 3. No circle shift $({\Bbb R}/{\Bbb Z}, x \mapsto x+\alpha)$ is topologically weak mixing (or topologically mixing), even though such shifts are minimal (and hence transitive) when $\alpha$ is irrational. On the other hand, any Bernoulli shift is easily seen to be topologically mixing (and hence topologically weak mixing). $\diamond$

We have the following dichotomy, first proven by Keynes and Robertson (using ideas from the above-mentioned paper of Furstenberg):

Theorem 3. (Dichotomy between structure and randomness) Let $(X, {\mathcal F}, T)$ be a minimal topological dynamical system. Then exactly one of the following statements is true:

1. (Structure) X has a non-trivial isometric factor.
2. (Randomness) X is topologically weakly mixing.

Remark 3. Combining this with Exercise 6 from the previous lecture, we obtain an equivalent formulation of this theorem: a minimal system is topologically weakly mixing if and only if it has no non-trivial eigenfunctions. $\diamond$

Proof. We first prove the easy direction: that if X has a non-trivial isometric factor, then it is not topologically weakly mixing. In view of Exercise 11, it suffices to prove this when X itself is isometric. Let $x, x'$ be two distinct points of Y, let r denote the distance between x and x’ with respect to the metric that makes X isometric, and let B and B’ be the open balls of radius r/10 centred at x and x’ respectively. As X is isometric, we see for any integer n that $T^n B$ cannot intersect both B and B’, or equivalently that $(T \times T)^n (B \times B)$ cannot intersect $B \times B'$. Thus X is not topologically transitive as desired.

Now we prove the difficult direction: if X is not topologically weakly mixing, then it has a non-trivial isometric factor. For this we use an argument of Blanchard, Host, and Maass, based on earlier work of McMahon. By Definition 3, there exist open non-empty sets U, V in $X \times X$ such that $(T \times T)^n U \cap V = \emptyset$ for all n. If we thus set $K := \overline{\bigcup_n (T \times T)^n U}$, we see that K is a compact proper $T \times T$-invariant subset of $X \times X$ with non-empty interior. On the other hand, the projection of K to either factor of $X \times X$ is a non-empty compact invariant subset of X and thus must be all of X.

We need to somehow use K to build an isometric factor of X. For this, we shall move from the topological dynamics setting to that of the ergodic theory setting. By Corollary 1 in the appendix, X admits an invariant Borel measure $\mu$. The support of $\mu$ is a non-empty closed invariant subset of X, and is thus equal to all of X by minimality.

The space $L^1(X,\mu)$ is a metric space, with an isometric shift map $Tf := f \circ T^{-1}$. We define the map $\pi: X \to L^1(X,\mu)$ by the formula

$\pi(x): y \mapsto 1_K(x,y)$ (1)

for all $x \in X$, where $1_K$ is the indicator function of K. Because K has non-empty interior and non-empty exterior, and because $\mu$ has full support, it is not hard to show that $\pi$ is non-constant. By the T-invariance of W, it also preserves the shift T. So if we can show that $\pi$ is continuous, we see that $\pi(X)$ will be a non-trivial isometric factor of X and we will be done.

Let us first consider the scalar function $f(x) := \int_X 1_K(x,y)\ d\mu(y)$. From the dominated convergence theorem and the fact that K is closed, we see that f is upper semi-continuous, and continuous at at least one point, thanks to Lemma 3 from Lecture 4. On the other hand, since K is $T \times T$-invariant and $\mu$ is T-invariant, we see that f is T-invariant. Applying Exercise 15 from Lemma 4 we see that f is constant. On the other hand, as K is closed we have $\limsup_{x \to x_0} 1_K(x,y) \leq 1_K(x_0,y)$ for any $x_0 \in X$, and so by dominated convergence again we see that $1_K(x,\cdot)$ converges in $L^1$ to zero outside of the support of $1_K(x_0,\cdot)$. Combining this with the constancy of f we conclude that $1_K(x,\cdot)$ converges to $1_K(x_0,\cdot)$ in $L^1$ on all of X, and thus $\pi$ is continuous as required. $\Box$

Remark 4. Note how the measure-theoretic structure was used to obtain metric structure, by passing from the measure space $(X, \mu)$ to the metric space $L^1(X, \mu)$. This again shows that one can sometimes upgrade weak notions of structure (such as topological or measure-theoretic structure) to strong notions (such as geometric or algebraic structure). $\diamond$

Exercise 13. Use Theorem 3 to prove Theorem 1. (Hint: use Exercise 10.) $\diamond$

Remark 5. It would be very convenient if one had a relative version of Theorem 3, namely that if X is an extension of Y, then X is either relatively topologically weakly mixing with respect to Y (which means that the relative product $X \times_Y X := \{ (x,x') \in X \times X: \pi(x)=\pi(x')\}$ is topologically transitive), or else X has a factor Z which is a non-trivial isometric extension of Y; among other things, this would have given a new proof of Theorem 2, and in fact establish a somewhat stronger structural theorem. Unfortunately, this relative version fails; a counterexample (based on the Morse sequence) can be found in Exercise 1.19.3 of Glasner’s book. Nevertheless, the analogue of this claim does hold true in the measure-theoretic setting, as we shall see in a few lectures. $\diamond$

– Appendix: sequential compactness of Borel probability measures –

We now recall some standard facts from measure theory about Borel probability measures on a compact metrisable space X. Recall that a sequence of such measures $\mu_n$ converges in the vague topology to another $\mu$ if we have $\int_X f\ d\mu_n \to \int_X f\ d\mu$ for all $f \in C(X)$.

Lemma 1. (Vague sequential compactness) The space $\hbox{Pr}(X)$ of Borel probability measures on X is sequentially compact in the vague topology.

Proof. From the Stone-Weierstrass theorem we know that C(X) is separable. The claim then follows from Riesz representation theorem and the usual Arzelà-Ascoli diagonalisation argument. $\Box$

Corollary 1. (Krylov-Bogolubov theorem) Let $(X, {\mathcal F}, T)$ be a topological dynamical system. Then there exists a T-invariant probability measure $\mu$ on X.

Proof. Pick any point $x_0 \in X$ and consider the finite probability measures

$\mu_N := \frac{1}{N} \sum_{n=1}^N \delta_{T^n x_0}$ (1)

where $\delta_x$ is the Dirac mass at x. By Lemma 1, some subsequence $\mu_{N_j}$ converges in the vague topology to another Borel probability measure $\mu$. Since we have

$\int T f\ d\mu_N = \int f\ d\mu_N + O_f( 1/N )$ (2)

for all bounded continuous f, we conclude on taking vague limits and using the Riesz representation theorem that $\mu$ is T-invariant as required. $\Box$
Remark 6. Note that Corollary 1, like many other results obtained via compactness methods, guarantees existence of an invariant measure but not uniqueness (this latter property is known as unique ergodicity). Even for minimal systems, it is possible for uniqueness to fail, although actually constructing an example is tricky (see for instance this paper of Furstenberg). However, as already observed in the proof of Theorem 3, any invariant measure on a minimal topological dynamical system must be full (i.e. its support must be the whole space). $\diamond$