The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if {U: H \rightarrow H} is a unitary operator on a Hilbert space {H}, and {v \in H} is a vector in that Hilbert space, then one has

\displaystyle  \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N U^n v = \pi_{H^U} v

in the strong topology, where {H^U := \{ w \in H: Uw = w \}} is the {U}-invariant subspace of {H}, and {\pi_{H^U}} is the orthogonal projection to {H^U}. (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if {G} is a countable amenable group acting on a Hilbert space {H} by unitary transformations {g: H \rightarrow H}, and {v \in H} is a vector in that Hilbert space, then one has

\displaystyle  \lim_{N \rightarrow \infty} \frac{1}{|\Phi_N|} \sum_{g \in \Phi_N} gv = \pi_{H^G} v \ \ \ \ \ (1)

for any Folner sequence {\Phi_N} of {G}, where {H^G := \{ w \in H: gw = w \hbox{ for all }g \in G \}} is the {G}-invariant subspace. Thus one can interpret {\pi_{H^G} v} as a certain average of elements of the orbit {Gv := \{ gv: g \in G \}} of {v}.

I recently discovered that there is a simple variant of this ergodic theorem that holds even when the group {G} is not amenable (or not discrete), using a more abstract notion of averaging:

Theorem 1 (Abstract ergodic theorem) Let {G} be an arbitrary group acting unitarily on a Hilbert space {H}, and let {v} be a vector in {H}. Then {\pi_{H^G} v} is the element in the closed convex hull of {Gv := \{ gv: g \in G \}} of minimal norm, and is also the unique element of {H^G} in this closed convex hull.

Proof: As the closed convex hull of {Gv} is closed, convex, and non-empty in a Hilbert space, it is a classical fact (see e.g. Proposition 1 of this previous post) that it has a unique element {F} of minimal norm. If {T_g F \neq F} for some {g}, then the midpoint of {T_g F} and {F} would be in the closed convex hull and be of smaller norm, a contradiction; thus {F} is {G}-invariant. To finish the first claim, it suffices to show that {v-F} is orthogonal to every element {h} of {H^G}. But if this were not the case for some such {h}, we would have {\langle T_g v - F, h \rangle = \langle v-F,h\rangle \neq 0} for all {g \in G}, and thus on taking convex hulls {\langle F-F,h\rangle = \langle f-F,f\rangle \neq 0}, a contradiction.

Finally, since {T_g v - F} is orthogonal to {H^G}, the same is true for {F'-F} for any {F'} in the closed convex hull of {Gv}, and this gives the second claim. \Box

This result is due to Alaoglu and Birkhoff. It implies the amenable ergodic theorem (1); indeed, given any {\epsilon>0}, Theorem 1 implies that there is a finite convex combination {v_\epsilon} of shifts {gv} of {v} which lies within {\epsilon} (in the {H} norm) to {\pi_{H^G} v}. By the triangle inequality, all the averages {\frac{1}{|\Phi_N|} \sum_{g \in \Phi_N} gv_\epsilon} also lie within {\epsilon} of {\pi_{H^G} v}, but by the Folner property this implies that the averages {\frac{1}{|\Phi_N|} \sum_{g \in \Phi_N} gv} are eventually within {2\epsilon} (say) of {\pi_{H^G} v}, giving the claim.

It turns out to be possible to use Theorem 1 as a substitute for the mean ergodic theorem in a number of contexts, thus removing the need for an amenability hypothesis. Here is a basic application:

Corollary 2 (Relative orthogonality) Let {G} be a group acting unitarily on a Hilbert space {H}, and let {V} be a {G}-invariant subspace of {H}. Then {V} and {H^G} are relatively orthogonal over their common subspace {V^G}, that is to say the restrictions of {V} and {H^G} to the orthogonal complement of {V^G} are orthogonal to each other.

Proof: By Theorem 1, we have {\pi_{H^G} v = \pi_{V^G} v} for all {v \in V}, and the claim follows. (Thanks to Gergely Harcos for this short argument.) \Box

Now we give a more advanced application of Theorem 1, to establish some “Mackey theory” over arbitrary groups {G}. Define a {G}-system {(X, {\mathcal X}, \mu, (T_g)_{g \in G})} to be a probability space {X = (X, {\mathcal X}, \mu)} together with a measure-preserving action {(T_g)_{g \in G}} of {G} on {X}; this gives an action of {G} on {L^2(X) = L^2(X,{\mathcal X},\mu)}, which by abuse of notation we also call {T_g}:

\displaystyle  T_g f := f \circ T_{g^{-1}}.

(In this post we follow the usual convention of defining the {L^p} spaces by quotienting out by almost everywhere equivalence.) We say that a {G}-system is ergodic if {L^2(X)^G} consists only of the constants.

(A technical point: the theory becomes slightly cleaner if we interpret our measure spaces abstractly (or “pointlessly“), removing the underlying space {X} and quotienting {{\mathcal X}} by the {\sigma}-ideal of null sets, and considering maps such as {T_g} only on this quotient {\sigma}-algebra (or on the associated von Neumann algebra {L^\infty(X)} or Hilbert space {L^2(X)}). However, we will stick with the more traditional setting of classical probability spaces here to keep the notation familiar, but with the understanding that many of the statements below should be understood modulo null sets.)

A factor {Y = (Y, {\mathcal Y}, \nu, (S_g)_{g \in G})} of a {G}-system {X = (X,{\mathcal X},\mu, (T_g)_{g \in G})} is another {G}-system together with a factor map {\pi: X \rightarrow Y} which commutes with the {G}-action (thus {T_g \pi = \pi S_g} for all {g \in G}) and respects the measure in the sense that {\mu(\pi^{-1}(E)) = \nu(E)} for all {E \in {\mathcal Y}}. For instance, the {G}-invariant factor {Z^0_G(X) := (X, {\mathcal X}^G, \mu\downharpoonright_{{\mathcal X}^G}, (T_g)_{g \in G})}, formed by restricting {X} to the invariant algebra {{\mathcal X}^G := \{ E \in {\mathcal X}: T_g E = E \hbox{ a.e. for all } g \in G \}}, is a factor of {X}. (This factor is the first factor in an important hierachy, the next element of which is the Kronecker factor {Z^1_G(X)}, but we will not discuss higher elements of this hierarchy further here.) If {Y} is a factor of {X}, we refer to {X} as an extension of {Y}.

From Corollary 2 we have

Corollary 3 (Relative independence) Let {X} be a {G}-system for a group {G}, and let {Y} be a factor of {X}. Then {Y} and {Z^0_G(X)} are relatively independent over their common factor {Z^0(Y)}, in the sense that the spaces {L^2(Y)} and {L^2(Z^0_G(X))} are relatively orthogonal over {L^2(Z^0_G(Y))} when all these spaces are embedded into {L^2(X)}.

This has a simple consequence regarding the product {X \times Y = (X \times Y, {\mathcal X} \times {\mathcal Y}, \mu \times \nu, (T_g \oplus S_g)_{g \in G})} of two {G}-systems {X = (X, {\mathcal X}, \mu, (T_g)_{g \in G})} and {Y = (Y, {\mathcal Y}, \nu, (S_g)_{g \in G})}, in the case when the {Y} action is trivial:

Lemma 4 If {X,Y} are two {G}-systems, with the action of {G} on {Y} trivial, then {Z^0_G(X \times Y)} is isomorphic to {Z^0_G(X) \times Y} in the obvious fashion.

This lemma is immediate for countable {G}, since for a {G}-invariant function {f}, one can ensure that {T_g f = f} holds simultaneously for all {g \in G} outside of a null set, but is a little trickier for uncountable {G}.

Proof: It is clear that {Z^0_G(X) \times Y} is a factor of {Z^0_G(X \times Y)}. To obtain the reverse inclusion, suppose that it fails, thus there is a non-zero {f \in L^2(Z^0_G(X \times Y))} which is orthogonal to {L^2(Z^0_G(X) \times Y)}. In particular, we have {fg} orthogonal to {L^2(Z^0_G(X))} for any {g \in L^\infty(Y)}. Since {fg} lies in {L^2(Z^0_G(X \times Y))}, we conclude from Corollary 3 (viewing {X} as a factor of {X \times Y}) that {fg} is also orthogonal to {L^2(X)}. Since {g} is an arbitrary element of {L^\infty(Y)}, we conclude that {f} is orthogonal to {L^2(X \times Y)} and in particular is orthogonal to itself, a contradiction. (Thanks to Gergely Harcos for this argument.) \Box

Now we discuss the notion of a group extension.

Definition 5 (Group extension) Let {G} be an arbitrary group, let {Y = (Y, {\mathcal Y}, \nu, (S_g)_{g \in G})} be a {G}-system, and let {K} be a compact metrisable group. A {K}-extension of {Y} is an extension {X = (X, {\mathcal X}, \mu, (T_g)_{g \in G})} whose underlying space is {X = Y \times K} (with {{\mathcal X}} the product of {{\mathcal Y}} and the Borel {\sigma}-algebra on {K}), the factor map is {\pi: (y,k) \mapsto y}, and the shift maps {T_g} are given by

\displaystyle  T_g ( y, k ) = (S_g y, \rho_g(y) k )

where for each {g \in G}, {\rho_g: Y \rightarrow K} is a measurable map (known as the cocycle associated to the {K}-extension {X}).

An important special case of a {K}-extension arises when the measure {\mu} is the product of {\nu} with the Haar measure {dk} on {K}. In this case, {X} also has a {K}-action {k': (y,k) \mapsto (y,k(k')^{-1})} that commutes with the {G}-action, making {X} a {G \times K}-system. More generally, {\mu} could be the product of {\nu} with the Haar measure {dh} of some closed subgroup {H} of {K}, with {\rho_g} taking values in {H}; then {X} is now a {G \times H} system. In this latter case we will call {X} {H}-uniform.

If {X} is a {K}-extension of {Y} and {U: Y \rightarrow K} is a measurable map, we can define the gauge transform {X_U} of {X} to be the {K}-extension of {Y} whose measure {\mu_U} is the pushforward of {\mu} under the map {(y,k) \mapsto (y, U(y) k)}, and whose cocycles {\rho_{g,U}: Y \rightarrow K} are given by the formula

\displaystyle  \rho_{g,U}(y) := U(gy) \rho_g(y) U(y)^{-1}.

It is easy to see that {X_U} is a {K}-extension that is isomorphic to {X} as a {K}-extension of {Y}; we will refer to {X_U} and {X} as equivalent systems, and {\rho_{g,U}} as cohomologous to {\rho_g}. We then have the following fundamental result of Mackey and of Zimmer:

Theorem 6 (Mackey-Zimmer theorem) Let {G} be an arbitrary group, let {Y} be an ergodic {G}-system, and let {K} be a compact metrisable group. Then every ergodic {K}-extension {X} of {Y} is equivalent to an {H}-uniform extension of {Y} for some closed subgroup {H} of {K}.

This theorem is usually stated for amenable groups {G}, but by using Theorem 1 (or more precisely, Corollary 3) the result is in fact also valid for arbitrary groups; we give the proof below the fold. (In the usual formulations of the theorem, {X} and {Y} are also required to be Lebesgue spaces, or at least standard Borel, but again with our abstract approach here, such hypotheses will be unnecessary.) Among other things, this theorem plays an important role in the Furstenberg-Zimmer structural theory of measure-preserving systems (as well as subsequent refinements of this theory by Host and Kra); see this previous blog post for some relevant discussion. One can obtain similar descriptions of non-ergodic extensions via the ergodic decomposition, but the result becomes more complicated to state, and we will not do so here.

— 1. Proof of theorem —

Let {X, Y} be the {G}-systems in Theorem 6. We can then form the product {G}-systems {X \times K} as before (endowing {K} with the Haar measure {dk}), and also defining the skew product {Y \times_\rho K}, which the same {G}-system as {Y \times K} except with the shift

\displaystyle  (y,k) \mapsto (S_g y, \rho_g(y) k).

Our argument will hinge on the study of the factor map

\displaystyle  \pi: X \times K \rightarrow Y \times_\rho K

defined by

\displaystyle  \pi( (y,k), k' ) := (y, kk' ).

An application of Fubini’s theorem shows that this is indeed a factor map (because the projection from {X} to {Y} was already a factor map, and because multiplication in {K} is associative). In fact this is a factor map of {G \times K}-systems, not just {G}-systems, where {K} acts on the right factors of {X \times K} and {Y \times_\rho K} by {k': (x,k) \mapsto (x,k (k')^{-1})} and {k': (y,k) \mapsto (y,k(k')^{-1})}.

Since {Y \times_\rho K} is a factor of {X \times K} as a {G \times K}-system, {Z^0_G(Y \times_\rho K)} is a factor of {Z^0_G(X \times K)} as a {K}-system. But as {X} is ergodic, {Z^0_G(X)} is a point, and so by Lemma 4, {Z^0_G(X \times K)} is isomorphic to {K}. Thus {Z^0_G(Y \times_\rho K)} is a factor of {K} as a {K}-system. We now need a baby version of Theorem 6:

Lemma 7 Let {K} be a compact metrisable group. Then every factor of {K} (as a {K}-system acting on the right) is equivalent to {H\backslash K} for some closed subgroup {H} of {K} (endowing {H \backslash K} with the quotiented Haar measure, of course).

Proof: If {W} is a factor of {K}, then {L^2(W)} can be identified with a subspace of {L^2(K)} that is invariant with respect to the right {K}-action. By using the {K}-action to convolve with continuous approximations to the identity, we see that {A := C(K) \cap L^2(W)} is dense in {L^2(W)}, where {C(K)} is the space of continuous functions on {K}. Let {H} be the symmetry group of {A}, that is to say the set of all elements {h \in K} such that {hf = f} for all {f \in A}. Then {H} is a closed subgroup of {K}, and {A} may be identified with a subalgebra of {C(H \backslash K)}. By construction, {A} separates points in {H \backslash K}, and is thus (by the Stone-Weierstrass theorem) dense in {C(H \backslash K)} in the uniform topology, and hence in {L^2(H \backslash K)} in the {L^2} topology. From this it is not difficult to show that {W} is equivalent to {H \backslash K} as claimed. \Box

We conclude that {Z^0_G(Y \times_\rho K)} is isomorphic to {H \backslash K} as a {K}-system for some closed subgroup {H} of {K}. (This space is known as the Mackey range of the cocycles {\rho_g}.)

Now we need to build the gauge function {U} to conjugate the cocycle {\rho} to lie in {H}. In the usual treatments of Theorem 6, this is achieved by the descriptive set theory device of Borel sections, but in keeping with our “pointless” approach in this post, we will avoid exploiting the point set structure of {X} or {Y} (although we will rely very much on the point set structure of {K} and {H}). We begin with an approximate result:

Proposition 8 Let {V} be a symmetric neighbourhood of the identity in {K}. Then there exists a measurable function {U: Y \rightarrow K} such that for each {g \in G}, {\rho_{g,U}} takes values in {V^2 HV^2} (modulo null sets, of course).

Proof: One can view {HV} as a positive measure subset of {H \backslash K}, which can then be identified with a positive measure {G}-invariant subset {\Omega} of {Y \times K}, since {Z^0_G(Y \times K)} is isomorphic to {H\backslash K}. Note that for any {k \in K} outside of {VHV}, {HV} and {HVk^{-1}} are disjoint, and so {\Omega} and {k \Omega} are also disjoint (recall that {k} acts on the right on {Y \times K}).

The conditional expectation {{\bf E}(1_\Omega|Y)} (that is, the orthogonal projection of {1_\Omega} to {L^\infty(Y)}) has positive mean and is {G}-invariant, and is hence equal to a positive constant by the ergodicity of {Y}.

As {K} is compact, we can cover {K} by a finite number {k_1 V,\dots,k_n V} of left-translates of {V}. In {Y \times_\rho K}, we thus have the inequality

\displaystyle  1_\Omega \leq \sum_{i=1}^n 1_\Omega 1_{Y \times k_i V}.

We thus have the pointwise lower bound

\displaystyle  \sum_{i=1}^n {\bf E}( 1_\Omega 1_{Y \times k_i V} | Y ) > 0.

Thus, if for each {y \in Y} we let {i(y)} be the first {i=1,\dots,n} for which {{\bf E}( 1_\Omega 1_{Y \times Vk_i} | Y ) > 0}, and let {U(y) := k_{i(y)}^{-1}}, then {U: Y \rightarrow K} is measurable and we have the pointwise lower bound

\displaystyle  {\bf E}( 1_\Omega 1_{Y \times U^{-1} V} | Y ) > 0 \ \ \ \ \ (2)

(indeed, from the pigeonhole principle we could assume a uniform lower bound away from zero if desired). We claim that

\displaystyle  {\bf E}( 1_\Omega (1-1_{Y \times U^{-1} V^2 HV}) | Y ) = 0. \ \ \ \ \ (3)

Indeed, since {\Omega} and {k \Omega} are disjoint for {k \not \in VHV}, we have

\displaystyle  [1_\Omega 1_{Y \times U^{-1} V}] \times k [1_\Omega (1-1_{Y \times U^{-1} V^2 HV})] = 0

for all {k' \in K}; integrating this in {k} and taking conditional expectations in {Y}, we obtain the claim thanks to (2).

Meanwhile, applying the {G} action to (2) and using the {G}-invariance of {\Omega}, we have

\displaystyle  {\bf E}( 1_\Omega 1_{Y \times (S_g \rho_g) (S_g U^{-1}) V} | Y ) > 0

pointwise for any {g \in G}. Comparing this with (3), we conclude that

\displaystyle  {\bf E}( 1_{Y \times U^{-1} V^2 HV}) 1_{Y \times (S_g \rho_g) (S_g U^{-1}) V} | Y ) > 0

and in particular that

\displaystyle  U^{-1} V^2 H V \cap (S_g \rho_g) (S_g U^{-1}) V \neq \emptyset

almost everywhere, which may be rearranged as

\displaystyle  S_g \rho_{g,U} \in V^2 H V^2,

and the claim follows. \Box

Now we use compactness to eliminate the neighbourhood {V}:

Proposition 9 There exists a measurable function {U: Y \rightarrow K} such that, for each {g \in G}, {\rho_{g,U}} takes values in {H}.

Proof: We place a metric on {K}. By the previous proposition, for each natural number {n} we may find a measurable {U_n: Y \rightarrow K} such that {\rho_{g,U_n}} lies within {1/n} of {H} pointwise. It thus suffices to find a measurable {U: Y \rightarrow K} such that {U(y)} is a limit point of the {U_n(y)} for every {y}, so that {\rho_{g,U}} is always a limit point of the {\rho_{g,U_n}}. If it were not for the measurability requirement, this would be immediate from the Heine-Borel theorem; so the only issue is to keep {U} measurable. However, this can be done by an inspection of the proof of the Heine-Borel theorem. Namely, for each natural number {m}, we cover {K} by a finite number of balls {B_{1,m},\dots,B_{j_m,m}} of radius {2^{-m}}. For each {y}, one of the {B_{i,m}} must contain an infinite number of the {U_n(y)}; if we recursively select {i_m(y)} to be the first such {i} for which the ball {B_{i,m}} intersects the previous ball {B_{i_{m-1}(y),m-1}} (with this latter condition being ignored for {m=1}), we see that each {i_m(y)} is measurable in {y}. The centres of the balls {B_{i_m(y),m}} converge to a limit {U(y)} which is then measurable and is a limit point of the {U_n(y)} as required. \Box

In view of this proposition, we may assume without loss of generality that the {\rho_g} take values in {H} (replacing {X \times K} and {Y \times_\rho K} with equivalent systems as necessary). The {G}-systems {X \times K} and {Y \times_\rho K} now split into copies of the {G}-systems {X \times H} and {Y \times_\rho H}, with the copies indexed by {H \backslash K}. As {Z^0_G(Y \times K)} was isomorphic to {H \backslash K} as a factor of {Z^0_G(X \times K) = K}, it is then easy to see that {Z^0_G(Y \times_\rho H)} is trivial. By Corollary 3, this implies that {Y \times_\rho H} and {Z^0_G(X \times H)} are independent factors of {X \times H}. In particular, if {f \in L^\infty(Y)} and {g \in C(H)}, so that the function {F: (y,h) \mapsto f(y) g(h)} lies in {L^\infty(Y \times_\rho H)}, which then embeds into the function {\tilde F: ((y,h),h') \mapsto f(y) g(hh')} in {L^\infty(X \times H)}, the orthogonal projection of {\tilde F} to {L^2(Z^0_G(X \times H))} is equal to the orthogonal projection of {F} onto the trivial factor. Since {X} is ergodic, we see from Lemma 4 that {Z^0_G(X \times H) = H}, and so we arrive at the identity

\displaystyle  \int_X f(y) g(hh')\ d\mu(y,h) = \int_{Y \times H} f(y) g(h)\ d\nu(y) dh

for almost all {h' \in H}; as {g} is continuous, this identity in fact holds for all {h' \in H}, and in particular when {h'} is the identity:

\displaystyle  \int_X f(y) g(h)\ d\mu(y,h) = \int_{Y \times H} f(y) g(h)\ d\nu(y) dh.

From the monotone convergence theorem, the same claim is then true if {g} is bounded semicontinuous (in particular, the indicator of an open or closed subset of {K}), and then (as Haar measure is inner and outer regular) the same claim is also true if {g} is bounded Borel measurable. From this we conclude that {\mu} is the product measure of {\nu} and Haar measure, and the claim follows.