The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if {U: H \rightarrow H} is a unitary operator on a Hilbert space {H}, and {v \in H} is a vector in that Hilbert space, then one has

\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N U^n v = \pi_{H^U} v

in the strong topology, where {H^U := \{ w \in H: Uw = w \}} is the {U}-invariant subspace of {H}, and {\pi_{H^U}} is the orthogonal projection to {H^U}. (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if {G} is a countable amenable group acting on a Hilbert space {H} by unitary transformations {T^g: H \rightarrow H} for {g \in G}, and {v \in H} is a vector in that Hilbert space, then one has

\displaystyle \lim_{N \rightarrow \infty} \mathop{\bf E}_{g \in \Phi_N} T^g v = \pi_{H^G} v \ \ \ \ \ (1)

 

for any Folner sequence {\Phi_N} of {G}, where {H^G := \{ w \in H: T^g w = w \hbox{ for all }g \in G \}} is the {G}-invariant subspace, and {\mathop{\bf E}_{a \in A} f(a) := \frac{1}{|A|} \sum_{a \in A} f(a)} is the average of {f} on {A}. Thus one can interpret {\pi_{H^G} v} as a certain average of elements of the orbit {Gv := \{ T^g v: g \in G \}} of {v}.

In a previous blog post, I noted a variant of this ergodic theorem (due to Alaoglu and Birkhoff) that holds even when the group {G} is not amenable (or not discrete), using a more abstract notion of averaging:

Theorem 1 (Abstract ergodic theorem) Let {G} be an arbitrary group acting unitarily on a Hilbert space {H}, and let {v} be a vector in {H}. Then {\pi_{H^G} v} is the element in the closed convex hull of {Gv := \{ T^g v: g \in G \}} of minimal norm, and is also the unique element of {H^G} in this closed convex hull.

I recently stumbled upon a different way to think about this theorem, in the additive case {G = (G,+)} when {G} is abelian, which has a closer resemblance to the classical mean ergodic theorem. Given an arbitrary additive group {G = (G,+)} (not necessarily discrete, or countable), let {{\mathcal F}} denote the collection of finite non-empty multisets in {G} – that is to say, unordered collections {\{a_1,\dots,a_n\}} of elements {a_1,\dots,a_n} of {G}, not necessarily distinct, for some positive integer {n}. Given two multisets {A = \{a_1,\dots,a_n\}}, {B = \{b_1,\dots,b_m\}} in {{\mathcal F}}, we can form the sum set {A + B := \{ a_i + b_j: 1 \leq i \leq n, 1 \leq j \leq m \}}. Note that the sum set {A+B} can contain multiplicity even when {A, B} do not; for instance, {\{ 1,2\} + \{1,2\} = \{2,3,3,4\}}. Given a multiset {A = \{a_1,\dots,a_n\}} in {{\mathcal F}}, and a function {f: G \rightarrow H} from {G} to a vector space {H}, we define the average {\mathop{\bf E}_{a \in A} f(a)} as

\displaystyle \mathop{\bf E}_{a \in A} f(a) = \frac{1}{n} \sum_{j=1}^n f(a_j).

Note that the multiplicity function of the set {A} affects the average; for instance, we have {\mathop{\bf E}_{a \in \{1,2\}} a = \frac{3}{2}}, but {\mathop{\bf E}_{a \in \{1,2,2\}} a = \frac{5}{3}}.

We can define a directed set on {{\mathcal F}} as follows: given two multisets {A,B \in {\mathcal F}}, we write {A \geq B} if we have {A = B+C} for some {C \in {\mathcal F}}. Thus for instance we have {\{ 1, 2, 2, 3\} \geq \{1,2\}}. It is easy to verify that this operation is transitive and reflexive, and is directed because any two elements {A,B} of {{\mathcal F}} have a common upper bound, namely {A+B}. (This is where we need {G} to be abelian.) The notion of convergence along a net, now allows us to define the notion of convergence along {{\mathcal F}}; given a family {x_A} of points in a topological space {X} indexed by elements {A} of {{\mathcal F}}, and a point {x} in {X}, we say that {x_A} converges to {x} along {{\mathcal F}} if, for every open neighbourhood {U} of {x} in {X}, one has {x_A \in U} for sufficiently large {A}, that is to say there exists {B \in {\mathcal F}} such that {x_A \in U} for all {A \geq B}. If the topological space {V} is Hausdorff, then the limit {x} is unique (if it exists), and we then write

\displaystyle x = \lim_{A \rightarrow G} x_A.

When {x_A} takes values in the reals, one can also define the limit superior or limit inferior along such nets in the obvious fashion.

We can then give an alternate formulation of the abstract ergodic theorem in the abelian case:

Theorem 2 (Abelian abstract ergodic theorem) Let {G = (G,+)} be an arbitrary additive group acting unitarily on a Hilbert space {H}, and let {v} be a vector in {H}. Then we have

\displaystyle \pi_{H^G} v = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v

in the strong topology of {H}.

Proof: Suppose that {A \geq B}, so that {A=B+C} for some {C \in {\mathcal F}}, then

\displaystyle \mathop{\bf E}_{a \in A} T^a v = \mathop{\bf E}_{c \in C} T^c ( \mathop{\bf E}_{b \in B} T^b v )

so by unitarity and the triangle inequality we have

\displaystyle \| \mathop{\bf E}_{a \in A} T^a v \|_H \leq \| \mathop{\bf E}_{b \in B} T^b v \|_H,

thus {\| \mathop{\bf E}_{a \in A} T^a v \|_H^2} is monotone non-increasing in {A}. Since this quantity is bounded between {0} and {\|v\|_H}, we conclude that the limit {\lim_{A \rightarrow G} \| \mathop{\bf E}_{a \in A} T^a v \|_H^2} exists. Thus, for any {\varepsilon > 0}, we have for sufficiently large {A} that

\displaystyle \| \mathop{\bf E}_{b \in B} T^b v \|_H^2 \geq \| \mathop{\bf E}_{a \in A} T^a v \|_H^2 - \varepsilon

for all {B \geq A}. In particular, for any {g \in G}, we have

\displaystyle \| \mathop{\bf E}_{b \in A + \{0,g\}} T^b v \|_H^2 \geq \| \mathop{\bf E}_{a \in A} T^a v \|_H^2 - \varepsilon.

We can write

\displaystyle \mathop{\bf E}_{b \in A + \{0,g\}} T^b v = \frac{1}{2} \mathop{\bf E}_{a \in A} T^a v + \frac{1}{2} T^g \mathop{\bf E}_{a \in A} T^a v

and so from the parallelogram law and unitarity we have

\displaystyle \| \mathop{\bf E}_{a \in A} T^a v - T^g \mathop{\bf E}_{a \in A} T^a v \|_H^2 \leq 4 \varepsilon

for all {g \in G}, and hence by the triangle inequality (averaging {g} over a finite multiset {C})

\displaystyle \| \mathop{\bf E}_{a \in A} T^a v - \mathop{\bf E}_{b \in A+C} T^b v \|_H^2 \leq 4 \varepsilon

for any {C \in {\mathcal F}}. This shows that {\mathop{\bf E}_{a \in A} T^a v} is a Cauchy sequence in {H} (in the strong topology), and hence (by the completeness of {H}) tends to a limit. Shifting {A} by a group element {g}, we have

\displaystyle \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A + \{g\}} T^a v = T^g \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v

and hence {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v} is invariant under shifts, and thus lies in {H^G}. On the other hand, for any {w \in H^G} and {A \in {\mathcal F}}, we have

\displaystyle \langle \mathop{\bf E}_{a \in A} T^a v, w \rangle_H = \mathop{\bf E}_{a \in A} \langle v, T^{-a} w \rangle_H = \langle v, w \rangle_H

and thus on taking strong limits

\displaystyle \langle \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v, w \rangle_H = \langle v, w \rangle_H

and so {v - \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v} is orthogonal to {H^G}. Combining these two facts we see that {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^a v} is equal to {\pi_{H^G} v} as claimed. \Box

To relate this result to the classical ergodic theorem, we observe

Lemma 3 Let {G} be a countable additive group, with a F{\o}lner sequence {\Phi_n}, and let {f_g} be a bounded sequence in a normed vector space indexed by {G}. If {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} f_a} exists, then {\lim_{n \rightarrow \infty} \mathop{\bf E}_{a \in \Phi_n} f_a} exists, and the two limits are equal.

Proof: From the F{\o}lner property, we see that for any {A} and any {\varepsilon>0}, the averages {\mathop{\bf E}_{a \in \Phi_n} f_a} and {\mathop{\bf E}_{a \in A+\Phi_n} f_a} differ by at most {\varepsilon} in norm if {n} is sufficiently large depending on {A}, {\varepsilon} (and the {f_a}). On the other hand, by the existence of the limit {\lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} f_a}, the averages {\mathop{\bf E}_{a \in A} f_a} and {\mathop{\bf E}_{a \in A + \Phi_n} f_a} differ by at most {\varepsilon} in norm if {A} is sufficiently large depending on {\varepsilon} (regardless of how large {n} is). The claim follows. \Box

It turns out that this approach can also be used as an alternate way to construct the GowersHost-Kra seminorms in ergodic theory, which has the feature that it does not explicitly require any amenability on the group {G} (or separability on the underlying measure space), though, as pointed out to me in comments, even uncountable abelian groups are amenable in the sense of possessing an invariant mean, even if they do not have a F{\o}lner sequence.

Given an arbitrary additive group {G}, define a {G}-system {({\mathrm X}, T)} to be a probability space {{\mathrm X} = (X, {\mathcal X}, \mu)} (not necessarily separable or standard Borel), together with a collection {T^g: X \rightarrow X} of invertible, measure-preserving maps, such that {T^0} is the identity and {T^g T^h = T^{g+h}} (modulo null sets) for all {g,h \in G}. This then gives isomorphisms {T^g: L^p({\mathrm X}) \rightarrow L^p({\mathrm X})} for {1 \leq p \leq \infty} by setting {T^g f(x) := f(T^{-g} x)}. From the above abstract ergodic theorem, we see that

\displaystyle {\mathbf E}( f | {\mathcal X}^G ) = \lim_{A \rightarrow G} \mathop{\bf E}_{a \in A} T^g f

in the strong topology of {L^2({\mathrm X})} for any {f \in L^2({\mathrm X})}, where {{\mathcal X}^G} is the collection of measurable sets {E} that are essentially {G}-invariant in the sense that {T^g E = E} modulo null sets for all {g \in G}, and {{\mathbf E}(f|{\mathcal X}^G)} is the conditional expectation of {f} with respect to {{\mathcal X}^G}.

In a similar spirit, we have

Theorem 4 (Convergence of Gowers-Host-Kra seminorms) Let {({\mathrm X},T)} be a {G}-system for some additive group {G}. Let {d} be a natural number, and for every {\omega \in\{0,1\}^d}, let {f_\omega \in L^{2^d}({\mathrm X})}, which for simplicity we take to be real-valued. Then the expression

\displaystyle \langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d({\mathrm X})} := \lim_{A_1,\dots,A_d \rightarrow G}

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f_\omega\ d\mu

converges, where we write {\omega = (\omega_1,\dots,\omega_d)}, and we are using the product direct set on {{\mathcal F}^d} to define the convergence {A_1,\dots,A_d \rightarrow G}. In particular, for {f \in L^{2^d}({\mathrm X})}, the limit

\displaystyle \| f \|_{U^d({\mathrm X})}^{2^d} = \lim_{A_1,\dots,A_d \rightarrow G}

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f\ d\mu

converges.

We prove this theorem below the fold. It implies a number of other known descriptions of the Gowers-Host-Kra seminorms {\|f\|_{U^d({\mathrm X})}}, for instance that

\displaystyle \| f \|_{U^d({\mathrm X})}^{2^d} = \lim_{A \rightarrow G} \mathop{\bf E}_{h \in A-A} \| f T^h f \|_{U^{d-1}({\mathrm X})}^{2^{d-1}}

for {d > 1}, while from the ergodic theorem we have

\displaystyle \| f \|_{U^1({\mathrm X})} = \| {\mathbf E}( f | {\mathcal X}^G ) \|_{L^2({\mathrm X})}.

This definition also manifestly demonstrates the cube symmetries of the Host-Kra measures {\mu^{[d]}} on {X^{\{0,1\}^d}}, defined via duality by requiring that

\displaystyle \langle (f_\omega)_{\omega \in \{0,1\}^d} \rangle_{U^d({\mathrm X})} = \int_{X^{\{0,1\}^d}} \bigotimes_{\omega \in \{0,1\}^d} f_\omega\ d\mu^{[d]}.

In a subsequent blog post I hope to present a more detailed study of the {U^2} norm and its relationship with eigenfunctions and the Kronecker factor, without assuming any amenability on {G} or any separability or topological structure on {{\mathrm X}}.

— 1. Proof of theorem —

If {\vec f := (f_\omega)_{\omega \in \{0,1\}^d}} is a tuple of functions {f_\omega \in L^{2^d}({\mathrm X})} and {0 \leq d' \leq d}, we say that {\vec f} is {d'}-symmetric if we have {f_\omega = f_{\omega'}} whenever {\omega = (\omega_1,\dots,\omega_d)} and {\omega' = (\omega'_1,\dots,\omega'_d)} agree in the first {d'} components (that is, {\omega_i = \omega'_i} for {i=1,\dots,d'}). We will prove Theorem 4 by downward induction on {d'}, with the {d'=0} case establishing the full theorem.

Thus, assume that {0 \leq d' \leq d} and that the claim has already been proven for larger values of {d'} (this hypothesis is vacuous for {d'=d}). Write

\displaystyle F( A_1,\dots, A_d ) := \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_d \in A_d-A_d} \int_X \prod_{\omega \in \{0,1\}^d} T^{\omega_1 h_1 + \dots + \omega_d h_d} f_\omega\ d\mu

We will show that for any {\varepsilon > 0}, and for sufficiently large {A_1,\dots,A_d} (in the net {{\mathcal F}}), the quantity {F(A_1,\dots,A_d)} can only increase by at most {\varepsilon} when one increases any of the {A_i}, {1 \leq i \leq d}, that is to say that

\displaystyle F(A_1,\dots,A_{i-1}, A'_i, A_{i+1},\dots, A_d) \leq F(A_1,\dots,A_d) + \varepsilon

whenever {1 \leq i \leq d} and {A'_i \geq A_i}. This implies that the limit superior of {F(A_1,\dots,A_d)} exceeds the limit inferior by at most {d\varepsilon}, and on sending {\varepsilon \rightarrow 0} we will obtain Theorem 4.

There are two cases, depending on whether {i \leq d'} or {d' < i \leq d}. We begin with the first case {i \leq d'}. By relabeling we may take {i=1}, so that {d' \geq 1}. As {\vec f} is {d'}-symmetric, we can write

\displaystyle F(A_1,\dots,A_d) = \mathop{\bf E}_{h_2 \in A_2-A_2,\dots,h_d \in A_d-A_d} \| \mathop{\bf E}_{a \in A_1} T^{a} f_{h_2,\dots,h_d}\|_{L^2({\mathrm X})}^2

where

\displaystyle f_{h_2,\dots,h_d} := \prod_{\omega_2,\dots,\omega_d \in \{0,1\}} T^{\omega_2 h_2 + \dots + \omega_d h_d} f_{(0,\omega_2,\dots,\omega_d)}.

By the triangle inequality argument used to prove Theorem 2 we thus see that

\displaystyle F(A_1 + B, A_2,\dots,A_d) \leq F(A_1,\dots,A_d),

and so {F} certainly cannot increase by {\varepsilon} by increasing {A_1}.

Now we turn to the case when {d' < i \leq d}. By relabeling we may take {i=d}, so that {d' < d}. We can write

\displaystyle F(A_1,\dots,A_d) = \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}}

\displaystyle \left\langle \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}}, \mathop{\bf E}_{a \in A_d} T^{a} f^1_{h_1,\dots,h_{d-1}} \right\rangle_{L^2({\mathrm X})}

where

\displaystyle f^{\omega_d}_{h_1,\dots,h_{d-1}} := \prod_{\omega_1,\dots,\omega_{d-1} \in \{0,1\}} T^{\omega_1 h_1 + \dots + \omega_{d-1} h_{d-1}} f_{(\omega_1,\omega_2,\dots,\omega_d)}.

On the other hand, the quantity

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2

is the same as {F(A_1,\dots,A_d)}, but with {f_{(\omega_1,\dots,\omega_{d-1},1)}} replaced by {f_{(\omega_1,\dots,\omega_{d-1},0)}}. After rearrangement, this is a {d'+1}-symmetric inner product, and so by induction hypothesis the limit

\displaystyle \lim_{A_1,\dots,A_d \rightarrow G} \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2

exists. In particular, for {A_1,\dots,A_d} large enough, we have

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d + \{0,g\}} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2

\displaystyle \geq \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}} \| \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2 - \varepsilon

for all {g \in G}, which by the parallelogram law as in the proof of Theorem 2 shows that

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}}

\displaystyle \| \mathop{\bf E}_{a \in A_d + \{g\}} T^{a} f^0_{h_1,\dots,h_{d-1}} - \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2 \leq 4 \varepsilon

and hence by averaging

\displaystyle \mathop{\bf E}_{h_1 \in A_1-A_1,\dots,h_{d-1} \in A_{d-1}-A_{d-1}}

\displaystyle \| \mathop{\bf E}_{a \in A'_d} T^{a} f^0_{h_1,\dots,h_{d-1}} - \mathop{\bf E}_{a \in A_d} T^{a} f^0_{h_1,\dots,h_{d-1}} \|_{L^2({\mathrm X})}^2 \leq 4 \varepsilon

whenever {A'_d \geq A_d}. Similarly with {f^0_{h_1,\dots,h_{d-1}}} replaced by {f^1_{h_1,\dots,h_{d-1}}}. From Cauchy-Schwarz we then have

\displaystyle |F(A_1,\dots,A_{d-1},A'_d) - F(A_1,\dots,A_d)| \leq C \varepsilon

for some {C} independent of {\varepsilon} (depending on the {L^{2^d}} norms of the {f_\omega}), and the claim follows after redefining {\varepsilon}.