You are currently browsing the tag archive for the ‘metastability’ tag.

Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).

Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:

Theorem 1 (Mean ergodic theorem) Let {(X,\mu,T)} be a measure-preserving system (a probability space {(X,\mu)} with an invertible measure-preserving transformation {T}). Then for any {f \in L^2(X,\mu)}, the averages {\frac{1}{N} \sum_{n=1}^N T^n f} converge in {L^2(X,\mu)} norm as {N \rightarrow \infty}, where {T^n f(x) := f(T^{-n} x)}.

In this post, all functions in {L^2(X,\mu)} and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.

Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of {f} to the {T}-invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:

Theorem 2 (von Neumann mean ergodic theorem) Let {H} be a Hilbert space, and let {U: H \rightarrow H} be a unitary operator on {H}. Then for any {f \in H}, the averages {\frac{1}{N} \sum_{n=1}^N U^n f} converge strongly in {H} as {N \rightarrow \infty}.

Again, see my lecture notes (or just about any text in ergodic theory) for a proof.

Now we turn to Walsh’s theorem.

Theorem 3 (Walsh’s convergence theorem) Let {(X,\mu)} be a measure space with a measure-preserving action of a nilpotent group {G}. Let {g_1,\ldots,g_k: {\bf Z} \rightarrow G} be polynomial sequences in {G} (i.e. each {g_i} takes the form {g_i(n) = a_{i,1}^{p_{i,1}(n)} \ldots a_{i,j}^{p_{i,j}(n)}} for some {a_{i,1},\ldots,a_{i,j} \in G} and polynomials {p_{i,1},\ldots,p_{i,j}: {\bf Z} \rightarrow {\bf Z}}). Then for any {f_1,\ldots,f_k \in L^\infty(X,\mu)}, the averages {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)} converge in {L^2(X,\mu)} norm as {N \rightarrow \infty}, where {g(n) f(x) := f(g(n)^{-1} x)}.

It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand {(g_1(n) f_1) \ldots (g_k(n) f_k)}, one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra {L^\infty(X,\mu)}. There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space {({\mathcal A},\tau)}, which for us will be a commutative unital algebra {{\mathcal A}} over the reals together with a linear functional {\tau: {\mathcal A} \rightarrow {\bf R}} which maps {1} to {1} and obeys the non-negativity axiom {\tau(f^2) \ge 0} for all {f}. The key example to keep in mind here is {{\mathcal A} = L^\infty(X,\mu)} of essentially bounded real-valued measurable functions with the supremum norm, and with the trace {\tau(f) := \int_X f\ d\mu}. We will also assume in our definition of commutative probability spaces that all elements {f} of {{\mathcal A}} are bounded in the sense that the spectral radius {\rho(f) := \lim_{k \rightarrow \infty} \tau(f^{2k})^{1/2k}} is finite. (In the concrete case of {L^\infty(X,\mu)}, the spectral radius is just the {L^\infty} norm.)

Given a commutative probability space, we can form an inner product {\langle, \rangle_{L^2(\tau)}} on it by the formula

\displaystyle  \langle f, g \rangle_{L^2(\tau)} := \tau(fg).

This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on {{\mathcal A}}. We could complete this structure into a Hilbert space {L^2(\tau)} (after quotienting out the elements of zero norm), but we will not do so here, instead just viewing {L^2(\tau)} as providing a semi-metric on {{\mathcal A}}. For future reference we record the inequalities

\displaystyle  \rho(fg) \leq \rho(f) \rho(g)

\displaystyle  \rho(f+g) \leq \rho(f) + \rho(g)

\displaystyle  \| fg\|_{L^2(\tau)} \leq \|f\|_{L^2(\tau)} \rho(g)

for any {f,g}, which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the {L^\infty(X,\mu)} case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)

The abstract version of Theorem 3 is then

Theorem 4 (Walsh’s theorem, abstract version) Let {({\mathcal A},\tau)} be a commutative probability space, and let {G} be a nilpotent group acting on {{\mathcal A}} by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and {L^2(\tau)} norm). Let {g_1,\ldots,g_k: {\bf Z} \rightarrow G} be polynomial sequences. Then for any {f_1,\ldots,f_k \in {\mathcal A}}, the averages {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)} form a Cauchy sequence in {L^2(\tau)} (semi-)norm as {N \rightarrow \infty}.

It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.

Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression {\|\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)\|_{L^2(\tau)}^2} can be viewed as the inner product of (say) {f_k} with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split {f_k} into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average {\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}, except with the polynomials {g_1,\ldots,g_k} replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.

Read the rest of this entry »

Archives

RSS Google+ feed

  • An error has occurred; the feed is probably down. Try again later.
Follow

Get every new post delivered to your Inbox.

Join 3,712 other followers