Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).
Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:
Theorem 1 (Mean ergodic theorem) Let
be a measure-preserving system (a probability space
with an invertible measure-preserving transformation
). Then for any
, the averages
converge in
norm as
, where
.
In this post, all functions in and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.
Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of to the
-invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:
Theorem 2 (von Neumann mean ergodic theorem) Let
be a Hilbert space, and let
be a unitary operator on
. Then for any
, the averages
converge strongly in
as
.
Again, see my lecture notes (or just about any text in ergodic theory) for a proof.
Now we turn to Walsh’s theorem.
Theorem 3 (Walsh’s convergence theorem) Let
be a measure space with a measure-preserving action of a nilpotent group
. Let
be polynomial sequences in
(i.e. each
takes the form
for some
and polynomials
). Then for any
, the averages
converge in
norm as
, where
.
It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand , one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra
. There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space
, which for us will be a commutative unital algebra
over the reals together with a linear functional
which maps
to
and obeys the non-negativity axiom
for all
. The key example to keep in mind here is
of essentially bounded real-valued measurable functions with the supremum norm, and with the trace
. We will also assume in our definition of commutative probability spaces that all elements
of
are bounded in the sense that the spectral radius
is finite. (In the concrete case of
, the spectral radius is just the
norm.)
Given a commutative probability space, we can form an inner product on it by the formula
This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on . We could complete this structure into a Hilbert space
(after quotienting out the elements of zero norm), but we will not do so here, instead just viewing
as providing a semi-metric on
. For future reference we record the inequalities
for any , which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the
case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)
The abstract version of Theorem 3 is then
Theorem 4 (Walsh’s theorem, abstract version) Let
be a commutative probability space, and let
be a nilpotent group acting on
by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and
norm). Let
be polynomial sequences. Then for any
, the averages
form a Cauchy sequence in
(semi-)norm as
.
It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.
Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression can be viewed as the inner product of (say)
with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split
into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average
, except with the polynomials
replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.
— 1. Nonstandard analysis and metastability —
We will assume some familiarity with nonstandard analysis, as covered for instance in these previous blog posts.
As is common practice in nonstandard analysis, we will need to select a non-principal ultrafilter . Using this ultrafilter, we can now form the ultraproduct
of any sequence
of (standard) spaces, defined as the space of all ultralimits
of sequences
defined for
sufficiently close to
, with two sequences considered to have the same ultralimit iff they agree sufficiently close to
. Any operation or relation on the standard spaces
can then be defined on the nonstandard space
in a natural fashion. For instance, given a sequence of standard functions
, one can form their ultralimit
from the nonstandard space
to the nonstandard space
by the formula
As usual, we call a nonstandard real bounded if we have
for some standard
, and infinitesimal if we have
for every standard
, and in the latter case we also write
. Every bounded nonstandard real
is infinitesimally close to a unique standard real, called the standard part
of
.
We will need the following fundamental properties about nonstandard analysis:
- (Transfer / Los’s theorem) If for each
,
is a sequence of mathematical objects, spaces, or functions, with ultralimit or ultraproduct
, then for any first-order predicate
involving
mathematical objects of the appropriate type, the claim
is true if and only if
.
- (Overspill) If an internal set
(an ultraproduct of standard sets, also known as a nonstandard set) of nonstandard numbers contains all unbounded natural numbers, then there exists a standard natural number
such that
contains all nonstandard numbers larger than
.
- (Loeb measure, hyperfinite case) If
is a non-empty nonstandard finite set (i.e. the ultraproduct
of standard finite sets, also known as a hyperfinite set), and the Loeb
-algebra is defined as the
-algebra generated by the internal subsets of
, then there exists a unique countably additive probability measure
on the Loeb
-algebra, called Loeb measure, such that for any internal subset
of
, one has
. See e.g. this previous blog post for the details of the construction.
To motivate the discussion that follows, let us recall some equivalent formulations of a Cauchy sequence in a pseudometric space (i.e. a generalisation of a metric space in which some distances are allowed to vanish).
Proposition 5 Let
be a sequence in a pseudometric space
(not necessarily complete). Let
be the nonstandard extension of
, taking values in the nonstandard metric space
. Then the following are equivalent:
- (standard Cauchy sequence) For every standard
, there exists a standard
such that
for all standard
.
- (nonstandard Cauchy sequence) For every nonstandard
, there exists a nonstandard
such that
for all nonstandard
.
- (standard metastability) For every standard function
and standard
, there exists a standard
such that
for all standard
.
- (nonstandard metastability) For every nonstandard function
and nonstandard
, there exists a nonstandard
such that
for all nonstandard
.
- (asymptotic stability) One has
for all unbounded
.
Proof: The equivalence of 1 and 2 follows from the transfer principle (or Los’s theorem), as does the equivalence of 3 and 4. The implication of 3 from 1 is also clear. Finally, suppose that 1 failed, then there is an such that for every standard
we can find a larger number
such that
. Setting
, we see that 3 fails also.
If 1 holds, then from transfer we see that for any unbounded , one has
for every standard
, giving 5. Conversely, if 1 fails, then letting
be as before, we see from transfer that
for every nonstandard
, contradicting 5.
Now we consider more general sequences, in which the above notions of convergence begin to diverge:
Definition 6 Let
be a nonstandard pseudometric space (i.e. the ultraproduct of standard pseudometric spaces
; in particular,
takes values in
rather than
), and let
be an nonstandard sequence (or internal sequence) in
, that is to say a nonstandard map (or internal map) from
to
(and thus an ultralimit of maps from
to
).
- We say that the sequence
is internally Cauchy if for every nonstandard
, there exists a nonstandard
such that
for all nonstandard
.
- We say that the sequence
is externally Cauchy or metastable if for every standard
, there exists a standard
such that
for all standard
.
- We say that the sequence
is asymptotically stable if
whenever
are unbounded.
These three notions are now distinct, even for a simple nonstandard metric space such as the ultrapower of the unit interval with the usual metric, as the following examples demonstrate:
- If
is an unbounded natural number, then the nonstandard sequence
is internally Cauchy, but not externally Cauchy or asymptotically stable.
- If
is an unbounded natural number, then the nonstandard sequence
is internally and externally Cauchy, but not asymptotically stable.
- If
is an unbounded natural number, then the nonstandard sequence
is externally Cauchy, but not internally Cauchy or asymptotically stable.
- If
is an unbounded natural number, then the nonstandard sequence
is asymptotically stable and externally Cauchy, but not internally Cauchy.
- Any monotone bounded nonstandard sequence of nonstandard reals is automatically both externally Cauchy and internally Cauchy, but is not necessarily asymptotically stable, as the example
above shows.
- The property of being externally Cauchy is only dependent on an initial segment of the sequence: if
is externally Cauchy, and one modifies
arbitrarily for
and some fixed unbounded
, then the modified sequence
will still be externally Cauchy. The same claim is certainly not true for the notions of internally Cauchy or asymptotically stable, as can be seen by considering examples such as
,
and
.
- The property of being externally Cauchy is closed under (external) uniform limits; if
is a nonstandard sequence such that for every standard
one can find an externally Cauchy sequence
with
for all
, then
is itself externally Cauchy. The same claim holds as well for asymptotically stability, but not for the internal Cauchy property (unless one allows
to be nonstandard).
One can equate these three nonstandard notions of convergence with standard notions as follows:
Proposition 7 Let
be a nonstandard pseudometric space (the ultraproduct of standard pseudometric spaces
), and let the nonstandard sequence
be the ultralimit of standard sequences
.
- The nonstandard sequence
is internally Cauchy if and only if the standard sequences
are Cauchy for all
sufficiently close to
.
- The nonstandard sequence
is externally Cauchy if and only if for every standard
and standard
, there exists a standard
such that
for all
and all
sufficiently close to
.
- The nonstandard sequence
is asymptotically stable if and only if for every standard
, there exists a standard
such that one has
for all standard
and all
sufficiently close to
.
- The nonstandard sequence
is externally Cauchy if and only if there exists an unbounded
such that
is asymptotically stable up to
, in the sense that
for all unbounded
.
Informally: internally Cauchy sequences are ultralimits of sequences that are Cauchy; externally Cauchy sequences are ultralimits of sequences that are uniformly metastable for an asymptotically infinite period of time; and asymptotically stable sequences are ultralimits of sequences that converge at a uniform rate for an asymptotically infinite period of time.
Proof: The claim 1 follows directly from the transfer principle. Claim 2 follows from the equivalences of parts 1 and 3 of Proposition 5 applied to the standard portion of the sequence (replacing the nonstandard metric by its standard part). Finally, we verify claim 3. If
is asymptotically stable, then for every standard
, we have
for all unbounded
, and so by the overspill principle, there is a standard
such that
for all
, which by transfer gives the “only if” portion of Claim 3. Reversing these steps gives the “if” direction.
To show Claim 4, observe that if is externally Cauchy, then for every standard
, one has
for all sufficiently large standard
, and thus by overspill there is an unbounded
such that
for all unbounded
. By overspill (or countable saturation) one can find an unbounded
such that
for every standard
, giving the “only if” direction. The “if” implication follows by reversing the steps.
From these equivalences one sees that asymptotic stability implies externally Cauchy, but as the above counterexamples show, there are no other implications between the three concepts.
Of the three notions of convergence for nonstandard sequences, we will focus almost exclusively on the notion of external Cauchy convergence, which at the finitary level corresponds to uniform metastability bounds (as opposed to qualitative convergence, or convergence at a uniform rate). In particular, we will deduce Walsh’s theorem from the following nonstandard version:
Theorem 8 (Walsh’s theorem, nonstandard version) Let
be a nonstandard commutative probability space (i.e. the ultraproduct of standard commutative probability spaces), and let
be a nilpotent nonstandard group acting on
by isomorphisms. Let
be polynomial nonstandard functions (i.e. each
takes the form
for some standard
, some
and some standard polynomials
). Then for any elements
which are bounded (in the sense that
are bounded), the averages
form an externally Cauchy sequence with respect to the (non-standard)
pseudometric.
From Proposition 5, Theorem 8 implies Theorem 4 and thus Theorem 3. But it is actually somewhat stronger, in that it gives a uniform metastability on the averages occuring in those latter two theorems. (This uniform metastability was already derived in Walsh’s original paper, and I did something similar in the special case of linear commuting averages.) This uniformity ultimately comes from the fact that in the above theorem, the polynomial sequences are allowed to have nonstandard coefficients, rather than just standard ones (and the space
is a general nonstandard space, rather than an ultrapower).
Remark 1 Since the original appearance of this post, it was essentially observed in this preprint of Avigad and Iovino that the original result in Theorem 4 implies the special case of Theorem 8 in the case when the polynomials
have standard coefficients, as one can use the standard part construction to project the nonstandard commutative probability space to a standard commutative probability space. As such, Theorem 4 automatically implies a metastable version of itself, in which the metastability bound is allowed to depend on the coefficients of the polynomials as well as the degree. The result in Theorem 8 is then apparently stronger because the metastability bound obtained in the finitary setting is also uniform in the choice of coefficients of the polynomials; however, by lifting the group
to a higher rank nilpotent group, one can replace the action of any finite number of polynomials with unbounded coefficients with lifted polynomials with bounded coefficients (this trick dates back to the book of Furstenberg) and so it turns out that Theorem 8 is ultimately equivalent to Theorem 4 (and also Theorem 3, using the structural theory of commutative probability spaces).
From the definition of external Cauchy convergence, it is clear that if ,
are two externally Cauchy convergent sequences of (nonstandard) reals, then their sum is also externally Cauchy convergent, and more generally any (standard) finite linear combination (with standard real coefficients) of externally Cauchy convergent sequences of nonstandard reals is also externally Cauchy convergent. A key property for us is that external Cauchy convergence is also preserved by hyperfinite averages involving a nonstandardly finite number of sequences:
Proposition 9 (Metastable dominated convergence theorem) Let
be a non-empty nonstandard finite set (i.e. the ultraproduct of standard finite sets), and let
be an internal family of internal sequences
of bounded elements of a nonstandard normed vector space. If the sequences
are externally Cauchy convergent for each
, then the (nonstandardly) averaged sequence
is also externally Cauchy convergent.
This is an infinitary version of the finitary metastable dominated convergence theorem that first appeared in this paper of mine, which roughly speaking claims that the average of uniformly metastable bounded sequences is again metastable. The proof was infinitary (deducing it from the Lebesgue dominated convergence theorem), and we will take a similar approach here. The argument was eventually finitised (and strengthened) in this paper of Avigad, Dean, and Rute, but the finitary argument is surprisingly non-trivial.
Proof: As each is individually bounded (i.e. smaller in norm than any unbounded natural number), and depends internally on
, we see from overspill that there is a uniform bound
for some standard natural number
.
For each standard and standard natural number
, let
denote the subset of
given by the formula
These sets are not internal subsets of , but are instead
-internal (i.e. countable intersections of internal sets). In particular, they are still Loeb measurable subsets of
and thus have a well-defined Loeb measure
.
By hypothesis, we see that for any fixed , the
increase to
in the sense that
and
. By monotone convergence, we conclude that there exists a standard
such that
. We then have for
that
As can be arbitrarily small, this gives the external Cauchy convergence of
as desired.
Remark 2 This proof is significantly shorter than the finitary proof of Avigad, Dean, and Rute, but the complexity has been concealed in the construction of Loeb measure and the monotone convergence theorem. This is typically how nonstandard analysis arguments work; they are unable to magically make the “hard” component of an argument disappear entirely, but they are often able to efficiently conceal such components in fundamental building blocks which are of independent interest, and which can be usefully applied as a black box to a wide spectrum of problems. (In contrast, a hard argument in a finitary argument often needs to be reworked each time one wishes to apply it to a new problem.)
— 2. A simple case: the von Neumann ergodic theorem —
Before we prove Theorem 8, let us first warm up by establishing an easy case, namely the nonstandard version of the von Neumann ergodic theorem (Theorem 2):
Theorem 10 (Nonstandard von Neumann mean ergodic theorem) Let
be a nonstandard inner product space (i.e. the ultraproduct of standard inner product spaces), and let
be a be a nonstandard unitary operator on
. Then for any bounded
, the averages
are externally Cauchy in
.
We first observe that if one takes the bounded elements of and forms the Hilbert space completion using the standard norm
one obtains a Hilbert space . Thanks to the bound
for all bounded in
, we see that these ergodic averages can be defined in
, and so it suffices to show that the averages
are externally Cauchy in
for all
.
Let us first investigate a condition that would force to be asymptotically stable in
. We expand out
(where all expressions have been extended to nonstandard values of or
in the usual fashion, and operations are extended from the bounded elements of
to
by continuity). With a little bit of rearrangement, this expression can be rewritten as
, where the dual function
for any
is defined by the formula
(The terminology of dual functions originates from this paper of Ben Green and myself.) Thus, if we let be the linear span in
of all functions of the form
with
unbounded and
, and
is orthogonal to
, then
vanishes in
for any unbounded
. This makes
asymptotically stable, and thus externally Cauchy, in
norm.
In view of this fact, the existence of an orthogonal projection to the closure of , and the linearity of
in
, it suffices to show that for any
in the closure of
in
, the expression
is eventually Cauchy in
.
Remark 3 This step used the existence of orthogonal projections from a Hilbert space to a closed subspace, and is closely related to an analogous use of such projections in the textbook proof of Theorem 2 (see e.g. the proof of Theorem 2 in these blog notes). In the finitary argument of Walsh, one uses instead a decomposition established by Gowers using the Hahn-Banach theorem as a substitute for orthogonal projections. See also the “Hilbert space finite convergence principle” from this blog post for a closely related link between orthogonal projections and quantitative decompositions.
Having eliminated the “pseudorandom case” when is orthogonal to
, we have now reduced to the “structured case” when
lies in the closure of
. By linearity and an approximation argument, we may reduce to the case when
is just a projection
of a single dual function
for some
and unbounded
, and by a further density and approximation argument we can assume that
is a bounded element of
, rather than a general element of
. (Incidentally, the non-standard analysis formalism is painlessly skipping over a certain amount of epsilon management here which is much more visible in the finitary version of the argument.) Thus, our task is now to show that the expression
is externally Cauchy in
.
We expand
Fixing , we now restrict to the regime
, thus
for all standard
. Then
as well. We can then shift the range
by
by making a substitution
. If we then return
to the range
, this creates an error of
norm
, and so
in for all
. In particular,
is asymptotically stable up to
, and thus externally Cauchy as required.
— 3. Descent —
Theorem 8 is proven by an induction on the “complexity” of the . Fix
and the action of the nilpotent nonstandard group
. Given any finite tuple
of internal functions from
to
(not necessarily polynomials), let us say that
is good if the conclusion of Theorem 8 holds, thus the averages
form an external Cauchy sequence in
for all bounded
. Trivially, any permutation of a good tuple is good, and any tuple that consists entirely of copies of the constant function
mapping
to the group identity
of
is good. Furthermore, if
is a finite tuple, and
is obtained from
by removing duplicate functions (e.g. converting
into
) and also removing all copies of
, then
is good if and only if
is good.
If the tuple is non-empty (i.e.
), then for any standard integer
, we define the
-reduction
of
to be the tuple consisting of the functions
, together with the function
and the functions
for . (We will see why these particular functions arise in the argument shortly.) The key step in proving Theorem 8 is then the following result, reminiscent of the van der Corput lemma in ergodic theory (see e.g. this blog post).
Proposition 11 (Descent) Let
be a nonstandard commutative probability space, and let
be a nonstandard group acting on
by isomorphisms. Let
be a non-empty finite tuple of internal functions from
to
. If
is good for every nonstandard integer
, then
is good.
Once one has this proposition, Theorem 8 will be an immediate consequence of the following combinatorial claim (and the remarks made at the beginning of this section).
Proposition 12 (PET induction) Let
be a nilpotent nonstandard group. Then there exists a well-ordered set
and a way to assign to each finite tuple
of polynomial nonstandard functions from
to a nilpotent nonstandard group
a tuple
which is a permutation of
after all duplicates and copies of
have been removed, and a weight
in
, with the following property: if
is non-empty, and
is an nonstandard integer, then there is a permutation of
which has a strictly smaller weight than that of
.
Indeed, once one has this proposition and Proposition 11, Theorem 8 follows by strong induction on the weight .
We prove Proposition 11 in this section, and Proposition 12 in the next section.
The proof of Proposition 11 closely mimics the proof of Theorem 10. Fix ,
, the tuple
, and bounded elements
, and assume that
is good for all standard integers
. We consider the averages
and our task is to show that the form an external Cauchy sequence in
.
We fix bounded elements , and largely work with manipulation of
. The map
is then an operator from
to
. We have the easily verified bound
Because of this, the linear operator can be uniquely continuously extended to a linear operator from
to
, where
is defined as the Hilbert space completion of the bounded elements of
under the norm
In particular quotients out all the elements of infinitesimal norm. In this Hilbertian formalism, the problem can now be viewed as one of establishing a weighted variant of Theorem 10.
Let us first investigate a condition that would force to be be asymptotically stable in
. We expand out
(where all expressions have been extended to nonstandard values of or
in the usual fashion, and operations are extended from the bounded elements of
to
by continuity). With a little bit of rearrangement, this expression can be rewritten as
, where the dual function
for any
is defined by the formula
Thus, if we let be the linear span in
of all functions of the form
with
unbounded and
, and
is orthogonal to
, then
vanishes in
for any unbounded
. This makes
asymptotically stable, and thus externally Cauchy, in
norm.
In view of this fact, the existence of an orthogonal projection to the closure of , and the linearity of
in
, it suffices to show that for any
in the closure of
in
, the expression
is eventually Cauchy in
.
Having eliminated the “pseudorandom case” when is orthogonal to
, we have now reduced to the “structured case” when
lies in the closure of
. By linearity and an approximation argument, we may reduce to the case when
is just a projection
of a single dual function
for some
and unbounded
, and by a further density and approximation argument we can assume that
is a bounded element of
, rather than a general element of
.
Inspecting the definition (1) of , we see that we need to understand the shifts
for
. It is here that we perform the “van der Corput” or “Weyl differencing” calculation that is pervasive in multiple recurrence theory. Namely, we expand
Fixing , we now restrict to the regime
, thus
for all standard
. Then
as well. We can then shift the range
by
by making a substitution
. If we then return
to the range
, this creates an error of
norm
, and so
(with both sides being interpreted in ). (In the language of Walsh’s paper, this identity asserts that dual functions are reducible.) Substituting this into (1), and recalling the definition of the tuples
, we thus obtain the “Weyl differencing identity”
in whenever
. In particular, since the property of being externally Cauchy is unaffected by truncation to
for any unbounded
(and in particular to an unbounded
), we see that the left-hand side of (2) is externally Cauchy in
if and only if the right-hand side is. But by the induction hypothesis, each of the sequences
is externally Cauchy in
, and from Proposition 9 we see that
is externally Cauchy in
, and Proposition 11 follows.
— 4. PET induction —
Now we prove Proposition 12, which will follow the general PET induction method first introduced by Bergelson. We prove the claim first for abelian groups (where there is an obvious notion of the “degree” of a polynomial sequence), and indicate at the end of the section how to modify the argument to handle nilpotent groups.
Henceforth the nonstandard abelian group is fixed. In the abelian case, we can take
to be the well-ordered set
of tuples
of standard non-negative integers with only finitely many of the
non-zero, with the reverse lexicographical ordering, thus
if there exists
such that
and
for all
.
It is easy to see that any polynomial nonstandard function can be uniquely expressed in the discrete Taylor expansion form
for some finite number of group elements with
non-trivial (or with
if
is trivial). We call
the degree of
; in the case that
is the nonstandard integers
with the additive group operation, this corresponds to the usual notion of the degree of a polynomial.
We observe the ultratriangle inequality
with the inequality being equality if have different degree; we also have the symmetry property
Also, we observe the key fact that if is a non-trivial polynomial sequence, then for any
, the derivative
defined by
is a polynomial sequence of strictly smaller degree.
We can now place an ultrametric on , with the distance between two polynomials
defined as
with the convention that . One easily verifies that the ultrametric axioms are obeyed.
Example 1 If
, and we consider the four polynomials
,
,
,
, then
is separated from
by a distance of
,
is separated from
by a distance of
, and
are separated from each other by a distance of
.
Now let be a finite tuple of polynomials in
for some
. Selecting a reference polynomial
(not necessarily in the tuple), we say that two polynomials
are equivalent relative to
if
. From the ultrametric property we see that this is an equivalence relation, and each equivalence class is a constant distance from
. We can then define the weight function
of the tuple
relative to
to equal
, where
is the number of equivalence classes that have distance exactly
from
.
Example 2 Let
be as in Example 1. Relative to
,
are all equivalent and at distance
from
, so the weight function here is
. Relative instead to
, none of the
are equivalent, and at are distances
from
, so the weight function here is
. For the tuple
, the weight function relative to any one of these three polynomials is
.
Now let be a non-empty tuple of nonstandard polynomials. We form
by removing all duplicates and copies of
from
(starting from the left and moving right), and if
does not already have maximal degree amongst all the
, permute the tuple
in some arbitrary fashion to make this the case. We define the weight
of
to be the weight of the augmented tuple
relative to the final element
of the tuple:
Suppose , thus there are
equivalence classes intersecting
that are at a distance exactly
from
. We set
.
Now let be a nonstandard integer, and consider the
-reduction
where . We first consider the weight of the augmented tuple
relative to . Observe that for any
, one has
thus, relative to ,
and
are in the same equivalence class. As such, we see that the weight of the tuple (6) relative to
is equal to
, thus there are still
equivalence classes intersecting the tuple (6) that are distance exactly
from
. We remark for future reference that the abelian nature of
was not directly used in the above calculation.
Now let be the element of the tuple (6) which has the minimal distance
to
, and has maximal degree. The two requirements are compatible, as any element of the tuple has degree less than that of
(which has the maximal degree by construction) necessarily has the maximal distance to
. The weight of (6) relative to
is then strictly smaller than the weight of (6) relative to
, because the weight function
at
is decreased by one, while the weight function at all values strictly greater than
are unchanged. (The weight function at values less than
can increase dramatically, but with the lexicographical ordering this does not change the validity of the previous assertion.) Because of this, if we then permute
to place
at the end, then we see that
(note that removing duplicates and copies of
from
only serves to decrease the weight vector, not to increase it), and the claim follows.
Example 3 Suppose we start with the tuple
, whose weight vector is
. Performing an
-reduction, we obtain
with a weight vector now reduced to
. Note that
already has maximal degree and has minimal distance to
, so no additional permutation is needed at this stage. Performing another
-reduction, we obtain
but now we need to permute to move
(which has maximal degree and minimal distance to
) to the end, giving
with a weight vector now of
. Performing another
-reduction, we obtain
which after eliminating duplicates and moving
(which has maximal degree and minimal distance to
) to the end, gives
with a weight vector of
. Another
-reduction then gives
(note the elimination of all quadratic terms) which after eliminating duplicates becomes
with a weight vector of
. Performing yet another
-reduction gives
with a weight vector of
. Continuing this process, we will see that the linear terms will eventually all be eliminated, leaving only the constant terms, which can then be eliminated one at a time using further reduction until only the empty tuple remains. See also Walsh’s paper for several further examples of this reduction process, as well as some commentary on how the process can be speeded up somewhat if one observes that one can eliminate not only duplicate polynomials, but also polynomials which differ from an existing polynomial by a constant.
Finally, we address the case of a nilpotent group , which will be a modification of the previous argument. The main issue is how to define degree properly. If
is a polynomial nonstandard sequence, then by many applications of (discrete analogues of) Baker-Campbell-Hausdorff formula, we can (as before) place
uniquely in the Taylor expansion form
for some (standard) finite number of group elements of
; see e.g. Exercise 11 of this previous blog post. In the abelian case, we used the largest
for which
was non-trivial as the degree of
. This turns out to not be a good choice in the nilpotent case, because the crucial ultratriangle property (3) does not hold for this concept of degree. For instance, if
is a two-step nilpotent group, and
are non-commuting elements of
, then the sequences
would ostensibly have degree
with this definition, but the product
where is the commutator of
and
, would then have degree
, thus contradicting (3). (The symmetry property (4) can also be shown to break down.)
Fortunately, the theory of polynomial sequences in nilpotent groups has been understood since the work of Leibman. The trick is not to view the coefficients appearing above as roaming unrestrictedly in the whole
-step nilpotent group
, but to restrict some or all of these coefficients to subgroups in the lower central series
, defined by setting
and
for all
. Given natural numbers
, we then say that a sequence
has filtered degree at most
if, when using the Taylor expansion (7), we have
whenever
. Thus, for instance, if
,
and
, then the sequence
has filtered degree at most
. A fundamental result of Leibman (proven for instance in this previous post) asserts that if the sequence
is superadditive in the sense that
whenever
, then the collection of polynomial sequences of filtered degree at most
form a group. A related fact is that if a sequence
has filtered degree at most
for some superadditive
, then any derivative
of
has filtered degree at most
(which is still superadditive).
If we let be the set of all superadditive degree sequences, we can order such sequences lexicographically by declaring
if there is an
with
and
for all
. This makes
a well-ordered set, and then we can define the filtered degree
of a polynomial sequence
to be the minimal
in
for which
has filtered degree at at most
. Thus, for instance, in an
-step nilpotent group, a sequence
with
non-trivial would have filtered degree
. From Leibman’s results we then have the key properties (3), (4), and also that any derivative
of
has strictly smaller filtered degree.
Unfortunately, as filtered degrees are not numbers, we cannot define an ultrametric taking values in in the using the formula (5), but this is not a real difficulty; we simply declare an “ultrametric” taking values in
instead of
, by declaring
if
are distinct, and
otherwise. If we view
as being smaller than any element of
, we see that the ultrametric axioms are still obeyed, and one can still run the argument more or less exactly as given above; we leave the details to the interested reader.
5 comments
Comments feed for this article
26 October, 2012 at 4:03 am
Uwe Stroinski
What a nice surprise: a link to a picture of my (former) teacher Prof. Nagel on your blog.
My browser (Firefox) stretches some formulas and keeps making black bars instead of latex.
27 October, 2012 at 5:33 pm
Jeremy Avigad
Dear Terry,
Jason Rute and I recently noticed that in the case of the mean ergodic theorem, one has something even stronger than a uniform metastability result. Saying that a sequence is Cauchy is equivalent to saying that, for every
, there are at most finitely many “fluctuations” (or “jumps”) by more than
. In the Hilbert space setting, a very elegant variational inequality due to Jones, Ostrovskii, Rosenblatt implies that a sequence of ergodic averages
has at most
many
-fluctuations. The JOR inequality and this corollary provide is a remarkably clean and uniform quantitative formulation of the MET.
Before learning of the JOR result, Jason and I had discovered a uniform bound on the number of
-fluctuations that works in the more general setting of a uniformly convex Banach space. Our result is not sharp when specialized to the case of a Hilbert space, however, and we were unable to strengthen it to anything like the JOR inequality. This state of affairs is described in a paper, “Oscillation and the mean ergodic theorem,” that we recently posted to arXiv.
We are curious to know how far this quantitative uniformity extends. In particular, in your norm convergence result and Walsh’s more recent result, is there a uniform bound on the number of
-fluctuations? Does anything like the JOR square function inequality carry over?
Incidentally, we took the term “
-fluctuations” from a 1996 paper by Kachurovskii (which we cite). In an appendix he also considers nonstandard formulations of such uniformities. So maybe your nonstandard argument can be adapted to yield the stronger result?
Jeremy
27 October, 2012 at 8:15 pm
Terence Tao
Hmm. There are variational inequalities for some pointwise ergodic theorems (such as Bourgain’s pointwise ergodic theorem from 1988 for averages along polynomials or primes), but I don’t think they have been established yet for norm convergence for multiple averages. The techniques used to prove these estimates are rather different (based on harmonic analysis rather than on regularity/metastability arguments, or on characteristic factors). I don’t think the metastability arguments (or the nonstandard variant given in this post) can easily give variational estimates, because one would then need to control fluctuations for very large unbounded times, and the arguments here are optimised instead to only control all times up to a relatively small unbounded time, which is all that one needs for metastability. But it is certainly a good question…
28 October, 2012 at 12:16 pm
Terence Tao
Update: Christoph Thiele pointed me towards this recent paper of Do, Oberlin, and Palsson which establishes such a variational result for a dyadic version of a double average such as
. In practice, these dyadic harmonic analysis arguments can often be adapted to the non-dyadic case, though the details can get messier in the process. The arguments are quite different from those in the ergodic theory literature, though, relying on the same sort of time-frequency analysis used to control operators such as the bilinear Hilbert transform or Carleson’s maximal operator. It may be that one actually needs such “hard analysis” tools to get such a quantitative result, and that the softer tools used to prove, say, Walsh’s theorem, might not be suitable for variational estimates.
28 October, 2012 at 12:16 pm
Anonymous
A typo: Arbeitsgemeinshaft -> Arbeitsgemeinschaft
[Korrigiert, danke – T.]