You are currently browsing the tag archive for the ‘ergodic theory’ tag.
Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let be a probability space, and be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps that preserve the measure : . To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)
Let be a discrete group. A (concrete) measure-preserving action of on is a group homomorphism from to , thus is the identity map and for all . A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when is the integers (with the additive group law).
Let be a compact Hausdorff abelian group, which we can endow with the Borel -algebra . A (concrete measurable) –cocycle is a collection of concrete measurable maps obeying the cocycle equation
for -almost every . (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action on (which we endow with the product of with Haar probability measure on ), defined by
This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.
A special case of a -valued cocycle is a (concrete measurable) -valued coboundary, in which for each takes the special form
for -almost every , where is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable -valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which -valued cocycles are in fact -valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when is the unit circle , by taking advantage of the Pontryagin dual group of characters , that is to say the collection of continuous homomorphisms to the unit circle. More precisely, we have
Theorem 1 (Countable Moore-Schmidt theorem) Let be a discrete group acting in a concrete measure-preserving fashion on a probability space . Let be a compact Hausdorff abelian group. Assume the following additional hypotheses:
- (i) is at most countable.
- (ii) is a standard Borel space.
- (iii) is metrisable.
Then a -valued concrete measurable cocycle is a concrete coboundary if and only if for each character , the -valued cocycles are concrete coboundaries.
The hypotheses (i), (ii), (iii) are saying in some sense that the data are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to being second countable, and (ii) is equivalent to being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.
Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each is a coboundary, then so is . By hypothesis, we then have an equation of the form
for all and some functions , and our task is then to produce a function for which
for all .
Comparing the two equations, the task would be easy if we could find an for which
for all . However there is an obstruction to this: the left-hand side of (3) is additive in , so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity
for all . On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between and the homomorphisms of the (discrete) group to .
Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in , so the right-hand side must be also. Manipulating this fact, we eventually arrive at
In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is -invariant. Now let us assume for sake of argument that the action of is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only -invariant functions are constant. So now we get a weaker version of (4), namely
for some constants .
Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let denote the space of concrete measurable maps from to , up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction , that is to say a group homomorphism that is the identity on the subgroup . We can use this retraction, or more precisely the complement , to eliminate the constant in (5). Indeed, if we set
then from (5) we see that
while from (2) one has
and now the previous strategy works with replaced by . This concludes the sketch of proof of Theorem 1.
In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).
My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.
If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation , while still continuous, can fail to be measurable as a map from to ! Thus for instance the sum of two measurable functions need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when is the uncountable torus , endowed with the product topology. Crucially, the Borel -algebra generated by this uncountable product is not the product of the factor Borel -algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but -algebras do not); relating to this, the product -algebra is not the same as the Borel -algebra , but is instead a strict sub-algebra. If the group operations on were measurable, then the diagonal set
would be measurable in . But it is an easy exercise in manipulation of -algebras to show that if are any two measurable spaces and is measurable in , then the fibres of are contained in some countably generated subalgebra of . Thus if were -measurable, then all the points of would lie in a single countably generated -algebra. But the cardinality of such an algebra is at most while the cardinality of is , and Cantor’s theorem then gives a contradiction.
To resolve this problem, we give a coarser -algebra than the Borel -algebra, namely the Baire -algebra , thus coarsening the measurable space structure on to a new measurable space . In the case of compact Hausdorff abelian groups, can be defined as the -algebra generated by the characters ; for more general compact abelian groups, one can define as the -algebra generated by all continuous maps into metric spaces. This -algebra is equal to when is metrisable but can be smaller for other . With this measurable structure, becomes a measurable group; it seems that once one leaves the metrisable world that is a superior (or at least equally good) space to work with than for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if is the Haar probability measure on , then (thus, every -measurable set is equivalent modulo -null sets to a -measurable set), so there is no damage to Plancherel caused by passing to the Baire -algebra.
Passing to the Baire -algebra fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of -algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form , where is an (abstract) -algebra and is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map is an object of the form , where is a Boolean algebra homomorphism and is again used as a formal placeholder; we call the pullback map associated to . [UPDATE: It turns out that this definition of a measurable map led to technical issues. In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be -complete (i.e., it respects countable joins).] The composition of two abstract measurable maps , is defined by the formula , or equivalently .
Every concrete measurable space can be identified with an abstract counterpart , and similarly every concrete measurable map can be identified with an abstract counterpart , where is the pullback map . Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra of , defined as where is the ideal of null sets in . Informally, is the space with all the null sets removed; there is a canonical abstract embedding map , which allows one to convert any concrete measurable map into an abstract one . One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing with the opposite measure algebra ; see the paper for details. Our main theorem is then
Theorem 2 (Uncountable Moore-Schmidt theorem) Let be a discrete group acting abstractly on a -finite measure space . Let be a compact Hausdorff abelian group. Then a -valued abstract measurable cocycle is an abstract coboundary if and only if for each character , the -valued cocycles are abstract coboundaries.
With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map for each obeying the identity for all , then there is an abstract measurable map such that for all . This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the -algebra of , and the Sikorski extension theorem.
We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:
- (i) Suppose one has an abstract measurable map into a concrete measurable space. Does there exist a representation of by a concrete measurable map ? Is it unique up to almost everywhere equivalence?
- (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?
For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):
- If does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If is not compact or Polish, there can be counterexamples to existence.
- If is a compact metric space or a Polish space, then one always has existence and uniqueness.
- If is a compact Hausdorff abelian group, one always has existence.
- If is a complete measure space, then one always has existence (from a theorem of Maharam).
- If is the unit interval with the Borel -algebra and Lebesgue measure, then one has existence for all compact Hausdorff assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
- For more general , existence for all compact Hausdorff is equivalent to the existence of a lifting from the -algebra to (or, in the language of abstract measurable spaces, the existence of an abstract retraction from to ).
- It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever is compact Hausdorff.
Our understanding of (ii) is much less complete:
- If is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
- If is at most countable and is a complete measure space, then the answer is again “always”.
In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.
Let be a measure-preserving system – a probability space equipped with a measure-preserving translation (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points in this space as being “close” if for some that is not too large; this allows one to distinguish between “local” structure at a point (in which one only looks at nearby points for moderately large ) and “global” structure (in which one looks at the entire space ). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.
A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function which is locally essentially constant in the sense that for -almost every , is necessarily globally essentially constant in the sense that there is a constant such that for -almost every . A basic consequence of ergodicity is the mean ergodic theorem: if , then the averages converge in norm to the mean . (The mean ergodic theorem also applies to other spaces with , though it is usually proven first in the Hilbert space .) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that converges to for any measurable set .
In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:
Corollary 1 Let be an ergodic measure-preserving system, and let be measurable. Suppose that
for some . Then there exists a constant such that for in a set of measure at least .
Informally: if is locally constant on pairs at least of the time, then is globally constant at least of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.
Proof: By composing with (say) the arctangent function, we may assume without loss of generality that is bounded. Let , and partition as , where is the level set
For each , only finitely many of the are non-empty. By (1), one has
Using the ergodic theorem, we conclude that
On the other hand, . Thus there exists such that , thus
By the Bolzano-Weierstrass theorem, we may pass to a subsequence where converges to a limit , then we have
for infinitely many , and hence
The claim follows.
The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if is a unitary operator on a Hilbert space , and is a vector in that Hilbert space, then one has
in the strong topology, where is the -invariant subspace of , and is the orthogonal projection to . (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if is a countable amenable group acting on a Hilbert space by unitary transformations for , and is a vector in that Hilbert space, then one has
for any Folner sequence of , where is the -invariant subspace, and is the average of on . Thus one can interpret as a certain average of elements of the orbit of .
In a previous blog post, I noted a variant of this ergodic theorem (due to Alaoglu and Birkhoff) that holds even when the group is not amenable (or not discrete), using a more abstract notion of averaging:
Theorem 1 (Abstract ergodic theorem) Let be an arbitrary group acting unitarily on a Hilbert space , and let be a vector in . Then is the element in the closed convex hull of of minimal norm, and is also the unique element of in this closed convex hull.
I recently stumbled upon a different way to think about this theorem, in the additive case when is abelian, which has a closer resemblance to the classical mean ergodic theorem. Given an arbitrary additive group (not necessarily discrete, or countable), let denote the collection of finite non-empty multisets in – that is to say, unordered collections of elements of , not necessarily distinct, for some positive integer . Given two multisets , in , we can form the sum set . Note that the sum set can contain multiplicity even when do not; for instance, . Given a multiset in , and a function from to a vector space , we define the average as
Note that the multiplicity function of the set affects the average; for instance, we have , but .
We can define a directed set on as follows: given two multisets , we write if we have for some . Thus for instance we have . It is easy to verify that this operation is transitive and reflexive, and is directed because any two elements of have a common upper bound, namely . (This is where we need to be abelian.) The notion of convergence along a net, now allows us to define the notion of convergence along ; given a family of points in a topological space indexed by elements of , and a point in , we say that converges to along if, for every open neighbourhood of in , one has for sufficiently large , that is to say there exists such that for all . If the topological space is Hausdorff, then the limit is unique (if it exists), and we then write
When takes values in the reals, one can also define the limit superior or limit inferior along such nets in the obvious fashion.
We can then give an alternate formulation of the abstract ergodic theorem in the abelian case:
Theorem 2 (Abelian abstract ergodic theorem) Let be an arbitrary additive group acting unitarily on a Hilbert space , and let be a vector in . Then we have
in the strong topology of .
Proof: Suppose that , so that for some , then
so by unitarity and the triangle inequality we have
thus is monotone non-increasing in . Since this quantity is bounded between and , we conclude that the limit exists. Thus, for any , we have for sufficiently large that
for all . In particular, for any , we have
We can write
and so from the parallelogram law and unitarity we have
for all , and hence by the triangle inequality (averaging over a finite multiset )
for any . This shows that is a Cauchy sequence in (in the strong topology), and hence (by the completeness of ) tends to a limit. Shifting by a group element , we have
and hence is invariant under shifts, and thus lies in . On the other hand, for any and , we have
and thus on taking strong limits
and so is orthogonal to . Combining these two facts we see that is equal to as claimed.
To relate this result to the classical ergodic theorem, we observe
Lemma 3 Let be a countable additive group, with a F{\o}lner sequence , and let be a bounded sequence in a normed vector space indexed by . If exists, then exists, and the two limits are equal.
Proof: From the F{\o}lner property, we see that for any and any , the averages and differ by at most in norm if is sufficiently large depending on , (and the ). On the other hand, by the existence of the limit , the averages and differ by at most in norm if is sufficiently large depending on (regardless of how large is). The claim follows.
It turns out that this approach can also be used as an alternate way to construct the Gowers–Host-Kra seminorms in ergodic theory, which has the feature that it does not explicitly require any amenability on the group (or separability on the underlying measure space), though, as pointed out to me in comments, even uncountable abelian groups are amenable in the sense of possessing an invariant mean, even if they do not have a F{\o}lner sequence.
Given an arbitrary additive group , define a -system to be a probability space (not necessarily separable or standard Borel), together with a collection of invertible, measure-preserving maps, such that is the identity and (modulo null sets) for all . This then gives isomorphisms for by setting . From the above abstract ergodic theorem, we see that
in the strong topology of for any , where is the collection of measurable sets that are essentially -invariant in the sense that modulo null sets for all , and is the conditional expectation of with respect to .
In a similar spirit, we have
Theorem 4 (Convergence of Gowers-Host-Kra seminorms) Let be a -system for some additive group . Let be a natural number, and for every , let , which for simplicity we take to be real-valued. Then the expression
converges, where we write , and we are using the product direct set on to define the convergence . In particular, for , the limit
converges.
We prove this theorem below the fold. It implies a number of other known descriptions of the Gowers-Host-Kra seminorms , for instance that
for , while from the ergodic theorem we have
This definition also manifestly demonstrates the cube symmetries of the Host-Kra measures on , defined via duality by requiring that
In a subsequent blog post I hope to present a more detailed study of the norm and its relationship with eigenfunctions and the Kronecker factor, without assuming any amenability on or any separability or topological structure on .
As laid out in the foundational work of Kolmogorov, a classical probability space (or probability space for short) is a triplet , where is a set, is a -algebra of subsets of , and is a countably additive probability measure on . Given such a space, one can form a number of interesting function spaces, including
- the (real) Hilbert space of square-integrable functions , modulo -almost everywhere equivalence, and with the positive definite inner product ; and
- the unital commutative Banach algebra of essentially bounded functions , modulo -almost everywhere equivalence, with defined as the essential supremum of .
There is also a trace on defined by integration: .
One can form the category of classical probability spaces, by defining a morphism between probability spaces to be a function which is measurable (thus for all ) and measure-preserving (thus for all ).
Let us now abstract the algebraic features of these spaces as follows; for want of a better name, I will refer to this abstraction as an algebraic probability space, and is very similar to the non-commutative probability spaces studied in this previous post, except that these spaces are now commutative (and real).
Definition 1 An algebraic probability space is a pair where
- is a unital commutative real algebra;
- is a homomorphism such that and for all ;
- Every element of is bounded in the sense that . (Technically, this isn’t an algebraic property, but I need it for technical reasons.)
A morphism is a homomorphism which is trace-preserving, in the sense that for all .
For want of a better name, I’ll denote the category of algebraic probability spaces as . One can view this category as the opposite category to that of (a subcategory of) the category of tracial commutative real algebras. One could emphasise this opposite nature by denoting the algebraic probability space as rather than ; another suggestive (but slightly inaccurate) notation, inspired by the language of schemes, would be rather than . However, we will not adopt these conventions here, and refer to algebraic probability spaces just by the pair .
By the previous discussion, we have a covariant functor that takes a classical probability space to its algebraic counterpart , with a morphism of classical probability spaces mapping to a morphism of the corresponding algebraic probability spaces by the formula
for . One easily verifies that this is a functor.
In this post I would like to describe a functor which partially inverts (up to natural isomorphism), that is to say a recipe for starting with an algebraic probability space and producing a classical probability space . This recipe is not new – it is basically the (commutative) Gelfand-Naimark-Segal construction (discussed in this previous post) combined with the Loomis-Sikorski theorem (discussed in this previous post). However, I wanted to put the construction in a single location for sake of reference. I also wanted to make the point that and are not complete inverses; there is a bit of information in the algebraic probability space (e.g. topological information) which is lost when passing back to the classical probability space. In some future posts, I would like to develop some ergodic theory using the algebraic foundations of probability theory rather than the classical foundations; this turns out to be convenient in the ergodic theory arising from nonstandard analysis (such as that described in this previous post), in which the groups involved are uncountable and the underlying spaces are not standard Borel spaces.
Let us describe how to construct the functor , with details postponed to below the fold.
- Starting with an algebraic probability space , form an inner product on by the formula , and also form the spectral radius .
- The inner product is clearly positive semi-definite. Quotienting out the null vectors and taking completions, we arrive at a real Hilbert space , to which the trace may be extended.
- Somewhat less obviously, the spectral radius is well-defined and gives a norm on . Taking limits of sequences in of bounded spectral radius gives us a subspace of that has the structure of a real commutative Banach algebra.
- The idempotents of the Banach algebra may be indexed by elements of an abstract -algebra .
- The Boolean algebra homomorphisms (or equivalently, the real algebra homomorphisms ) may be indexed by elements of a space .
- Let denote the -algebra on generated by the basic sets for every .
- Let be the -ideal of generated by the sets , where is a sequence with .
- One verifies that is isomorphic to . Using this isomorphism, the trace on can be used to construct a countably additive measure on . The classical probability space is then , and the abstract spaces may now be identified with their concrete counterparts , .
- Every algebraic probability space morphism generates a classical probability morphism via the formula
using a pullback operation on the abstract -algebras that can be defined by density.
Remark 1 The classical probability space constructed by the functor has some additional structure; namely is a -Stone space (a Stone space with the property that the closure of any countable union of clopen sets is clopen), is the Baire -algebra (generated by the clopen sets), and the null sets are the meager sets. However, we will not use this additional structure here.
The partial inversion relationship between the functors and is given by the following assertion:
- There is a natural transformation from to the identity functor .
More informally: if one starts with an algebraic probability space and converts it back into a classical probability space , then there is a trace-preserving algebra homomorphism of to , which respects morphisms of the algebraic probability space. While this relationship is far weaker than an equivalence of categories (which would require that and are both natural isomorphisms), it is still good enough to allow many ergodic theory problems formulated using classical probability spaces to be reformulated instead as an equivalent problem in algebraic probability spaces.
Remark 2 The opposite composition is a little odd: it takes an arbitrary probability space and returns a more complicated probability space , with being the space of homomorphisms . while there is “morally” an embedding of into using the evaluation map, this map does not exist in general because points in may well have zero measure. However, if one takes a “pointless” approach and focuses just on the measure algebras , , then these algebras become naturally isomorphic after quotienting out by null sets.
Remark 3 An algebraic probability space captures a bit more structure than a classical probability space, because may be identified with a proper subset of that describes the “regular” functions (or random variables) of the space. For instance, starting with the unit circle (with the usual Haar measure and the usual trace ), any unital subalgebra of that is dense in will generate the same classical probability space on applying the functor , namely one will get the space of homomorphisms from to (with the measure induced from ). Thus for instance could be the continuous functions , the Wiener algebra or the full space , but the classical space will be unable to distinguish these spaces from each other. In particular, the functor loses information (roughly speaking, this functor takes an algebraic probability space and completes it to a von Neumann algebra, but then forgets exactly what algebra was initially used to create this completion). In ergodic theory, this sort of “extra structure” is traditionally encoded in topological terms, by assuming that the underlying probability space has a nice topological structure (e.g. a standard Borel space); however, with the algebraic perspective one has the freedom to have non-topological notions of extra structure, by choosing to be something other than an algebra of continuous functions on a topological space. I hope to discuss one such example of extra structure (coming from the Gowers-Host-Kra theory of uniformity seminorms) in a later blog post (this generalises the example of the Wiener algebra given previously, which is encoding “Fourier structure”).
A small example of how one could use the functors is as follows. Suppose one has a classical probability space with a measure-preserving action of an uncountable group , which is only defined (and an action) up to almost everywhere equivalence; thus for instance for any set and any , and might not be exactly equal, but only equal up to a null set. For similar reasons, an element of the invariant factor might not be exactly invariant with respect to , but instead one only has and equal up to null sets for each . One might like to “clean up” the action of to make it defined everywhere, and a genuine action everywhere, but this is not immediately achievable if is uncountable, since the union of all the null sets where something bad occurs may cease to be a null set. However, by applying the functor , each shift defines a morphism on the associated algebraic probability space (i.e. the Koopman operator), and then applying , we obtain a shift on a new classical probability space which now gives a genuine measure-preserving action of , and which is equivalent to the original action from a measure algebra standpoint. The invariant factor now consists of those sets in which are genuinely -invariant, not just up to null sets. (Basically, the classical probability space contains a Boolean algebra with the property that every measurable set is equivalent up to null sets to precisely one set in , allowing for a canonical “retraction” onto that eliminates all null set issues.)
More indirectly, the functors suggest that one should be able to develop a “pointless” form of ergodic theory, in which the underlying probability spaces are given algebraically rather than classically. I hope to give some more specific examples of this in later posts.
There are a number of ways to construct the real numbers , for instance
- as the metric completion of (thus, is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
- as the space of Dedekind cuts on the rationals ;
- as the space of quasimorphisms on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)
There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number , one can define two external additive subgroups of the nonstandard integers :
- The group of all nonstandard integers of magnitude less than or comparable to ; and
- The group of nonstandard integers of magnitude infinitesimally smaller than .
The group is a subgroup of , so we may form the quotient group . This space is isomorphic to the reals , and can in fact be used to construct the reals:
Proposition 1 For any coset of , there is a unique real number with the property that . The map is then an isomorphism between the additive groups and .
Proof: Uniqueness is clear. For existence, observe that the set is a Dedekind cut, and its supremum can be verified to have the required properties for .
In a similar vein, we can view the unit interval in the reals as the quotient
where is the nonstandard (i.e. internal) set ; of course, is not a group, so one should interpret as the image of under the quotient map (or , if one prefers). Or to put it another way, (1) asserts that is the image of with respect to the map .
In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on . Given an internal subset of , we may define the elementary measure of by the formula
This is a finitely additive probability measure on the Boolean algebra of internal subsets of . We can then construct the Loeb outer measure of any subset in complete analogy with Lebesgue outer measure by the formula
where ranges over all sequences of internal subsets of that cover . We say that a subset of is Loeb measurable if, for any (standard) , one can find an internal subset of which differs from by a set of Loeb outer measure at most , and in that case we define the Loeb measure of to be . It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space of Loeb measurable sets is a -algebra, and that is a countably additive probability measure on this space that extends the elementary measure . Thus now has the structure of a probability space .
Now, the group acts (Loeb-almost everywhere) on the probability space by the addition map, thus for and (excluding a set of Loeb measure zero where exits ). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor , defined by restricting attention to those Loeb measurable sets with the property that is equal -almost everywhere to for each .
The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval with Lebesgue measure (and the trivial action of ), by the same factor map used in (1). More precisely:
Theorem 2 Given a set , there exists a Lebesgue measurable set , unique up to -a.e. equivalence, such that is -a.e. equivalent to the set . Conversely, if is Lebesgue measurable, then is in , and .
More informally, we have the measure-theoretic version
of (1).
Proof: We first prove the converse. It is clear that is -invariant, so it suffices to show that is Loeb measurable with Loeb measure . This is easily verified when is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of is bounded by the Lebesgue outer measure of for any set ; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.
Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let . Let be an arbitrary standard real number, then we can find an internal set which differs from by a set of Loeb measure at most . As is -invariant, we conclude that for every , and differ by a set of Loeb measure (and hence elementary measure) at most . By the (contrapositive of the) underspill principle, there must exist a standard such that and differ by a set of elementary measure at most for all . If we then define the nonstandard function by the formula
then from the (nonstandard) triangle inequality we have
(say). On the other hand, has the Lipschitz continuity property
and so in particular we see that
for some Lipschitz continuous function . If we then let be the set where , one can check that differs from by a set of Loeb outer measure , and hence does so also. Sending to zero, we see (from the converse claim) that is a Cauchy sequence in and thus converges in for some Lebesgue measurable . The sets then converge in Loeb outer measure to , giving the claim.
Thanks to the Lebesgue differentiation theorem, the conditional expectation of a bounded Loeb-measurable function can be expressed (as a function on , defined -a.e.) as
By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts , of minimal norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages for converge (as a net, rather than a sequence) in to .
If is (the standard part of) an internal function, that is to say the ultralimit of a sequence of finitary bounded functions, one can view the measurable function as a limit of the that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function is related to the discrete functions by the formula
for all , where is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence such that
thus is the asymptotic density function of the . For instance, if is the indicator function of a randomly chosen subset of , then the asymptotic density function would equal (almost everywhere, at least).
I’m continuing to look into understanding the ergodic theory of actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter , rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.
Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to -actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if is a measure-preserving -system (which, in this post, means that is a probability space and is measure-preserving and invertible, thus giving an action of the integers), and are functions, and is ergodic (which means that contains no -invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average
converges as to the expression
see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if is drawn at random from (using the probability measure ), and is drawn at random from for some large , then the pair becomes uniformly distributed in the product space (using product measure ) in the limit as .
If we allow to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let be the -invariant measurable sets in ; the -system can then be viewed as a factor of the original system , which is equivalent (in the sense of measure-preserving systems) to a trivial system (known as the invariant factor) in which the shift is trivial. There is then a projection map to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression
where is the pushforward map associated to the map ; see e.g. this previous blog post. We can interpret this as an equidistribution result. If is a pair as before, then we no longer expect complete equidistribution in in the non-ergodic, because there are now non-trivial constraints relating with ; indeed, for any -invariant function , we have the constraint ; putting all these constraints together we see that (for almost every , at least). The limit (2) can be viewed as an assertion that this constraint are in some sense the “only” constraints between and , and that the pair is uniformly distributed relative to these constraints.
Limit formulae are known for multiple ergodic averages as well, although the statement becomes more complicated. For instance, consider the expression
for three functions ; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system to be ergodic. Naively one might expect this limit to then converge to
which would roughly speaking correspond to an assertion that the triplet is asymptotically equidistributed in . However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs , . The key obstruction here is that of eigenfunctions of the shift , that is to say non-trivial functions that obey the eigenfunction equation almost everywhere for some constant (or -invariant) . Each such eigenfunction generates a constraint
tying together , , and . However, it turns out that these are in some sense the only constraints on that are relevant for the limit (3). More precisely, if one sets to be the sub-algebra of generated by the eigenfunctions of , then it turns out that the factor is isomorphic to a shift system known as the Kronecker factor, for some compact abelian group and some (irrational) shift ; the factor map pushes eigenfunctions forward to (affine) characters on . It is then known that the limit of (3) is
where is the closed subgroup
and is the Haar probability measure on ; see this previous blog post. The equation defining corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when is non-negative and not identically vanishing.
If one considers a quadruple average
(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions , which obey an eigenfunction equation in which is no longer constant, but is now a linear eigenfunction. For such functions, behaves quadratically in , and one can compute the existence of a constraint
between , , , and that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.
The above discussion was concerned with -systems, but one can adapt much of the theory to measure-preserving -systems for other discrete countable abelian groups , in which one now has a family of shifts indexed by rather than a single shift, obeying the compatibility relation . The role of the intervals in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian , the theory for double averages (1) and triple limits (3) is essentially identical to the -system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary , still not fully understood). However one model case which is now well understood is the finite field case when is an infinite-dimensional vector space over a finite field (with the finite subspaces then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if is drawn at random from a -system and drawn randomly from a large subspace of , then the only constraints between are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for -systems, was one of the main results of this paper of Host and Kra).
As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for -systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set in an ergodic -system and any , one has
for a syndetic set of , where are distinct residue classes. It turns out that Khintchine-type theorems always hold for (and for ergodicity is not required), and for it holds whenever form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger we could show that the Khintchine property failed for generic choices of , though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.
One of the basic objects of study in combinatorics are finite strings or infinite strings of symbols from some given alphabet , which could be either finite or infinite (but which we shall usually take to be compact). For instance, a set of natural numbers can be identified with the infinite string of s and s formed by the indicator of , e.g. the even numbers can be identified with the string from the alphabet , the multiples of three can be identified with the string , and so forth. One can also consider doubly infinite strings , which among other things can be used to describe arbitrary subsets of integers.
On the other hand, the basic object of study in dynamics (and in related fields, such as ergodic theory) is that of a dynamical system , that is to say a space together with a shift map (which is often assumed to be invertible, although one can certainly study non-invertible dynamical systems as well). One often adds additional structure to this dynamical system, such as topological structure (giving rise topological dynamics), measure-theoretic structure (giving rise to ergodic theory), complex structure (giving rise to complex dynamics), and so forth. A dynamical system gives rise to an action of the natural numbers on the space by using the iterates of for ; if is invertible, we can extend this action to an action of the integers on the same space. One can certainly also consider dynamical systems whose underlying group (or semi-group) is something other than or (e.g. one can consider continuous dynamical systems in which the evolution group is ), but we will restrict attention to the classical situation of or actions here.
There is a fundamental correspondence principle connecting the study of strings (or subsets of natural numbers or integers) with the study of dynamical systems. In one direction, given a dynamical system , an observable taking values in some alphabet , and some initial datum , we can first form the forward orbit of , and then observe this orbit using to obtain an infinite string . If the shift in this system is invertible, one can extend this infinite string into a doubly infinite string . Thus we see that every quadruplet consisting of a dynamical system , an observable , and an initial datum creates an infinite string.
Example 1 If is the three-element set with the shift map , is the observable that takes the value at the residue class and zero at the other two classes, and one starts with the initial datum , then the observed string becomes the indicator of the multiples of three.
In the converse direction, every infinite string in some alphabet arises (in a decidedly non-unique fashion) from a quadruple in the above fashion. This can be easily seen by the following “universal” construction: take to be the set of infinite strings in the alphabet , let be the shift map
let be the observable
and let be the initial point
Then one easily sees that the observed string is nothing more than the original string . Note also that this construction can easily be adapted to doubly infinite strings by using instead of , at which point the shift map now becomes invertible. An important variant of this construction also attaches an invariant probability measure to that is associated to the limiting density of various sets associated to the string , and leads to the Furstenberg correspondence principle, discussed for instance in these previous blog posts. Such principles allow one to rigorously pass back and forth between the combinatorics of strings and the dynamics of systems; for instance, Furstenberg famously used his correspondence principle to demonstrate the equivalence of Szemerédi’s theorem on arithmetic progressions with what is now known as the Furstenberg multiple recurrence theorem in ergodic theory.
In the case when the alphabet is the binary alphabet , and (for technical reasons related to the infamous non-injectivity of the decimal representation system) the string does not end with an infinite string of s, then one can reformulate the above universal construction by taking to be the interval , to be the doubling map , to be the observable that takes the value on and on (that is, is the first binary digit of ), and is the real number (that is, in binary).
The above universal construction is very easy to describe, and is well suited for “generic” strings that have no further obvious structure to them, but it often leads to dynamical systems that are much larger and more complicated than is actually needed to produce the desired string , and also often obscures some of the key dynamical features associated to that sequence. For instance, to generate the indicator of the multiples of three that were mentioned previously, the above universal construction requires an uncountable space and a dynamics which does not obviously reflect the key features of the sequence such as its periodicity. (Using the unit interval model, the dynamics arise from the orbit of under the doubling map, which is a rather artificial way to describe the indicator function of the multiples of three.)
A related aesthetic objection to the universal construction is that of the four components of the quadruplet used to generate the sequence , three of the components are completely universal (in that they do not depend at all on the sequence ), leaving only the initial datum to carry all the distinctive features of the original sequence. While there is nothing wrong with this mathematically, from a conceptual point of view it would make sense to make all four components of the quadruplet to be adapted to the sequence, in order to take advantage of the accumulated intuition about various special dynamical systems (and special observables), not just special initial data.
One step in this direction can be made by restricting to the orbit of the initial datum (actually for technical reasons it is better to restrict to the topological closure of this orbit, in order to keep compact). For instance, starting with the sequence , the orbit now consists of just three points , , , bringing the system more in line with the example in Example 1. Technically, this is the “optimal” representation of the sequence by a quadruplet , because any other such representation is a factor of this representation (in the sense that there is a unique map with , , and ). However, from a conceptual point of view this representation is still somewhat unsatisfactory, given that the elements of the system are interpreted as infinite strings rather than elements of a more geometrically or algebraically rich object (e.g. points in a circle, torus, or other homogeneous space).
For general sequences , locating relevant geometric or algebraic structure in a dynamical system generating that sequence is an important but very difficult task (see e.g. this paper of Host and Kra, which is more or less devoted to precisely this task in the context of working out what component of a dynamical system controls the multiple recurrence behaviour of that system). However, for specific examples of sequences , one can use an informal procedure of educated guesswork in order to produce a more natural-looking quadruple that generates that sequence. This is not a particularly difficult or deep operation, but I found it very helpful in internalising the intuition behind the correspondence principle. Being non-rigorous, this procedure does not seem to be emphasised in most presentations of the correspondence principle, so I thought I would describe it here.
Let be an abelian countable discrete group. A measure-preserving -system (or -system for short) is a probability space , equipped with a measure-preserving action of the group , thus
for all and , and
for all , with equal to the identity map. Classically, ergodic theory has focused on the cyclic case (in which the are iterates of a single map , with elements of being interpreted as a time parameter), but one can certainly consider actions of other groups also (including continuous or non-abelian groups).
A -system is said to be strongly -mixing, or strongly mixing for short, if one has
for all , where the convergence is with respect to the one-point compactification of (thus, for every , there exists a compact (hence finite) subset of such that for all ).
Similarly, we say that a -system is strongly -mixing if one has
for all , thus for every , there exists a finite subset of such that
whenever all lie outside .
It is obvious that a strongly -mixing system is necessarily strong -mixing. In the case of -systems, it has been an open problem for some time, due to Rohlin, whether the converse is true:
Problem 1 (Rohlin’s problem) Is every strongly mixing -system necessarily strongly -mixing?
This is a surprisingly difficult problem. In the positive direction, a routine application of the Cauchy-Schwarz inequality (via van der Corput’s inequality) shows that every strongly mixing system is weakly -mixing, which roughly speaking means that converges to for most . Indeed, every weakly mixing system is in fact weakly mixing of all orders; see for instance this blog post of Carlos Matheus, or these lecture notes of myself. So the problem is to exclude the possibility of correlation between , , and for a small but non-trivial number of pairs .
It is also known that the answer to Rohlin’s problem is affirmative for rank one transformations (a result of Kalikow) and for shifts with purely singular continuous spectrum (a result of Host; note that strongly mixing systems cannot have any non-trivial point spectrum). Indeed, any counterexample to the problem, if it exists, is likely to be highly pathological.
In the other direction, Rohlin’s problem is known to have a negative answer for -systems, by a well-known counterexample of Ledrappier which can be described as follows. One can view a -system as being essentially equivalent to a stationary process of random variables in some range space indexed by , with being with the obvious shift map
In Ledrappier’s example, the take values in the finite field of two elements, and are selected at uniformly random subject to the “Pascal’s triangle” linear constraints
A routine application of the Kolmogorov extension theorem allows one to build such a process. The point is that due to the properties of Pascal’s triangle modulo (known as Sierpinski’s triangle), one has
for all powers of two . This is enough to destroy strong -mixing, because it shows a strong correlation between , , and for arbitrarily large and randomly chosen . On the other hand, one can still show that and are asymptotically uncorrelated for large , giving strong -mixing. Unfortunately, there are significant obstructions to converting Ledrappier’s example from a -system to a -system, as pointed out by de la Rue.
In this post, I would like to record a “finite field” variant of Ledrappier’s construction, in which is replaced by the function field ring , which is a “dyadic” (or more precisely, “triadic”) model for the integers (cf. this earlier blog post of mine). In other words:
Theorem 2 There exists a -system that is strongly -mixing but not strongly -mixing.
The idea is much the same as that of Ledrappier; one builds a stationary -process in which are chosen uniformly at random subject to the constraints
for all and all . Again, this system is manifestly not strongly -mixing, but can be shown to be strongly -mixing; I give details below the fold.
As I discussed in this previous post, in many cases the dyadic model serves as a good guide for the non-dyadic model. However, in this case there is a curious rigidity phenomenon that seems to prevent Ledrappier-type examples from being transferable to the one-dimensional non-dyadic setting; once one restores the Archimedean nature of the underlying group, the constraints (1) not only reinforce each other strongly, but also force so much linearity on the system that one loses the strong mixing property.
I have recently finished a draft version of my blog book “Poincaré’s legacies: pages from year two of a mathematical blog“, which covers all the mathematical posts from my blog in 2008, excluding those posts which primarily originated from other authors or speakers.
The draft is much longer – 694 pages – than the analogous draft from 2007 (which was 374 pages using the same style files). This is largely because of the two series of course lecture notes which dominate the book (and inspired its title), namely on ergodic theory and on the Poincaré conjecture. I am talking with the AMS staff about the possibility of splitting the book into two volumes, one focusing on ergodic theory, number theory, and combinatorics, and the other focusing on geometry, topology, and PDE (though there will certainly be miscellaneous sections that will basically be divided arbitrarily amongst the two volumes).
The draft probably also needs an index, which I will attend to at some point before publication.
As in the previous book, those comments and corrections from readers which were of a substantive and mathematical nature have been acknowledged in the text. In many cases, I was only able to refer to commenters by their internet handles; please email me if you wish to be attributed differently (or not to be attributed at all).
Any other suggestions, corrections, etc. are, of course welcome.
I learned some technical tricks for HTML to LaTeX conversion which made the process significantly faster than last year’s, although still rather tedious and time consuming; I thought I might share them below as they may be of use to anyone else contemplating a similar conversion.
Recent Comments