You are currently browsing the category archive for the ‘math.DS’ category.

Tamar Ziegler and I have just uploaded to the arXiv two related papers: “Concatenation theorems for anti-Gowers-uniform functions and Host-Kra characteoristic factors” and “polynomial patterns in primes“, with the former developing a “quantitative Bessel inequality” for local Gowers norms that is crucial in the latter.

We use the term “concatenation theorem” to denote results in which structural control of a function in two or more “directions” can be “concatenated” into structural control in a *joint* direction. A trivial example of such a concatenation theorem is the following: if a function is constant in the first variable (thus is constant for each ), and also constant in the second variable (thus is constant for each ), then it is constant in the joint variable . A slightly less trivial example: if a function is affine-linear in the first variable (thus, for each , there exist such that for all ) and affine-linear in the second variable (thus, for each , there exist such that for all ) then is a quadratic polynomial in ; in fact it must take the form

for some real numbers . (This can be seen for instance by using the affine linearity in to show that the coefficients are also affine linear.)

The same phenomenon extends to higher degree polynomials. Given a function from one additive group to another, we say that is of *degree less than * along a subgroup of if all the -fold iterated differences of along directions in vanish, that is to say

for all and , where is the difference operator

(We adopt the convention that the only of degree less than is the zero function.)

We then have the following simple proposition:

Proposition 1 (Concatenation of polynomiality)Let be of degree less than along one subgroup of , and of degree less than along another subgroup of , for some . Then is of degree less than along the subgroup of .

Note the previous example was basically the case when , , , , and .

*Proof:* The claim is trivial for or (in which is constant along or respectively), so suppose inductively and the claim has already been proven for smaller values of .

We take a derivative in a direction along to obtain

where is the shift of by . Then we take a further shift by a direction to obtain

leading to the *cocycle equation*

Since has degree less than along and degree less than along , has degree less than along and less than along , so is degree less than along by induction hypothesis. Similarly is also of degree less than along . Combining this with the cocycle equation we see that is of degree less than along for any , and hence is of degree less than along , as required.

While this proposition is simple, it already illustrates some basic principles regarding how one would go about proving a concatenation theorem:

- (i) One should perform induction on the degrees involved, and take advantage of the recursive nature of degree (in this case, the fact that a function is of less than degree along some subgroup of directions iff all of its first derivatives along are of degree less than ).
- (ii) Structure is preserved by operations such as addition, shifting, and taking derivatives. In particular, if a function is of degree less than along some subgroup , then any derivative of is also of degree less than along ,
*even if does not belong to*.

Here is another simple example of a concatenation theorem. Suppose an at most countable additive group acts by measure-preserving shifts on some probability space ; we call the pair (or more precisely ) a *-system*. We say that a function is a *generalised eigenfunction of degree less than * along some subgroup of and some if one has

almost everywhere for all , and some functions of degree less than along , with the convention that a function has degree less than if and only if it is equal to . Thus for instance, a function is an generalised eigenfunction of degree less than along if it is constant on almost every -ergodic component of , and is a generalised function of degree less than along if it is an eigenfunction of the shift action on almost every -ergodic component of . A basic example of a higher order eigenfunction is the function on the *skew shift* with action given by the generator for some irrational . One can check that for every integer , where is a generalised eigenfunction of degree less than along , so is of degree less than along .

We then have

Proposition 2 (Concatenation of higher order eigenfunctions)Let be a -system, and let be a generalised eigenfunction of degree less than along one subgroup of , and a generalised eigenfunction of degree less than along another subgroup of , for some . Then is a generalised eigenfunction of degree less than along the subgroup of .

The argument is almost identical to that of the previous proposition and is left as an exercise to the reader. The key point is the point (ii) identified earlier: the space of generalised eigenfunctions of degree less than along is preserved by multiplication and shifts, as well as the operation of “taking derivatives” even along directions that do not lie in . (To prove this latter claim, one should restrict to the region where is non-zero, and then divide by to locate .)

A typical example of this proposition in action is as follows: consider the -system given by the -torus with generating shifts

for some irrational , which can be checked to give a action

The function can then be checked to be a generalised eigenfunction of degree less than along , and also less than along , and less than along . One can view this example as the dynamical systems translation of the example (1) (see this previous post for some more discussion of this sort of correspondence).

The main results of our concatenation paper are analogues of these propositions concerning a more complicated notion of “polynomial-like” structure that are of importance in additive combinatorics and in ergodic theory. On the ergodic theory side, the notion of structure is captured by the *Host-Kra characteristic factors* of a -system along a subgroup . These factors can be defined in a number of ways. One is by duality, using the *Gowers-Host-Kra uniformity seminorms* (defined for instance here) . Namely, is the factor of defined up to equivalence by the requirement that

An equivalent definition is in terms of the *dual functions* of along , which can be defined recursively by setting and

where denotes the ergodic average along a Følner sequence in (in fact one can also define these concepts in non-amenable abelian settings as per this previous post). The factor can then be alternately defined as the factor generated by the dual functions for .

In the case when and is -ergodic, a deep theorem of Host and Kra shows that the factor is equivalent to the inverse limit of nilsystems of step less than . A similar statement holds with replaced by any finitely generated group by Griesmer, while the case of an infinite vector space over a finite field was treated in this paper of Bergelson, Ziegler, and myself. The situation is more subtle when is not -ergodic, or when is -ergodic but is a proper subgroup of acting non-ergodically, when one has to start considering measurable families of directional nilsystems; see for instance this paper of Austin for some of the subtleties involved (for instance, higher order group cohomology begins to become relevant!).

One of our main theorems is then

Proposition 3 (Concatenation of characteristic factors)Let be a -system, and let be measurable with respect to the factor and with respect to the factor for some and some subgroups of . Then is also measurable with respect to the factor .

We give two proofs of this proposition in the paper; an ergodic-theoretic proof using the Host-Kra theory of “cocycles of type (along a subgroup )”, which can be used to inductively describe the factors , and a combinatorial proof based on a combinatorial analogue of this proposition which is harder to state (but which roughly speaking asserts that a function which is nearly orthogonal to all bounded functions of small norm, and also to all bounded functions of small norm, is also nearly orthogonal to alll bounded functions of small norm). The combinatorial proof parallels the proof of Proposition 2. A key point is that dual functions obey a property analogous to being a generalised eigenfunction, namely that

where and is a “structured function of order ” along . (In the language of this previous paper of mine, this is an assertion that dual functions are uniformly almost periodic of order .) Again, the point (ii) above is crucial, and in particular it is key that any structure that has is inherited by the associated functions and . This sort of inheritance is quite easy to accomplish in the ergodic setting, as there is a ready-made language of factors to encapsulate the concept of structure, and the shift-invariance and -algebra properties of factors make it easy to show that just about any “natural” operation one performs on a function measurable with respect to a given factor, returns a function that is still measurable in that factor. In the finitary combinatorial setting, though, encoding the fact (ii) becomes a remarkably complicated notational nightmare, requiring a huge amount of “epsilon management” and “second-order epsilon management” (in which one manages not only scalar epsilons, but also function-valued epsilons that depend on other parameters). In order to avoid all this we were forced to utilise a nonstandard analysis framework for the combinatorial theorems, which made the arguments greatly resemble the ergodic arguments in many respects (though the two settings are still not equivalent, see this previous blog post for some comparisons between the two settings). Unfortunately the arguments are still rather complicated.

For combinatorial applications, dual formulations of the concatenation theorem are more useful. A direct dualisation of the theorem yields the following decomposition theorem: a bounded function which is small in norm can be split into a component that is small in norm, and a component that is small in norm. (One may wish to understand this type of result by first proving the following baby version: any function that has mean zero on every coset of , can be decomposed as the sum of a function that has mean zero on every coset, and a function that has mean zero on every coset. This is dual to the assertion that a function that is constant on every coset and constant on every coset, is constant on every coset.) Combining this with some standard “almost orthogonality” arguments (i.e. Cauchy-Schwarz) give the following Bessel-type inequality: if one has a lot of subgroups and a bounded function is small in norm for most , then it is also small in norm for most . (Here is a baby version one may wish to warm up on: if a function has small mean on for some large prime , then it has small mean on most of the cosets of most of the one-dimensional subgroups of .)

There is also a generalisation of the above Bessel inequality (as well as several of the other results mentioned above) in which the subgroups are replaced by more general *coset progressions* (of bounded rank), so that one has a Bessel inequailty controlling “local” Gowers uniformity norms such as by “global” Gowers uniformity norms such as . This turns out to be particularly useful when attempting to compute polynomial averages such as

for various functions . After repeated use of the van der Corput lemma, one can control such averages by expressions such as

(actually one ends up with more complicated expressions than this, but let’s use this example for sake of discussion). This can be viewed as an average of various Gowers uniformity norms of along arithmetic progressions of the form for various . Using the above Bessel inequality, this can be controlled in turn by an average of various Gowers uniformity norms along rank two generalised arithmetic progressions of the form for various . But for generic , this rank two progression is close in a certain technical sense to the “global” interval (this is ultimately due to the basic fact that two randomly chosen large integers are likely to be coprime, or at least have a small gcd). As a consequence, one can use the concatenation theorems from our first paper to control expressions such as (2) in terms of *global* Gowers uniformity norms. This is important in number theoretic applications, when one is interested in computing sums such as

or

where and are the Möbius and von Mangoldt functions respectively. This is because we are able to control global Gowers uniformity norms of such functions (thanks to results such as the proof of the inverse conjecture for the Gowers norms, the orthogonality of the Möbius function with nilsequences, and asymptotics for linear equations in primes), but much less control is currently available for local Gowers uniformity norms, even with the assistance of the generalised Riemann hypothesis (see this previous blog post for some further discussion).

By combining these tools and strategies with the “transference principle” approach from our previous paper (as improved using the recent “densification” technique of Conlon, Fox, and Zhao, discussed in this previous post), we are able in particular to establish the following result:

Theorem 4 (Polynomial patterns in the primes)Let be polynomials of degree at most , whose degree coefficients are all distinct, for some . Suppose that is admissible in the sense that for every prime , there are such that are all coprime to . Then there exist infinitely many pairs of natural numbers such that are prime.

Furthermore, we obtain an asymptotic for the number of such pairs in the range , (actually for minor technical reasons we reduce the range of to be very slightly less than ). In fact one could in principle obtain asymptotics for smaller values of , and relax the requirement that the degree coefficients be distinct with the requirement that no two of the differ by a constant, provided one had good enough local uniformity results for the Möbius or von Mangoldt functions. For instance, we can obtain an asymptotic for triplets of the form unconditionally for , and conditionally on GRH for all , using known results on primes in short intervals on average.

The case of this theorem was obtained in a previous paper of myself and Ben Green (using the aforementioned conjectures on the Gowers uniformity norm and the orthogonality of the Möbius function with nilsequences, both of which are now proven). For higher , an older result of Tamar and myself was able to tackle the case when (though our results there only give lower bounds on the number of pairs , and no asymptotics). Both of these results generalise my older theorem with Ben Green on the primes containing arbitrarily long arithmetic progressions. The theorem also extends to multidimensional polynomials, in which case there are some additional previous results; see the paper for more details. We also get a technical refinement of our previous result on narrow polynomial progressions in (dense subsets of) the primes by making the progressions just a little bit narrower in the case of the density of the set one is using is small.

. This latter Bessel type inequality is particularly useful in combinatorial and number-theoretic applications, as it allows one to convert “global” Gowers uniformity norm (basically, bounds on norms such as ) to “local” Gowers uniformity norm control.

Note: this post is of a particularly technical nature, in particular presuming familiarity with nilsequences, nilsystems, characteristic factors, etc., and is primarily intended for experts.

As mentioned in the previous post, Ben Green, Tamar Ziegler, and myself proved the following inverse theorem for the Gowers norms:

Theorem 1 (Inverse theorem for Gowers norms)Let and be integers, and let . Suppose that is a function supported on such thatThen there exists a filtered nilmanifold of degree and complexity , a polynomial sequence , and a Lipschitz function of Lipschitz constant such that

This result was conjectured earlier by Ben Green and myself; this conjecture was strongly motivated by an analogous inverse theorem in ergodic theory by Host and Kra, which we formulate here in a form designed to resemble Theorem 1 as closely as possible:

Theorem 2 (Inverse theorem for Gowers-Host-Kra seminorms)Let be an integer, and let be an ergodic, countably generated measure-preserving system. Suppose that one hasfor all non-zero (all spaces are real-valued in this post). Then is an inverse limit (in the category of measure-preserving systems, up to almost everywhere equivalence) of ergodic degree nilsystems, that is to say systems of the form for some degree filtered nilmanifold and a group element that acts ergodically on .

It is a natural question to ask if there is any logical relationship between the two theorems. In the finite field category, one can deduce the combinatorial inverse theorem from the ergodic inverse theorem by a variant of the Furstenberg correspondence principle, as worked out by Tamar Ziegler and myself, however in the current context of -actions, the connection is less clear.

One can split Theorem 2 into two components:

Theorem 3 (Weak inverse theorem for Gowers-Host-Kra seminorms)Let be an integer, and let be an ergodic, countably generated measure-preserving system. Suppose that one hasfor all non-zero , where . Then is a

factorof an inverse limit of ergodic degree nilsystems.

Theorem 4 (Pro-nilsystems closed under factors)Let be an integer. Then any factor of an inverse limit of ergodic degree nilsystems, is again an inverse limit of ergodic degree nilsystems.

Indeed, it is clear that Theorem 2 implies both Theorem 3 and Theorem 4, and conversely that the two latter theorems jointly imply the former. Theorem 4 is, in principle, purely a fact about nilsystems, and should have an independent proof, but this is not known; the only known proofs go through the full machinery needed to prove Theorem 2 (or the closely related theorem of Ziegler). (However, the fact that a factor of a nilsystem is again a nilsystem was established previously by Parry.)

The purpose of this post is to record a partial implication in reverse direction to the correspondence principle:

As mentioned at the start of the post, a fair amount of familiarity with the area is presumed here, and some routine steps will be presented with only a fairly brief explanation.

A few years ago, Ben Green, Tamar Ziegler, and myself proved the following (rather technical-looking) inverse theorem for the Gowers norms:

Theorem 1 (Discrete inverse theorem for Gowers norms)Let and be integers, and let . Suppose that is a function supported on such thatThen there exists a filtered nilmanifold of degree and complexity , a polynomial sequence , and a Lipschitz function of Lipschitz constant such that

For the definitions of “filtered nilmanifold”, “degree”, “complexity”, and “polynomial sequence”, see the paper of Ben, Tammy, and myself. (I should caution the reader that this blog post will presume a fair amount of familiarity with this subfield of additive combinatorics.) This result has a number of applications, for instance to establishing asymptotics for linear equations in the primes, but this will not be the focus of discussion here.

The purpose of this post is to record the observation that this “discrete” inverse theorem, together with an equidistribution theorem for nilsequences that Ben and I worked out in a separate paper, implies a continuous version:

Theorem 2 (Continuous inverse theorem for Gowers norms)Let be an integer, and let . Suppose that is a measurable function supported on such thatThen there exists a filtered nilmanifold of degree and complexity , a (smooth) polynomial sequence , and a Lipschitz function of Lipschitz constant such that

The interval can be easily replaced with any other fixed interval by a change of variables. A key point here is that the bounds are completely uniform in the choice of . Note though that the coefficients of can be arbitrarily large (and this is necessary, as can be seen just by considering functions of the form for some arbitrarily large frequency ).

It is likely that one could prove Theorem 2 by carefully going through the proof of Theorem 1 and replacing all instances of with (and making appropriate modifications to the argument to accommodate this). However, the proof of Theorem 1 is quite lengthy. Here, we shall proceed by the usual limiting process of viewing the continuous interval as a limit of the discrete interval as . However there will be some problems taking the limit due to a failure of compactness, and specifically with regards to the coefficients of the polynomial sequence produced by Theorem 1, after normalising these coefficients by . Fortunately, a factorisation theorem from a paper of Ben Green and myself resolves this problem by splitting into a “smooth” part which does enjoy good compactness properties, as well as “totally equidistributed” and “periodic” parts which can be eliminated using the measurability (and thus, approximate smoothness), of .

Szemerédi’s theorem asserts that any subset of the integers of positive upper density contains arbitrarily large arithmetic progressions. Here is an equivalent quantitative form of this theorem:

Theorem 1 (Szemerédi’s theorem)Let be a positive integer, and let be a function with for some , where we use the averaging notation , , etc.. Then for we havefor some depending only on .

The equivalence is basically thanks to an averaging argument of Varnavides; see for instance Chapter 11 of my book with Van Vu or this previous blog post for a discussion. We have removed the cases as they are trivial and somewhat degenerate.

There are now many proofs of this theorem. Some time ago, I took an ergodic-theoretic proof of Furstenberg and converted it to a purely finitary proof of the theorem. The argument used some simplifying innovations that had been developed since the original work of Furstenberg (in particular, deployment of the Gowers uniformity norms, as well as a “dual” norm that I called the uniformly almost periodic norm, and an emphasis on van der Waerden’s theorem for handling the “compact extension” component of the argument). But the proof was still quite messy. However, as discussed in this previous blog post, messy finitary proofs can often be cleaned up using nonstandard analysis. Thus, there should be a nonstandard version of the Furstenberg ergodic theory argument that is relatively clean. I decided (after some encouragement from Ben Green and Isaac Goldbring) to write down most of the details of this argument in this blog post, though for sake of brevity I will skim rather quickly over arguments that were already discussed at length in other blog posts. In particular, I will presume familiarity with nonstandard analysis (in particular, the notion of a standard part of a bounded real number, and the Loeb measure construction), see for instance this previous blog post for a discussion.

I’ve just uploaded to the arXiv my paper “Failure of the pointwise and maximal ergodic theorems for the free group“, submitted to Forum of Mathematics, Sigma. This paper concerns a variant of the pointwise ergodic theorem of Birkhoff, which asserts that if one has a measure-preserving shift map on a probability space , then for any , the averages converge pointwise almost everywhere. (In the important case when the shift map is ergodic, the pointwise limit is simply the mean of the original function .)

The pointwise ergodic theorem can be extended to measure-preserving actions of other amenable groups, if one uses a suitably “tempered” Folner sequence of averages; see this paper of Lindenstrauss for more details. (I also wrote up some notes on that paper here, back in 2006 before I had started this blog.) But the arguments used to handle the amenable case break down completely for non-amenable groups, and in particular for the free non-abelian group on two generators.

Nevo and Stein studied this problem and obtained a number of pointwise ergodic theorems for -actions on probability spaces . For instance, for the spherical averaging operators

(where denotes the length of the reduced word that forms ), they showed that converged pointwise almost everywhere provided that was in for some . (The need to restrict to spheres of even radius can be seen by considering the action of on the two-element set in which both generators of act by interchanging the elements, in which case is determined by the parity of .) This result was reproven with a different and simpler proof by Bufetov, who also managed to relax the condition to the weaker condition .

The question remained open as to whether the pointwise ergodic theorem for -actions held if one only assumed that was in . Nevo and Stein were able to establish this for the Cesáro averages , but not for itself. About six years ago, Assaf Naor and I tried our hand at this problem, and was able to show an associated maximal inequality on , but due to the non-amenability of , this inequality did not transfer to and did not have any direct impact on this question, despite a fair amount of effort on our part to attack it.

Inspired by some recent conversations with Lewis Bowen, I returned to this problem. This time around, I tried to construct a counterexample to the pointwise ergodic theorem – something Assaf and I had not seriously attempted to do (perhaps due to being a bit too enamoured of our maximal inequality). I knew of an existing counterexample of Ornstein regarding a failure of an ergodic theorem for iterates of a self-adjoint Markov operator – in fact, I had written some notes on this example back in 2007. Upon revisiting my notes, I soon discovered that the Ornstein construction was adaptable to the setting, thus settling the problem in the negative:

Theorem 1 (Failure of pointwise ergodic theorem)There exists a measure-preserving -action on a probability space and a non-negative function such that for almost every .

To describe the proof of this theorem, let me first briefly sketch the main ideas of Ornstein’s construction, which gave an example of a self-adjoint Markov operator on a probability space and a non-negative such that for almost every . By some standard manipulations, it suffices to show that for any given and , there exists a self-adjoint Markov operator on a probability space and a non-negative with , such that on a set of measure at least . Actually, it will be convenient to replace the Markov chain with an *ancient Markov chain* – that is to say, a sequence of non-negative functions for both positive and negative , such that for all . The purpose of requiring the Markov chain to be ancient (that is, to extend infinitely far back in time) is to allow for the Markov chain to be shifted arbitrarily in time, which is key to Ornstein’s construction. (Technically, Ornstein’s original argument only uses functions that go back to a large negative time, rather than being infinitely ancient, but I will gloss over this point for sake of discussion, as it turns out that the version of the argument can be run using infinitely ancient chains.)

For any , let denote the claim that for any , there exists an ancient Markov chain with such that on a set of measure at least . Clearly holds since we can just take for all . Our objective is to show that holds for arbitrarily small . The heart of Ornstein’s argument is then the implication

for any , which upon iteration quickly gives the desired claim.

Let’s see informally how (1) works. By hypothesis, and ignoring epsilons, we can find an ancient Markov chain on some probability space of total mass , such that attains the value of or greater almost everywhere. Assuming that the Markov process is irreducible, the will eventually converge as to the constant value of , in particular its final state will essentially stay above (up to small errors).

Now suppose we duplicate the Markov process by replacing with a double copy (giving the uniform probability measure), and using the disjoint sum of the Markov operators on and as the propagator, so that there is no interaction between the two components of this new system. Then the functions form an ancient Markov chain of mass at most that lives solely in the first half of this copy, and attains the value of or greater on almost all of the first half , but is zero on the second half. The final state of will be to stay above in the first half , but be zero on the second half.

Now we modify the above example by allowing an infinitesimal amount of interaction between the two halves , of the system (I mentally think of and as two identical boxes that a particle can bounce around in, and now we wish to connect the boxes by a tiny tube). The precise way in which this interaction is inserted is not terribly important so long as the new Markov process is irreducible. Once one does so, then the ancient Markov chain in the previous example gets replaced by a slightly different ancient Markov chain which is more or less identical with for negative times , or for bounded positive times , but for very large values of the final state is now constant across the entire state space , and will stay above on this space.

Finally, we consider an ancient Markov chain which is basically of the form

for some large parameter and for all (the approximation becomes increasingly inaccurate for much larger than , but never mind this for now). This is basically two copies of the original Markov process in separate, barely interacting state spaces , but with the second copy delayed by a large time delay , and also attenuated in amplitude by a factor of . The total mass of this process is now . Because of the component of , we see that basically attains the value of or greater on the first half . On the second half , we work with times close to . If is large enough, would have averaged out to about at such times, but the component can get as large as here. Summing (and continuing to ignore various epsilon losses), we see that can get as large as on almost all of the second half of . This concludes the rough sketch of how one establishes the implication (1).

It was observed by Bufetov that the spherical averages for a free group action can be lifted up to become powers of a Markov operator, basically by randomly assigning a “velocity vector” to one’s base point and then applying the Markov process that moves along that velocity vector (and then randomly changing the velocity vector at each time step to the “reduced word” condition that the velocity never flips from to ). Thus the spherical average problem has a Markov operator interpretation, which opens the door to adapting the Ornstein construction to the setting of systems. This turns out to be doable after a certain amount of technical artifice; the main thing is to work with -measure preserving systems that admit ancient Markov chains that are initially supported in a very small region in the “interior” of the state space, so that one can couple such systems to each other “at the boundary” in the fashion needed to establish the analogue of (1) without disrupting the ancient dynamics of such chains. The initial such system (used to establish the base case ) comes from basically considering the action of on a (suitably renormalised) “infinitely large ball” in the Cayley graph, after suitably gluing together the boundary of this ball to complete the action. The ancient Markov chain associated to this system starts at the centre of this infinitely large ball at infinite negative time , and only reaches the boundary of this ball at the time .

An extremely large portion of mathematics is concerned with locating solutions to equations such as

for in some suitable domain space (either finite-dimensional or infinite-dimensional), and various maps or . To solve the fixed point iteration equation (1), the simplest general method available is the fixed point iteration method: one starts with an initial *approximate solution* to (1), so that , and then recursively constructs the sequence by . If behaves enough like a “contraction”, and the domain is complete, then one can expect the to converge to a limit , which should then be a solution to (1). For instance, if is a map from a metric space to itself, which is a contraction in the sense that

for all and some , then with as above we have

for any , and so the distances between successive elements of the sequence decay at at least a geometric rate. This leads to the contraction mapping theorem, which has many important consequences, such as the inverse function theorem and the Picard existence theorem.

A slightly more complicated instance of this strategy arises when trying to *linearise* a complex map defined in a neighbourhood of a fixed point. For simplicity we normalise the fixed point to be the origin, thus and . When studying the complex dynamics , , of such a map, it can be useful to try to conjugate to another function , where is a holomorphic function defined and invertible near with , since the dynamics of will be conjguate to that of . Note that if and , then from the chain rule any conjugate of will also have and . Thus, the “simplest” function one can hope to conjugate to is the linear function . Let us say that is *linearisable* (around ) if it is conjugate to in some neighbourhood of . Equivalently, is linearisable if there is a solution to the Schröder equation

for some defined and invertible in a neighbourhood of with , and all sufficiently close to . (The Schröder equation is normalised somewhat differently in the literature, but this form is equivalent to the usual form, at least when is non-zero.) Note that if solves the above equation, then so does for any non-zero , so we may normalise in addition to , which also ensures local invertibility from the inverse function theorem. (Note from winding number considerations that cannot be invertible near zero if vanishes.)

We have the following basic result of Koenigs:

Theorem 1 (Koenig’s linearisation theorem)Let be a holomorphic function defined near with and . If (attracting case) or (repelling case), then is linearisable near zero.

*Proof:* Observe that if solve (2), then solve (2) also (in a sufficiently small neighbourhood of zero). Thus we may reduce to the attractive case .

Let be a sufficiently small radius, and let denote the space of holomorphic functions on the complex disk with and . We can view the Schröder equation (2) as a fixed point equation

where is the partially defined function on that maps a function to the function defined by

assuming that is well-defined on the range of (this is why is only partially defined).

We can solve this equation by the fixed point iteration method, if is small enough. Namely, we start with being the identity map, and set , etc. We equip with the uniform metric . Observe that if , and is small enough, then takes values in , and are well-defined and lie in . Also, since is smooth and has derivative at , we have

if , and is sufficiently small depending on . This is not yet enough to establish the required contraction (thanks to Mario Bonk for pointing this out); but observe that the function is holomorphic on and bounded by on the boundary of this ball (or slightly within this boundary), so by the maximum principle we see that

on all of , and in particular

on . Putting all this together, we see that

since , we thus obtain a contraction on the ball if is small enough (and sufficiently small depending on ). From this (and the completeness of , which follows from Morera’s theorem) we see that the iteration converges (exponentially fast) to a limit which is a fixed point of , and thus solves Schröder’s equation, as required.

Koenig’s linearisation theorem leaves open the *indifferent case* when . In the *rationally indifferent* case when for some natural number , there is an obvious obstruction to linearisability, namely that (in particular, linearisation is not possible in this case when is a non-trivial rational function). An obstruction is also present in some *irrationally indifferent* cases (where but for any natural number ), if is sufficiently close to various roots of unity; the first result of this form is due to Cremer, and the optimal result of this type for quadratic maps was established by Yoccoz. In the other direction, we have the following result of Siegel:

Theorem 2 (Siegel’s linearisation theorem)Let be a holomorphic function defined near with and . If and one has the Diophantine condition for all natural numbers and some constant , then is linearisable at .

The Diophantine condition can be relaxed to a more general condition involving the rational exponents of the phase of ; this was worked out by Brjuno, with the condition matching the one later obtained by Yoccoz. Amusingly, while the set of Diophantine numbers (and hence the set of linearisable ) has full measure on the unit circle, the set of non-linearisable is generic (the complement of countably many nowhere dense sets) due to the above-mentioned work of Cremer, leading to a striking disparity between the measure-theoretic and category notions of “largeness”.

Siegel’s theorem does not seem to be provable using a fixed point iteration method. However, it can be established by modifying another basic method to solve equations, namely Newton’s method. Let us first review how this method works to solve the equation for some smooth function defined on an interval . We suppose we have some initial approximant to this equation, with small but not necessarily zero. To make the analysis more quantitative, let us suppose that the interval lies in for some , and we have the estimates

for some and and all (the factors of are present to make “dimensionless”).

Lemma 3Under the above hypotheses, we can find with such thatIn particular, setting , , and , we have , and

for all .

The crucial point here is that the new error is roughly the square of the previous error . This leads to extremely fast (double-exponential) improvement in the error upon iteration, which is more than enough to absorb the exponential losses coming from the factor.

*Proof:* If for some absolute constants then we may simply take , so we may assume that for some small and large . Using the Newton approximation we are led to the choice

for . From the hypotheses on and the smallness hypothesis on we certainly have . From Taylor’s theorem with remainder we have

and the claim follows.

We can iterate this procedure; starting with as above, we obtain a sequence of nested intervals with , and with evolving by the recursive equations and estimates

If is sufficiently small depending on , we see that converges rapidly to zero (indeed, we can inductively obtain a bound of the form for some large absolute constant if is small enough), and converges to a limit which then solves the equation by the continuity of .

As I recently learned from Zhiqiang Li, a similar scheme works to prove Siegel’s theorem, as can be found for instance in this text of Carleson and Gamelin. The key is the following analogue of Lemma 3.

Lemma 4Let be a complex number with and for all natural numbers . Let , and let be a holomorphic function with , , andfor all and some . Let , and set . Then there exists an injective holomorphic function and a holomorphic function such that

and

for all and some .

*Proof:* By scaling we may normalise . If for some constants , then we can simply take to be the identity and , so we may assume that for some small and large .

To motivate the choice of , we write and , with and viewed as small. We would like to have , which expands as

As and are both small, we can heuristically approximate up to quadratic errors (compare with the Newton approximation ), and arrive at the equation

This equation can be solved by Taylor series; the function vanishes to second order at the origin and thus has a Taylor expansion

and then has a Taylor expansion

We take this as our definition of , define , and then define implicitly via (4).

Let us now justify that this choice works. By (3) and the generalised Cauchy integral formula, we have for all ; by the Diophantine assumption on , we thus have . In particular, converges on , and on the disk (say) we have the bounds

In particular, as is so small, we see that maps injectively to and to , and the inverse maps to . From (3) we see that maps to , and so if we set to be the function , then is a holomorphic function obeying (4). Expanding (4) in terms of and as before, and also writing , we have

for , which by (5) simplifies to

From (6), the fundamental theorem of calculus, and the smallness of we have

and thus

From (3) and the Cauchy integral formula we have on (say) , and so from (6) and the fundamental theorem of calculus we conclude that

on , and the claim follows.

If we set , , and to be sufficiently small, then (since vanishes to second order at the origin), the hypotheses of this lemma will be obeyed for some sufficiently small . Iterating the lemma (and halving repeatedly), we can then find sequences , injective holomorphic functions and holomorphic functions such that one has the recursive identities and estimates

for all and . By construction, decreases to a positive radius that is a constant multiple of , while (for small enough) converges double-exponentially to zero, so in particular converges uniformly to on . Also, is close enough to the identity, the compositions are uniformly convergent on with and . From this we have

on , and on taking limits using Morera’s theorem we obtain a holomorphic function defined near with , , and

obtaining the required linearisation.

Remark 5The idea of using a Newton-type method to obtain error terms that decay double-exponentially, and can therefore absorb exponential losses in the iteration, also occurs in KAM theory and in Nash-Moser iteration, presumably due to Siegel’s influence on Moser. (I discuss Nash-Moser iteration in this note that I wrote back in 2006.)

The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if is a unitary operator on a Hilbert space , and is a vector in that Hilbert space, then one has

in the strong topology, where is the -invariant subspace of , and is the orthogonal projection to . (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if is a countable amenable group acting on a Hilbert space by unitary transformations for , and is a vector in that Hilbert space, then one has

for any Folner sequence of , where is the -invariant subspace, and is the average of on . Thus one can interpret as a certain average of elements of the orbit of .

In a previous blog post, I noted a variant of this ergodic theorem (due to Alaoglu and Birkhoff) that holds even when the group is not amenable (or not discrete), using a more abstract notion of averaging:

Theorem 1 (Abstract ergodic theorem)Let be an arbitrary group acting unitarily on a Hilbert space , and let be a vector in . Then is the element in the closed convex hull of of minimal norm, and is also the unique element of in this closed convex hull.

I recently stumbled upon a different way to think about this theorem, in the additive case when is abelian, which has a closer resemblance to the classical mean ergodic theorem. Given an arbitrary additive group (not necessarily discrete, or countable), let denote the collection of finite non-empty multisets in – that is to say, unordered collections of elements of , not necessarily distinct, for some positive integer . Given two multisets , in , we can form the sum set . Note that the sum set can contain multiplicity even when do not; for instance, . Given a multiset in , and a function from to a vector space , we define the average as

Note that the multiplicity function of the set affects the average; for instance, we have , but .

We can define a directed set on as follows: given two multisets , we write if we have for some . Thus for instance we have . It is easy to verify that this operation is transitive and reflexive, and is directed because any two elements of have a common upper bound, namely . (This is where we need to be abelian.) The notion of convergence along a net, now allows us to define the notion of convergence along ; given a family of points in a topological space indexed by elements of , and a point in , we say that *converges* to along if, for every open neighbourhood of in , one has for sufficiently large , that is to say there exists such that for all . If the topological space is Hausdorff, then the limit is unique (if it exists), and we then write

When takes values in the reals, one can also define the limit superior or limit inferior along such nets in the obvious fashion.

We can then give an alternate formulation of the abstract ergodic theorem in the abelian case:

Theorem 2 (Abelian abstract ergodic theorem)Let be an arbitrary additive group acting unitarily on a Hilbert space , and let be a vector in . Then we havein the strong topology of .

*Proof:* Suppose that , so that for some , then

so by unitarity and the triangle inequality we have

thus is monotone non-increasing in . Since this quantity is bounded between and , we conclude that the limit exists. Thus, for any , we have for sufficiently large that

for all . In particular, for any , we have

We can write

and so from the parallelogram law and unitarity we have

for all , and hence by the triangle inequality (averaging over a finite multiset )

for any . This shows that is a Cauchy sequence in (in the strong topology), and hence (by the completeness of ) tends to a limit. Shifting by a group element , we have

and hence is invariant under shifts, and thus lies in . On the other hand, for any and , we have

and thus on taking strong limits

and so is orthogonal to . Combining these two facts we see that is equal to as claimed.

To relate this result to the classical ergodic theorem, we observe

Lemma 3Let be a countable additive group, with a F{\o}lner sequence , and let be a bounded sequence in a normed vector space indexed by . If exists, then exists, and the two limits are equal.

*Proof:* From the F{\o}lner property, we see that for any and any , the averages and differ by at most in norm if is sufficiently large depending on , (and the ). On the other hand, by the existence of the limit , the averages and differ by at most in norm if is sufficiently large depending on (regardless of how large is). The claim follows.

It turns out that this approach can also be used as an alternate way to construct the Gowers–Host-Kra seminorms in ergodic theory, which has the feature that it does not explicitly require any amenability on the group (or separability on the underlying measure space), though, as pointed out to me in comments, even uncountable abelian groups are amenable in the sense of possessing an invariant mean, even if they do not have a F{\o}lner sequence.

Given an arbitrary additive group , define a *-system* to be a probability space (not necessarily separable or standard Borel), together with a collection of invertible, measure-preserving maps, such that is the identity and (modulo null sets) for all . This then gives isomorphisms for by setting . From the above abstract ergodic theorem, we see that

in the strong topology of for any , where is the collection of measurable sets that are essentially -invariant in the sense that modulo null sets for all , and is the conditional expectation of with respect to .

In a similar spirit, we have

Theorem 4 (Convergence of Gowers-Host-Kra seminorms)Let be a -system for some additive group . Let be a natural number, and for every , let , which for simplicity we take to be real-valued. Then the expressionconverges, where we write , and we are using the product direct set on to define the convergence . In particular, for , the limit

converges.

We prove this theorem below the fold. It implies a number of other known descriptions of the Gowers-Host-Kra seminorms , for instance that

for , while from the ergodic theorem we have

This definition also manifestly demonstrates the cube symmetries of the Host-Kra measures on , defined via duality by requiring that

In a subsequent blog post I hope to present a more detailed study of the norm and its relationship with eigenfunctions and the Kronecker factor, without assuming any amenability on or any separability or topological structure on .

The 2014 Fields medallists have just been announced as (in alphabetical order of surname) Artur Avila, Manjul Bhargava, Martin Hairer, and Maryam Mirzakhani (see also these nice video profiles for the winners, which is a new initiative of the IMU and the Simons foundation). This time four years ago, I wrote a blog post discussing one result from each of the 2010 medallists; I thought I would try to repeat the exercise here, although the work of the medallists this time around is a little bit further away from my own direct area of expertise than last time, and so my discussion will unfortunately be a bit superficial (and possibly not completely accurate) in places. As before, I am picking these results based on my own idiosyncratic tastes, and they should not be viewed as necessarily being the “best” work of these medallists. (See also the press releases for Avila, Bhargava, Hairer, and Mirzakhani.)

Artur Avila works in dynamical systems and in the study of Schrödinger operators. The work of Avila that I am most familiar with is his solution with Svetlana Jitormiskaya of the *ten martini problem* of Kac, the solution to which (according to Barry Simon) he offered ten martinis for, hence the name. The problem involves perhaps the simplest example of a Schrödinger operator with non-trivial spectral properties, namely the almost Mathieu operator defined for parameters and by a discrete one-dimensional Schrödinger operator with cosine potential:

This is a bounded self-adjoint operator and thus has a spectrum that is a compact subset of the real line; it arises in a number of physical contexts, most notably in the theory of the integer quantum Hall effect, though I will not discuss these applications here. Remarkably, the structure of this spectrum depends crucially on the Diophantine properties of the frequency . For instance, if is a rational number, then the operator is periodic with period , and then basic (discrete) Floquet theory tells us that the spectrum is simply the union of (possibly touching) intervals. But for irrational (in which case the spectrum is independent of the phase ), the situation is much more fractal in nature, for instance in the critical case the spectrum (as a function of ) gives rise to the Hofstadter butterfly. The “ten martini problem” asserts that for *every* irrational and every choice of coupling constant , the spectrum is homeomorphic to a Cantor set. Prior to the work of Avila and Jitormiskaya, there were a number of partial results on this problem, notably the result of Puig establishing Cantor spectrum for a full measure set of parameters , as well as results requiring a perturbative hypothesis, such as being very small or very large. The result was also already known for being either very close to rational (i.e. a Liouville number) or very far from rational (a Diophantine number), although the analyses for these two cases failed to meet in the middle, leaving some cases untreated. The argument uses a wide variety of existing techniques, both perturbative and non-perturbative, to attack this problem, as well as an amusing argument by contradiction: they assume (in certain regimes) that the spectrum *fails* to be a Cantor set, and use this hypothesis to obtain additional Lipschitz control on the spectrum (as a function of the frequency ), which they can then use (after much effort) to improve existing arguments and conclude that the spectrum was in fact Cantor after all!

Manjul Bhargava produces amazingly beautiful mathematics, though most of it is outside of my own area of expertise. One part of his work that touches on an area of my own interest (namely, random matrix theory) is his ongoing work with many co-authors on modeling (both conjecturally and rigorously) the statistics of various key number-theoretic features of elliptic curves (such as their rank, their Selmer group, or their Tate-Shafarevich groups). For instance, with Kane, Lenstra, Poonen, and Rains, Manjul has proposed a very general random matrix model that predicts all of these statistics (for instance, predicting that the -component of the Tate-Shafarevich group is distributed like the cokernel of a certain random -adic matrix, very much in the spirit of the Cohen-Lenstra heuristics discussed in this previous post). But what is even more impressive is that Manjul and his coauthors have been able to *verify* several non-trivial fragments of this model (e.g. showing that certain moments have the predicted asymptotics), giving for the first time non-trivial upper and lower bounds for various statistics, for instance obtaining lower bounds on how often an elliptic curve has rank or rank , leading most recently (in combination with existing work of Gross-Zagier and of Kolyvagin, among others) to his amazing result with Skinner and Zhang that at least of all elliptic curves over (ordered by height) obey the Birch and Swinnerton-Dyer conjecture. Previously it was not even known that a positive proportion of curves obeyed the conjecture. This is still a fair ways from resolving the conjecture fully (in particular, the situation with the presumably small number of curves of rank and higher is still very poorly understood, and the theory of Gross-Zagier and Kolyvagin that this work relies on, which was initially only available for , has only been extended to totally real number fields thus far, by the work of Zhang), but it certainly does provide hope that the conjecture could be within reach in a statistical sense at least.

Martin Hairer works in at the interface between probability and partial differential equations, and in particular in the theory of stochastic differential equations (SDEs). The result of his that is closest to my own interests is his remarkable demonstration with Jonathan Mattingly of unique invariant measure for the two-dimensional stochastically forced Navier-Stokes equation

on the two-torus , where is a Gaussian field that forces a fixed set of frequencies. It is expected that for any reasonable choice of initial data, the solution to this equation should asymptotically be distributed according to Kolmogorov’s power law, as discussed in this previous post. This is still far from established rigorously (although there are some results in this direction for dyadic models, see e.g. this paper of Cheskidov, Shvydkoy, and Friedlander). However, Hairer and Mattingly were able to show that there was a unique probability distribution to almost every initial data would converge to asymptotically; by the ergodic theorem, this is equivalent to demonstrating the existence and uniqueness of an invariant measure for the flow. Existence can be established using standard methods, but uniqueness is much more difficult. One of the standard routes to uniqueness is to establish a “strong Feller property” that enforces some continuity on the transition operators; among other things, this would mean that two ergodic probability measures with intersecting supports would in fact have a non-trivial common component, contradicting the ergodic theorem (which forces different ergodic measures to be mutually singular). Since all ergodic measures for Navier-Stokes can be seen to contain the origin in their support, this would give uniqueness. Unfortunately, the strong Feller property is unlikely to hold in the infinite-dimensional phase space for Navier-Stokes; but Hairer and Mattingly develop a clean abstract substitute for this property, which they call the *asymptotic strong Feller* property, which is again a regularity property on the transition operator; this in turn is then demonstrated by a careful application of Malliavin calculus.

Maryam Mirzakhani has mostly focused on the geometry and dynamics of Teichmuller-type moduli spaces, such as the moduli space of Riemann surfaces with a fixed genus and a fixed number of cusps (or with a fixed number of boundaries that are geodesics of a prescribed length). These spaces have an incredibly rich structure, ranging from geometric structure (such as the Kahler geometry given by the Weil-Petersson metric), to dynamical structure (through the action of the mapping class group on this and related spaces), to algebraic structure (viewing these spaces as algebraic varieties), and are thus connected to many other objects of interest in geometry and dynamics. For instance, by developing a new recursive formula for the Weil-Petersson volume of this space, Mirzakhani was able to asymptotically count the number of *simple* prime geodesics of length up to some threshold in a hyperbolic surface (or more precisely, she obtained asymptotics for the number of such geodesics in a given orbit of the mapping class group); the answer turns out to be polynomial in , in contrast to the much larger class of *non-simple* prime geodesics, whose asymptotics are exponential in (the “prime number theorem for geodesics”, developed in a classic series of works by Delsart, Huber, Selberg, and Margulis); she also used this formula to establish a new proof of a conjecture of Witten on intersection numbers that was first proven by Kontsevich. More recently, in two lengthy papers with Eskin and with Eskin-Mohammadi, Mirzakhani established rigidity theorems for the action of on such moduli spaces that are close analogues of Ratner’s celebrated rigidity theorems for unipotently generated groups (discussed in this previous blog post). Ratner’s theorems are already notoriously difficult to prove, and rely very much on the polynomial stability properties of unipotent flows; in this even more complicated setting, the unipotent flows are no longer tractable, and Mirzakhani instead uses a recent “exponential drift” method of Benoist and Quint with as a substitute. Ratner’s theorems are incredibly useful for all sorts of problems connected to homogeneous dynamics, and the analogous theorems established by Mirzakhani, Eskin, and Mohammadi have a similarly broad range of applications, for instance in counting periodic billiard trajectories in rational polygons.

As laid out in the foundational work of Kolmogorov, a *classical probability space* (or probability space for short) is a triplet , where is a set, is a -algebra of subsets of , and is a countably additive probability measure on . Given such a space, one can form a number of interesting function spaces, including

- the (real) Hilbert space of square-integrable functions , modulo -almost everywhere equivalence, and with the positive definite inner product ; and
- the unital commutative Banach algebra of essentially bounded functions , modulo -almost everywhere equivalence, with defined as the essential supremum of .

There is also a trace on defined by integration: .

One can form the category of classical probability spaces, by defining a morphism between probability spaces to be a function which is measurable (thus for all ) and measure-preserving (thus for all ).

Let us now abstract the algebraic features of these spaces as follows; for want of a better name, I will refer to this abstraction as an *algebraic probability space*, and is very similar to the non-commutative probability spaces studied in this previous post, except that these spaces are now commutative (and real).

Definition 1Analgebraic probability spaceis a pair where

- is a unital commutative real algebra;
- is a homomorphism such that and for all ;
- Every element of is
boundedin the sense that . (Technically, this isn’t an algebraic property, but I need it for technical reasons.)A morphism is a homomorphism which is trace-preserving, in the sense that for all .

For want of a better name, I’ll denote the category of algebraic probability spaces as . One can view this category as the opposite category to that of (a subcategory of) the category of tracial commutative real algebras. One could emphasise this opposite nature by denoting the algebraic probability space as rather than ; another suggestive (but slightly inaccurate) notation, inspired by the language of schemes, would be rather than . However, we will not adopt these conventions here, and refer to algebraic probability spaces just by the pair .

By the previous discussion, we have a covariant functor that takes a classical probability space to its algebraic counterpart , with a morphism of classical probability spaces mapping to a morphism of the corresponding algebraic probability spaces by the formula

for . One easily verifies that this is a functor.

In this post I would like to describe a functor which partially inverts (up to natural isomorphism), that is to say a recipe for starting with an algebraic probability space and producing a classical probability space . This recipe is not new – it is basically the (commutative) Gelfand-Naimark-Segal construction (discussed in this previous post) combined with the Loomis-Sikorski theorem (discussed in this previous post). However, I wanted to put the construction in a single location for sake of reference. I also wanted to make the point that and are not complete inverses; there is a bit of information in the algebraic probability space (e.g. topological information) which is lost when passing back to the classical probability space. In some future posts, I would like to develop some ergodic theory using the algebraic foundations of probability theory rather than the classical foundations; this turns out to be convenient in the ergodic theory arising from nonstandard analysis (such as that described in this previous post), in which the groups involved are uncountable and the underlying spaces are not standard Borel spaces.

Let us describe how to construct the functor , with details postponed to below the fold.

- Starting with an algebraic probability space , form an inner product on by the formula , and also form the spectral radius .
- The inner product is clearly positive semi-definite. Quotienting out the null vectors and taking completions, we arrive at a real Hilbert space , to which the trace may be extended.
- Somewhat less obviously, the spectral radius is well-defined and gives a norm on . Taking limits of sequences in of bounded spectral radius gives us a subspace of that has the structure of a real commutative Banach algebra.
- The idempotents of the Banach algebra may be indexed by elements of an abstract -algebra .
- The Boolean algebra homomorphisms (or equivalently, the real algebra homomorphisms ) may be indexed by elements of a space .
- Let denote the -algebra on generated by the basic sets for every .
- Let be the -ideal of generated by the sets , where is a sequence with .
- One verifies that is isomorphic to . Using this isomorphism, the trace on can be used to construct a countably additive measure on . The classical probability space is then , and the abstract spaces may now be identified with their concrete counterparts , .
- Every algebraic probability space morphism generates a classical probability morphism via the formula
using a pullback operation on the abstract -algebras that can be defined by density.

Remark 1The classical probability space constructed by the functor has some additional structure; namely is a -Stone space (a Stone space with the property that the closure of any countable union of clopen sets is clopen), is the Baire -algebra (generated by the clopen sets), and the null sets are the meager sets. However, we will not use this additional structure here.

The partial inversion relationship between the functors and is given by the following assertion:

- There is a natural transformation from to the identity functor .

More informally: if one starts with an algebraic probability space and converts it back into a classical probability space , then there is a trace-preserving algebra homomorphism of to , which respects morphisms of the algebraic probability space. While this relationship is far weaker than an equivalence of categories (which would require that and are both natural isomorphisms), it is still good enough to allow many ergodic theory problems formulated using classical probability spaces to be reformulated instead as an equivalent problem in algebraic probability spaces.

Remark 2The opposite composition is a little odd: it takes an arbitrary probability space and returns a more complicated probability space , with being the space of homomorphisms . while there is “morally” an embedding of into using the evaluation map, this map does not exist in general because points in may well have zero measure. However, if one takes a “pointless” approach and focuses just on the measure algebras , , then these algebras become naturally isomorphic after quotienting out by null sets.

Remark 3An algebraic probability space captures a bit more structure than a classical probability space, because may be identified with a proper subset of that describes the “regular” functions (or random variables) of the space. For instance, starting with the unit circle (with the usual Haar measure and the usual trace ), any unital subalgebra of that is dense in will generate the same classical probability space on applying the functor , namely one will get the space of homomorphisms from to (with the measure induced from ). Thus for instance could be the continuous functions , the Wiener algebra or the full space , but the classical space will be unable to distinguish these spaces from each other. In particular, the functor loses information (roughly speaking, this functor takes an algebraic probability space and completes it to a von Neumann algebra, but then forgets exactly what algebra was initially used to create this completion). In ergodic theory, this sort of “extra structure” is traditionally encoded in topological terms, by assuming that the underlying probability space has a nice topological structure (e.g. a standard Borel space); however, with the algebraic perspective one has the freedom to have non-topological notions of extra structure, by choosing to be something other than an algebra of continuous functions on a topological space. I hope to discuss one such example of extra structure (coming from the Gowers-Host-Kra theory of uniformity seminorms) in a later blog post (this generalises the example of the Wiener algebra given previously, which is encoding “Fourier structure”).

A small example of how one could use the functors is as follows. Suppose one has a classical probability space with a measure-preserving action of an uncountable group , which is only defined (and an action) up to almost everywhere equivalence; thus for instance for any set and any , and might not be exactly equal, but only equal up to a null set. For similar reasons, an element of the invariant factor might not be exactly invariant with respect to , but instead one only has and equal up to null sets for each . One might like to “clean up” the action of to make it defined everywhere, and a genuine action everywhere, but this is not immediately achievable if is uncountable, since the union of all the null sets where something bad occurs may cease to be a null set. However, by applying the functor , each shift defines a morphism on the associated algebraic probability space (i.e. the Koopman operator), and then applying , we obtain a shift on a new classical probability space which now gives a genuine measure-preserving action of , and which is equivalent to the original action from a measure algebra standpoint. The invariant factor now consists of those sets in which are genuinely -invariant, not just up to null sets. (Basically, the classical probability space contains a Boolean algebra with the property that every measurable set is equivalent up to null sets to precisely one set in , allowing for a canonical “retraction” onto that eliminates all null set issues.)

More indirectly, the functors suggest that one should be able to develop a “pointless” form of ergodic theory, in which the underlying probability spaces are given algebraically rather than classically. I hope to give some more specific examples of this in later posts.

There are a number of ways to construct the real numbers , for instance

- as the metric completion of (thus, is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
- as the space of Dedekind cuts on the rationals ;
- as the space of quasimorphisms on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)

There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the *nonstandard hull construction*. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number , one can define two external additive subgroups of the nonstandard integers :

- The group of all nonstandard integers of magnitude less than or comparable to ; and
- The group of nonstandard integers of magnitude infinitesimally smaller than .

The group is a subgroup of , so we may form the quotient group . This space is isomorphic to the reals , and can in fact be used to construct the reals:

Proposition 1For any coset of , there is a unique real number with the property that . The map is then an isomorphism between the additive groups and .

*Proof:* Uniqueness is clear. For existence, observe that the set is a Dedekind cut, and its supremum can be verified to have the required properties for .

In a similar vein, we can view the unit interval in the reals as the quotient

where is the nonstandard (i.e. internal) set ; of course, is not a group, so one should interpret as the image of under the quotient map (or , if one prefers). Or to put it another way, (1) asserts that is the image of with respect to the map .

In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of *Loeb measure* on . Given an internal subset of , we may define the elementary measure of by the formula

This is a finitely additive probability measure on the Boolean algebra of internal subsets of . We can then construct the *Loeb outer measure* of any subset in complete analogy with Lebesgue outer measure by the formula

where ranges over all sequences of internal subsets of that cover . We say that a subset of is *Loeb measurable* if, for any (standard) , one can find an internal subset of which differs from by a set of Loeb outer measure at most , and in that case we define the *Loeb measure* of to be . It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space of Loeb measurable sets is a -algebra, and that is a countably additive probability measure on this space that extends the elementary measure . Thus now has the structure of a probability space .

Now, the group acts (Loeb-almost everywhere) on the probability space by the addition map, thus for and (excluding a set of Loeb measure zero where exits ). This action is clearly seen to be measure-preserving. As such, we can form the *invariant factor* , defined by restricting attention to those Loeb measurable sets with the property that is equal -almost everywhere to for each .

The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval with Lebesgue measure (and the trivial action of ), by the same factor map used in (1). More precisely:

Theorem 2Given a set , there exists a Lebesgue measurable set , unique up to -a.e. equivalence, such that is -a.e. equivalent to the set . Conversely, if is Lebesgue measurable, then is in , and .

More informally, we have the measure-theoretic version

of (1).

*Proof:* We first prove the converse. It is clear that is -invariant, so it suffices to show that is Loeb measurable with Loeb measure . This is easily verified when is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of is bounded by the Lebesgue outer measure of for any set ; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.

Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let . Let be an arbitrary standard real number, then we can find an internal set which differs from by a set of Loeb measure at most . As is -invariant, we conclude that for every , and differ by a set of Loeb measure (and hence elementary measure) at most . By the (contrapositive of the) underspill principle, there must exist a standard such that and differ by a set of elementary measure at most for all . If we then define the nonstandard function by the formula

then from the (nonstandard) triangle inequality we have

(say). On the other hand, has the Lipschitz continuity property

and so in particular we see that

for some Lipschitz continuous function . If we then let be the set where , one can check that differs from by a set of Loeb outer measure , and hence does so also. Sending to zero, we see (from the converse claim) that is a Cauchy sequence in and thus converges in for some Lebesgue measurable . The sets then converge in Loeb outer measure to , giving the claim.

Thanks to the Lebesgue differentiation theorem, the conditional expectation of a bounded Loeb-measurable function can be expressed (as a function on , defined -a.e.) as

By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts , of minimal norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages for converge (as a net, rather than a sequence) in to .

If is (the standard part of) an internal function, that is to say the ultralimit of a sequence of finitary bounded functions, one can view the measurable function as a limit of the that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function is related to the discrete functions by the formula

for all , where is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence such that

thus is the asymptotic density function of the . For instance, if is the indicator function of a randomly chosen subset of , then the asymptotic density function would equal (almost everywhere, at least).

I’m continuing to look into understanding the ergodic theory of actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter , rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.

## Recent Comments