You are currently browsing the category archive for the ‘math.DS’ category.

The 2014 Fields medallists have just been announced as (in alphabetical order of surname) Artur Avila, Manjul Bhargava, Martin Hairer, and Maryam Mirzakhani (see also these nice video profiles for the winners, which is a new initiative of the IMU and the Simons foundation). This time four years ago, I wrote a blog post discussing one result from each of the 2010 medallists; I thought I would try to repeat the exercise here, although the work of the medallists this time around is a little bit further away from my own direct area of expertise than last time, and so my discussion will unfortunately be a bit superficial (and possibly not completely accurate) in places. As before, I am picking these results based on my own idiosyncratic tastes, and they should not be viewed as necessarily being the “best” work of these medallists. (See also the press releases for Avila, Bhargava, Hairer, and Mirzakhani.)

Artur Avila works in dynamical systems and in the study of Schrödinger operators. The work of Avila that I am most familiar with is his solution with Svetlana Jitormiskaya of the ten martini problem of Kac, the solution to which (according to Barry Simon) he offered ten martinis for, hence the name. The problem involves perhaps the simplest example of a Schrödinger operator with non-trivial spectral properties, namely the almost Mathieu operator ${H^{\lambda,\alpha}_\omega: \ell^2({\bf Z}) \rightarrow \ell^2({\bf Z})}$ defined for parameters ${\alpha,\omega \in {\bf R}/{\bf Z}}$ and ${\lambda>0}$ by a discrete one-dimensional Schrödinger operator with cosine potential:

$\displaystyle (H^{\lambda,\alpha}_\omega u)_n := u_{n+1} + u_{n-1} + 2\lambda (\cos 2\pi(\theta+n\alpha)) u_n.$

This is a bounded self-adjoint operator and thus has a spectrum ${\sigma( H^{\lambda,\alpha}_\omega )}$ that is a compact subset of the real line; it arises in a number of physical contexts, most notably in the theory of the integer quantum Hall effect, though I will not discuss these applications here. Remarkably, the structure of this spectrum depends crucially on the Diophantine properties of the frequency ${\alpha}$. For instance, if ${\alpha = p/q}$ is a rational number, then the operator is periodic with period ${q}$, and then basic (discrete) Floquet theory tells us that the spectrum is simply the union of ${q}$ (possibly touching) intervals. But for irrational ${\alpha}$ (in which case the spectrum is independent of the phase ${\theta}$), the situation is much more fractal in nature, for instance in the critical case ${\lambda=1}$ the spectrum (as a function of ${\alpha}$) gives rise to the Hofstadter butterfly. The “ten martini problem” asserts that for every irrational ${\alpha}$ and every choice of coupling constant ${\lambda > 0}$, the spectrum is homeomorphic to a Cantor set. Prior to the work of Avila and Jitormiskaya, there were a number of partial results on this problem, notably the result of Puig establishing Cantor spectrum for a full measure set of parameters ${(\lambda,\alpha)}$, as well as results requiring a perturbative hypothesis, such as ${\lambda}$ being very small or very large. The result was also already known for ${\alpha}$ being either very close to rational (i.e. a Liouville number) or very far from rational (a Diophantine number), although the analyses for these two cases failed to meet in the middle, leaving some cases untreated. The argument uses a wide variety of existing techniques, both perturbative and non-perturbative, to attack this problem, as well as an amusing argument by contradiction: they assume (in certain regimes) that the spectrum fails to be a Cantor set, and use this hypothesis to obtain additional Lipschitz control on the spectrum (as a function of the frequency ${\alpha}$), which they can then use (after much effort) to improve existing arguments and conclude that the spectrum was in fact Cantor after all!

Manjul Bhargava produces amazingly beautiful mathematics, though most of it is outside of my own area of expertise. One part of his work that touches on an area of my own interest (namely, random matrix theory) is his ongoing work with many co-authors on modeling (both conjecturally and rigorously) the statistics of various key number-theoretic features of elliptic curves (such as their rank, their Selmer group, or their Tate-Shafarevich groups). For instance, with Kane, Lenstra, Poonen, and Rains, Manjul has proposed a very general random matrix model that predicts all of these statistics (for instance, predicting that the ${p}$-component of the Tate-Shafarevich group is distributed like the cokernel of a certain random ${p}$-adic matrix, very much in the spirit of the Cohen-Lenstra heuristics discussed in this previous post). But what is even more impressive is that Manjul and his coauthors have been able to verify several non-trivial fragments of this model (e.g. showing that certain moments have the predicted asymptotics), giving for the first time non-trivial upper and lower bounds for various statistics, for instance obtaining lower bounds on how often an elliptic curve has rank ${0}$ or rank ${1}$, leading most recently (in combination with existing work of Gross-Zagier and of Kolyvagin, among others) to his amazing result with Skinner and Zhang that at least ${66\%}$ of all elliptic curves over ${{\bf Q}}$ (ordered by height) obey the Birch and Swinnerton-Dyer conjecture. Previously it was not even known that a positive proportion of curves obeyed the conjecture. This is still a fair ways from resolving the conjecture fully (in particular, the situation with the presumably small number of curves of rank ${2}$ and higher is still very poorly understood, and the theory of Gross-Zagier and Kolyvagin that this work relies on, which was initially only available for ${{\bf Q}}$, has only been extended to totally real number fields thus far, by the work of Zhang), but it certainly does provide hope that the conjecture could be within reach in a statistical sense at least.

Martin Hairer works in at the interface between probability and partial differential equations, and in particular in the theory of stochastic differential equations (SDEs). The result of his that is closest to my own interests is his remarkable demonstration with Jonathan Mattingly of unique invariant measure for the two-dimensional stochastically forced Navier-Stokes equation

$\displaystyle \partial_t u + (u \cdot \nabla u) = \nu \Delta u - \nabla p + \xi$

$\displaystyle \nabla \cdot u = 0$

on the two-torus ${({\bf R}/{\bf Z})^2}$, where ${\xi}$ is a Gaussian field that forces a fixed set of frequencies. It is expected that for any reasonable choice of initial data, the solution to this equation should asymptotically be distributed according to Kolmogorov’s power law, as discussed in this previous post. This is still far from established rigorously (although there are some results in this direction for dyadic models, see e.g. this paper of Cheskidov, Shvydkoy, and Friedlander). However, Hairer and Mattingly were able to show that there was a unique probability distribution to almost every initial data would converge to asymptotically; by the ergodic theorem, this is equivalent to demonstrating the existence and uniqueness of an invariant measure for the flow. Existence can be established using standard methods, but uniqueness is much more difficult. One of the standard routes to uniqueness is to establish a “strong Feller property” that enforces some continuity on the transition operators; among other things, this would mean that two ergodic probability measures with intersecting supports would in fact have a non-trivial common component, contradicting the ergodic theorem (which forces different ergodic measures to be mutually singular). Since all ergodic measures for Navier-Stokes can be seen to contain the origin in their support, this would give uniqueness. Unfortunately, the strong Feller property is unlikely to hold in the infinite-dimensional phase space for Navier-Stokes; but Hairer and Mattingly develop a clean abstract substitute for this property, which they call the asymptotic strong Feller property, which is again a regularity property on the transition operator; this in turn is then demonstrated by a careful application of Malliavin calculus.

Maryam Mirzakhani has mostly focused on the geometry and dynamics of Teichmuller-type moduli spaces, such as the moduli space of Riemann surfaces with a fixed genus and a fixed number of cusps (or with a fixed number of boundaries that are geodesics of a prescribed length). These spaces have an incredibly rich structure, ranging from geometric structure (such as the Kahler geometry given by the Weil-Petersson metric), to dynamical structure (through the action of the mapping class group on this and related spaces), to algebraic structure (viewing these spaces as algebraic varieties), and are thus connected to many other objects of interest in geometry and dynamics. For instance, by developing a new recursive formula for the Weil-Petersson volume of this space, Mirzakhani was able to asymptotically count the number of simple prime geodesics of length up to some threshold ${L}$ in a hyperbolic surface (or more precisely, she obtained asymptotics for the number of such geodesics in a given orbit of the mapping class group); the answer turns out to be polynomial in ${L}$, in contrast to the much larger class of non-simple prime geodesics, whose asymptotics are exponential in ${L}$ (the “prime number theorem for geodesics”, developed in a classic series of works by Delsart, Huber, Selberg, and Margulis); she also used this formula to establish a new proof of a conjecture of Witten on intersection numbers that was first proven by Kontsevich. More recently, in two lengthy papers with Eskin and with Eskin-Mohammadi, Mirzakhani established rigidity theorems for the action of ${SL_2({\bf R})}$ on such moduli spaces that are close analogues of Ratner’s celebrated rigidity theorems for unipotently generated groups (discussed in this previous blog post). Ratner’s theorems are already notoriously difficult to prove, and rely very much on the polynomial stability properties of unipotent flows; in this even more complicated setting, the unipotent flows are no longer tractable, and Mirzakhani instead uses a recent “exponential drift” method of Benoist and Quint with as a substitute. Ratner’s theorems are incredibly useful for all sorts of problems connected to homogeneous dynamics, and the analogous theorems established by Mirzakhani, Eskin, and Mohammadi have a similarly broad range of applications, for instance in counting periodic billiard trajectories in rational polygons.

As laid out in the foundational work of Kolmogorov, a classical probability space (or probability space for short) is a triplet ${(X, {\mathcal X}, \mu)}$, where ${X}$ is a set, ${{\mathcal X}}$ is a ${\sigma}$-algebra of subsets of ${X}$, and ${\mu: {\mathcal X} \rightarrow [0,1]}$ is a countably additive probability measure on ${{\mathcal X}}$. Given such a space, one can form a number of interesting function spaces, including

• the (real) Hilbert space ${L^2(X, {\mathcal X}, \mu)}$ of square-integrable functions ${f: X \rightarrow {\bf R}}$, modulo ${\mu}$-almost everywhere equivalence, and with the positive definite inner product ${\langle f, g\rangle_{L^2(X, {\mathcal X}, \mu)} := \int_X f g\ d\mu}$; and
• the unital commutative Banach algebra ${L^\infty(X, {\mathcal X}, \mu)}$ of essentially bounded functions ${f: X \rightarrow {\bf R}}$, modulo ${\mu}$-almost everywhere equivalence, with ${\|f\|_{L^\infty(X, {\mathcal X}, \mu)}}$ defined as the essential supremum of ${|f|}$.

There is also a trace ${\tau = \tau_\mu: L^\infty(X, {\mathcal X}, \mu) \rightarrow {\bf C}}$ on ${L^\infty}$ defined by integration: ${\tau(f) := \int_X f\ d\mu}$.

One can form the category ${\mathbf{Prb}}$ of classical probability spaces, by defining a morphism ${\phi: (X, {\mathcal X}, \mu) \rightarrow (Y, {\mathcal Y}, \nu)}$ between probability spaces to be a function ${\phi: X \rightarrow Y}$ which is measurable (thus ${\phi^{-1}(E) \in {\mathcal X}}$ for all ${E \in {\mathcal Y}}$) and measure-preserving (thus ${\mu(\phi^{-1}(E)) = \nu(E)}$ for all ${E \in {\mathcal Y}}$).

Let us now abstract the algebraic features of these spaces as follows; for want of a better name, I will refer to this abstraction as an algebraic probability space, and is very similar to the non-commutative probability spaces studied in this previous post, except that these spaces are now commutative (and real).

Definition 1 An algebraic probability space is a pair ${({\mathcal A}, \tau)}$ where

• ${{\mathcal A}}$ is a unital commutative real algebra;
• ${\tau: {\mathcal A} \rightarrow {\bf R}}$ is a homomorphism such that ${\tau(1)=1}$ and ${\tau( f^2 ) \geq 0}$ for all ${f \in {\mathcal A}}$;
• Every element ${f}$ of ${{\mathcal A}}$ is bounded in the sense that ${\sup_{k \geq 1} \tau( f^{2k} )^{1/2k} < \infty}$. (Technically, this isn’t an algebraic property, but I need it for technical reasons.)

A morphism ${\phi: ({\mathcal A}_1, \tau_1) \rightarrow ({\mathcal A}_2, \tau_2)}$ is a homomorphism ${\phi^*: {\mathcal A}_2 \rightarrow {\mathcal A}_1}$ which is trace-preserving, in the sense that ${\tau_1(\phi^*(f)) = \tau_2(f)}$ for all ${f \in {\mathcal A}_2}$.

For want of a better name, I’ll denote the category of algebraic probability spaces as ${\mathbf{AlgPrb}}$. One can view this category as the opposite category to that of (a subcategory of) the category of tracial commutative real algebras. One could emphasise this opposite nature by denoting the algebraic probability space as ${({\mathcal A}, \tau)^{op}}$ rather than ${({\mathcal A},\tau)}$; another suggestive (but slightly inaccurate) notation, inspired by the language of schemes, would be ${\hbox{Spec}({\mathcal A},\tau)}$ rather than ${({\mathcal A},\tau)}$. However, we will not adopt these conventions here, and refer to algebraic probability spaces just by the pair ${({\mathcal A},\tau)}$.

By the previous discussion, we have a covariant functor ${F: \textbf{Prb} \rightarrow \textbf{AlgPrb}}$ that takes a classical probability space ${(X, {\mathcal X}, \mu)}$ to its algebraic counterpart ${(L^\infty(X, {\mathcal X},\mu), \tau_\mu)}$, with a morphism ${\phi: (X, {\mathcal X}, \mu) \rightarrow (Y, {\mathcal Y}, \nu)}$ of classical probability spaces mapping to a morphism ${F(\phi): (L^\infty(X, {\mathcal X},\mu), \tau_\mu) \rightarrow (L^\infty(Y, {\mathcal Y},\nu), \tau_\nu)}$ of the corresponding algebraic probability spaces by the formula

$\displaystyle F(\phi)^* f := f \circ \phi$

for ${f \in L^\infty(Y, {\mathcal Y}, \nu)}$. One easily verifies that this is a functor.

In this post I would like to describe a functor ${G: \textbf{AlgPrb} \rightarrow \textbf{Prb}}$ which partially inverts ${F}$ (up to natural isomorphism), that is to say a recipe for starting with an algebraic probability space ${({\mathcal A}, \tau)}$ and producing a classical probability space ${(X, {\mathcal X}, \mu)}$. This recipe is not new – it is basically the (commutative) Gelfand-Naimark-Segal construction (discussed in this previous post) combined with the Loomis-Sikorski theorem (discussed in this previous post). However, I wanted to put the construction in a single location for sake of reference. I also wanted to make the point that ${F}$ and ${G}$ are not complete inverses; there is a bit of information in the algebraic probability space (e.g. topological information) which is lost when passing back to the classical probability space. In some future posts, I would like to develop some ergodic theory using the algebraic foundations of probability theory rather than the classical foundations; this turns out to be convenient in the ergodic theory arising from nonstandard analysis (such as that described in this previous post), in which the groups involved are uncountable and the underlying spaces are not standard Borel spaces.

Let us describe how to construct the functor ${G}$, with details postponed to below the fold.

1. Starting with an algebraic probability space ${({\mathcal A}, \tau)}$, form an inner product on ${{\mathcal A}}$ by the formula ${\langle f, g \rangle := \tau(fg)}$, and also form the spectral radius ${\rho(f) :=\lim_{k \rightarrow \infty} \tau(f^{2^k})^{1/2^k}}$.
2. The inner product is clearly positive semi-definite. Quotienting out the null vectors and taking completions, we arrive at a real Hilbert space ${L^2 = L^2({\mathcal A},\tau)}$, to which the trace ${\tau}$ may be extended.
3. Somewhat less obviously, the spectral radius is well-defined and gives a norm on ${{\mathcal A}}$. Taking ${L^2}$ limits of sequences in ${{\mathcal A}}$ of bounded spectral radius gives us a subspace ${L^\infty = L^\infty({\mathcal A},\tau)}$ of ${L^2}$ that has the structure of a real commutative Banach algebra.
4. The idempotents ${1_E}$ of the Banach algebra ${L^\infty}$ may be indexed by elements ${E}$ of an abstract ${\sigma}$-algebra ${{\mathcal B}}$.
5. The Boolean algebra homomorphisms ${\delta_x: {\mathcal B} \rightarrow \{0,1\}}$ (or equivalently, the real algebra homomorphisms ${\iota_x: L^\infty \rightarrow {\bf R}}$) may be indexed by elements ${x}$ of a space ${X}$.
6. Let ${{\mathcal X}}$ denote the ${\sigma}$-algebra on ${X}$ generated by the basic sets ${\overline{E} := \{ x \in X: \delta_x(E) = 1 \}}$ for every ${E \in {\mathcal B}}$.
7. Let ${{\mathcal N}}$ be the ${\sigma}$-ideal of ${{\mathcal X}}$ generated by the sets ${\bigcap_n \overline{E_n}}$, where ${E_n \in {\mathcal B}}$ is a sequence with ${\bigcap_n E_n = \emptyset}$.
8. One verifies that ${{\mathcal B}}$ is isomorphic to ${{\mathcal X}/{\mathcal N}}$. Using this isomorphism, the trace ${\tau}$ on ${L^\infty}$ can be used to construct a countably additive measure ${\mu}$ on ${{\mathcal X}}$. The classical probability space ${(X, {\mathcal X}, \mu)}$ is then ${G( {\mathcal A}, \tau )}$, and the abstract spaces ${L^2, L^\infty}$ may now be identified with their concrete counterparts ${L^2(X, {\mathcal X}, \mu)}$, ${L^\infty(X, {\mathcal X}, \mu)}$.
9. Every algebraic probability space morphism ${\phi: ({\mathcal A}_1,\tau_1) \rightarrow ({\mathcal A}_2,\tau_2)}$ generates a classical probability morphism ${G(\phi): (X_1, {\mathcal X}_1, \mu_1) \rightarrow (X_2, {\mathcal X}_2, \mu_2)}$ via the formula

$\displaystyle \delta_{G(\phi)(x_1)}( E_2 ) = \delta_{x_1}( \phi^*(E_2) )$

using a pullback operation ${\phi^*}$ on the abstract ${\sigma}$-algebras ${{\mathcal B}_1, {\mathcal B}_2}$ that can be defined by density.

Remark 1 The classical probability space ${X}$ constructed by the functor ${G}$ has some additional structure; namely ${X}$ is a ${\sigma}$-Stone space (a Stone space with the property that the closure of any countable union of clopen sets is clopen), ${{\mathcal X}}$ is the Baire ${\sigma}$-algebra (generated by the clopen sets), and the null sets are the meager sets. However, we will not use this additional structure here.

The partial inversion relationship between the functors ${F: \textbf{Prb} \rightarrow \textbf{AlgPrb}}$ and ${G: \textbf{AlgPrb} \rightarrow \textbf{Prb}}$ is given by the following assertion:

1. There is a natural transformation from ${F \circ G: \textbf{AlgPrb} \rightarrow \textbf{AlgPrb}}$ to the identity functor ${I: \textbf{AlgPrb} \rightarrow \textbf{AlgPrb}}$.

More informally: if one starts with an algebraic probability space ${({\mathcal A},\tau)}$ and converts it back into a classical probability space ${(X, {\mathcal X}, \mu)}$, then there is a trace-preserving algebra homomorphism of ${{\mathcal A}}$ to ${L^\infty( X, {\mathcal X}, \mu )}$, which respects morphisms of the algebraic probability space. While this relationship is far weaker than an equivalence of categories (which would require that ${F \circ G}$ and ${G \circ F}$ are both natural isomorphisms), it is still good enough to allow many ergodic theory problems formulated using classical probability spaces to be reformulated instead as an equivalent problem in algebraic probability spaces.

Remark 2 The opposite composition ${G \circ F: \textbf{Prb} \rightarrow \textbf{Prb}}$ is a little odd: it takes an arbitrary probability space ${(X, {\mathcal X}, \mu)}$ and returns a more complicated probability space ${(X', {\mathcal X}', \mu')}$, with ${X'}$ being the space of homomorphisms ${\iota_x: L^\infty(X, {\mathcal X}, \mu) \rightarrow {\bf R}}$. while there is “morally” an embedding of ${X}$ into ${X'}$ using the evaluation map, this map does not exist in general because points in ${X}$ may well have zero measure. However, if one takes a “pointless” approach and focuses just on the measure algebras ${({\mathcal X}, \mu)}$, ${({\mathcal X}', \mu')}$, then these algebras become naturally isomorphic after quotienting out by null sets.

Remark 3 An algebraic probability space captures a bit more structure than a classical probability space, because ${{\mathcal A}}$ may be identified with a proper subset of ${L^\infty}$ that describes the “regular” functions (or random variables) of the space. For instance, starting with the unit circle ${{\bf R}/{\bf Z}}$ (with the usual Haar measure and the usual trace ${\tau(f) = \int_{{\bf R}/{\bf Z}} f}$), any unital subalgebra ${{\mathcal A}}$ of ${L^\infty({\bf R}/{\bf Z})}$ that is dense in ${L^2({\bf R}/{\bf Z})}$ will generate the same classical probability space ${G( {\mathcal A}, \tau )}$ on applying the functor ${G}$, namely one will get the space ${({\bf R}/{\bf Z})'}$ of homomorphisms from ${L^\infty({\bf R}/{\bf Z})}$ to ${{\bf R}}$ (with the measure induced from ${\tau}$). Thus for instance ${{\mathcal A}}$ could be the continuous functions ${C( {\bf R}/{\bf Z} )}$, the Wiener algebra ${A({\bf R}/{\bf Z})}$ or the full space ${L^\infty({\bf R}/{\bf Z})}$, but the classical space ${G( {\mathcal A}, \tau )}$ will be unable to distinguish these spaces from each other. In particular, the functor ${F \circ G}$ loses information (roughly speaking, this functor takes an algebraic probability space and completes it to a von Neumann algebra, but then forgets exactly what algebra was initially used to create this completion). In ergodic theory, this sort of “extra structure” is traditionally encoded in topological terms, by assuming that the underlying probability space ${X}$ has a nice topological structure (e.g. a standard Borel space); however, with the algebraic perspective one has the freedom to have non-topological notions of extra structure, by choosing ${{\mathcal A}}$ to be something other than an algebra ${C(X)}$ of continuous functions on a topological space. I hope to discuss one such example of extra structure (coming from the Gowers-Host-Kra theory of uniformity seminorms) in a later blog post (this generalises the example of the Wiener algebra given previously, which is encoding “Fourier structure”).

A small example of how one could use the functors ${F, G}$ is as follows. Suppose one has a classical probability space ${(X, {\mathcal X}, \mu)}$ with a measure-preserving action of an uncountable group ${\Gamma}$, which is only defined (and an action) up to almost everywhere equivalence; thus for instance for any set ${E}$ and any ${g, h \in \Gamma}$, ${T^{gh} E}$ and ${T^g T^h E}$ might not be exactly equal, but only equal up to a null set. For similar reasons, an element ${E}$ of the invariant factor ${{\mathcal X}^\Gamma}$ might not be exactly invariant with respect to ${\Gamma}$, but instead one only has ${T^g E}$ and ${E}$ equal up to null sets for each ${g \in \Gamma}$. One might like to “clean up” the action of ${\Gamma}$ to make it defined everywhere, and a genuine action everywhere, but this is not immediately achievable if ${\Gamma}$ is uncountable, since the union of all the null sets where something bad occurs may cease to be a null set. However, by applying the functor ${F}$, each shift ${T^g: X \rightarrow X}$ defines a morphism ${T^g: L^\infty(X, {\mathcal X}, \mu) \rightarrow L^\infty(X, {\mathcal X}, \mu)}$ on the associated algebraic probability space (i.e. the Koopman operator), and then applying ${G}$, we obtain a shift ${T^g: X' \rightarrow X'}$ on a new classical probability space ${(X', {\mathcal X}', \mu')}$ which now gives a genuine measure-preserving action of ${\Gamma}$, and which is equivalent to the original action from a measure algebra standpoint. The invariant factor ${{\mathcal X}^\Gamma}$ now consists of those sets in ${{\mathcal X}'}$ which are genuinely ${\Gamma}$-invariant, not just up to null sets. (Basically, the classical probability space ${(X', {\mathcal X}', \mu')}$ contains a Boolean algebra ${\overline{\mathcal B}}$ with the property that every measurable set ${A \in {\mathcal X}'}$ is equivalent up to null sets to precisely one set in ${\overline{\mathcal B}}$, allowing for a canonical “retraction” onto ${\overline{\mathcal B}}$ that eliminates all null set issues.)

More indirectly, the functors ${F, G}$ suggest that one should be able to develop a “pointless” form of ergodic theory, in which the underlying probability spaces are given algebraically rather than classically. I hope to give some more specific examples of this in later posts.

There are a number of ways to construct the real numbers ${{\bf R}}$, for instance

• as the metric completion of ${{\bf Q}}$ (thus, ${{\bf R}}$ is defined as the set of Cauchy sequences of rationals, modulo Cauchy equivalence);
• as the space of Dedekind cuts on the rationals ${{\bf Q}}$;
• as the space of quasimorphisms ${\phi: {\bf Z} \rightarrow {\bf Z}}$ on the integers, quotiented by bounded functions. (I believe this construction first appears in this paper of Street, who credits the idea to Schanuel, though the germ of this construction arguably goes all the way back to Eudoxus.)

There is also a fourth family of constructions that proceeds via nonstandard analysis, as a special case of what is known as the nonstandard hull construction. (Here I will assume some basic familiarity with nonstandard analysis and ultraproducts, as covered for instance in this previous blog post.) Given an unbounded nonstandard natural number ${N \in {}^* {\bf N} \backslash {\bf N}}$, one can define two external additive subgroups of the nonstandard integers ${{}^* {\bf Z}}$:

• The group ${O(N) := \{ n \in {}^* {\bf Z}: |n| \leq CN \hbox{ for some } C \in {\bf N} \}}$ of all nonstandard integers of magnitude less than or comparable to ${N}$; and
• The group ${o(N) := \{ n \in {}^* {\bf Z}: |n| \leq C^{-1} N \hbox{ for all } C \in {\bf N} \}}$ of nonstandard integers of magnitude infinitesimally smaller than ${N}$.

The group ${o(N)}$ is a subgroup of ${O(N)}$, so we may form the quotient group ${O(N)/o(N)}$. This space is isomorphic to the reals ${{\bf R}}$, and can in fact be used to construct the reals:

Proposition 1 For any coset ${n + o(N)}$ of ${O(N)/o(N)}$, there is a unique real number ${\hbox{st} \frac{n}{N}}$ with the property that ${\frac{n}{N} = \hbox{st} \frac{n}{N} + o(1)}$. The map ${n + o(N) \mapsto \hbox{st} \frac{n}{N}}$ is then an isomorphism between the additive groups ${O(N)/o(N)}$ and ${{\bf R}}$.

Proof: Uniqueness is clear. For existence, observe that the set ${\{ x \in {\bf R}: Nx \leq n + o(N) \}}$ is a Dedekind cut, and its supremum can be verified to have the required properties for ${\hbox{st} \frac{n}{N}}$. $\Box$

In a similar vein, we can view the unit interval ${[0,1]}$ in the reals as the quotient

$\displaystyle [0,1] \equiv [N] / o(N) \ \ \ \ \ (1)$

where ${[N]}$ is the nonstandard (i.e. internal) set ${\{ n \in {\bf N}: n \leq N \}}$; of course, ${[N]}$ is not a group, so one should interpret ${[N]/o(N)}$ as the image of ${[N]}$ under the quotient map ${{}^* {\bf Z} \rightarrow {}^* {\bf Z} / o(N)}$ (or ${O(N) \rightarrow O(N)/o(N)}$, if one prefers). Or to put it another way, (1) asserts that ${[0,1]}$ is the image of ${[N]}$ with respect to the map ${\pi: n \mapsto \hbox{st} \frac{n}{N}}$.

In this post I would like to record a nice measure-theoretic version of the equivalence (1), which essentially appears already in standard texts on Loeb measure (see e.g. this text of Cutland). To describe the results, we must first quickly recall the construction of Loeb measure on ${[N]}$. Given an internal subset ${A}$ of ${[N]}$, we may define the elementary measure ${\mu_0(A)}$ of ${A}$ by the formula

$\displaystyle \mu_0(A) := \hbox{st} \frac{|A|}{N}.$

This is a finitely additive probability measure on the Boolean algebra of internal subsets of ${[N]}$. We can then construct the Loeb outer measure ${\mu^*(A)}$ of any subset ${A \subset [N]}$ in complete analogy with Lebesgue outer measure by the formula

$\displaystyle \mu^*(A) := \inf \sum_{n=1}^\infty \mu_0(A_n)$

where ${(A_n)_{n=1}^\infty}$ ranges over all sequences of internal subsets of ${[N]}$ that cover ${A}$. We say that a subset ${A}$ of ${[N]}$ is Loeb measurable if, for any (standard) ${\epsilon>0}$, one can find an internal subset ${B}$ of ${[N]}$ which differs from ${A}$ by a set of Loeb outer measure at most ${\epsilon}$, and in that case we define the Loeb measure ${\mu(A)}$ of ${A}$ to be ${\mu^*(A)}$. It is a routine matter to show (e.g. using the Carathéodory extension theorem) that the space ${{\mathcal L}}$ of Loeb measurable sets is a ${\sigma}$-algebra, and that ${\mu}$ is a countably additive probability measure on this space that extends the elementary measure ${\mu_0}$. Thus ${[N]}$ now has the structure of a probability space ${([N], {\mathcal L}, \mu)}$.

Now, the group ${o(N)}$ acts (Loeb-almost everywhere) on the probability space ${[N]}$ by the addition map, thus ${T^h n := n+h}$ for ${n \in [N]}$ and ${h \in o(N)}$ (excluding a set of Loeb measure zero where ${n+h}$ exits ${[N]}$). This action is clearly seen to be measure-preserving. As such, we can form the invariant factor ${Z^0_{o(N)}([N]) = ([N], {\mathcal L}^{o(N)}, \mu\downharpoonright_{{\mathcal L}^{o(N)}})}$, defined by restricting attention to those Loeb measurable sets ${A \subset [N]}$ with the property that ${T^h A}$ is equal ${\mu}$-almost everywhere to ${A}$ for each ${h \in o(N)}$.

The claim is then that this invariant factor is equivalent (up to almost everywhere equivalence) to the unit interval ${[0,1]}$ with Lebesgue measure ${m}$ (and the trivial action of ${o(N)}$), by the same factor map ${\pi: n \mapsto \hbox{st} \frac{n}{N}}$ used in (1). More precisely:

Theorem 2 Given a set ${A \in {\mathcal L}^{o(N)}}$, there exists a Lebesgue measurable set ${B \subset [0,1]}$, unique up to ${m}$-a.e. equivalence, such that ${A}$ is ${\mu}$-a.e. equivalent to the set ${\pi^{-1}(B) := \{ n \in [N]: \hbox{st} \frac{n}{N} \in B \}}$. Conversely, if ${B \in [0,1]}$ is Lebesgue measurable, then ${\pi^{-1}(B)}$ is in ${{\mathcal L}^{o(N)}}$, and ${\mu( \pi^{-1}(B) ) = m( B )}$.

$\displaystyle [0,1] \equiv Z^0_{o(N)}( [N] )$

of (1).

Proof: We first prove the converse. It is clear that ${\pi^{-1}(B)}$ is ${o(N)}$-invariant, so it suffices to show that ${\pi^{-1}(B)}$ is Loeb measurable with Loeb measure ${m(B)}$. This is easily verified when ${B}$ is an elementary set (a finite union of intervals). By countable subadditivity of outer measure, this implies that Loeb outer measure of ${\pi^{-1}(E)}$ is bounded by the Lebesgue outer measure of ${E}$ for any set ${E \subset [0,1]}$; since every Lebesgue measurable set differs from an elementary set by a set of arbitrarily small Lebesgue outer measure, the claim follows.

Now we establish the forward claim. Uniqueness is clear from the converse claim, so it suffices to show existence. Let ${A \in {\mathcal L}^{o(N)}}$. Let ${\epsilon>0}$ be an arbitrary standard real number, then we can find an internal set ${A_\epsilon \subset [N]}$ which differs from ${A}$ by a set of Loeb measure at most ${\epsilon}$. As ${A}$ is ${o(N)}$-invariant, we conclude that for every ${h \in o(N)}$, ${A_\epsilon}$ and ${T^h A_\epsilon}$ differ by a set of Loeb measure (and hence elementary measure) at most ${2\epsilon}$. By the (contrapositive of the) underspill principle, there must exist a standard ${\delta>0}$ such that ${A_\epsilon}$ and ${T^h A_\epsilon}$ differ by a set of elementary measure at most ${2\epsilon}$ for all ${|h| \leq \delta N}$. If we then define the nonstandard function ${f_\epsilon: [N] \rightarrow {}^* {\bf R}}$ by the formula

$\displaystyle f(n) := \hbox{st} \frac{1}{\delta N} \sum_{m \in [N]: m \leq n \leq m+\delta N} 1_{A_\epsilon}(m),$

then from the (nonstandard) triangle inequality we have

$\displaystyle \frac{1}{N} \sum_{n \in [N]} |f(n) - 1_{A_\epsilon}(n)| \leq 3\epsilon$

(say). On the other hand, ${f}$ has the Lipschitz continuity property

$\displaystyle |f(n)-f(m)| \leq \frac{2|n-m|}{\delta N}$

and so in particular we see that

$\displaystyle \hbox{st} f(n) = \tilde f( \hbox{st} \frac{n}{N} )$

for some Lipschitz continuous function ${\tilde f: [0,1] \rightarrow [0,1]}$. If we then let ${E_\epsilon}$ be the set where ${\tilde f \geq 1 - \sqrt{\epsilon}}$, one can check that ${A_\epsilon}$ differs from ${\pi^{-1}(E_\epsilon)}$ by a set of Loeb outer measure ${O(\sqrt{\epsilon})}$, and hence ${A}$ does so also. Sending ${\epsilon}$ to zero, we see (from the converse claim) that ${1_{E_\epsilon}}$ is a Cauchy sequence in ${L^1}$ and thus converges in ${L^1}$ for some Lebesgue measurable ${E}$. The sets ${A_\epsilon}$ then converge in Loeb outer measure to ${\pi^{-1}(E)}$, giving the claim. $\Box$

Thanks to the Lebesgue differentiation theorem, the conditional expectation ${{\bf E}( f | Z^0_{o(N)}([N]))}$ of a bounded Loeb-measurable function ${f: [N] \rightarrow {\bf R}}$ can be expressed (as a function on ${[0,1]}$, defined ${m}$-a.e.) as

$\displaystyle {\bf E}( f | Z^0_{o(N)}([N]))(x) := \lim_{\epsilon \rightarrow 0} \frac{1}{2\epsilon} \int_{[x-\epsilon N,x+\epsilon N]} f\ d\mu.$

By the abstract ergodic theorem from the previous post, one can also view this conditional expectation as the element in the closed convex hull of the shifts ${T^h f}$, ${h = o(N)}$ of minimal ${L^2}$ norm. In particular, we obtain a form of the von Neumann ergodic theorem in this context: the averages ${\frac{1}{H} \sum_{h=1}^H T^h f}$ for ${H=O(N)}$ converge (as a net, rather than a sequence) in ${L^2}$ to ${{\bf E}( f | Z^0_{o(N)}([N]))}$.

If ${f: [N] \rightarrow [-1,1]}$ is (the standard part of) an internal function, that is to say the ultralimit of a sequence ${f_n: [N_n] \rightarrow [-1,1]}$ of finitary bounded functions, one can view the measurable function ${F := {\bf E}( f | Z^0_{o(N)}([N]))}$ as a limit of the ${f_n}$ that is analogous to the “graphons” that emerge as limits of graphs (see e.g. the recent text of Lovasz on graph limits). Indeed, the measurable function ${F: [0,1] \rightarrow [-1,1]}$ is related to the discrete functions ${f_n: [N_n] \rightarrow [-1,1]}$ by the formula

$\displaystyle \int_a^b F(x)\ dx = \hbox{st} \lim_{n \rightarrow p} \frac{1}{N_n} \sum_{a N_n \leq m \leq b N_n} f_n(m)$

for all ${0 \leq a < b \leq 1}$, where ${p}$ is the nonprincipal ultrafilter used to define the nonstandard universe. In particular, from the Arzela-Ascoli diagonalisation argument there is a subsequence ${n_j}$ such that

$\displaystyle \int_a^b F(x)\ dx = \lim_{j \rightarrow \infty} \frac{1}{N_{n_j}} \sum_{a N_{n_j} \leq m \leq b N_{n_j}} f_n(m),$

thus ${F}$ is the asymptotic density function of the ${f_n}$. For instance, if ${f_n}$ is the indicator function of a randomly chosen subset of ${[N_n]}$, then the asymptotic density function would equal ${1/2}$ (almost everywhere, at least).

I’m continuing to look into understanding the ergodic theory of ${o(N)}$ actions, as I believe this may allow one to apply ergodic theory methods to the “single-scale” or “non-asymptotic” setting (in which one averages only over scales comparable to a large parameter ${N}$, rather than the traditional asymptotic approach of letting the scale go to infinity). I’m planning some further posts in this direction, though this is still a work in progress.

The von Neumann ergodic theorem (the Hilbert space version of the mean ergodic theorem) asserts that if ${U: H \rightarrow H}$ is a unitary operator on a Hilbert space ${H}$, and ${v \in H}$ is a vector in that Hilbert space, then one has

$\displaystyle \lim_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N U^n v = \pi_{H^U} v$

in the strong topology, where ${H^U := \{ w \in H: Uw = w \}}$ is the ${U}$-invariant subspace of ${H}$, and ${\pi_{H^U}}$ is the orthogonal projection to ${H^U}$. (See e.g. these previous lecture notes for a proof.) The same proof extends to more general amenable groups: if ${G}$ is a countable amenable group acting on a Hilbert space ${H}$ by unitary transformations ${g: H \rightarrow H}$, and ${v \in H}$ is a vector in that Hilbert space, then one has

$\displaystyle \lim_{N \rightarrow \infty} \frac{1}{|\Phi_N|} \sum_{g \in \Phi_N} gv = \pi_{H^G} v \ \ \ \ \ (1)$

for any Folner sequence ${\Phi_N}$ of ${G}$, where ${H^G := \{ w \in H: gw = w \hbox{ for all }g \in G \}}$ is the ${G}$-invariant subspace. Thus one can interpret ${\pi_{H^G} v}$ as a certain average of elements of the orbit ${Gv := \{ gv: g \in G \}}$ of ${v}$.

I recently discovered that there is a simple variant of this ergodic theorem that holds even when the group ${G}$ is not amenable (or not discrete), using a more abstract notion of averaging:

Theorem 1 (Abstract ergodic theorem) Let ${G}$ be an arbitrary group acting unitarily on a Hilbert space ${H}$, and let ${v}$ be a vector in ${H}$. Then ${\pi_{H^G} v}$ is the element in the closed convex hull of ${Gv := \{ gv: g \in G \}}$ of minimal norm, and is also the unique element of ${H^G}$ in this closed convex hull.

Proof: As the closed convex hull of ${Gv}$ is closed, convex, and non-empty in a Hilbert space, it is a classical fact (see e.g. Proposition 1 of this previous post) that it has a unique element ${F}$ of minimal norm. If ${T_g F \neq F}$ for some ${g}$, then the midpoint of ${T_g F}$ and ${F}$ would be in the closed convex hull and be of smaller norm, a contradiction; thus ${F}$ is ${G}$-invariant. To finish the first claim, it suffices to show that ${v-F}$ is orthogonal to every element ${h}$ of ${H^G}$. But if this were not the case for some such ${h}$, we would have ${\langle T_g v - F, h \rangle = \langle v-F,h\rangle \neq 0}$ for all ${g \in G}$, and thus on taking convex hulls ${\langle F-F,h\rangle = \langle f-F,f\rangle \neq 0}$, a contradiction.

Finally, since ${T_g v - F}$ is orthogonal to ${H^G}$, the same is true for ${F'-F}$ for any ${F'}$ in the closed convex hull of ${Gv}$, and this gives the second claim. $\Box$

This result is due to Alaoglu and Birkhoff. It implies the amenable ergodic theorem (1); indeed, given any ${\epsilon>0}$, Theorem 1 implies that there is a finite convex combination ${v_\epsilon}$ of shifts ${gv}$ of ${v}$ which lies within ${\epsilon}$ (in the ${H}$ norm) to ${\pi_{H^G} v}$. By the triangle inequality, all the averages ${\frac{1}{|\Phi_N|} \sum_{g \in \Phi_N} gv_\epsilon}$ also lie within ${\epsilon}$ of ${\pi_{H^G} v}$, but by the Folner property this implies that the averages ${\frac{1}{|\Phi_N|} \sum_{g \in \Phi_N} gv}$ are eventually within ${2\epsilon}$ (say) of ${\pi_{H^G} v}$, giving the claim.

It turns out to be possible to use Theorem 1 as a substitute for the mean ergodic theorem in a number of contexts, thus removing the need for an amenability hypothesis. Here is a basic application:

Corollary 2 (Relative orthogonality) Let ${G}$ be a group acting unitarily on a Hilbert space ${H}$, and let ${V}$ be a ${G}$-invariant subspace of ${H}$. Then ${V}$ and ${H^G}$ are relatively orthogonal over their common subspace ${V^G}$, that is to say the restrictions of ${V}$ and ${H^G}$ to the orthogonal complement of ${V^G}$ are orthogonal to each other.

Proof: By Theorem 1, we have ${\pi_{H^G} v = \pi_{V^G} v}$ for all ${v \in V}$, and the claim follows. (Thanks to Gergely Harcos for this short argument.) $\Box$

Now we give a more advanced application of Theorem 1, to establish some “Mackey theory” over arbitrary groups ${G}$. Define a ${G}$-system ${(X, {\mathcal X}, \mu, (T_g)_{g \in G})}$ to be a probability space ${X = (X, {\mathcal X}, \mu)}$ together with a measure-preserving action ${(T_g)_{g \in G}}$ of ${G}$ on ${X}$; this gives an action of ${G}$ on ${L^2(X) = L^2(X,{\mathcal X},\mu)}$, which by abuse of notation we also call ${T_g}$:

$\displaystyle T_g f := f \circ T_{g^{-1}}.$

(In this post we follow the usual convention of defining the ${L^p}$ spaces by quotienting out by almost everywhere equivalence.) We say that a ${G}$-system is ergodic if ${L^2(X)^G}$ consists only of the constants.

(A technical point: the theory becomes slightly cleaner if we interpret our measure spaces abstractly (or “pointlessly“), removing the underlying space ${X}$ and quotienting ${{\mathcal X}}$ by the ${\sigma}$-ideal of null sets, and considering maps such as ${T_g}$ only on this quotient ${\sigma}$-algebra (or on the associated von Neumann algebra ${L^\infty(X)}$ or Hilbert space ${L^2(X)}$). However, we will stick with the more traditional setting of classical probability spaces here to keep the notation familiar, but with the understanding that many of the statements below should be understood modulo null sets.)

A factor ${Y = (Y, {\mathcal Y}, \nu, (S_g)_{g \in G})}$ of a ${G}$-system ${X = (X,{\mathcal X},\mu, (T_g)_{g \in G})}$ is another ${G}$-system together with a factor map ${\pi: X \rightarrow Y}$ which commutes with the ${G}$-action (thus ${T_g \pi = \pi S_g}$ for all ${g \in G}$) and respects the measure in the sense that ${\mu(\pi^{-1}(E)) = \nu(E)}$ for all ${E \in {\mathcal Y}}$. For instance, the ${G}$-invariant factor ${Z^0_G(X) := (X, {\mathcal X}^G, \mu\downharpoonright_{{\mathcal X}^G}, (T_g)_{g \in G})}$, formed by restricting ${X}$ to the invariant algebra ${{\mathcal X}^G := \{ E \in {\mathcal X}: T_g E = E \hbox{ a.e. for all } g \in G \}}$, is a factor of ${X}$. (This factor is the first factor in an important hierachy, the next element of which is the Kronecker factor ${Z^1_G(X)}$, but we will not discuss higher elements of this hierarchy further here.) If ${Y}$ is a factor of ${X}$, we refer to ${X}$ as an extension of ${Y}$.

From Corollary 2 we have

Corollary 3 (Relative independence) Let ${X}$ be a ${G}$-system for a group ${G}$, and let ${Y}$ be a factor of ${X}$. Then ${Y}$ and ${Z^0_G(X)}$ are relatively independent over their common factor ${Z^0_G(Y)}$, in the sense that the spaces ${L^2(Y)}$ and ${L^2(Z^0_G(X))}$ are relatively orthogonal over ${L^2(Z^0_G(Y))}$ when all these spaces are embedded into ${L^2(X)}$.

This has a simple consequence regarding the product ${X \times Y = (X \times Y, {\mathcal X} \times {\mathcal Y}, \mu \times \nu, (T_g \oplus S_g)_{g \in G})}$ of two ${G}$-systems ${X = (X, {\mathcal X}, \mu, (T_g)_{g \in G})}$ and ${Y = (Y, {\mathcal Y}, \nu, (S_g)_{g \in G})}$, in the case when the ${Y}$ action is trivial:

Lemma 4 If ${X,Y}$ are two ${G}$-systems, with the action of ${G}$ on ${Y}$ trivial, then ${Z^0_G(X \times Y)}$ is isomorphic to ${Z^0_G(X) \times Y}$ in the obvious fashion.

This lemma is immediate for countable ${G}$, since for a ${G}$-invariant function ${f}$, one can ensure that ${T_g f = f}$ holds simultaneously for all ${g \in G}$ outside of a null set, but is a little trickier for uncountable ${G}$.

Proof: It is clear that ${Z^0_G(X) \times Y}$ is a factor of ${Z^0_G(X \times Y)}$. To obtain the reverse inclusion, suppose that it fails, thus there is a non-zero ${f \in L^2(Z^0_G(X \times Y))}$ which is orthogonal to ${L^2(Z^0_G(X) \times Y)}$. In particular, we have ${fg}$ orthogonal to ${L^2(Z^0_G(X))}$ for any ${g \in L^\infty(Y)}$. Since ${fg}$ lies in ${L^2(Z^0_G(X \times Y))}$, we conclude from Corollary 3 (viewing ${X}$ as a factor of ${X \times Y}$) that ${fg}$ is also orthogonal to ${L^2(X)}$. Since ${g}$ is an arbitrary element of ${L^\infty(Y)}$, we conclude that ${f}$ is orthogonal to ${L^2(X \times Y)}$ and in particular is orthogonal to itself, a contradiction. (Thanks to Gergely Harcos for this argument.) $\Box$

Now we discuss the notion of a group extension.

Definition 5 (Group extension) Let ${G}$ be an arbitrary group, let ${Y = (Y, {\mathcal Y}, \nu, (S_g)_{g \in G})}$ be a ${G}$-system, and let ${K}$ be a compact metrisable group. A ${K}$-extension of ${Y}$ is an extension ${X = (X, {\mathcal X}, \mu, (T_g)_{g \in G})}$ whose underlying space is ${X = Y \times K}$ (with ${{\mathcal X}}$ the product of ${{\mathcal Y}}$ and the Borel ${\sigma}$-algebra on ${K}$), the factor map is ${\pi: (y,k) \mapsto y}$, and the shift maps ${T_g}$ are given by

$\displaystyle T_g ( y, k ) = (S_g y, \rho_g(y) k )$

where for each ${g \in G}$, ${\rho_g: Y \rightarrow K}$ is a measurable map (known as the cocycle associated to the ${K}$-extension ${X}$).

An important special case of a ${K}$-extension arises when the measure ${\mu}$ is the product of ${\nu}$ with the Haar measure ${dk}$ on ${K}$. In this case, ${X}$ also has a ${K}$-action ${k': (y,k) \mapsto (y,k(k')^{-1})}$ that commutes with the ${G}$-action, making ${X}$ a ${G \times K}$-system. More generally, ${\mu}$ could be the product of ${\nu}$ with the Haar measure ${dh}$ of some closed subgroup ${H}$ of ${K}$, with ${\rho_g}$ taking values in ${H}$; then ${X}$ is now a ${G \times H}$ system. In this latter case we will call ${X}$ ${H}$-uniform.

If ${X}$ is a ${K}$-extension of ${Y}$ and ${U: Y \rightarrow K}$ is a measurable map, we can define the gauge transform ${X_U}$ of ${X}$ to be the ${K}$-extension of ${Y}$ whose measure ${\mu_U}$ is the pushforward of ${\mu}$ under the map ${(y,k) \mapsto (y, U(y) k)}$, and whose cocycles ${\rho_{g,U}: Y \rightarrow K}$ are given by the formula

$\displaystyle \rho_{g,U}(y) := U(gy) \rho_g(y) U(y)^{-1}.$

It is easy to see that ${X_U}$ is a ${K}$-extension that is isomorphic to ${X}$ as a ${K}$-extension of ${Y}$; we will refer to ${X_U}$ and ${X}$ as equivalent systems, and ${\rho_{g,U}}$ as cohomologous to ${\rho_g}$. We then have the following fundamental result of Mackey and of Zimmer:

Theorem 6 (Mackey-Zimmer theorem) Let ${G}$ be an arbitrary group, let ${Y}$ be an ergodic ${G}$-system, and let ${K}$ be a compact metrisable group. Then every ergodic ${K}$-extension ${X}$ of ${Y}$ is equivalent to an ${H}$-uniform extension of ${Y}$ for some closed subgroup ${H}$ of ${K}$.

This theorem is usually stated for amenable groups ${G}$, but by using Theorem 1 (or more precisely, Corollary 3) the result is in fact also valid for arbitrary groups; we give the proof below the fold. (In the usual formulations of the theorem, ${X}$ and ${Y}$ are also required to be Lebesgue spaces, or at least standard Borel, but again with our abstract approach here, such hypotheses will be unnecessary.) Among other things, this theorem plays an important role in the Furstenberg-Zimmer structural theory of measure-preserving systems (as well as subsequent refinements of this theory by Host and Kra); see this previous blog post for some relevant discussion. One can obtain similar descriptions of non-ergodic extensions via the ergodic decomposition, but the result becomes more complicated to state, and we will not do so here.

(This is an extended blog post version of my talk “Ultraproducts as a Bridge Between Discrete and Continuous Analysis” that I gave at the Simons institute for the theory of computing at the workshop “Neo-Classical methods in discrete analysis“. Some of the material here is drawn from previous blog posts, notably “Ultraproducts as a bridge between hard analysis and soft analysis” and “Ultralimit analysis and quantitative algebraic geometry“‘. The text here has substantially more details than the talk; one may wish to skip all of the proofs given here to obtain a closer approximation to the original talk.)

Discrete analysis, of course, is primarily interested in the study of discrete (or “finitary”) mathematical objects: integers, rational numbers (which can be viewed as ratios of integers), finite sets, finite graphs, finite or discrete metric spaces, and so forth. However, many powerful tools in mathematics (e.g. ergodic theory, measure theory, topological group theory, algebraic geometry, spectral theory, etc.) work best when applied to continuous (or “infinitary”) mathematical objects: real or complex numbers, manifolds, algebraic varieties, continuous topological or metric spaces, etc. In order to apply results and ideas from continuous mathematics to discrete settings, there are basically two approaches. One is to directly discretise the arguments used in continuous mathematics, which often requires one to keep careful track of all the bounds on various quantities of interest, particularly with regard to various error terms arising from discretisation which would otherwise have been negligible in the continuous setting. The other is to construct continuous objects as limits of sequences of discrete objects of interest, so that results from continuous mathematics may be applied (often as a “black box”) to the continuous limit, which then can be used to deduce consequences for the original discrete objects which are quantitative (though often ineffectively so). The latter approach is the focus of this current talk.

The following table gives some examples of a discrete theory and its continuous counterpart, together with a limiting procedure that might be used to pass from the former to the latter:

 (Discrete) (Continuous) (Limit method) Ramsey theory Topological dynamics Compactness Density Ramsey theory Ergodic theory Furstenberg correspondence principle Graph/hypergraph regularity Measure theory Graph limits Polynomial regularity Linear algebra Ultralimits Structural decompositions Hilbert space geometry Ultralimits Fourier analysis Spectral theory Direct and inverse limits Quantitative algebraic geometry Algebraic geometry Schemes Discrete metric spaces Continuous metric spaces Gromov-Hausdorff limits Approximate group theory Topological group theory Model theory

As the above table illustrates, there are a variety of different ways to form a limiting continuous object. Roughly speaking, one can divide limits into three categories:

• Topological and metric limits. These notions of limits are commonly used by analysts. Here, one starts with a sequence (or perhaps a net) of objects ${x_n}$ in a common space ${X}$, which one then endows with the structure of a topological space or a metric space, by defining a notion of distance between two points of the space, or a notion of open neighbourhoods or open sets in the space. Provided that the sequence or net is convergent, this produces a limit object ${\lim_{n \rightarrow \infty} x_n}$, which remains in the same space, and is “close” to many of the original objects ${x_n}$ with respect to the given metric or topology.
• Categorical limits. These notions of limits are commonly used by algebraists. Here, one starts with a sequence (or more generally, a diagram) of objects ${x_n}$ in a category ${X}$, which are connected to each other by various morphisms. If the ambient category is well-behaved, one can then form the direct limit ${\varinjlim x_n}$ or the inverse limit ${\varprojlim x_n}$ of these objects, which is another object in the same category ${X}$, and is connected to the original objects ${x_n}$ by various morphisms.
• Logical limits. These notions of limits are commonly used by model theorists. Here, one starts with a sequence of objects ${x_{\bf n}}$ or of spaces ${X_{\bf n}}$, each of which is (a component of) a model for given (first-order) mathematical language (e.g. if one is working in the language of groups, ${X_{\bf n}}$ might be groups and ${x_{\bf n}}$ might be elements of these groups). By using devices such as the ultraproduct construction, or the compactness theorem in logic, one can then create a new object ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$ or a new space ${\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$, which is still a model of the same language (e.g. if the spaces ${X_{\bf n}}$ were all groups, then the limiting space ${\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ will also be a group), and is “close” to the original objects or spaces in the sense that any assertion (in the given language) that is true for the limiting object or space, will also be true for many of the original objects or spaces, and conversely. (For instance, if ${\prod_{{\bf n} \rightarrow \alpha} X_{\bf n}}$ is an abelian group, then the ${X_{\bf n}}$ will also be abelian groups for many ${{\bf n}}$.)

The purpose of this talk is to highlight the third type of limit, and specifically the ultraproduct construction, as being a “universal” limiting procedure that can be used to replace most of the limits previously mentioned. Unlike the topological or metric limits, one does not need the original objects ${x_{\bf n}}$ to all lie in a common space ${X}$ in order to form an ultralimit ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$; they are permitted to lie in different spaces ${X_{\bf n}}$; this is more natural in many discrete contexts, e.g. when considering graphs on ${{\bf n}}$ vertices in the limit when ${{\bf n}}$ goes to infinity. Also, no convergence properties on the ${x_{\bf n}}$ are required in order for the ultralimit to exist. Similarly, ultraproduct limits differ from categorical limits in that no morphisms between the various spaces ${X_{\bf n}}$ involved are required in order to construct the ultraproduct.

With so few requirements on the objects ${x_{\bf n}}$ or spaces ${X_{\bf n}}$, the ultraproduct construction is necessarily a very “soft” one. Nevertheless, the construction has two very useful properties which make it particularly useful for the purpose of extracting good continuous limit objects out of a sequence of discrete objects. First of all, there is Łos’s theorem, which roughly speaking asserts that any first-order sentence which is asymptotically obeyed by the ${x_{\bf n}}$, will be exactly obeyed by the limit object ${\lim_{{\bf n} \rightarrow \alpha} x_{\bf n}}$; in particular, one can often take a discrete sequence of “partial counterexamples” to some assertion, and produce a continuous “complete counterexample” that same assertion via an ultraproduct construction; taking the contrapositives, one can often then establish a rigorous equivalence between a quantitative discrete statement and its qualitative continuous counterpart. Secondly, there is the countable saturation property that ultraproducts automatically enjoy, which is a property closely analogous to that of compactness in topological spaces, and can often be used to ensure that the continuous objects produced by ultraproduct methods are “complete” or “compact” in various senses, which is particularly useful in being able to upgrade qualitative (or “pointwise”) bounds to quantitative (or “uniform”) bounds, more or less “for free”, thus reducing significantly the burden of “epsilon management” (although the price one pays for this is that one needs to pay attention to which mathematical objects of study are “standard” and which are “nonstandard”). To achieve this compactness or completeness, one sometimes has to restrict to the “bounded” portion of the ultraproduct, and it is often also convenient to quotient out the “infinitesimal” portion in order to complement these compactness properties with a matching “Hausdorff” property, thus creating familiar examples of continuous spaces, such as locally compact Hausdorff spaces.

Ultraproducts are not the only logical limit in the model theorist’s toolbox, but they are one of the simplest to set up and use, and already suffice for many of the applications of logical limits outside of model theory. In this post, I will set out the basic theory of these ultraproducts, and illustrate how they can be used to pass between discrete and continuous theories in each of the examples listed in the above table.

Apart from the initial “one-time cost” of setting up the ultraproduct machinery, the main loss one incurs when using ultraproduct methods is that it becomes very difficult to extract explicit quantitative bounds from results that are proven by transferring qualitative continuous results to the discrete setting via ultraproducts. However, in many cases (particularly those involving regularity-type lemmas) the bounds are already of tower-exponential type or worse, and there is arguably not much to be lost by abandoning the explicit quantitative bounds altogether.

Tamar Ziegler and I have just uploaded to the arXiv our joint paper “A multi-dimensional Szemerédi theorem for the primes via a correspondence principle“. This paper is related to an earlier result of Ben Green and mine in which we established that the primes contain arbitrarily long arithmetic progressions. Actually, in that paper we proved a more general result:

Theorem 1 (Szemerédi’s theorem in the primes) Let ${A}$ be a subset of the primes ${{\mathcal P}}$ of positive relative density, thus ${\limsup_{N \rightarrow \infty} \frac{|A \cap [N]|}{|{\mathcal P} \cap [N]|} > 0}$. Then ${A}$ contains arbitrarily long arithmetic progressions.

This result was based in part on an earlier paper of Green that handled the case of progressions of length three. With the primes replaced by the integers, this is of course the famous theorem of Szemerédi.

Szemerédi’s theorem has now been generalised in many different directions. One of these is the multidimensional Szemerédi theorem of Furstenberg and Katznelson, who used ergodic-theoretic techniques to show that any dense subset of ${{\bf Z}^d}$ necessarily contained infinitely many constellations of any prescribed shape. Our main result is to relativise that theorem to the primes as well:

Theorem 2 (Multidimensional Szemerédi theorem in the primes) Let ${d \geq 1}$, and let ${A}$ be a subset of the ${d^{th}}$ Cartesian power ${{\mathcal P}^d}$ of the primes of positive relative density, thus

$\displaystyle \limsup_{N \rightarrow \infty} \frac{|A \cap [N]^d|}{|{\mathcal P}^d \cap [N]^d|} > 0.$

Then for any ${v_1,\ldots,v_k \in {\bf Z}^d}$, ${A}$ contains infinitely many “constellations” of the form ${a+r v_1, \ldots, a + rv_k}$ with ${a \in {\bf Z}^k}$ and ${r}$ a positive integer.

In the case when ${A}$ is itself a Cartesian product of one-dimensional sets (in particular, if ${A}$ is all of ${{\mathcal P}^d}$), this result already follows from Theorem 1, but there does not seem to be a similarly easy argument to deduce the general case of Theorem 2 from previous results. Simultaneously with this paper, an independent proof of Theorem 2 using a somewhat different method has been established by Cook, Maygar, and Titichetrakun.

The result is reminiscent of an earlier result of mine on finding constellations in the Gaussian primes (or dense subsets thereof). That paper followed closely the arguments of my original paper with Ben Green, namely it first enclosed (a W-tricked version of) the primes or Gaussian primes (in a sieve theoretic-sense) by a slightly larger set (or more precisely, a weight function ${\nu}$) of almost primes or almost Gaussian primes, which one could then verify (using methods closely related to the sieve-theoretic methods in the ongoing Polymath8 project) to obey certain pseudorandomness conditions, known as the linear forms condition and the correlation condition. Very roughly speaking, these conditions assert statements of the following form: if ${n}$ is a randomly selected integer, then the events of ${n+h_1,\ldots,n+h_k}$ simultaneously being an almost prime (or almost Gaussian prime) are approximately independent for most choices of ${h_1,\ldots,h_k}$. Once these conditions are satisfied, one can then run a transference argument (initially based on ergodic-theory methods, but nowadays there are simpler transference results based on the Hahn-Banach theorem, due to Gowers and Reingold-Trevisan-Tulsiani-Vadhan) to obtain relative Szemerédi-type theorems from their absolute counterparts.

However, when one tries to adapt these arguments to sets such as ${{\mathcal P}^2}$, a new difficulty occurs: the natural analogue of the almost primes would be the Cartesian square ${{\mathcal A}^2}$ of the almost primes – pairs ${(n,m)}$ whose entries are both almost primes. (Actually, for technical reasons, one does not work directly with a set of almost primes, but would instead work with a weight function such as ${\nu(n) \nu(m)}$ that is concentrated on a set such as ${{\mathcal A}^2}$, but let me ignore this distinction for now.) However, this set ${{\mathcal A}^2}$ does not enjoy as many pseudorandomness conditions as one would need for a direct application of the transference strategy to work. More specifically, given any fixed ${h, k}$, and random ${(n,m)}$, the four events

$\displaystyle (n,m) \in {\mathcal A}^2$

$\displaystyle (n+h,m) \in {\mathcal A}^2$

$\displaystyle (n,m+k) \in {\mathcal A}^2$

$\displaystyle (n+h,m+k) \in {\mathcal A}^2$

do not behave independently (as they would if ${{\mathcal A}^2}$ were replaced for instance by the Gaussian almost primes), because any three of these events imply the fourth. This blocks the transference strategy for constellations which contain some right-angles to them (e.g. constellations of the form ${(n,m), (n+r,m), (n,m+r)}$) as such constellations soon turn into rectangles such as the one above after applying Cauchy-Schwarz a few times. (But a few years ago, Cook and Magyar showed that if one restricted attention to constellations which were in general position in the sense that any coordinate hyperplane contained at most one element in the constellation, then this obstruction does not occur and one can establish Theorem 2 in this case through the transference argument.) It’s worth noting that very recently, Conlon, Fox, and Zhao have succeeded in removing of the pseudorandomness conditions (namely the correlation condition) from the transference principle, leaving only the linear forms condition as the remaining pseudorandomness condition to be verified, but unfortunately this does not completely solve the above problem because the linear forms condition also fails for ${{\mathcal A}^2}$ (or for weights concentrated on ${{\mathcal A}^2}$) when applied to rectangular patterns.

There are now two ways known to get around this problem and establish Theorem 2 in full generality. The approach of Cook, Magyar, and Titichetrakun proceeds by starting with one of the known proofs of the multidimensional Szemerédi theorem – namely, the proof that proceeds through hypergraph regularity and hypergraph removal – and attach pseudorandom weights directly within the proof itself, rather than trying to add the weights to the result of that proof through a transference argument. (A key technical issue is that weights have to be added to all the levels of the hypergraph – not just the vertices and top-order edges – in order to circumvent the failure of naive pseudorandomness.) As one has to modify the entire proof of the multidimensional Szemerédi theorem, rather than use that theorem as a black box, the Cook-Magyar-Titichetrakun argument is lengthier than ours; on the other hand, it is more general and does not rely on some difficult theorems about primes that are used in our paper.

In our approach, we continue to use the multidimensional Szemerédi theorem (or more precisely, the equivalent theorem of Furstenberg and Katznelson concerning multiple recurrence for commuting shifts) as a black box. The difference is that instead of using a transference principle to connect the relative multidimensional Szemerédi theorem we need to the multiple recurrence theorem, we instead proceed by a version of the Furstenberg correspondence principle, similar to the one that connects the absolute multidimensional Szemerédi theorem to the multiple recurrence theorem. I had discovered this approach many years ago in an unpublished note, but had abandoned it because it required an infinite number of linear forms conditions (in contrast to the transference technique, which only needed a finite number of linear forms conditions and (until the recent work of Conlon-Fox-Zhao) a correlation condition). The reason for this infinite number of conditions is that the correspondence principle has to build a probability measure on an entire ${\sigma}$-algebra; for this, it is not enough to specify the measure ${\mu(A)}$ of a single set such as ${A}$, but one also has to specify the measure ${\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)}$ of “cylinder sets” such as ${T^{n_1} A \cap \ldots \cap T^{n_m} A}$ where ${m}$ could be arbitrarily large. The larger ${m}$ gets, the more linear forms conditions one needs to keep the correspondence under control.

With the sieve weights ${\nu}$ we were using at the time, standard sieve theory methods could indeed provide a finite number of linear forms conditions, but not an infinite number, so my idea was abandoned. However, with my later work with Green and Ziegler on linear equations in primes (and related work on the Mobius-nilsequences conjecture and the inverse conjecture on the Gowers norm), Tamar and I realised that the primes themselves obey an infinite number of linear forms conditions, so one can basically use the primes (or a proxy for the primes, such as the von Mangoldt function ${\Lambda}$) as the enveloping sieve weight, rather than a classical sieve. Thus my old idea of using the Furstenberg correspondence principle to transfer Szemerédi-type theorems to the primes could actually be realised. In the one-dimensional case, this simply produces a much more complicated proof of Theorem 1 than the existing one; but it turns out that the argument works as well in higher dimensions and yields Theorem 2 relatively painlessly, except for the fact that it needs the results on linear equations in primes, the known proofs of which are extremely lengthy (and also require some of the transference machinery mentioned earlier). The problem of correlations in rectangles is avoided in the correspondence principle approach because one can compensate for such correlations by performing a suitable weighted limit to compute the measure ${\mu( T^{n_1} A \cap \ldots \cap T^{n_m} A)}$ of cylinder sets, with each ${m}$ requiring a different weighted correction. (This may be related to the Cook-Magyar-Titichetrakun strategy of weighting all of the facets of the hypergraph in order to recover pseudorandomness, although our contexts are rather different.)

Vitaly Bergelson, Tamar Ziegler, and I have just uploaded to the arXiv our joint paper “Multiple recurrence and convergence results associated to ${{\bf F}_{p}^{\omega}}$-actions“. This paper is primarily concerned with limit formulae in the theory of multiple recurrence in ergodic theory. Perhaps the most basic formula of this type is the mean ergodic theorem, which (among other things) asserts that if ${(X,{\mathcal X}, \mu,T)}$ is a measure-preserving ${{\bf Z}}$-system (which, in this post, means that ${(X,{\mathcal X}, \mu)}$ is a probability space and ${T: X \mapsto X}$ is measure-preserving and invertible, thus giving an action ${(T^n)_{n \in {\bf Z}}}$ of the integers), and ${f,g \in L^2(X,{\mathcal X}, \mu)}$ are functions, and ${X}$ is ergodic (which means that ${L^2(X,{\mathcal X}, \mu)}$ contains no ${T}$-invariant functions other than the constants (up to almost everywhere equivalence, of course)), then the average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x)\ d\mu \ \ \ \ \ (1)$

converges as ${N \rightarrow \infty}$ to the expression

$\displaystyle (\int_X f(x)\ d\mu) (\int_X g(x)\ d\mu);$

see e.g. this previous blog post. Informally, one can interpret this limit formula as an equidistribution result: if ${x}$ is drawn at random from ${X}$ (using the probability measure ${\mu}$), and ${n}$ is drawn at random from ${\{1,\ldots,N\}}$ for some large ${N}$, then the pair ${(x, T^n x)}$ becomes uniformly distributed in the product space ${X \times X}$ (using product measure ${\mu \times \mu}$) in the limit as ${N \rightarrow \infty}$.

If we allow ${(X,\mu)}$ to be non-ergodic, then we still have a limit formula, but it is a bit more complicated. Let ${{\mathcal X}^T}$ be the ${T}$-invariant measurable sets in ${{\mathcal X}}$; the ${{\bf Z}}$-system ${(X, {\mathcal X}^T, \mu, T)}$ can then be viewed as a factor of the original system ${(X, {\mathcal X}, \mu, T)}$, which is equivalent (in the sense of measure-preserving systems) to a trivial system ${(Z_0, {\mathcal Z}_0, \mu_{Z_0}, 1)}$ (known as the invariant factor) in which the shift is trivial. There is then a projection map ${\pi_0: X \rightarrow Z_0}$ to the invariant factor which is a factor map, and the average (1) converges in the limit to the expression

$\displaystyle \int_{Z_0} (\pi_0)_* f(z) (\pi_0)_* g(z)\ d\mu_{Z_0}(x), \ \ \ \ \ (2)$

where ${(\pi_0)_*: L^2(X,{\mathcal X},\mu) \rightarrow L^2(Z_0,{\mathcal Z}_0,\mu_{Z_0})}$ is the pushforward map associated to the map ${\pi_0: X \rightarrow Z_0}$; see e.g. this previous blog post. We can interpret this as an equidistribution result. If ${(x,T^n x)}$ is a pair as before, then we no longer expect complete equidistribution in ${X \times X}$ in the non-ergodic, because there are now non-trivial constraints relating ${x}$ with ${T^n x}$; indeed, for any ${T}$-invariant function ${f: X \rightarrow {\bf C}}$, we have the constraint ${f(x) = f(T^n x)}$; putting all these constraints together we see that ${\pi_0(x) = \pi_0(T^n x)}$ (for almost every ${x}$, at least). The limit (2) can be viewed as an assertion that this constraint ${\pi_0(x) = \pi_0(T^n x)}$ are in some sense the “only” constraints between ${x}$ and ${T^n x}$, and that the pair ${(x,T^n x)}$ is uniformly distributed relative to these constraints.

Limit formulae are known for multiple ergodic averages as well, although the statement becomes more complicated. For instance, consider the expression

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x)\ d\mu \ \ \ \ \ (3)$

for three functions ${f,g,h \in L^\infty(X, {\mathcal X}, \mu)}$; this is analogous to the combinatorial task of counting length three progressions in various sets. For simplicity we assume the system ${(X,{\mathcal X},\mu,T)}$ to be ergodic. Naively one might expect this limit to then converge to

$\displaystyle (\int_X f\ d\mu) (\int_X g\ d\mu) (\int_X h\ d\mu)$

which would roughly speaking correspond to an assertion that the triplet ${(x,T^n x, T^{2n} x)}$ is asymptotically equidistributed in ${X \times X \times X}$. However, even in the ergodic case there can be additional constraints on this triplet that cannot be seen at the level of the individual pairs ${(x,T^n x)}$, ${(x, T^{2n} x)}$. The key obstruction here is that of eigenfunctions of the shift ${T: X \rightarrow X}$, that is to say non-trivial functions ${f: X \rightarrow S^1}$ that obey the eigenfunction equation ${Tf = \lambda f}$ almost everywhere for some constant (or ${T}$-invariant) ${\lambda}$. Each such eigenfunction generates a constraint

$\displaystyle f(x) \overline{f(T^n x)}^2 f(T^{2n} x) = 1 \ \ \ \ \ (4)$

tying together ${x}$, ${T^n x}$, and ${T^{2n} x}$. However, it turns out that these are in some sense the only constraints on ${x,T^n x, T^{2n} x}$ that are relevant for the limit (3). More precisely, if one sets ${{\mathcal X}_1}$ to be the sub-algebra of ${{\mathcal X}}$ generated by the eigenfunctions of ${T}$, then it turns out that the factor ${(X, {\mathcal X}_1, \mu, T)}$ is isomorphic to a shift system ${(Z_1, {\mathcal Z}_1, \mu_{Z_1}, x \mapsto x+\alpha)}$ known as the Kronecker factor, for some compact abelian group ${Z_1 = (Z_1,+)}$ and some (irrational) shift ${\alpha \in Z_1}$; the factor map ${\pi_1: X \rightarrow Z_1}$ pushes eigenfunctions forward to (affine) characters on ${Z_1}$. It is then known that the limit of (3) is

$\displaystyle \int_\Sigma (\pi_1)_* f(x_0) (\pi_1)_* g(x_1) (\pi_1)_* h(x_2)\ d\mu_\Sigma$

where ${\Sigma \subset Z_1^3}$ is the closed subgroup

$\displaystyle \Sigma = \{ (x_1,x_2,x_3) \in Z_1^3: x_1-2x_2+x_3=0 \}$

and ${\mu_\Sigma}$ is the Haar probability measure on ${\Sigma}$; see this previous blog post. The equation ${x_1-2x_2+x_3=0}$ defining ${\Sigma}$ corresponds to the constraint (4) mentioned earlier. Among other things, this limit formula implies Roth’s theorem, which in the context of ergodic theory is the assertion that the limit (or at least the limit inferior) of (3) is positive when ${f=g=h}$ is non-negative and not identically vanishing.

If one considers a quadruple average

$\displaystyle \frac{1}{N} \sum_{n=1}^N \int_X f(x) g(T^n x) h(T^{2n} x) k(T^{3n} x)\ d\mu \ \ \ \ \ (5)$

(analogous to counting length four progressions) then the situation becomes more complicated still, even in the ergodic case. In addition to the (linear) eigenfunctions that already showed up in the computation of the triple average (3), a new type of constraint also arises from quadratic eigenfunctions ${f: X \rightarrow S^1}$, which obey an eigenfunction equation ${Tf = \lambda f}$ in which ${\lambda}$ is no longer constant, but is now a linear eigenfunction. For such functions, ${f(T^n x)}$ behaves quadratically in ${n}$, and one can compute the existence of a constraint

$\displaystyle f(x) \overline{f(T^n x)}^3 f(T^{2n} x)^3 \overline{f(T^{3n} x)} = 1 \ \ \ \ \ (6)$

between ${x}$, ${T^n x}$, ${T^{2n} x}$, and ${T^{3n} x}$ that is not detected at the triple average level. As it turns out, this is not the only type of constraint relevant for (5); there is a more general class of constraint involving two-step nilsystems which we will not detail here, but see e.g. this previous blog post for more discussion. Nevertheless there is still a similar limit formula to previous examples, involving a special factor ${(Z_2, {\mathcal Z}_2, \mu_{Z_2}, S)}$ which turns out to be an inverse limit of two-step nilsystems; this limit theorem can be extracted from the structural theory in this paper of Host and Kra combined with a limit formula for nilsystems obtained by Lesigne, but will not be reproduced here. The pattern continues to higher averages (and higher step nilsystems); this was first done explicitly by Ziegler, and can also in principle be extracted from the structural theory of Host-Kra combined with nilsystem equidistribution results of Leibman. These sorts of limit formulae can lead to various recurrence results refining Roth’s theorem in various ways; see this paper of Bergelson, Host, and Kra for some examples of this.

The above discussion was concerned with ${{\bf Z}}$-systems, but one can adapt much of the theory to measure-preserving ${G}$-systems for other discrete countable abelian groups ${G}$, in which one now has a family ${(T_g)_{g \in G}}$ of shifts indexed by ${G}$ rather than a single shift, obeying the compatibility relation ${T_{g+h}=T_g T_h}$. The role of the intervals ${\{1,\ldots,N\}}$ in this more general setting is replaced by that of Folner sequences. For arbitrary countable abelian ${G}$, the theory for double averages (1) and triple limits (3) is essentially identical to the ${{\bf Z}}$-system case. But when one turns to quadruple and higher limits, the situation becomes more complicated (and, for arbitrary ${G}$, still not fully understood). However one model case which is now well understood is the finite field case when ${G = {\bf F}_p^\omega = \bigcup_{n=1}^\infty {\bf F}_p^n}$ is an infinite-dimensional vector space over a finite field ${{\bf F}_p}$ (with the finite subspaces ${{\bf F}_p^n}$ then being a good choice for the Folner sequence). Here, the analogue of the structural theory of Host and Kra was worked out by Vitaly, Tamar, and myself in these previous papers (treating the high characteristic and low characteristic cases respectively). In the finite field setting, it turns out that nilsystems no longer appear, and one only needs to deal with linear, quadratic, and higher order eigenfunctions (known collectively as phase polynomials). It is then natural to look for a limit formula that asserts, roughly speaking, that if ${x}$ is drawn at random from a ${{\bf F}_p^\omega}$-system and ${n}$ drawn randomly from a large subspace of ${{\bf F}_p^\omega}$, then the only constraints between ${x, T^n x, \ldots, T^{(p-1)n} x}$ are those that arise from phase polynomials. The main theorem of this paper is to establish this limit formula (which, again, is a little complicated to state explicitly and will not be done here). In particular, we establish for the first time that the limit actually exists (a result which, for ${{\bf Z}}$-systems, was one of the main results of this paper of Host and Kra).

As a consequence, we can recover finite field analogues of most of the results of Bergelson-Host-Kra, though interestingly some of the counterexamples demonstrating sharpness of their results for ${{\bf Z}}$-systems (based on Behrend set constructions) do not seem to be present in the finite field setting (cf. this previous blog post on the cap set problem). In particular, we are able to largely settle the question of when one has a Khintchine-type theorem that asserts that for any measurable set ${A}$ in an ergodic ${{\bf F}_p^\omega}$-system and any ${\epsilon>0}$, one has

$\displaystyle \mu( T_{c_1 n} A \cap \ldots \cap T_{c_k n} A ) > \mu(A)^k - \epsilon$

for a syndetic set of ${n}$, where ${c_1,\ldots,c_k \in {\bf F}_p}$ are distinct residue classes. It turns out that Khintchine-type theorems always hold for ${k=1,2,3}$ (and for ${k=1,2}$ ergodicity is not required), and for ${k=4}$ it holds whenever ${c_1,c_2,c_3,c_4}$ form a parallelogram, but not otherwise (though the counterexample here was such a painful computation that we ended up removing it from the paper, and may end up putting it online somewhere instead), and for larger ${k}$ we could show that the Khintchine property failed for generic choices of ${c_1,\ldots,c_k}$, though the problem of determining exactly the tuples for which the Khintchine property failed looked to be rather messy and we did not completely settle it.

One of the basic objects of study in combinatorics are finite strings ${(a_n)_{n=0}^N}$ or infinite strings ${(a_n)_{n=0}^\infty}$ of symbols ${a_n}$ from some given alphabet ${{\mathcal A}}$, which could be either finite or infinite (but which we shall usually take to be compact). For instance, a set ${A}$ of natural numbers can be identified with the infinite string ${(1_A(n))_{n=0}^\infty}$ of ${0}$s and ${1}$s formed by the indicator of ${A}$, e.g. the even numbers can be identified with the string ${1010101\ldots}$ from the alphabet ${\{0,1\}}$, the multiples of three can be identified with the string ${100100100\ldots}$, and so forth. One can also consider doubly infinite strings ${(a_n)_{n \in {\bf Z}}}$, which among other things can be used to describe arbitrary subsets of integers.

On the other hand, the basic object of study in dynamics (and in related fields, such as ergodic theory) is that of a dynamical system ${(X,T)}$, that is to say a space ${X}$ together with a shift map ${T: X \rightarrow X}$ (which is often assumed to be invertible, although one can certainly study non-invertible dynamical systems as well). One often adds additional structure to this dynamical system, such as topological structure (giving rise topological dynamics), measure-theoretic structure (giving rise to ergodic theory), complex structure (giving rise to complex dynamics), and so forth. A dynamical system gives rise to an action of the natural numbers ${{\bf N}}$ on the space ${X}$ by using the iterates ${T^n: X \rightarrow X}$ of ${T}$ for ${n=0,1,2,\ldots}$; if ${T}$ is invertible, we can extend this action to an action of the integers ${{\bf Z}}$ on the same space. One can certainly also consider dynamical systems whose underlying group (or semi-group) is something other than ${{\bf N}}$ or ${{\bf Z}}$ (e.g. one can consider continuous dynamical systems in which the evolution group is ${{\bf R}}$), but we will restrict attention to the classical situation of ${{\bf N}}$ or ${{\bf Z}}$ actions here.

There is a fundamental correspondence principle connecting the study of strings (or subsets of natural numbers or integers) with the study of dynamical systems. In one direction, given a dynamical system ${(X,T)}$, an observable ${c: X \rightarrow {\mathcal A}}$ taking values in some alphabet ${{\mathcal A}}$, and some initial datum ${x_0 \in X}$, we can first form the forward orbit ${(T^n x_0)_{n=0}^\infty}$ of ${x_0}$, and then observe this orbit using ${c}$ to obtain an infinite string ${(c(T^n x_0))_{n=0}^\infty}$. If the shift ${T}$ in this system is invertible, one can extend this infinite string into a doubly infinite string ${(c(T^n x_0))_{n \in {\bf Z}}}$. Thus we see that every quadruplet ${(X,T,c,x_0)}$ consisting of a dynamical system ${(X,T)}$, an observable ${c}$, and an initial datum ${x_0}$ creates an infinite string.

Example 1 If ${X}$ is the three-element set ${X = {\bf Z}/3{\bf Z}}$ with the shift map ${Tx := x+1}$, ${c: {\bf Z}/3{\bf Z} \rightarrow \{0,1\}}$ is the observable that takes the value ${1}$ at the residue class ${0 \hbox{ mod } 3}$ and zero at the other two classes, and one starts with the initial datum ${x_0 = 0 \hbox{ mod } 3}$, then the observed string ${(c(T^n x_0))_{n=0}^\infty}$ becomes the indicator ${100100100\ldots}$ of the multiples of three.

In the converse direction, every infinite string ${(a_n)_{n=0}^\infty}$ in some alphabet ${{\mathcal A}}$ arises (in a decidedly non-unique fashion) from a quadruple ${(X,T,c,x_0)}$ in the above fashion. This can be easily seen by the following “universal” construction: take ${X}$ to be the set ${X:= {\mathcal A}^{\bf N}}$ of infinite strings ${(b_i)_{n=0}^\infty}$ in the alphabet ${{\mathcal A}}$, let ${T: X \rightarrow X}$ be the shift map

$\displaystyle T(b_i)_{n=0}^\infty := (b_{i+1})_{n=0}^\infty,$

let ${c: X \rightarrow {\mathcal A}}$ be the observable

$\displaystyle c((b_i)_{n=0}^\infty) := b_0,$

and let ${x_0 \in X}$ be the initial point

$\displaystyle x_0 := (a_i)_{n=0}^\infty.$

Then one easily sees that the observed string ${(c(T^n x_0))_{n=0}^\infty}$ is nothing more than the original string ${(a_n)_{n=0}^\infty}$. Note also that this construction can easily be adapted to doubly infinite strings by using ${{\mathcal A}^{\bf Z}}$ instead of ${{\mathcal A}^{\bf N}}$, at which point the shift map ${T}$ now becomes invertible. An important variant of this construction also attaches an invariant probability measure to ${X}$ that is associated to the limiting density of various sets associated to the string ${(a_i)_{n=0}^\infty}$, and leads to the Furstenberg correspondence principle, discussed for instance in these previous blog posts. Such principles allow one to rigorously pass back and forth between the combinatorics of strings and the dynamics of systems; for instance, Furstenberg famously used his correspondence principle to demonstrate the equivalence of Szemerédi’s theorem on arithmetic progressions with what is now known as the Furstenberg multiple recurrence theorem in ergodic theory.

In the case when the alphabet ${{\mathcal A}}$ is the binary alphabet ${\{0,1\}}$, and (for technical reasons related to the infamous non-injectivity ${0.999\ldots = 1.00\ldots}$ of the decimal representation system) the string ${(a_n)_{n=0}^\infty}$ does not end with an infinite string of ${1}$s, then one can reformulate the above universal construction by taking ${X}$ to be the interval ${[0,1)}$, ${T}$ to be the doubling map ${Tx := 2x \hbox{ mod } 1}$, ${c: X \rightarrow \{0,1\}}$ to be the observable that takes the value ${1}$ on ${[1/2,1)}$ and ${0}$ on ${[0,1/2)}$ (that is, ${c(x)}$ is the first binary digit of ${x}$), and ${x_0}$ is the real number ${x_0 := \sum_{n=0}^\infty a_n 2^{-n-1}}$ (that is, ${x_0 = 0.a_0a_1\ldots}$ in binary).

The above universal construction is very easy to describe, and is well suited for “generic” strings ${(a_n)_{n=0}^\infty}$ that have no further obvious structure to them, but it often leads to dynamical systems that are much larger and more complicated than is actually needed to produce the desired string ${(a_n)_{n=0}^\infty}$, and also often obscures some of the key dynamical features associated to that sequence. For instance, to generate the indicator ${100100100\ldots}$ of the multiples of three that were mentioned previously, the above universal construction requires an uncountable space ${X}$ and a dynamics which does not obviously reflect the key features of the sequence such as its periodicity. (Using the unit interval model, the dynamics arise from the orbit of ${2/7}$ under the doubling map, which is a rather artificial way to describe the indicator function of the multiples of three.)

A related aesthetic objection to the universal construction is that of the four components ${X,T,c,x_0}$ of the quadruplet ${(X,T,c,x_0)}$ used to generate the sequence ${(a_n)_{n=0}^\infty}$, three of the components ${X,T,c}$ are completely universal (in that they do not depend at all on the sequence ${(a_n)_{n=0}^\infty}$), leaving only the initial datum ${x_0}$ to carry all the distinctive features of the original sequence. While there is nothing wrong with this mathematically, from a conceptual point of view it would make sense to make all four components of the quadruplet to be adapted to the sequence, in order to take advantage of the accumulated intuition about various special dynamical systems (and special observables), not just special initial data.

One step in this direction can be made by restricting ${X}$ to the orbit ${\{ T^n x_0: n \in {\bf N} \}}$ of the initial datum ${x_0}$ (actually for technical reasons it is better to restrict to the topological closure ${\overline{\{ T^n x_0: n \in {\bf N} \}}}$ of this orbit, in order to keep ${X}$ compact). For instance, starting with the sequence ${100100100\ldots}$, the orbit now consists of just three points ${100100100\ldots}$, ${010010010\ldots}$, ${001001001\ldots}$, bringing the system more in line with the example in Example 1. Technically, this is the “optimal” representation of the sequence by a quadruplet ${(X,T,c,x_0)}$, because any other such representation ${(X',T',c',x'_0)}$ is a factor of this representation (in the sense that there is a unique map ${\pi: X \rightarrow X'}$ with ${T' \circ \pi = \pi \circ T}$, ${c' \circ \pi = c}$, and ${x'_0 = \pi(x_0)}$). However, from a conceptual point of view this representation is still somewhat unsatisfactory, given that the elements of the system ${X}$ are interpreted as infinite strings rather than elements of a more geometrically or algebraically rich object (e.g. points in a circle, torus, or other homogeneous space).

For general sequences ${(a_n)_{n=0}^\infty}$, locating relevant geometric or algebraic structure in a dynamical system generating that sequence is an important but very difficult task (see e.g. this paper of Host and Kra, which is more or less devoted to precisely this task in the context of working out what component of a dynamical system controls the multiple recurrence behaviour of that system). However, for specific examples of sequences ${(a_n)_{n=0}^\infty}$, one can use an informal procedure of educated guesswork in order to produce a more natural-looking quadruple ${(X,T,c,x_0)}$ that generates that sequence. This is not a particularly difficult or deep operation, but I found it very helpful in internalising the intuition behind the correspondence principle. Being non-rigorous, this procedure does not seem to be emphasised in most presentations of the correspondence principle, so I thought I would describe it here.

I’ve just uploaded to the arXiv my joint paper with Vitaly Bergelson, “Multiple recurrence in quasirandom groups“, which is submitted to Geom. Func. Anal.. This paper builds upon a paper of Gowers in which he introduced the concept of a quasirandom group, and established some mixing (or recurrence) properties of such groups. A ${D}$-quasirandom group is a finite group with no non-trivial unitary representations of dimension at most ${D}$. We will informally refer to a “quasirandom group” as a ${D}$-quasirandom group with the quasirandomness parameter ${D}$ large (more formally, one can work with a sequence of ${D_n}$-quasirandom groups with ${D_n}$ going to infinity). A typical example of a quasirandom group is ${SL_2(F_p)}$ where ${p}$ is a large prime. Quasirandom groups are discussed in depth in this blog post. One of the key properties of quasirandom groups established in Gowers’ paper is the following “weak mixing” property: if ${A, B}$ are subsets of ${G}$, then for “almost all” ${g \in G}$, one has

$\displaystyle \mu( A \cap gB ) \approx \mu(A) \mu(B) \ \ \ \ \ (1)$

where ${\mu(A) := |A|/|G|}$ denotes the density of ${A}$ in ${G}$. Here, we use ${x \approx y}$ to informally represent an estimate of the form ${x=y+o(1)}$ (where ${o(1)}$ is a quantity that goes to zero when the quasirandomness parameter ${D}$ goes to infinity), and “almost all ${g \in G}$” denotes “for all ${g}$ in a subset of ${G}$ of density ${1-o(1)}$“. As a corollary, if ${A,B,C}$ have positive density in ${G}$ (by which we mean that ${\mu(A)}$ is bounded away from zero, uniformly in the quasirandomness parameter ${D}$, and similarly for ${B,C}$), then (if the quasirandomness parameter ${D}$ is sufficiently large) we can find elements ${g, x \in G}$ such that ${g \in A}$, ${x \in B}$, ${gx \in C}$. In fact we can find approximately ${\mu(A)\mu(B)\mu(C) |G|^2}$ such pairs ${(g,x)}$. To put it another way: if we choose ${g,x}$ uniformly and independently at random from ${G}$, then the events ${g \in A}$, ${x \in B}$, ${gx \in C}$ are approximately independent (thus the random variable ${(g,x,gx) \in G^3}$ resembles a uniformly distributed random variable on ${G^3}$ in some weak sense). One can also express this mixing property in integral form as

$\displaystyle \int_G \int_G f_1(g) f_2(x) f_3(gx)\ d\mu(g) d\mu(x) \approx (\int_G f_1\ d\mu) (\int_G f_2\ d\mu) (\int_G f_3\ d\mu)$

for any bounded functions ${f_1,f_2,f_3: G \rightarrow {\bf R}}$. (Of course, with ${G}$ being finite, one could replace the integrals here by finite averages if desired.) Or in probabilistic language, we have

$\displaystyle \mathop{\bf E} f_1(g) f_2(x) f_3(gx) \approx \mathop{\bf E} f_1(x_1) f_2(x_2) f_3(x_3)$

where ${g, x, x_1, x_2, x_3}$ are drawn uniformly and independently at random from ${G}$.

As observed in Gowers’ paper, one can iterate this observation to find “parallelopipeds” of any given dimension in dense subsets of ${G}$. For instance, applying (1) with ${A,B,C}$ replaced by ${A \cap hB}$, ${C \cap hD}$, and ${E \cap hF}$ one can assert (after some relabeling) that for ${g,h,x}$ chosen uniformly and independently at random from ${G}$, the events ${g \in A}$, ${h \in B}$, ${gh \in C}$, ${x \in D}$, ${gx \in E}$, ${hx \in F}$, ${ghx \in H}$ are approximately independent whenever ${A,B,C,D,E,F,H}$ are dense subsets of ${G}$; thus the tuple ${(g,h,gh,x,gh,hx,ghx)}$ resebles a uniformly distributed random variable in ${G^7}$ in some weak sense.

However, there are other tuples for which the above iteration argument does not seem to apply. One of the simplest tuples in this vein is the tuple ${(g, x, xg, gx)}$ in ${G^4}$, when ${g, x}$ are drawn uniformly at random from a quasirandom group ${G}$. Here, one does not expect the tuple to behave as if it were uniformly distributed in ${G^4}$, because there is an obvious constraint connecting the last two components ${gx, xg}$ of this tuple: they must lie in the same conjugacy class! In particular, if ${A}$ is a subset of ${G}$ that is the union of conjugacy classes, then the events ${gx \in A}$, ${xg \in A}$ are perfectly correlated, so that ${\mu( gx \in A, xg \in A)}$ is equal to ${\mu(A)}$ rather than ${\mu(A)^2}$. Our main result, though, is that in a quasirandom group, this is (approximately) the only constraint on the tuple. More precisely, we have

Theorem 1 Let ${G}$ be a ${D}$-quasirandom group, and let ${g, x}$ be drawn uniformly at random from ${G}$. Then for any ${f_1,f_2,f_3,f_4: G \rightarrow [-1,1]}$, we have

$\displaystyle \mathop{\bf E} f_1(g) f_2(x) f_3(gx) f_4(xg) = \mathop{\bf E} f_1(x_1) f_2(x_2) f_3(x_3) f_4(x_4) + o(1)$

where ${o(1)}$ goes to zero as ${D \rightarrow \infty}$, ${x_1,x_2,x_3}$ are drawn uniformly and independently at random from ${G}$, and ${x_4}$ is drawn uniformly at random from the conjugates of ${x_3}$ for each fixed choice of ${x_1,x_2,x_3}$.

This is the probabilistic formulation of the above theorem; one can also phrase the theorem in other formulations (such as an integral formulation), and this is detailed in the paper. This theorem leads to a number of recurrence results; for instance, as a corollary of this result, we have

$\displaystyle \mu(A) \mu(B)^2 - o(1) \leq \mu( A \cap gB \cap Bg ) \leq \mu(A) \mu(B) + o(1)$

for almost all ${g \in G}$, and any dense subsets ${A, B}$ of ${G}$; the lower and upper bounds are sharp, with the lower bound being attained when ${B}$ is randomly distributed, and the upper bound when ${B}$ is conjugation-invariant.

To me, the more interesting thing here is not the result itself, but how it is proven. Vitaly and I were not able to find a purely finitary way to establish this mixing theorem. Instead, we had to first use the machinery of ultraproducts (as discussed in this previous post) to convert the finitary statement about a quasirandom group to an infinitary statement about a type of infinite group which we call an ultra quasirandom group (basically, an ultraproduct of increasingly quasirandom finite groups). This is analogous to how the Furstenberg correspondence principle is used to convert a finitary combinatorial problem into an infinitary ergodic theory problem.

Ultra quasirandom groups come equipped with a finite, countably additive measure known as Loeb measure ${\mu_G}$, which is very analogous to the Haar measure of a compact group, except that in the case of ultra quasirandom groups one does not quite have a topological structure that would give compactness. Instead, one has a slightly weaker structure known as a ${\sigma}$-topology, which is like a topology except that open sets are only closed under countable unions rather than arbitrary ones. There are some interesting measure-theoretic and topological issues regarding the distinction between topologies and ${\sigma}$-topologies (and between Haar measure and Loeb measure), but for this post it is perhaps best to gloss over these issues and pretend that ultra quasirandom groups ${G}$ come with a Haar measure. One can then recast Theorem 1 as a mixing theorem for the left and right actions of the ultra approximate group ${G}$ on itself, which roughly speaking is the assertion that

$\displaystyle \int_G f_1(x) L_g f_2(x) L_g R_g f_3(x)\ d\mu_G(x) \approx 0 \ \ \ \ \ (2)$

for “almost all” ${g \in G}$, if ${f_1, f_2, f_3}$ are bounded measurable functions on ${G}$, with ${f_3}$ having zero mean on all conjugacy classes of ${G}$, where ${L_g, R_g}$ are the left and right translation operators

$\displaystyle L_g f(x) := f(g^{-1} x); \quad R_g f(x) := f(xg).$

To establish this mixing theorem, we use the machinery of idempotent ultrafilters, which is a particularly useful tool for understanding the ergodic theory of actions of countable groups ${G}$ that need not be amenable; in the non-amenable setting the classical ergodic averages do not make much sense, but ultrafilter-based averages are still available. To oversimplify substantially, the idempotent ultrafilter arguments let one establish mixing estimates of the form (2) for “many” elements ${g}$ of an infinite-dimensional parallelopiped known as an IP system (provided that the actions ${L_g,R_g}$ of this IP system obey some technical mixing hypotheses, but let’s ignore that for sake of this discussion). The claim then follows by using the quasirandomness hypothesis to show that if the estimate (2) failed for a large set of ${g \in G}$, then this large set would then contain an IP system, contradicting the previous claim.

Idempotent ultrafilters are an extremely infinitary type of mathematical object (one has to use Zorn’s lemma no fewer than three times just to construct one of these objects!). So it is quite remarkable that they can be used to establish a finitary theorem such as Theorem 1, though as is often the case with such infinitary arguments, one gets absolutely no quantitative control whatsoever on the error terms ${o(1)}$ appearing in that theorem. (It is also mildly amusing to note that our arguments involve the use of ultrafilters in two completely different ways: firstly in order to set up the ultraproduct that converts the finitary mixing problem to an infinitary one, and secondly to solve the infinitary mixing problem. Despite some superficial similarities, there appear to be no substantial commonalities between these two usages of ultrafilters.) There is already a fair amount of literature on using idempotent ultrafilter methods in infinitary ergodic theory, and perhaps by further development of ultraproduct correspondence principles, one can use such methods to obtain further finitary consequences (although the state of the art for idempotent ultrafilter ergodic theory has not advanced much beyond the analysis of two commuting shifts ${L_g, R_g}$ currently, which is the main reason why our arguments only handle the pattern ${(g,x,xg,gx)}$ and not more sophisticated patterns).

We also have some miscellaneous other results in the paper. It turns out that by using the triangle removal lemma from graph theory, one can obtain a recurrence result that asserts that whenever ${A}$ is a dense subset of a finite group ${G}$ (not necessarily quasirandom), then there are ${\gg |G|^2}$ pairs ${(x,g)}$ such that ${x, gx, xg}$ all lie in ${A}$. Using a hypergraph generalisation of the triangle removal lemma known as the hypergraph removal lemma, one can obtain more complicated versions of this statement; for instance, if ${A}$ is a dense subset of ${G^2}$, then one can find ${\gg |G|^2}$ triples ${(x,y,g)}$ such that ${(x,y), (gx, y), (gx, gy), (gxg^{-1}, gyg^{-1})}$ all lie in ${A}$. But the method is tailored to the specific types of patterns given here, and we do not have a general method for obtaining recurrence or mixing properties for arbitrary patterns of words in some finite alphabet such as ${g,x,y}$.

We also give some properties of a model example of an ultra quasirandom group, namely the ultraproduct ${SL_2(F)}$ of ${SL_2(F_{p_n})}$ where ${p_n}$ is a sequence of primes going off to infinity. Thanks to the substantial recent progress (by Helfgott, Bourgain, Gamburd, Breuillard, and others) on understanding the expansion properties of the finite groups ${SL_2(F_{p_n})}$, we have a fair amount of knowledge on the ultraproduct ${SL_2(F)}$ as well; for instance any two elements of ${SL_2(F)}$ will almost surely generate a spectral gap. We don’t have any direct application of this particular ultra quasirandom group, but it might be interesting to study it further.

Two weeks ago I was at Oberwolfach, for the Arbeitsgemeinschaft in Ergodic Theory and Combinatorial Number Theory that I was one of the organisers for. At this workshop, I learned the details of a very nice recent convergence result of Miguel Walsh (who, incidentally, is an informal grandstudent of mine, as his advisor, Roman Sasyk, was my informal student), which considerably strengthens and generalises a number of previous convergence results in ergodic theory (including one of my own), with a remarkably simple proof. Walsh’s argument is phrased in a finitary language (somewhat similar, in fact, to the approach used in my paper mentioned previously), and (among other things) relies on the concept of metastability of sequences, a variant of the notion of convergence which is useful in situations in which one does not expect a uniform convergence rate; see this previous blog post for some discussion of metastability. When interpreted in a finitary setting, this concept requires a fair amount of “epsilon management” to manipulate; also, Walsh’s argument uses some other epsilon-intensive finitary arguments, such as a decomposition lemma of Gowers based on the Hahn-Banach theorem. As such, I was tempted to try to rewrite Walsh’s argument in the language of nonstandard analysis to see the extent to which these sorts of issues could be managed. As it turns out, the argument gets cleaned up rather nicely, with the notion of metastability being replaced with the simpler notion of external Cauchy convergence (which we will define below the fold).

Let’s first state Walsh’s theorem. This theorem is a norm convergence theorem in ergodic theory, and can be viewed as a substantial generalisation of one of the most fundamental theorems of this type, namely the mean ergodic theorem:

Theorem 1 (Mean ergodic theorem) Let ${(X,\mu,T)}$ be a measure-preserving system (a probability space ${(X,\mu)}$ with an invertible measure-preserving transformation ${T}$). Then for any ${f \in L^2(X,\mu)}$, the averages ${\frac{1}{N} \sum_{n=1}^N T^n f}$ converge in ${L^2(X,\mu)}$ norm as ${N \rightarrow \infty}$, where ${T^n f(x) := f(T^{-n} x)}$.

In this post, all functions in ${L^2(X,\mu)}$ and similar spaces will be taken to be real instead of complex-valued for simplicity, though the extension to the complex setting is routine.

Actually, we have a precise description of the limit of these averages, namely the orthogonal projection of ${f}$ to the ${T}$-invariant factors. (See for instance my lecture notes on this theorem.) While this theorem ostensibly involves measure theory, it can be abstracted to the more general setting of unitary operators on a Hilbert space:

Theorem 2 (von Neumann mean ergodic theorem) Let ${H}$ be a Hilbert space, and let ${U: H \rightarrow H}$ be a unitary operator on ${H}$. Then for any ${f \in H}$, the averages ${\frac{1}{N} \sum_{n=1}^N U^n f}$ converge strongly in ${H}$ as ${N \rightarrow \infty}$.

Again, see my lecture notes (or just about any text in ergodic theory) for a proof.

Now we turn to Walsh’s theorem.

Theorem 3 (Walsh’s convergence theorem) Let ${(X,\mu)}$ be a measure space with a measure-preserving action of a nilpotent group ${G}$. Let ${g_1,\ldots,g_k: {\bf Z} \rightarrow G}$ be polynomial sequences in ${G}$ (i.e. each ${g_i}$ takes the form ${g_i(n) = a_{i,1}^{p_{i,1}(n)} \ldots a_{i,j}^{p_{i,j}(n)}}$ for some ${a_{i,1},\ldots,a_{i,j} \in G}$ and polynomials ${p_{i,1},\ldots,p_{i,j}: {\bf Z} \rightarrow {\bf Z}}$). Then for any ${f_1,\ldots,f_k \in L^\infty(X,\mu)}$, the averages ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$ converge in ${L^2(X,\mu)}$ norm as ${N \rightarrow \infty}$, where ${g(n) f(x) := f(g(n)^{-1} x)}$.

It turns out that this theorem can also be abstracted to some extent, although due to the multiplication in the summand ${(g_1(n) f_1) \ldots (g_k(n) f_k)}$, one cannot work purely with Hilbert spaces as in the von Neumann mean ergodic theorem, but must also work with something like the Banach algebra ${L^\infty(X,\mu)}$. There are a number of ways to formulate this abstraction (which will be of some minor convenience to us, as it will allow us to reduce the need to invoke the nonstandard measure theory of Loeb, discussed for instance in this blog post); we will use the notion of a (real) commutative probability space ${({\mathcal A},\tau)}$, which for us will be a commutative unital algebra ${{\mathcal A}}$ over the reals together with a linear functional ${\tau: {\mathcal A} \rightarrow {\bf R}}$ which maps ${1}$ to ${1}$ and obeys the non-negativity axiom ${\tau(f^2) \ge 0}$ for all ${f}$. The key example to keep in mind here is ${{\mathcal A} = L^\infty(X,\mu)}$ of essentially bounded real-valued measurable functions with the supremum norm, and with the trace ${\tau(f) := \int_X f\ d\mu}$. We will also assume in our definition of commutative probability spaces that all elements ${f}$ of ${{\mathcal A}}$ are bounded in the sense that the spectral radius ${\rho(f) := \lim_{k \rightarrow \infty} \tau(f^{2k})^{1/2k}}$ is finite. (In the concrete case of ${L^\infty(X,\mu)}$, the spectral radius is just the ${L^\infty}$ norm.)

Given a commutative probability space, we can form an inner product ${\langle, \rangle_{L^2(\tau)}}$ on it by the formula

$\displaystyle \langle f, g \rangle_{L^2(\tau)} := \tau(fg).$

This is a positive semi-definite form, and gives a (possibly degenerate) inner product structure on ${{\mathcal A}}$. We could complete this structure into a Hilbert space ${L^2(\tau)}$ (after quotienting out the elements of zero norm), but we will not do so here, instead just viewing ${L^2(\tau)}$ as providing a semi-metric on ${{\mathcal A}}$. For future reference we record the inequalities

$\displaystyle \rho(fg) \leq \rho(f) \rho(g)$

$\displaystyle \rho(f+g) \leq \rho(f) + \rho(g)$

$\displaystyle \| fg\|_{L^2(\tau)} \leq \|f\|_{L^2(\tau)} \rho(g)$

for any ${f,g}$, which we will use in the sequel without further comment; see e.g. these previous blog notes for proofs. (Actually, for the purposes of proving Theorem 3, one can specialise to the ${L^\infty(X,\mu)}$ case (and ultraproducts thereof), in which case these inequalities are just the triangle and Hölder inequalities.)

The abstract version of Theorem 3 is then

Theorem 4 (Walsh’s theorem, abstract version) Let ${({\mathcal A},\tau)}$ be a commutative probability space, and let ${G}$ be a nilpotent group acting on ${{\mathcal A}}$ by isomorphisms (preserving the algebra, conjugation, and trace structure, and thus also preserving the spectral radius and ${L^2(\tau)}$ norm). Let ${g_1,\ldots,g_k: {\bf Z} \rightarrow G}$ be polynomial sequences. Then for any ${f_1,\ldots,f_k \in {\mathcal A}}$, the averages ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$ form a Cauchy sequence in ${L^2(\tau)}$ (semi-)norm as ${N \rightarrow \infty}$.

It is easy to see that this theorem generalises Theorem 3. Conversely, one can use the commutative Gelfand-Naimark theorem to deduce Theorem 4 from Theorem 3, although we will not need this implication. Note how we are abandoning all attempts to discern what the limit of the sequence actually is, instead contenting ourselves with demonstrating that it is merely a Cauchy sequence. With this phrasing, it is tempting to ask whether there is any analogue of Walsh’s theorem for noncommutative probability spaces, but unfortunately the answer to that question is negative for all but the simplest of averages, as was worked out in this paper of Austin, Eisner, and myself.

Our proof of Theorem 4 will proceed as follows. Firstly, in order to avoid the epsilon management alluded to earlier, we will take an ultraproduct to rephrase the theorem in the language of nonstandard analysis; for reasons that will be clearer later, we will also convert the convergence problem to a problem of obtaining metastability (external Cauchy convergence). Then, we observe that (the nonstandard counterpart of) the expression ${\|\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)\|_{L^2(\tau)}^2}$ can be viewed as the inner product of (say) ${f_k}$ with a certain type of expression, which we call a dual function. By performing an orthogonal projection to the span of the dual functions, we can split ${f_k}$ into the sum of an expression orthogonal to all dual functions (the “pseudorandom” component), and a function that can be well approximated by finite linear combinations of dual functions (the “structured” component). The contribution of the pseudorandom component is asymptotically negligible, so we can reduce to consideration of the structured component. But by a little bit of rearrangement, this can be viewed as an average of expressions similar to the initial average ${\frac{1}{N} \sum_{n=1}^N (g_1(n) f_1) \ldots (g_k(n) f_k)}$, except with the polynomials ${g_1,\ldots,g_k}$ replaced by a “lower complexity” set of such polynomials, which can be greater in number, but which have slightly lower degrees in some sense. One can iterate this (using “PET induction”) until all the polynomials become trivial, at which point the claim follows.