You are currently browsing the category archive for the ‘math.DS’ category.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Mackey-Zimmer theorem“. This paper is part of our longer term project to develop “uncountable” versions of various theorems in ergodic theory; see this previous paper of Asgar and myself for the first paper in this series (and another paper will appear shortly).

In this case the theorem in question is the Mackey-Zimmer theorem, previously discussed in this blog post. This theorem gives an important classification of group and homogeneous extensions of measure-preserving systems. Let us first work in the (classical) setting of concrete measure-preserving systems. Let {Y = (Y, \mu_Y, T_Y)} be a measure-preserving system for some group {\Gamma}, thus {(Y,\mu_Y)} is a (concrete) probability space and {T_Y : \gamma \rightarrow T_Y^\gamma} is a group homomorphism from {\Gamma} to the automorphism group {\mathrm{Aut}(Y,\mu_Y)} of the probability space. (Here we are abusing notation by using {Y} to refer both to the measure-preserving system and to the underlying set. In the notation of the paper we would instead distinguish these two objects as {Y_{\mathbf{ConcPrb}_\Gamma}} and {Y_{\mathbf{Set}}} respectively, reflecting two of the (many) categories one might wish to view {Y} as a member of, but for sake of this informal overview we will not maintain such precise distinctions.) If {K} is a compact group, we define a (concrete) cocycle to be a collection of measurable functions {\rho_\gamma : Y \rightarrow K} for {\gamma \in \Gamma} that obey the cocycle equation

\displaystyle  \rho_{\gamma \gamma'}(y) = \rho_\gamma(T_Y^{\gamma'} y) \rho_{\gamma'}(y) \ \ \ \ \ (1)

for each {\gamma,\gamma' \in \Gamma} and all {y \in Y}. (One could weaken this requirement by only demanding the cocycle equation to hold for almost all {y}, rather than all {y}; we will effectively do so later in the post, when we move to opposite probability algebra systems.) Any such cocycle generates a group skew-product {X = Y \rtimes_\rho K} of {Y}, which is another measure-preserving system {(X, \mu_X, T_X)} where
  • {X = Y \times K} is the Cartesian product of {Y} and {K};
  • {\mu_X = \mu_Y \times \mathrm{Haar}_K} is the product measure of {\mu_Y} and Haar probability measure on {K}; and
  • The action {T_X: \gamma \rightarrow } is given by the formula

    \displaystyle  T_X^\gamma(y,k) := (T_Y^\gamma y, \rho_\gamma(y) k). \ \ \ \ \ (2)

The cocycle equation (1) guarantees that {T_X} is a homomorphism, and the (left) invariance of Haar measure and Fubini’s theorem guarantees that the {T_X^\gamma} remain measure preserving. There is also the more general notion of a homogeneous skew-product {X \times Y \times_\rho K/L} in which the group {K} is replaced by the homogeneous space {K/L} for some closed subgroup of {L}, noting that {K/L} still comes with a left-action of {K} and a Haar measure. Group skew-products are very “explicit” ways to extend a system {Y}, as everything is described by the cocycle {\rho} which is a relatively tractable object to manipulate. (This is not to say that the cohomology of measure-preserving systems is trivial, but at least there are many tools one can use to study them, such as the Moore-Schmidt theorem discussed in this previous post.)

This group skew-product {X} comes with a factor map {\pi: X \rightarrow Y} and a coordinate map {\theta: X \rightarrow K}, which by (2) are related to the action via the identities

\displaystyle  \pi \circ T_X^\gamma = T_Y^\gamma \circ \pi \ \ \ \ \ (3)


\displaystyle  \theta \circ T_X^\gamma = (\rho_\gamma \circ \pi) \theta \ \ \ \ \ (4)

where in (4) we are implicitly working in the group of (concretely) measurable functions from {Y} to {K}. Furthermore, the combined map {(\pi,\theta): X \rightarrow Y \times K} is measure-preserving (using the product measure on {Y \times K}), indeed the way we have constructed things this map is just the identity map.

We can now generalize the notion of group skew-product by just working with the maps {\pi, \theta}, and weakening the requirement that {(\pi,\theta)} be measure-preserving. Namely, define a group extension of {Y} by {K} to be a measure-preserving system {(X,\mu_X, T_X)} equipped with a measure-preserving map {\pi: X \rightarrow Y} obeying (3) and a measurable map {\theta: X \rightarrow K} obeying (4) for some cocycle {\rho}, such that the {\sigma}-algebra of {X} is generated by {\pi,\theta}. There is also a more general notion of a homogeneous extension in which {\theta} takes values in {K/L} rather than {K}. Then every group skew-product {Y \rtimes_\rho K} is a group extension of {Y} by {K}, but not conversely. Here are some key counterexamples:

  • (i) If {H} is a closed subgroup of {K}, and {\rho} is a cocycle taking values in {H}, then {Y \rtimes_\rho H} can be viewed as a group extension of {Y} by {K}, taking {\theta: Y \rtimes_\rho H \rightarrow K} to be the vertical coordinate {\theta(y,h) = h} (viewing {h} now as an element of {K}). This will not be a skew-product by {K} because {(\theta,\pi)} pushes forward to the wrong measure on {Y \times K}: it pushes forward to {\mu_Y \times \mathrm{Haar}_H} rather than {\mu_Y \times \mathrm{Haar}_K}.
  • (ii) If one takes the same example as (i), but twists the vertical coordinate {\theta} to another vertical coordinate {\tilde \theta(y,h) := \Phi(y) \theta(y,h)} for some measurable “gauge function” {\Phi: Y \rightarrow K}, then {Y \rtimes_\rho H} is still a group extension by {K}, but now with the cocycle {\rho} replaced by the cohomologous cocycle

    \displaystyle  \tilde \rho_\gamma(y) := \Phi(T_Y^\gamma y) \rho_\gamma \Phi(y)^{-1}.

    Again, this will not be a skew product by {K}, because {(\theta,\pi)} pushes forward to a twisted version of {\mu_Y \times \mathrm{Haar}_H} that is supported (at least in the case where {Y} is compact and the cocycle {\rho} is continuous) on the {H}-bundle {\bigcup_{y \in Y} \{y\} \times \Phi(y) H}.
  • (iii) With the situation as in (i), take {X} to be the union {X = Y \rtimes_\rho H \uplus Y \rtimes_\rho Hk \subset Y \times K} for some {k \in K} outside of {H}, where we continue to use the action (2) and the standard vertical coordinate {\theta: (y,k) \mapsto k} but now use the measure {\mu_Y \times (\frac{1}{2} \mathrm{Haar}_H + \frac{1}{2} \mathrm{Haar}_{Hk})}.

As it turns out, group extensions and homogeneous extensions arise naturally in the Furstenberg-Zimmer structural theory of measure-preserving systems; roughly speaking, every compact extension of {Y} is an inverse limit of group extensions. It is then of interest to classify such extensions.

Examples such as (iii) are annoying, but they can be excluded by imposing the additional condition that the system {(X,\mu_X,T_X)} is ergodic – all invariant (or essentially invariant) sets are of measure zero or measure one. (An essentially invariant set is a measurable subset {E} of {X} such that {T^\gamma E} is equal modulo null sets to {E} for all {\gamma \in \Gamma}.) For instance, the system in (iii) is non-ergodic because the set {Y \times H} (or {Y \times Hk}) is invariant but has measure {1/2}. We then have the following fundamental result of Mackey and Zimmer:

Theorem 1 (Countable Mackey Zimmer theorem) Let {\Gamma} be a group, {Y} be a concrete measure-preserving system, and {K} be a compact Hausdorff group. Assume that {\Gamma} is at most countable, {Y} is a standard Borel space, and {K} is metrizable. Then every (concrete) ergodic group extension of {Y} is abstractly isomorphic to a group skew-product (by some closed subgroup {H} of {K}), and every (concrete) ergodic homogeneous extension of {Y} is similarly abstractly isomorphic to a homogeneous skew-product.

We will not define precisely what “abstractly isomorphic” means here, but it roughly speaking means “isomorphic after quotienting out the null sets”. A proof of this theorem can be found for instance in .

The main result of this paper is to remove the “countability” hypotheses from the above theorem, at the cost of working with opposite probability algebra systems rather than concrete systems. (We will discuss opposite probability algebras in a subsequent blog post relating to another paper in this series.)

Theorem 2 (Uncountable Mackey Zimmer theorem) Let {\Gamma} be a group, {Y} be an opposite probability algebra measure-preserving system, and {K} be a compact Hausdorff group. Then every (abstract) ergodic group extension of {Y} is abstractly isomorphic to a group skew-product (by some closed subgroup {H} of {K}), and every (abstract) ergodic homogeneous extension of {Y} is similarly abstractly isomorphic to a homogeneous skew-product.

We plan to use this result in future work to obtain uncountable versions of the Furstenberg-Zimmer and Host-Kra structure theorems.

As one might expect, one locates a proof of Theorem 2 by finding a proof of Theorem 1 that does not rely too strongly on “countable” tools, such as disintegration or measurable selection, so that all of those tools can be replaced by “uncountable” counterparts. The proof we use is based on the one given in this previous post, and begins by comparing the system {X} with the group extension {Y \rtimes_\rho K}. As the examples (i), (ii) show, these two systems need not be isomorphic even in the ergodic case, due to the different probability measures employed. However one can relate the two after performing an additional averaging in {K}. More precisely, there is a canonical factor map {\Pi: X \rtimes_1 K \rightarrow Y \times_\rho K} given by the formula

\displaystyle  \Pi(x, k) := (\pi(x), \theta(x) k).

This is a factor map not only of {\Gamma}-systems, but actually of {\Gamma \times K^{op}}-systems, where the opposite group {K^{op}} to {K} acts (on the left) by right-multiplication of the second coordinate (this reversal of order is why we need to use the opposite group here). The key point is that the ergodicity properties of the system {Y \times_\rho K} are closely tied the group {H} that is “secretly” controlling the group extension. Indeed, in example (i), the invariant functions on {Y \times_\rho K} take the form {(y,k) \mapsto f(Hk)} for some measurable {f: H \backslash K \rightarrow {\bf C}}, while in example (ii), the invariant functions on {Y \times_{\tilde \rho} K} take the form {(y,k) \mapsto f(H \Phi(y)^{-1} k)}. In either case, the invariant factor is isomorphic to {H \backslash K}, and can be viewed as a factor of the invariant factor of {X \rtimes_1 K}, which is isomorphic to {K}. Pursuing this reasoning (using an abstract ergodic theorem of Alaoglu and Birkhoff, as discussed in the previous post) one obtains the Mackey range {H}, and also obtains the quotient {\tilde \Phi: Y \rightarrow K/H} of {\Phi: Y \rightarrow K} to {K/H} in this process. The main remaining task is to lift the quotient {\tilde \Phi} back up to a map {\Phi: Y \rightarrow K} that stays measurable, in order to “untwist” a system that looks like (ii) to make it into one that looks like (i). In countable settings this is where a “measurable selection theorem” would ordinarily be invoked, but in the uncountable setting such theorems are not available for concrete maps. However it turns out that they still remain available for abstract maps: any abstractly measurable map {\tilde \Phi} from {Y} to {K/H} has an abstractly measurable lift from {Y} to {K}. To prove this we first use a canonical model for opposite probability algebras (which we will discuss in a companion post to this one, to appear shortly) to work with continuous maps (on a Stone space) rather than abstractly measurable maps. The measurable map {\tilde \Phi} then induces a probability measure on {Y \times K/H}, formed by pushing forward {\mu_Y} by the graphing map {y \mapsto (y,\tilde \Phi(y))}. This measure in turn has several lifts up to a probability measure on {Y \times K}; for instance, one can construct such a measure {\overline{\mu}} via the Riesz representation theorem by demanding

\displaystyle  \int_{Y \times K} f(y,k) \overline{\mu}(y,k) := \int_Y (\int_{\tilde \Phi(y) H} f(y,k)\ d\mathrm{Haar}_{\tilde \Phi(y) H})\ d\mu_Y(y)

for all continuous functions {f}. This measure does not come from a graph of any single lift {\Phi: Y \rightarrow K}, but is in some sense an “average” of the entire ensemble of these lifts. But it turns out one can invoke the Krein-Milman theorem to pass to an extremal lifting measure which does come from an (abstract) lift {\Phi}, and this can be used as a substitute for a measurable selection theorem. A variant of this Krein-Milman argument can also be used to express any homogeneous extension as a quotient of a group extension, giving the second part of the Mackey-Zimmer theorem.

Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

Theorem 1 (Birkhoff ergodic theorem) Let {(X,\mu,T)} be a measure-preserving system (by which we mean {(X,\mu)} is a {\sigma}-finite measure space, and {T: X \rightarrow X} is invertible and measure-preserving), and let {f \in L^p(X)} for any {1 \leq p < \infty}. Then the averages {\frac{1}{N} \sum_{n=1}^N f(T^n x)} converge pointwise for {\mu}-almost every {x \in X}.

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint {p=1}, where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

Theorem 2 (Pointwise ergodic theorem for polynomial averages) Let {(X,\mu,T)} be a measure-preserving system, and let {f \in L^p(X)} for any {1 < p < \infty}. Let {P \in {\bf Z}[{\mathrm n}]} be a polynomial with integer coefficients. Then the averages {\frac{1}{N} \sum_{n=1}^N f(T^{P(n)} x)} converge pointwise for {\mu}-almost every {x \in X}.

For bilinear averages, we have a separate 1990 result of Bourgain (for {L^\infty} functions), extended to other {L^p} spaces by Lacey, and with an alternate proof given, by Demeter:

Theorem 3 (Pointwise ergodic theorem for two linear polynomials) Let {(X,\mu,T)} be a measure-preserving system with finite measure, and let {f \in L^{p_1}(X)}, {g \in L^{p_2}} for some {1 < p_1,p_2 \leq \infty} with {\frac{1}{p_1}+\frac{1}{p_2} < \frac{3}{2}}. Then for any integers {a,b}, the averages {\frac{1}{N} \sum_{n=1}^N f(T^{an} x) g(T^{bn} x)} converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial) Let {(X,\mu,T)} be a measure-preserving system, and let {f \in L^{p_1}(X)}, {g \in L^{p_2}} for some {1 < p_1,p_2 < \infty} with {\frac{1}{p_1}+\frac{1}{p_2} \leq 1}. Then for any polynomial {P \in {\bf Z}[{\mathrm n}]} of degree {d \geq 2}, the averages {\frac{1}{N} \sum_{n=1}^N f(T^{n} x) g(T^{P(n)} x)} converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with {\frac{1}{p_1}+\frac{1}{p_2}>1}, but we will not discuss these extensions here. A good model case to keep in mind is when {p_1=p_2=2} and {P(n) = n^2} (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the {d=2} case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions {f,g} involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if {f,g} are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which {X = {\bf Z}} with counting measure, and {T(n) = n-1}). A model problem is to establish the maximal inequality

\displaystyle  \| \sup_N |A_N(f,g)| \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \ \ \ \ \ (1)

where {N} ranges over powers of two and {A_N} is the bilinear operator

\displaystyle  A_N(f,g)(x) := \frac{1}{N} \sum_{n=1}^N f(x-n) g(x-n^2).

The single scale estimate

\displaystyle  \| A_N(f,g) \|_{\ell^1({\bf Z})} \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})}

or equivalently (by duality)

\displaystyle  \frac{1}{N} \sum_{n=1}^N \sum_{x \in {\bf Z}} h(x) f(x-n) g(x-n^2) \lesssim \|f\|_{\ell^2({\bf Z})}\|g\|_{\ell^2({\bf Z})} \|h\|_{\ell^\infty({\bf Z})} \ \ \ \ \ (2)

is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales {N}.

The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when {f(x) = e(ax/q) F(x)}, {g(x) = e(bx/q) G(x)}, {h(x) = e(cx/q) H(x)} where {q=O(1)} is a small modulus, {a,b,c} are such that {a+b+c=0 \hbox{ mod } q}, {G} is a smooth cutoff to an interval {I} of length {O(N^2)}, and {F=H} is also supported on {I} and behaves like a constant on intervals of length {O(N)}. Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials {P} by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when {f,g,h} are supported on a common interval {I} of length {O(N^2)} and are normalised in {\ell^\infty} rather than {\ell^2}. (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the {f,h} factors; the corresponding statement for {g} was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the {\ell^\infty}-normalized version of (2)).

For our applications we had to extend the {\ell^\infty} inverse theory of Peluse and Prendiville to an {\ell^2} theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

\displaystyle  A^*_N(h,g)(x) = \frac{1}{N} \sum_{n=1}^N h(x+n) g(x+n-n^2)

can be well approximated in {\ell^1} by a function that has Fourier support on “major arcs” if {g,h} enjoy {\ell^\infty} control. To get the required extension to {\ell^2} in the {f} aspect one has to improve the control on the error from {\ell^1} to {\ell^2}; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recent {\ell^p({\bf Z})} improving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as {x \mapsto \frac{1}{N} \sum_{n=1}^N g(x+n-n^2)}, one can relax the {\ell^\infty} hypothesis on {g} to an {\ell^2} hypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function {f}; a modification of the arguments also gives something similar for {g}.

Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when {f,g} are supported near rational numbers {a/q} with {q \sim 2^l} for some moderately large {l}. The inverse theory gives good control (with an exponential decay in {l}) on individual scales {N}, and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of {A_N} to eventually handle all “small” scales, with {N} ranging up to say {2^{2^u}} where {u = C 2^{\rho l}} for some small constant {\rho} and large constant {C}. For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator {Q}, and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers {{\bf Z}} to the locally compact abelian group {{\bf R} \times {\bf Z}/Q{\bf Z}}. Actually it was conceptually clearer for us to work instead with the adelic integers {{\mathbf A}_{\bf Z} ={\bf R} \times \hat {\bf Z}}, which is the inverse limit of the {{\bf R} \times {\bf Z}/Q{\bf Z}}. Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

\displaystyle  A_{N,{\bf R}}(f,g)(x) := \frac{1}{N} \int_0^N f(x-t) g(x-t^2)\ dt

on {{\bf R}}, and the “arithmetic” bilinear operator

\displaystyle  A_{\hat Z}(f,g)(x) := \int_{\hat {\bf Z}} f(x-y) g(x-y^2) d\mu_{\hat {\bf Z}}(y)

on the profinite integers {\hat {\bf Z}}, equipped with probability Haar measure {\mu_{\hat {\bf Z}}}. After a number of standard manipulations (interpolation, Fubini’s theorem, Hölder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing an {L^q} improving estimate

\displaystyle  \| A_{\hat {\bf Z}}(f,g) \|_{L^q(\hat {\bf Z})} \lesssim \|f\|_{L^2(\hat {\bf Z})} \|g\|_{L^2(\hat {\bf Z})}

for some {q>2}. Splitting the profinite integers {\hat {\bf Z}} into the product of the {p}-adic integers {{\bf Z}_p}, it suffices to establish this claim for each {{\bf Z}_p} separately (so long as we keep the implied constant equal to {1} for sufficiently large {p}). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmetic {L^q} improving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the {p}-adic field {{\bf Q}_p}, which are a variant of some estimates of Kowalski and Wright.

Just a short post to note that this year’s Abel prize has been awarded jointly to Hillel Furstenberg and Grigory Margulis for “for pioneering the use of methods from probability and dynamics in group theory, number theory and combinatorics”.  I was not involved in the decision making process of the Abel committee this year, but I certainly feel that the contributions of both mathematicians are worthy of the prize.  Certainly both mathematicians have influenced my own work (for instance, Furstenberg’s proof of Szemeredi’s theorem ended up being a key influence in my result with Ben Green that the primes contain arbitrarily long arithmetic progressions); see for instance these blog posts mentioning Furstenberg, and these blog posts mentioning Margulis.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let {X = (X,{\mathcal X},\mu)} be a probability space, and {\mathrm{Aut}(X, {\mathcal X}, \mu)} be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps {T: X \rightarrow X} that preserve the measure {\mu}: {T_* \mu = \mu}. To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call {X = (X,{\mathcal X}, \mu)} a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)

Let {\Gamma = (\Gamma,\cdot)} be a discrete group. A (concrete) measure-preserving action of {\Gamma} on {X} is a group homomorphism {\gamma \mapsto T^\gamma} from {\Gamma} to {\mathrm{Aut}(X, {\mathcal X}, \mu)}, thus {T^1} is the identity map and {T^{\gamma_1} \circ T^{\gamma_2} = T^{\gamma_1 \gamma_2}} for all {\gamma_1,\gamma_2 \in \Gamma}. A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when {\Gamma} is the integers (with the additive group law).

Let {K = (K,+)} be a compact Hausdorff abelian group, which we can endow with the Borel {\sigma}-algebra {{\mathcal B}(K)}. A (concrete measurable) {K}cocycle is a collection {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} of concrete measurable maps {\rho_\gamma: X \rightarrow K} obeying the cocycle equation

\displaystyle \rho_{\gamma_1 \gamma_2}(x) = \rho_{\gamma_1} \circ T^{\gamma_2}(x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)

for {\mu}-almost every {x \in X}. (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action {\gamma \mapsto S^\gamma} on {X \times K} (which we endow with the product of {\mu} with Haar probability measure on {K}), defined by

\displaystyle S^\gamma( x, k ) := (T^\gamma x, k + \rho_\gamma(x) ).

This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.

A special case of a {K}-valued cocycle is a (concrete measurable) {K}-valued coboundary, in which {\rho_\gamma} for each {\gamma \in \Gamma} takes the special form

\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)

for {\mu}-almost every {x \in X}, where {F: X \rightarrow K} is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable {K}-valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which {K}-valued cocycles are in fact {K}-valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when {K} is the unit circle {\mathbf{T} = {\bf R}/{\bf Z}}, by taking advantage of the Pontryagin dual group {\hat K} of characters {\hat k: K \rightarrow \mathbf{T}}, that is to say the collection of continuous homomorphisms {\hat k: k \mapsto \langle \hat k, k \rangle} to the unit circle. More precisely, we have

Theorem 1 (Countable Moore-Schmidt theorem) Let {\Gamma} be a discrete group acting in a concrete measure-preserving fashion on a probability space {X}. Let {K} be a compact Hausdorff abelian group. Assume the following additional hypotheses:

Then a {K}-valued concrete measurable cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is a concrete coboundary if and only if for each character {\hat k \in \hat K}, the {\mathbf{T}}-valued cocycles {\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}} are concrete coboundaries.

The hypotheses (i), (ii), (iii) are saying in some sense that the data {\Gamma, X, K} are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to {K} being second countable, and (ii) is equivalent to {X} being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.

Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each {\langle \hat k, \rho \rangle} is a coboundary, then so is {\rho}. By hypothesis, we then have an equation of the form

\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \alpha_{\hat k} \circ T^\gamma(x) - \alpha_{\hat k}(x) \ \ \ \ \ (2)

for all {\hat k, \gamma, x} and some functions {\alpha_{\hat k}: X \rightarrow {\mathbf T}}, and our task is then to produce a function {F: X \rightarrow K} for which

\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)

for all {\gamma,x}.

Comparing the two equations, the task would be easy if we could find an {F: X \rightarrow K} for which

\displaystyle \langle \hat k, F(x) \rangle = \alpha_{\hat k}(x) \ \ \ \ \ (3)

for all {\hat k, x}. However there is an obstruction to this: the left-hand side of (3) is additive in {\hat k}, so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity

\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = 0 \ \ \ \ \ (4)

for all {\hat k_1, \hat k_2, x}. On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function {F} obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between {K} and the homomorphisms of the (discrete) group {\hat K} to {\mathbf{T}}.

Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in {\hat k}, so the right-hand side must be also. Manipulating this fact, we eventually arrive at

\displaystyle (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2}) \circ T^\gamma(x) = (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2})(x).

In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is {\Gamma}-invariant. Now let us assume for sake of argument that the action of {\Gamma} is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only {\Gamma}-invariant functions are constant. So now we get a weaker version of (4), namely

\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = c_{\hat k_1, \hat k_2} \ \ \ \ \ (5)

for some constants {c_{\hat k_1, \hat k_2} \in \mathbf{T}}.

Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let {L^0({\bf X} \rightarrow {\bf T})} denote the space of concrete measurable maps {\alpha} from {{\bf X}} to {{\bf T}}, up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup {{\bf T}} of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because {{\bf T}} is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction {w: L^0({\bf X} \rightarrow {\bf T}) \rightarrow {\bf T}}, that is to say a group homomorphism that is the identity on the subgroup {{\bf T}}. We can use this retraction, or more precisely the complement {\alpha \mapsto \alpha - w(\alpha)}, to eliminate the constant in (5). Indeed, if we set

\displaystyle \tilde \alpha_{\hat k}(x) := \alpha_{\hat k}(x) - w(\alpha_{\hat k})

then from (5) we see that

\displaystyle \tilde \alpha_{\hat k_1 + \hat k_2}(x) - \tilde \alpha_{\hat k_1}(x) - \tilde \alpha_{\hat k_2}(x) = 0

while from (2) one has

\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \tilde \alpha_{\hat k} \circ T^\gamma(x) - \tilde \alpha_{\hat k}(x)

and now the previous strategy works with {\alpha_{\hat k}} replaced by {\tilde \alpha_{\hat k}}. This concludes the sketch of proof of Theorem 1.

In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to {\hat K} being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).

My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.

If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation {+: K \times K \rightarrow K}, while still continuous, can fail to be measurable as a map from {(K \times K, {\mathcal B}(K) \otimes {\mathcal B}(K))} to {(K, {\mathcal B}(K))}! Thus for instance the sum of two measurable functions {F: X \rightarrow K} need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when {K} is the uncountable torus {{\mathbf T}^{{\bf R}}}, endowed with the product topology. Crucially, the Borel {\sigma}-algebra {{\mathcal B}(K)} generated by this uncountable product is not the product {{\mathcal B}(\mathbf{T})^{\otimes {\bf R}}} of the factor Borel {\sigma}-algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but {\sigma}-algebras do not); relating to this, the product {\sigma}-algebra {{\mathcal B}(K) \otimes {\mathcal B}(K)} is not the same as the Borel {\sigma}-algebra {{\mathcal B}(K \times K)}, but is instead a strict sub-algebra. If the group operations on {K} were measurable, then the diagonal set

\displaystyle K^\Delta := \{ (k,k') \in K \times K: k = k' \} = \{ (k,k') \in K \times K: k - k' = 0 \}

would be measurable in {{\mathcal B}(K) \otimes {\mathcal B}(K)}. But it is an easy exercise in manipulation of {\sigma}-algebras to show that if {(X, {\mathcal X}), (Y, {\mathcal Y})} are any two measurable spaces and {E \subset X \times Y} is measurable in {{\mathcal X} \otimes {\mathcal Y}}, then the fibres {E_x := \{ y \in Y: (x,y) \in E \}} of {E} are contained in some countably generated subalgebra of {{\mathcal Y}}. Thus if {K^\Delta} were {{\mathcal B}(K) \otimes {\mathcal B}(K)}-measurable, then all the points of {K} would lie in a single countably generated {\sigma}-algebra. But the cardinality of such an algebra is at most {2^{\alpha_0}} while the cardinality of {K} is {2^{2^{\alpha_0}}}, and Cantor’s theorem then gives a contradiction.

To resolve this problem, we give {K} a coarser {\sigma}-algebra than the Borel {\sigma}-algebra, namely the Baire {\sigma}-algebra {{\mathcal B}^\otimes(K)}, thus coarsening the measurable space structure on {K = (K,{\mathcal B}(K))} to a new measurable space {K_\otimes := (K, {\mathcal B}^\otimes(K))}. In the case of compact Hausdorff abelian groups, {{\mathcal B}^{\otimes}(K)} can be defined as the {\sigma}-algebra generated by the characters {\hat k: K \rightarrow {\mathbf T}}; for more general compact abelian groups, one can define {{\mathcal B}^{\otimes}(K)} as the {\sigma}-algebra generated by all continuous maps into metric spaces. This {\sigma}-algebra is equal to {{\mathcal B}(K)} when {K} is metrisable but can be smaller for other {K}. With this measurable structure, {K_\otimes} becomes a measurable group; it seems that once one leaves the metrisable world that {K_\otimes} is a superior (or at least equally good) space to work with than {K} for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if {m_K} is the Haar probability measure on {K}, then {L^2(K,m_K) = L^2(K_\otimes,m_K)} (thus, every {K}-measurable set is equivalent modulo {m_K}-null sets to a {K_\otimes}-measurable set), so there is no damage to Plancherel caused by passing to the Baire {\sigma}-algebra.

Passing to the Baire {\sigma}-algebra {K_\otimes} fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of {\sigma}-algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form {{\mathcal X}^{\mathrm{op}}}, where {{\mathcal X}} is an (abstract) {\sigma}-algebra and {\mathrm{op}} is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map {T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}} is an object of the form {(T^*)^{\mathrm{op}}}, where {T^*: {\mathcal Y} \rightarrow {\mathcal X}} is a Boolean algebra homomorphism and {\mathrm{op}} is again used as a formal placeholder; we call {T^*} the pullback map associated to {T}.  [UPDATE: It turns out that this definition of a measurable map led to technical issues.  In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be \sigma-complete (i.e., it respects countable joins).] The composition {S \circ T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}} of two abstract measurable maps {T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}, {S: {\mathcal Y}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}} is defined by the formula {S \circ T := (T^* \circ S^*)^{\mathrm{op}}}, or equivalently {(S \circ T)^* = T^* \circ S^*}.

Every concrete measurable space {X = (X,{\mathcal X})} can be identified with an abstract counterpart {{\mathcal X}^{op}}, and similarly every concrete measurable map {T: X \rightarrow Y} can be identified with an abstract counterpart {(T^*)^{op}}, where {T^*: {\mathcal Y} \rightarrow {\mathcal X}} is the pullback map {T^* E := T^{-1}(E)}. Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra {X_\mu} of {X}, defined as {({\bf X}/{\bf N})^*} where {{\bf N}} is the ideal of null sets in {{\bf X}}. Informally, {X_\mu} is the space {X} with all the null sets removed; there is a canonical abstract embedding map {\iota: X_\mu \rightarrow X}, which allows one to convert any concrete measurable map {f: X \rightarrow Y} into an abstract one {[f]: X_\mu \rightarrow Y}. One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing {X} with the opposite measure algebra {X_\mu}; see the paper for details. Our main theorem is then

Theorem 2 (Uncountable Moore-Schmidt theorem) Let {\Gamma} be a discrete group acting abstractly on a {\sigma}-finite measure space {X}. Let {K} be a compact Hausdorff abelian group. Then a {K_\otimes}-valued abstract measurable cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is an abstract coboundary if and only if for each character {\hat k \in \hat K}, the {\mathbf{T}}-valued cocycles {\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}} are abstract coboundaries.

With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map {\alpha_{\hat k}: X_\mu \rightarrow {\bf T}} for each {\hat k \in K} obeying the identity { \alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2} = 0} for all {\hat k_1,\hat k_2 \in \hat K}, then there is an abstract measurable map {F: X_\mu \rightarrow K_\otimes} such that {\alpha_{\hat k} = \langle \hat k, F \rangle} for all {\hat k \in \hat K}. This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the {\sigma}-algebra of {X_\mu}, and the Sikorski extension theorem.

We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:

  • (i) Suppose one has an abstract measurable map {f: X_\mu \rightarrow Y} into a concrete measurable space. Does there exist a representation of {f} by a concrete measurable map {\tilde f: X \rightarrow Y}? Is it unique up to almost everywhere equivalence?
  • (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?

For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):

  • If {Y} does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If {Y} is not compact or Polish, there can be counterexamples to existence.
  • If {Y} is a compact metric space or a Polish space, then one always has existence and uniqueness.
  • If {Y} is a compact Hausdorff abelian group, one always has existence.
  • If {X} is a complete measure space, then one always has existence (from a theorem of Maharam).
  • If {X} is the unit interval with the Borel {\sigma}-algebra and Lebesgue measure, then one has existence for all compact Hausdorff {Y} assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
  • For more general {X}, existence for all compact Hausdorff {Y} is equivalent to the existence of a lifting from the {\sigma}-algebra {\mathcal{X}/\mathcal{N}} to {\mathcal{X}} (or, in the language of abstract measurable spaces, the existence of an abstract retraction from {X} to {X_\mu}).
  • It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever {Y} is compact Hausdorff.

Our understanding of (ii) is much less complete:

  • If {K} is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
  • If {\Gamma} is at most countable and {X} is a complete measure space, then the answer is again “always”.

In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the {3x+1} conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let {{\bf N}+1 = \{1,2,\dots\}} denote the positive integers (with {{\bf N} =\{0,1,2,\dots\}} the natural numbers), and let {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} be the map defined by setting {\mathrm{Col}(N)} equal to {3N+1} when {N} is odd and {N/2} when {N} is even. Let {\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)} be the minimal element of the Collatz orbit {N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}. Then we have

Conjecture 1 (Collatz conjecture) One has {\mathrm{Col}_{\min}(N)=1} for all {N \in {\bf N}+1}.

Establishing the conjecture for all {N} remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” {N} in some sense. For instance, it is a result of Krasikov and Lagarias that

\displaystyle  \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}

for all sufficiently large {x}. In another direction, it was shown by Terras that for almost all {N} (in the sense of natural density), one has {\mathrm{Col}_{\min}(N) < N}. This was then improved by Allouche to {\mathrm{Col}_{\min}(N)  0.869}, and extended later by Korec to cover all {\theta > \frac{\log 3}{\log 4} \approx 0.7924}. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let {f: {\bf N}+1 \rightarrow {\bf R}} be any function with {\lim_{N \rightarrow \infty} f(N) = +\infty}. Then we have {\mathrm{Col}_{\min}(N) < f(N)} for almost all {N} (in the sense of logarithmic density).

Thus for instance one has {\mathrm{Col}_{\min}(N) < \log\log\log\log N} for almost all {N} (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution {\mathrm{Col}^n(N)} for times {n} that only get as large as a small multiple {c \log N} of {\log N}; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get {\mathrm{Col}^n(N)} all the way down to {f(N)} one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state {N=O(1)}.

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if {\theta > \frac{\log 3}{\log 4}} one picks at random an integer {N} from a large interval {[1,x]}, then in most cases, the orbit of {N} will eventually move into the interval {[1,x^{\theta}]}. Similarly, if one picks an integer {M} at random from {[1,x^\theta]}, then in most cases, the orbit of {M} will eventually move into {[1,x^{\theta^2}]}. It is then tempting to concatenate the two statements and conclude that for most {N} in {[1,x]}, the orbit will eventually move {[1,x^{\theta^2}]}. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn {N \in [1,x]} reaches {[1,x^\theta]}, the distribution of the final value is unlikely to be close to being uniformly distributed on {[1,x^\theta]}, and in particular could potentially concentrate almost entirely in the exceptional set of {M \in [1,x^\theta]} that do not make it into {[1,x^{\theta^2}]}. The point here is the uniform measure on {[1,x]} is not transported by Collatz dynamics to anything resembling the uniform measure on {[1,x^\theta]}.

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}, defined on the odd numbers {2{\bf N}+1 = \{1,3,5,\dots\}} by setting {\mathrm{Syr}(N) = (3N+1)/2^a}, where {2^a} is the largest power of {2} that divides {3N+1}. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of {3} at each iteration step, which makes the map better behaved when performing “{3}-adic” analysis.)

When viewed {3}-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, {\mathrm{Syr}(N)} is never divisible by {3}. A little less obviously, {\mathrm{Syr}(N)} is twice as likely to equal {2} mod {3} as it is to equal {1} mod {3}. This is because for a randomly chosen odd {\mathbf{N}}, the number of times {\mathbf{a}} that {2} divides {3\mathbf{N}+1} can be seen to have a geometric distribution of mean {2} – it equals any given value {a \in{\bf N}+1} with probability {2^{-a}}. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of {3}. For instance, one can compute that for large random odd {\mathbf{N}}, {\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9} will take the residue classes {0,1,2,3,4,5,6,7,8 \hbox{ mod } 9} with probabilities

\displaystyle  0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

respectively. More generally, for any {n}, {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} will be distributed according to the law of a random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on {{\bf Z}/3^n{\bf Z}} that we call a Syracuse random variable, and can be described explicitly as

\displaystyle  \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)

where {\mathbf{a}_1,\dots,\mathbf{a}_n} are iid copies of a geometric random variable of mean {2}.

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this {3}-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} to construct such a measure, but only if these random variables stabilise in the limit {n \rightarrow \infty} in a certain total variation sense. More precisely, in the paper we establish the estimate

\displaystyle  \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)

\displaystyle  \ll_A m^{-A}

for any {1 \leq m \leq n} and any {A > 0}. This type of stabilisation is plausible from entropy heuristics – the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} of geometric random variables that generates {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} has Shannon entropy {n \log 4}, which is significantly larger than the total entropy {n \log 3} of the uniform distribution on {{\bf Z}/3^n{\bf Z}}, so we expect a lot of “mixing” and “collision” to occur when converting the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} to {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}; these heuristics can be supported by numerics (which I was able to work out up to about {n=10} before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}

are all distinct as {(a_1,\dots,a_n)} vary over tuples in {({\bf N}+1)^n}. Unfortunately, the process of reducing mod {3^n} creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)

are mostly distinct for “typical” {a_1,\dots,a_m} (as drawn using the geometric distribution) as long as {m} is a bit smaller than {\frac{\log 3}{\log 4} n} (basically because the rational number appearing in (3) then typically takes a form like {M/2^{2m}} with {M} an integer between {0} and {3^n}). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on { \mathbf{Syrac}({\bf Z}/3^n{\bf Z})} (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of {{\bf Z}/3^n{\bf Z}} of density less than {n^{-C}} for some large absolute constant {C>0}). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}, and more precisely in showing that

\displaystyle  \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)

for any {A > 0} and any {\xi \in {\bf Z}/3^n{\bf Z}} that is not divisible by {3}.

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming {n} is even for sake of discussion) as

\displaystyle  (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots

\displaystyle  + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n

where {\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}. The point here is that after conditioning on the {\mathbf{b}_1,\dots,\mathbf{b}_{n/2}} to be fixed, the random variables {\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n} remain independent (though the distribution of each {\mathbf{a}_{2j}} depends on the value that we conditioned {\mathbf{b}_j} to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

\displaystyle  \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}

is not close to an integer for a moderately large number ({\gg A \log n}, to be precise) of indices {j = 1,\dots,n/2}. (Actually, for technical reasons we have to also restrict to those {j} for which {\mathbf{b}_j=3}, but let us ignore this detail here.) To put it another way, if we let {B} denote the set of pairs {(j,l)} for which

\displaystyle  \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z},

we have to show that (with overwhelming probability) the random walk

\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of {B}.

A little bit of elementary number theory and combinatorics allows one to describe the set {B} as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of {B} after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of {B}. The most difficult case is when the renewal process passes through a particularly large triangle in {B}. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of {B} that one can finish the proof of (4), and thus Theorem 2.

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures“. This is a sequel to our previous paper that studied logarithmic correlations of the form

\displaystyle  f(a) := \lim^*_{x \rightarrow \infty} \frac{1}{\log \omega(x)} \sum_{x/\omega(x) \leq n \leq x} \frac{g_1(n+ah_1) \dots g_k(n+ah_k)}{n},

where {g_1,\dots,g_k} were bounded multiplicative functions, {h_1,\dots,h_k \rightarrow \infty} were fixed shifts, {1 \leq \omega(x) \leq x} was a quantity going off to infinity, and {\lim^*} was a generalised limit functional. Our main technical result asserted that these correlations were necessarily the uniform limit of periodic functions {f_i}. Furthermore, if {g_1 \dots g_k} (weakly) pretended to be a Dirichlet character {\chi}, then the {f_i} could be chosen to be {\chi}isotypic in the sense that {f_i(ab) = f_i(a) \chi(b)} whenever {a,b} are integers with {b} coprime to the periods of {\chi} and {f_i}; otherwise, if {g_1 \dots g_k} did not weakly pretend to be any Dirichlet character {\chi}, then {f} vanished completely. This was then used to verify several cases of the logarithmically averaged Elliott and Chowla conjectures.

The purpose of this paper was to investigate the extent to which the methods could be extended to non-logarithmically averaged settings. For our main technical result, we now considered the unweighted averages

\displaystyle  f_d(a) := \lim^*_{x \rightarrow \infty} \frac{1}{x/d} \sum_{n \leq x/d} g_1(n+ah_1) \dots g_k(n+ah_k),

where {d>1} is an additional parameter. Our main result was now as follows. If {g_1 \dots g_k} did not weakly pretend to be a twisted Dirichlet character {n \mapsto \chi(n) n^{it}}, then {f_d(a)} converged to zero on (doubly logarithmic) average as {d \rightarrow \infty}. If instead {g_1 \dots g_k} did pretend to be such a twisted Dirichlet character, then {f_d(a) d^{it}} converged on (doubly logarithmic) average to a limit {f(a)} of {\chi}-isotypic functions {f_i}. Thus, roughly speaking, one has the approximation

\displaystyle  \lim^*_{x \rightarrow \infty} \frac{1}{x/d} \sum_{n \leq x/d} g_1(n+ah_1) \dots g_k(n+ah_k) \approx f(a) d^{-it}

for most {d}.

Informally, this says that at almost all scales {x} (where “almost all” means “outside of a set of logarithmic density zero”), the non-logarithmic averages behave much like their logarithmic counterparts except for a possible additional twisting by an Archimedean character {d \mapsto d^{it}} (which interacts with the Archimedean parameter {d} in much the same way that the Dirichlet character {\chi} interacts with the non-Archimedean parameter {a}). One consequence of this is that most of the recent results on the logarithmically averaged Chowla and Elliott conjectures can now be extended to their non-logarithmically averaged counterparts, so long as one excludes a set of exceptional scales {x} of logarithmic density zero. For instance, the Chowla conjecture

\displaystyle  \lim_{x \rightarrow\infty} \frac{1}{x} \sum_{n \leq x} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

is now established for {k} either odd or equal to {2}, so long as one excludes an exceptional set of scales.

In the logarithmically averaged setup, the main idea was to combine two very different pieces of information on {f(a)}. The first, coming from recent results in ergodic theory, was to show that {f(a)} was well approximated in some sense by a nilsequence. The second was to use the “entropy decrement argument” to obtain an approximate isotopy property of the form

\displaystyle  f(a) g_1 \dots g_k(p)\approx f(ap)

for “most” primes {p} and integers {a}. Combining the two facts, one eventually finds that only the almost periodic components of the nilsequence are relevant.

In the current situation, each {a \mapsto f_d(a)} is approximated by a nilsequence, but the nilsequence can vary with {d} (although there is some useful “Lipschitz continuity” of this nilsequence with respect to the {d} parameter). Meanwhile, the entropy decrement argument gives an approximation basically of the form

\displaystyle  f_{dp}(a) g_1 \dots g_k(p)\approx f_d(ap)

for “most” {d,p,a}. The arguments then proceed largely as in the logarithmically averaged case. A key lemma to handle the dependence on the new parameter {d} is the following cohomological statement: if one has a map {\alpha: (0,+\infty) \rightarrow S^1} that was a quasimorphism in the sense that {\alpha(xy) = \alpha(x) \alpha(y) + O(\varepsilon)} for all {x,y \in (0,+\infty)} and some small {\varepsilon}, then there exists a real number {t} such that {\alpha(x) = x^{it} + O(\varepsilon)} for all small {\varepsilon}. This is achieved by applying a standard “cocycle averaging argument” to the cocycle {(x,y) \mapsto \alpha(xy) \alpha(x)^{-1} \alpha(y)^{-1}}.

It would of course be desirable to not have the set of exceptional scales. We only know of one (implausible) scenario in which we can do this, namely when one has far fewer (in particular, subexponentially many) sign patterns for (say) the Liouville function than predicted by the Chowla conjecture. In this scenario (roughly analogous to the “Siegel zero” scenario in multiplicative number theory), the entropy of the Liouville sign patterns is so small that the entropy decrement argument becomes powerful enough to control all scales rather than almost all scales. On the other hand, this scenario seems to be self-defeating, in that it allows one to establish a large number of cases of the Chowla conjecture, and the full Chowla conjecture is inconsistent with having unusually few sign patterns. Still it hints that future work in this direction may need to split into “low entropy” and “high entropy” cases, in analogy to how many arguments in multiplicative number theory have to split into the “Siegel zero” and “no Siegel zero” cases.

This coming fall quarter, I am teaching a class on topics in the mathematical theory of incompressible fluid equations, focusing particularly on the incompressible Euler and Navier-Stokes equations. These two equations are by no means the only equations used to model fluids, but I will focus on these two equations in this course to narrow the focus down to something manageable. I have not fully decided on the choice of topics to cover in this course, but I would probably begin with some core topics such as local well-posedness theory and blowup criteria, conservation laws, and construction of weak solutions, then move on to some topics such as boundary layers and the Prandtl equations, the Euler-Poincare-Arnold interpretation of the Euler equations as an infinite dimensional geodesic flow, and some discussion of the Onsager conjecture. I will probably also continue to more advanced and recent topics in the winter quarter.

In this initial set of notes, we begin by reviewing the physical derivation of the Euler and Navier-Stokes equations from the first principles of Newtonian mechanics, and specifically from Newton’s famous three laws of motion. Strictly speaking, this derivation is not needed for the mathematical analysis of these equations, which can be viewed if one wishes as an arbitrarily chosen system of partial differential equations without any physical motivation; however, I feel that the derivation sheds some insight and intuition on these equations, and is also worth knowing on purely intellectual grounds regardless of its mathematical consequences. I also find it instructive to actually see the journey from Newton’s law

\displaystyle F = ma

to the seemingly rather different-looking law

\displaystyle \partial_t u + (u \cdot \nabla) u = -\nabla p + \nu \Delta u

\displaystyle \nabla \cdot u = 0

for incompressible Navier-Stokes (or, if one drops the viscosity term {\nu \Delta u}, the Euler equations).

Our discussion in this set of notes is physical rather than mathematical, and so we will not be working at mathematical levels of rigour and precision. In particular we will be fairly casual about interchanging summations, limits, and integrals, we will manipulate approximate identities {X \approx Y} as if they were exact identities (e.g., by differentiating both sides of the approximate identity), and we will not attempt to verify any regularity or convergence hypotheses in the expressions being manipulated. (The same holds for the exercises in this text, which also do not need to be justified at mathematical levels of rigour.) Of course, once we resume the mathematical portion of this course in subsequent notes, such issues will be an important focus of careful attention. This is a basic division of labour in mathematical modeling: non-rigorous heuristic reasoning is used to derive a mathematical model from physical (or other “real-life”) principles, but once a precise model is obtained, the analysis of that model should be completely rigorous if at all possible (even if this requires applying the model to regimes which do not correspond to the original physical motivation of that model). See the discussion by John Ball quoted at the end of these slides of Gero Friesecke for an expansion of these points.

Note: our treatment here will differ slightly from that presented in many fluid mechanics texts, in that it will emphasise first-principles derivations from many-particle systems, rather than relying on bulk laws of physics, such as the laws of thermodynamics, which we will not cover here. (However, the derivations from bulk laws tend to be more robust, in that they are not as reliant on assumptions about the particular interactions between particles. In particular, the physical hypotheses we assume in this post are probably quite a bit stronger than the minimal assumptions needed to justify the Euler or Navier-Stokes equations, which can hold even in situations in which one or more of the hypotheses assumed here break down.)

Read the rest of this entry »

Let {(X,T,\mu)} be a measure-preserving system – a probability space {(X,\mu)} equipped with a measure-preserving translation {T} (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points {x,y} in this space as being “close” if {y = T^n x} for some {n} that is not too large; this allows one to distinguish between “local” structure at a point {x} (in which one only looks at nearby points {T^n x} for moderately large {n}) and “global” structure (in which one looks at the entire space {X}). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function {f: X \rightarrow {\bf R}} which is locally essentially constant in the sense that {f(Tx) = f(x)} for {\mu}-almost every {x}, is necessarily globally essentially constant in the sense that there is a constant {c} such that {f(x) = c} for {\mu}-almost every {x}. A basic consequence of ergodicity is the mean ergodic theorem: if {f \in L^2(X,\mu)}, then the averages {x \mapsto \frac{1}{N} \sum_{n=1}^N f(T^n x)} converge in {L^2} norm to the mean {\int_X f\ d\mu}. (The mean ergodic theorem also applies to other {L^p} spaces with {1 < p < \infty}, though it is usually proven first in the Hilbert space {L^2}.) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that {\frac{1}{N} \sum_{n=1}^N \mu( E \cap T^n E)} converges to {\mu(E)^2} for any measurable set {E}.

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1 Let {(X,T,\mu)} be an ergodic measure-preserving system, and let {f: X \rightarrow {\bf R}} be measurable. Suppose that

\displaystyle \limsup_{N \rightarrow \infty} \frac{1}{N} \sum_{n=1}^N \mu( \{ x \in X: f(T^n x) = f(x) \} ) \geq \delta \ \ \ \ \ (1)


for some {0 \leq \delta \leq 1}. Then there exists a constant {c} such that {f(x)=c} for {x} in a set of measure at least {\delta}.

Informally: if {f} is locally constant on pairs {x, T^n x} at least {\delta} of the time, then {f} is globally constant at least {\delta} of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take {f} to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

Proof: By composing {f} with (say) the arctangent function, we may assume without loss of generality that {f} is bounded. Let {k>0}, and partition {X} as {\bigcup_{m \in {\bf Z}} E_{m,k}}, where {E_{m,k}} is the level set

\displaystyle E_{m,k} := \{ x \in X: m 2^{-k} \leq f(x) < (m+1) 2^{-k} \}.

For each {k}, only finitely many of the {E_{m,k}} are non-empty. By (1), one has

\displaystyle \limsup_{N \rightarrow \infty} \sum_m \frac{1}{N} \sum_{n=1}^N \mu( E_{m,k} \cap T^n E_{m,k} ) \geq \delta.

Using the ergodic theorem, we conclude that

\displaystyle \sum_m \mu( E_{m,k} )^2 \geq \delta.

On the other hand, {\sum_m \mu(E_{m,k}) = 1}. Thus there exists {m_k} such that {\mu(E_{m_k,k}) \geq \delta}, thus

\displaystyle \mu( \{ x \in X: m_k 2^{-k} \leq f(x) < (m_k+1) 2^{-k} \} ) \geq \delta.

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where {m_k 2^{-k}} converges to a limit {c}, then we have

\displaystyle \mu( \{ x \in X: c-2^{-k} \leq f(x) \leq c+2^{-k} \}) \geq \delta

for infinitely many {k}, and hence

\displaystyle \mu( \{ x \in X: f(x) = c \}) \geq \delta.

The claim follows. \Box

Let {\lambda: {\bf N} \rightarrow \{-1,1\}} be the Liouville function, thus {\lambda(n)} is defined to equal {+1} when {n} is the product of an even number of primes, and {-1} when {n} is the product of an odd number of primes. The Chowla conjecture asserts that {\lambda} has the statistics of a random sign pattern, in the sense that

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (1)

for all {k \geq 1} and all distinct natural numbers {h_1,\dots,h_k}, where we use the averaging notation

\displaystyle  \mathbb{E}_{n \leq N} f(n) := \frac{1}{N} \sum_{n \leq N} f(n).

For {k=1}, this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any {k \geq 2}.

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} \lambda(n+h_1) \dots \lambda(n+h_k) = 0 \ \ \ \ \ (2)

of the conjecture, where we use the logarithmic averaging notation

\displaystyle  \mathbb{E}_{n \leq N}^{\log} f(n) := \frac{\sum_{n \leq N} \frac{f(n)}{n}}{\sum_{n \leq N} \frac{1}{n}}.

Using the summation by parts (or telescoping series) identity

\displaystyle  \sum_{n \leq N} \frac{f(n)}{n} = \sum_{M < N} \frac{1}{M(M+1)} (\sum_{n \leq M} f(n)) + \frac{1}{N} \sum_{n \leq N} f(n) \ \ \ \ \ (3)

it is not difficult to show that the Chowla conjecture (1) for a given {k,h_1,\dots,h_k} implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for {k=1}, we have already mentioned that the Chowla conjecture

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N} \lambda(n) = 0

is equivalent to the prime number theorem; but the logarithmically averaged analogue

\displaystyle  \lim_{N \rightarrow \infty} \mathbb{E}^{\log}_{n \leq N} \lambda(n) = 0

is significantly easier to show (a proof with the Liouville function {\lambda} replaced by the closely related Möbius function {\mu} is given in this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for {k=2}, and in this recent paper with Joni Teravainen, we proved the conjecture for all odd {k} (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1 Assume that the logarithmically averaged Chowla conjecture (2) is true for all {k}. Then there exists a sequence {N_i} going to infinity such that the Chowla conjecture (1) is true for all {k} along that sequence, that is to say

\displaystyle  \lim_{N_i \rightarrow \infty} \mathbb{E}_{n \leq N_i} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for all {k} and all distinct {h_1,\dots,h_k}.

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2 Let {k} be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for {2k}. Then there exists a set {{\mathcal N}} of natural numbers of logarithmic density {1} (that is, {\lim_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}} = 1}) such that

\displaystyle  \lim_{N \rightarrow \infty: N \in {\mathcal N}} \mathbb{E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

for any distinct {h_1,\dots,h_k}.

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture ({k=2} and odd {k}) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct {h_1,\dots,h_k}, we take a large number {H} and consider the limiting the second moment

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2.

We can expand this as

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{m,m' \leq H} \mathop{\bf E}_{n \leq N}^{\log} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)

\displaystyle \lambda(n+m'+h_1) \dots \lambda(n+m'+h_k).

If all the {m+h_1,\dots,m+h_k,m'+h_1,\dots,m'+h_k} are distinct, the hypothesis (2) tells us that the inner averages goes to zero as {N \rightarrow \infty}. The remaining averages are {O(1)}, and there are {O( k^2 )} of these averages. We conclude that

\displaystyle  \limsup_{N \rightarrow \infty} \mathop{\bf E}_{n \leq N}^{\log} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k^2 / H.

By Markov’s inequality (and (3)), we conclude that for any fixed {h_1,\dots,h_k, H}, there exists a set {{\mathcal N}_{h_1,\dots,h_k,H}} of upper logarithmic density at least {1-k/H^{1/2}}, thus

\displaystyle  \limsup_{N \rightarrow \infty} \mathbb{E}_{n \leq N}^{\log} 1_{n \in {\mathcal N}_{h_1,\dots,h_k,H}} \geq 1 - k/H^{1/2}

such that

\displaystyle  \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.

By deleting at most finitely many elements, we may assume that {{\mathcal N}_{h_1,\dots,h_k,H}} consists only of elements of size at least {H^2} (say).

For any {H_0}, if we let {{\mathcal N}_{h_1,\dots,h_k, \geq H_0}} be the union of {{\mathcal N}_{h_1,\dots,h_k, H}} for {H \geq H_0}, then {{\mathcal N}_{h_1,\dots,h_k, \geq H_0}} has logarithmic density {1}. By a diagonalisation argument (using the fact that the set of tuples {(h_1,\dots,h_k)} is countable), we can then find a set {{\mathcal N}} of natural numbers of logarithmic density {1}, such that for every {h_1,\dots,h_k,H_0}, every sufficiently large element of {{\mathcal N}} lies in {{\mathcal N}_{h_1,\dots,h_k,\geq H_0}}. Thus for every sufficiently large {N} in {{\mathcal N}}, one has

\displaystyle  \mathop{\bf E}_{n \leq N} |\mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k)|^2 \ll k / H^{1/2}.

for some {H \geq H_0} with {N \geq H^2}. By Cauchy-Schwarz, this implies that

\displaystyle  \mathop{\bf E}_{n \leq N} \mathop{\bf E}_{m \leq H} \lambda(n+m+h_1) \dots \lambda(n+m+h_k) \ll k^{1/2} / H^{1/4};

interchanging the sums and using {N \geq H^2} and {H \geq H_0}, this implies that

\displaystyle  \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) \ll k^{1/2} / H^{1/4} \leq k^{1/2} / H_0^{1/4}.

We conclude on taking {H_0} to infinity that

\displaystyle  \lim_{N \rightarrow \infty; N \in {\mathcal N}} \mathop{\bf E}_{n \leq N} \lambda(n+h_1) \dots \lambda(n+h_k) = 0

as required.

Let {P(z) = z^n + a_{n-1} z^{n-1} + \dots + a_0} be a monic polynomial of degree {n} with complex coefficients. Then by the fundamental theorem of algebra, we can factor {P} as

\displaystyle  P(z) = (z-z_1) \dots (z-z_n) \ \ \ \ \ (1)

for some complex zeroes {z_1,\dots,z_n} (possibly with repetition).

Now suppose we evolve {P} with respect to time by heat flow, creating a function {P(t,z)} of two variables with given initial data {P(0,z) = P(z)} for which

\displaystyle  \partial_t P(t,z) = \partial_{zz} P(t,z). \ \ \ \ \ (2)

On the space of polynomials of degree at most {n}, the operator {\partial_{zz}} is nilpotent, and one can solve this equation explicitly both forwards and backwards in time by the Taylor series

\displaystyle  P(t,z) = \sum_{j=0}^\infty \frac{t^j}{j!} \partial_z^{2j} P(0,z).

For instance, if one starts with a quadratic {P(0,z) = z^2 + bz + c}, then the polynomial evolves by the formula

\displaystyle  P(t,z) = z^2 + bz + (c+2t).

As the polynomial {P(t)} evolves in time, the zeroes {z_1(t),\dots,z_n(t)} evolve also. Assuming for sake of discussion that the zeroes are simple, the inverse function theorem tells us that the zeroes will (locally, at least) evolve smoothly in time. What are the dynamics of this evolution?

For instance, in the quadratic case, the quadratic formula tells us that the zeroes are

\displaystyle  z_1(t) = \frac{-b + \sqrt{b^2 - 4(c+2t)}}{2}


\displaystyle  z_2(t) = \frac{-b - \sqrt{b^2 - 4(c+2t)}}{2}

after arbitrarily choosing a branch of the square root. If {b,c} are real and the discriminant {b^2 - 4c} is initially positive, we see that we start with two real zeroes centred around {-b/2}, which then approach each other until time {t = \frac{b^2-4c}{8}}, at which point the roots collide and then move off from each other in an imaginary direction.

In the general case, we can obtain the equations of motion by implicitly differentiating the defining equation

\displaystyle  P( t, z_i(t) ) = 0

in time using (2) to obtain

\displaystyle  \partial_{zz} P( t, z_i(t) ) + \partial_t z_i(t) \partial_z P(t,z_i(t)) = 0.

To simplify notation we drop the explicit dependence on time, thus

\displaystyle  \partial_{zz} P(z_i) + (\partial_t z_i) \partial_z P(z_i)= 0.

From (1) and the product rule, we see that

\displaystyle  \partial_z P( z_i ) = \prod_{j:j \neq i} (z_i - z_j)


\displaystyle  \partial_{zz} P( z_i ) = 2 \sum_{k:k \neq i} \prod_{j:j \neq i,k} (z_i - z_j)

(where all indices are understood to range over {1,\dots,n}) leading to the equations of motion

\displaystyle  \partial_t z_i = \sum_{k:k \neq i} \frac{2}{z_k - z_i}, \ \ \ \ \ (3)

at least when one avoids those times in which there is a repeated zero. In the case when the zeroes {z_i} are real, each term {\frac{2}{z_k-z_i}} represents a (first-order) attraction in the dynamics between {z_i} and {z_k}, but the dynamics are more complicated for complex zeroes (e.g. purely imaginary zeroes will experience repulsion rather than attraction, as one already sees in the quadratic example). Curiously, this system resembles that of Dyson brownian motion (except with the brownian motion part removed, and time reversed). I learned of the connection between the ODE (3) and the heat equation from this paper of Csordas, Smith, and Varga, but perhaps it has been mentioned in earlier literature as well.

One interesting consequence of these equations is that if the zeroes are real at some time, then they will stay real as long as the zeroes do not collide. Let us now restrict attention to the case of real simple zeroes, in which case we will rename the zeroes as {x_i} instead of {z_i}, and order them as {x_1 < \dots < x_n}. The evolution

\displaystyle  \partial_t x_i = \sum_{k:k \neq i} \frac{2}{x_k - x_i}

can now be thought of as reverse gradient flow for the “entropy”

\displaystyle  H := -\sum_{i,j: i \neq j} \log |x_i - x_j|,

(which is also essentially the logarithm of the discriminant of the polynomial) since we have

\displaystyle  \partial_t x_i = \frac{\partial H}{\partial x_i}.

In particular, we have the monotonicity formula

\displaystyle  \partial_t H = 4E

where {E} is the “energy”

\displaystyle  E := \frac{1}{4} \sum_i (\frac{\partial H}{\partial x_i})^2

\displaystyle  = \sum_i (\sum_{k:k \neq i} \frac{1}{x_k-x_i})^2

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2} + 2 \sum_{i,j,k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_j-x_i)}

\displaystyle  = \sum_{i,k: i \neq k} \frac{1}{(x_k-x_i)^2}

where in the last line we use the antisymmetrisation identity

\displaystyle  \frac{1}{(x_k-x_i)(x_j-x_i)} + \frac{1}{(x_i-x_j)(x_k-x_j)} + \frac{1}{(x_j-x_k)(x_i-x_k)} = 0.

Among other things, this shows that as one goes backwards in time, the entropy decreases, and so no collisions can occur to the past, only in the future, which is of course consistent with the attractive nature of the dynamics. As {H} is a convex function of the positions {x_1,\dots,x_n}, one expects {H} to also evolve in a convex manner in time, that is to say the energy {E} should be increasing. This is indeed the case:

Exercise 1 Show that

\displaystyle  \partial_t E = 2 \sum_{i,j: i \neq j} (\frac{2}{(x_i-x_j)^2} - \sum_{k: i,j,k \hbox{ distinct}} \frac{1}{(x_k-x_i)(x_k-x_j)})^2.

Symmetric polynomials of the zeroes are polynomial functions of the coefficients and should thus evolve in a polynomial fashion. One can compute this explicitly in simple cases. For instance, the center of mass is an invariant:

\displaystyle  \partial_t \frac{1}{n} \sum_i x_i = 0.

The variance decreases linearly:

Exercise 2 Establish the virial identity

\displaystyle  \partial_t \sum_{i,j} (x_i-x_j)^2 = - 4n^2(n-1).

As the variance (which is proportional to {\sum_{i,j} (x_i-x_j)^2}) cannot become negative, this identity shows that “finite time blowup” must occur – that the zeroes must collide at or before the time {\frac{1}{4n^2(n-1)} \sum_{i,j} (x_i-x_j)^2}.

Exercise 3 Show that the Stieltjes transform

\displaystyle  s(t,z) = \sum_i \frac{1}{x_i - z}

solves the viscous Burgers equation

\displaystyle  \partial_t s = \partial_{zz} s - 2 s \partial_z s,

either by using the original heat equation (2) and the identity {s = - \partial_z P / P}, or else by using the equations of motion (3). This relation between the Burgers equation and the heat equation is known as the Cole-Hopf transformation.

The paper of Csordas, Smith, and Varga mentioned previously gives some other bounds on the lifespan of the dynamics; roughly speaking, they show that if there is one pair of zeroes that are much closer to each other than to the other zeroes then they must collide in a short amount of time (unless there is a collision occuring even earlier at some other location). Their argument extends also to situations where there are an infinite number of zeroes, which they apply to get new results on Newman’s conjecture in analytic number theory. I would be curious to know of further places in the literature where this dynamics has been studied.