You are currently browsing the category archive for the ‘math.DS’ category.

In the modern theory of higher order Fourier analysis, a key role are played by the Gowers uniformity norms for . For finitely supported functions , one can define the (non-normalised) Gowers norm by the formula

where denotes complex conjugation, and then on any discrete interval and any function we can then define the (normalised) Gowers norm where is the extension of by zero to all of . Thus for instance (which technically makes a seminorm rather than a norm), and one can calculate where , and we use the averaging notation .The significance of the Gowers norms is that they control other multilinear forms that show up in additive combinatorics. Given any polynomials and functions , we define the multilinear form

(assuming that the denominator is finite and non-zero). Thus for instance where we view as formal (indeterminate) variables, and are understood to be extended by zero to all of . These forms are used to count patterns in various sets; for instance, the quantity is closely related to the number of length three arithmetic progressions contained in . Let us informally say that a form is*controlled*by the norm if the form is small whenever are -bounded functions with at least one of the small in norm. This definition was made more precise by Gowers and Wolf, who then defined the

*true complexity*of a form to be the least such that is controlled by the norm. For instance,

- and have true complexity ;
- has true complexity ;
- has true complexity ;
- The form (which among other things could be used to count twin primes) has infinite true complexity (which is quite unfortunate for applications).

Gowers and Wolf formulated a conjecture on what this complexity should be, at least for linear polynomials ; Ben Green and I thought we had resolved this conjecture back in 2010, though it turned out there was a subtle gap in our arguments and we were only able to resolve the conjecture in a partial range of cases. However, the full conjecture was recently resolved by Daniel Altman.

The (semi-)norm is so weak that it barely controls any averages at all. For instance the average

is not controlled by the semi-norm: it is perfectly possible for a -bounded function to even have vanishing norm but have large value of (consider for instance the parity function ).Because of this, I propose inserting an additional norm in the Gowers uniformity norm hierarchy between the and norms, which I will call the (or “profinite “) norm:

where ranges over all arithmetic progressions in . This can easily be seen to be a norm on functions that controls the norm. It is also basically controlled by the norm for -bounded functions ; indeed, if is an arithmetic progression in of some spacing , then we can write as the intersection of an interval with a residue class modulo , and from Fourier expansion we have If we let be a standard bump function supported on with total mass and is a parameter then (extending by zero outside of ), as can be seen by using the triangle inequality and the estimate After some Fourier expansion of we now have Writing as a linear combination of and using the Gowers–Cauchy–Schwarz inequality, we conclude hence on optimising in we have Forms which are controlled by the norm (but not ) would then have their true complexity adjusted to with this insertion.The norm recently appeared implicitly in work of Peluse and Prendiville, who showed that the form had true complexity in this notation (with polynomially strong bounds). [Actually, strictly speaking this control was only shown for the third function ; for the first two functions one needs to localize the norm to intervals of length . But I will ignore this technical point to keep the exposition simple.] The weaker claim that has true complexity is substantially easier to prove (one can apply the circle method together with Gauss sum estimates).

The well known inverse theorem for the norm tells us that if a -bounded function has norm at least for some , then there is a Fourier phase such that

this follows easily from (1) and Plancherel’s theorem. Conversely, from the Gowers–Cauchy–Schwarz inequality one hasFor one has a trivial inverse theorem; by definition, the norm of is at least if and only if

Thus the frequency appearing in the inverse theorem can be taken to be zero when working instead with the norm.For one has the intermediate situation in which the frequency is not taken to be zero, but is instead major arc. Indeed, suppose that is -bounded with , thus

for some progression . This forces the spacing of this progression to be . We write the above inequality as for some residue class and some interval . By Fourier expansion and the triangle inequality we then have for some integer . Convolving by for a small multiple of and a Schwartz function of unit mass with Fourier transform supported on , we have The Fourier transform of is bounded by and supported on , thus by Fourier expansion and the triangle inequality we have for some , so in particular . Thus we have for some of the major arc form with . Conversely, for of this form, some routine summation by parts gives the bound so if (2) holds for a -bounded then one must have .Here is a diagram showing some of the control relationships between various Gowers norms, multilinear forms, and duals of classes of functions (where each class of functions induces a dual norm :

Here I have included the three classes of functions that one can choose from for the inverse theorem, namely degree two nilsequences, bracket quadratic phases, and local quadratic phases, as well as the more narrow class of globally quadratic phases.

The Gowers norms have counterparts for measure-preserving systems , known as *Host-Kra seminorms*. The norm can be defined for as

*invariant factor*(generated by the (almost everywhere) invariant measurable subsets of ) in the sense that a function has vanishing seminorm if and only if it is orthogonal to all -measurable (bounded) functions. Similarly, the norm is orthogonal to the

*Kronecker factor*, generated by the eigenfunctions of (that is to say, those obeying an identity for some -invariant ); for ergodic systems, it is the largest factor isomorphic to rotation on a compact abelian group. In analogy to the Gowers norm, one can then define the Host-Kra seminorm by it is orthogonal to the

*profinite factor*, generated by the periodic sets of (or equivalently, by those eigenfunctions whose eigenvalue is a root of unity); for ergodic systems, it is the largest factor isomorphic to rotation on a profinite abelian group.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Mackey-Zimmer theorem“. This paper is part of our longer term project to develop “uncountable” versions of various theorems in ergodic theory; see this previous paper of Asgar and myself for the first paper in this series (and another paper will appear shortly).

In this case the theorem in question is the Mackey-Zimmer theorem, previously discussed in this blog post. This theorem gives an important classification of group and homogeneous extensions of measure-preserving systems. Let us first work in the (classical) setting of concrete measure-preserving systems. Let be a measure-preserving system for some group , thus is a (concrete) probability space and is a group homomorphism from to the automorphism group of the probability space. (Here we are abusing notation by using to refer both to the measure-preserving system and to the underlying set. In the notation of the paper we would instead distinguish these two objects as and respectively, reflecting two of the (many) categories one might wish to view as a member of, but for sake of this informal overview we will not maintain such precise distinctions.) If is a compact group, we define a *(concrete) cocycle* to be a collection of measurable functions for that obey the *cocycle equation*

*group skew-product*of , which is another measure-preserving system where

- is the Cartesian product of and ;
- is the product measure of and Haar probability measure on ; and
- The action is given by the formula

*homogeneous skew-product*in which the group is replaced by the homogeneous space for some closed subgroup of , noting that still comes with a left-action of and a Haar measure. Group skew-products are very “explicit” ways to extend a system , as everything is described by the cocycle which is a relatively tractable object to manipulate. (This is not to say that the cohomology of measure-preserving systems is trivial, but at least there are many tools one can use to study them, such as the Moore-Schmidt theorem discussed in this previous post.)

This group skew-product comes with a factor map and a coordinate map , which by (2) are related to the action via the identities

and where in (4) we are implicitly working in the group of (concretely) measurable functions from to . Furthermore, the combined map is measure-preserving (using the product measure on ), indeed the way we have constructed things this map is just the identity map.
We can now generalize the notion of group skew-product by just working with the maps , and weakening the requirement that be measure-preserving. Namely, define a *group extension* of by to be a measure-preserving system equipped with a measure-preserving map obeying (3) and a measurable map obeying (4) for some cocycle , such that the -algebra of is generated by . There is also a more general notion of a *homogeneous extension* in which takes values in rather than . Then every group skew-product is a group extension of by , but not conversely. Here are some key counterexamples:

- (i) If is a closed subgroup of , and is a cocycle taking values in , then can be viewed as a group extension of by , taking to be the vertical coordinate (viewing now as an element of ). This will not be a skew-product by because pushes forward to the wrong measure on : it pushes forward to rather than .
- (ii) If one takes the same example as (i), but twists the vertical coordinate to another vertical coordinate for some measurable “gauge function” , then is still a group extension by , but now with the cocycle replaced by the cohomologous cocycle Again, this will not be a skew product by , because pushes forward to a twisted version of that is supported (at least in the case where is compact and the cocycle is continuous) on the -bundle .
- (iii) With the situation as in (i), take to be the union for some outside of , where we continue to use the action (2) and the standard vertical coordinate but now use the measure .

As it turns out, group extensions and homogeneous extensions arise naturally in the Furstenberg-Zimmer structural theory of measure-preserving systems; roughly speaking, every compact extension of is an inverse limit of group extensions. It is then of interest to classify such extensions.

Examples such as (iii) are annoying, but they can be excluded by imposing the additional condition that the system is ergodic – all invariant (or essentially invariant) sets are of measure zero or measure one. (An essentially invariant set is a measurable subset of such that is equal modulo null sets to for all .) For instance, the system in (iii) is non-ergodic because the set (or ) is invariant but has measure . We then have the following fundamental result of Mackey and Zimmer:

Theorem 1 (Countable Mackey Zimmer theorem)Let be a group, be a concrete measure-preserving system, and be a compact Hausdorff group. Assume that is at most countable, is a standard Borel space, and is metrizable. Then every (concrete) ergodic group extension of is abstractly isomorphic to a group skew-product (by some closed subgroup of ), and every (concrete) ergodic homogeneous extension of is similarly abstractly isomorphic to a homogeneous skew-product.

We will not define precisely what “abstractly isomorphic” means here, but it roughly speaking means “isomorphic after quotienting out the null sets”. A proof of this theorem can be found for instance in .

The main result of this paper is to remove the “countability” hypotheses from the above theorem, at the cost of working with opposite probability algebra systems rather than concrete systems. (We will discuss opposite probability algebras in a subsequent blog post relating to another paper in this series.)

Theorem 2 (Uncountable Mackey Zimmer theorem)Let be a group, be an opposite probability algebra measure-preserving system, and be a compact Hausdorff group. Then every (abstract) ergodic group extension of is abstractly isomorphic to a group skew-product (by some closed subgroup of ), and every (abstract) ergodic homogeneous extension of is similarly abstractly isomorphic to a homogeneous skew-product.

We plan to use this result in future work to obtain uncountable versions of the Furstenberg-Zimmer and Host-Kra structure theorems.

As one might expect, one locates a proof of Theorem 2 by finding a proof of Theorem 1 that does not rely too strongly on “countable” tools, such as disintegration or measurable selection, so that all of those tools can be replaced by “uncountable” counterparts. The proof we use is based on the one given in this previous post, and begins by comparing the system with the group extension . As the examples (i), (ii) show, these two systems need not be isomorphic even in the ergodic case, due to the different probability measures employed. However one can relate the two after performing an additional averaging in . More precisely, there is a canonical factor map given by the formula

This is a factor map not only of -systems, but actually of -systems, where the opposite group to acts (on the left) by right-multiplication of the second coordinate (this reversal of order is why we need to use the opposite group here). The key point is that the ergodicity properties of the system are closely tied the group that is “secretly” controlling the group extension. Indeed, in example (i), the invariant functions on take the form for some measurable , while in example (ii), the invariant functions on take the form . In either case, the invariant factor is isomorphic to , and can be viewed as a factor of the invariant factor of , which is isomorphic to . Pursuing this reasoning (using an abstract ergodic theorem of Alaoglu and Birkhoff, as discussed in the previous post) one obtains the*Mackey range*, and also obtains the quotient of to in this process. The main remaining task is to lift the quotient back up to a map that stays measurable, in order to “untwist” a system that looks like (ii) to make it into one that looks like (i). In countable settings this is where a “measurable selection theorem” would ordinarily be invoked, but in the uncountable setting such theorems are not available for concrete maps. However it turns out that they still remain available for abstract maps: any abstractly measurable map from to has an abstractly measurable lift from to . To prove this we first use a canonical model for opposite probability algebras (which we will discuss in a companion post to this one, to appear shortly) to work with continuous maps (on a Stone space) rather than abstractly measurable maps. The measurable map then induces a probability measure on , formed by pushing forward by the graphing map . This measure in turn has several lifts up to a probability measure on ; for instance, one can construct such a measure via the Riesz representation theorem by demanding for all continuous functions . This measure does not come from a graph of any single lift , but is in some sense an “average” of the entire ensemble of these lifts. But it turns out one can invoke the Krein-Milman theorem to pass to an extremal lifting measure which

*does*come from an (abstract) lift , and this can be used as a substitute for a measurable selection theorem. A variant of this Krein-Milman argument can also be used to express any homogeneous extension as a quotient of a group extension, giving the second part of the Mackey-Zimmer theorem.

Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

Theorem 1 (Birkhoff ergodic theorem)Let be a measure-preserving system (by which we mean is a -finite measure space, and is invertible and measure-preserving), and let for any . Then the averages converge pointwise for -almost every .

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint , where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

Theorem 2 (Pointwise ergodic theorem for polynomial averages)Let be a measure-preserving system, and let for any . Let be a polynomial with integer coefficients. Then the averages converge pointwise for -almost every .

For bilinear averages, we have a separate 1990 result of Bourgain (for functions), extended to other spaces by Lacey, and with an alternate proof given, by Demeter:

Theorem 3 (Pointwise ergodic theorem for two linear polynomials)Let be a measure-preserving system with finite measure, and let , for some with . Then for any integers , the averages converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial)Let be a measure-preserving system, and let , for some with . Then for any polynomial of degree , the averages converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with , but we will not discuss these extensions here. A good model case to keep in mind is when and (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which with counting measure, and ). A model problem is to establish the maximal inequality

where ranges over powers of two and is the bilinear operator The single scale estimate or equivalently (by duality) is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales .The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when , , where is a small modulus, are such that , is a smooth cutoff to an interval of length , and is also supported on and behaves like a constant on intervals of length . Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when are supported on a common interval of length and are normalised in rather than . (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the factors; the corresponding statement for was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the -normalized version of (2)).

For our applications we had to extend the inverse theory of Peluse and Prendiville to an theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

can be well approximated in by a function that has Fourier support on “major arcs” if enjoy control. To get the required extension to in the aspect one has to improve the control on the error from to ; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recent improving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as , one can relax the hypothesis on to an hypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function ; a modification of the arguments also gives something similar for .Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when are supported near rational numbers with for some moderately large . The inverse theory gives good control (with an exponential decay in ) on individual scales , and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of to eventually handle all “small” scales, with ranging up to say where for some small constant and large constant . For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator , and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers to the locally compact abelian group . Actually it was conceptually clearer for us to work instead with the adelic integers , which is the inverse limit of the . Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

on , and the “arithmetic” bilinear operator on the profinite integers , equipped with probability Haar measure . After a number of standard manipulations (interpolation, Fubini’s theorem, Hölder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing an improving estimate for some . Splitting the profinite integers into the product of the -adic integers , it suffices to establish this claim for each separately (so long as we keep the implied constant equal to for sufficiently large ). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmetic improving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the -adic field , which are a variant of some estimates of Kowalski and Wright.Just a short post to note that this year’s Abel prize has been awarded jointly to Hillel Furstenberg and Grigory Margulis for “for pioneering the use of methods from probability and dynamics in group theory, number theory and combinatorics”. I was not involved in the decision making process of the Abel committee this year, but I certainly feel that the contributions of both mathematicians are worthy of the prize. Certainly both mathematicians have influenced my own work (for instance, Furstenberg’s proof of Szemeredi’s theorem ended up being a key influence in my result with Ben Green that the primes contain arbitrarily long arithmetic progressions); see for instance these blog posts mentioning Furstenberg, and these blog posts mentioning Margulis.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let be a probability space, and be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps that preserve the measure : . To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as *concrete measurable maps*, and measurable spaces as *concrete measurable spaces*. (One could also call a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)

Let be a discrete group. A *(concrete) measure-preserving action* of on is a group homomorphism from to , thus is the identity map and for all . A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when is the integers (with the additive group law).

Let be a compact Hausdorff abelian group, which we can endow with the Borel -algebra . A *(concrete measurable) –cocycle* is a collection of concrete measurable maps obeying the *cocycle equation*

for -almost every . (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action on (which we endow with the product of with Haar probability measure on ), defined by

This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.

A special case of a -valued cocycle is a *(concrete measurable) -valued coboundary*, in which for each takes the special form

for -almost every , where is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable -valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which -valued cocycles are in fact -valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when is the unit circle , by taking advantage of the Pontryagin dual group of characters , that is to say the collection of continuous homomorphisms to the unit circle. More precisely, we have

Theorem 1 (Countable Moore-Schmidt theorem)Let be a discrete group acting in a concrete measure-preserving fashion on a probability space . Let be a compact Hausdorff abelian group. Assume the following additional hypotheses:

- (i) is at most countable.
- (ii) is a standard Borel space.
- (iii) is metrisable.
Then a -valued concrete measurable cocycle is a concrete coboundary if and only if for each character , the -valued cocycles are concrete coboundaries.

The hypotheses (i), (ii), (iii) are saying in some sense that the data are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to being second countable, and (ii) is equivalent to being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.

Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each is a coboundary, then so is . By hypothesis, we then have an equation of the form

for all and some functions , and our task is then to produce a function for which

for all .

Comparing the two equations, the task would be easy if we could find an for which

for all . However there is an obstruction to this: the left-hand side of (3) is additive in , so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity

for all . On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between and the homomorphisms of the (discrete) group to .

Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in , so the right-hand side must be also. Manipulating this fact, we eventually arrive at

In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is -invariant. Now let us assume for sake of argument that the action of is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only -invariant functions are constant. So now we get a weaker version of (4), namely

for some constants .

Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let denote the space of concrete measurable maps from to , up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction , that is to say a group homomorphism that is the identity on the subgroup . We can use this retraction, or more precisely the complement , to eliminate the constant in (5). Indeed, if we set

then from (5) we see that

while from (2) one has

and now the previous strategy works with replaced by . This concludes the sketch of proof of Theorem 1.

In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).

My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.

If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation , while still continuous, can fail to be measurable as a map from to ! Thus for instance the sum of two measurable functions need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the *Nedoma pathology*. A standard example arises when is the uncountable torus , endowed with the product topology. Crucially, the Borel -algebra generated by this uncountable product is *not* the product of the factor Borel -algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but -algebras do not); relating to this, the product -algebra is *not* the same as the Borel -algebra , but is instead a strict sub-algebra. If the group operations on were measurable, then the diagonal set

would be measurable in . But it is an easy exercise in manipulation of -algebras to show that if are any two measurable spaces and is measurable in , then the fibres of are contained in some countably generated subalgebra of . Thus if were -measurable, then all the points of would lie in a single countably generated -algebra. But the cardinality of such an algebra is at most while the cardinality of is , and Cantor’s theorem then gives a contradiction.

To resolve this problem, we give a coarser -algebra than the Borel -algebra, namely the *Baire -algebra* , thus coarsening the measurable space structure on to a new measurable space . In the case of compact Hausdorff abelian groups, can be defined as the -algebra generated by the characters ; for more general compact abelian groups, one can define as the -algebra generated by all continuous maps into metric spaces. This -algebra is equal to when is metrisable but can be smaller for other . With this measurable structure, becomes a measurable group; it seems that once one leaves the metrisable world that is a superior (or at least equally good) space to work with than for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if is the Haar probability measure on , then (thus, every -measurable set is equivalent modulo -null sets to a -measurable set), so there is no damage to Plancherel caused by passing to the Baire -algebra.

Passing to the Baire -algebra fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of *abstract measure theory*, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of *abstract measurable spaces*, which we formally define as the opposite category of the category of -algebras (with Boolean algebra homomorphisms). Thus, we define an *abstract measurable space* to be an object of the form , where is an (abstract) -algebra and is a formal placeholder symbol that signifies use of the opposite category, and an *abstract measurable map* is an object of the form , where is a Boolean algebra homomorphism and is again used as a formal placeholder; we call the *pullback map* associated to . [UPDATE: It turns out that this definition of a measurable map led to technical issues. In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be -complete (i.e., it respects countable joins).] The composition of two abstract measurable maps , is defined by the formula , or equivalently .

Every concrete measurable space can be identified with an abstract counterpart , and similarly every concrete measurable map can be identified with an abstract counterpart , where is the pullback map . Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the *opposite measure algebra* of , defined as where is the ideal of null sets in . Informally, is the space with all the null sets removed; there is a canonical abstract embedding map , which allows one to convert any concrete measurable map into an abstract one . One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing with the opposite measure algebra ; see the paper for details. Our main theorem is then

Theorem 2 (Uncountable Moore-Schmidt theorem)Let be a discrete group acting abstractly on a -finite measure space . Let be a compact Hausdorff abelian group. Then a -valued abstract measurable cocycle is an abstract coboundary if and only if for each character , the -valued cocycles are abstract coboundaries.

With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map for each obeying the identity for all , then there is an abstract measurable map such that for all . This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the -algebra of , and the Sikorski extension theorem.

We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:

- (i) Suppose one has an abstract measurable map into a concrete measurable space. Does there exist a representation of by a concrete measurable map ? Is it unique up to almost everywhere equivalence?
- (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?

For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):

- If does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If is not compact or Polish, there can be counterexamples to existence.
- If is a compact metric space or a Polish space, then one always has existence and uniqueness.
- If is a compact Hausdorff abelian group, one always has existence.
- If is a complete measure space, then one always has existence (from a theorem of Maharam).
- If is the unit interval with the Borel -algebra and Lebesgue measure, then one has existence for all compact Hausdorff assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
- For more general , existence for all compact Hausdorff is equivalent to the existence of a lifting from the -algebra to (or, in the language of abstract measurable spaces, the existence of an abstract retraction from to ).
- It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever is compact Hausdorff.

Our understanding of (ii) is much less complete:

- If is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
- If is at most countable and is a complete measure space, then the answer is again “always”.

In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let denote the positive integers (with the natural numbers), and let be the map defined by setting equal to when is odd and when is even. Let be the minimal element of the Collatz orbit . Then we have

Conjecture 1 (Collatz conjecture)One has for all .

Establishing the conjecture for all remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” in some sense. For instance, it is a result of Krasikov and Lagarias that

for all sufficiently large . In another direction, it was shown by Terras that for almost all (in the sense of natural density), one has . This was then improved by Allouche to for almost all and any fixed , and extended later by Korec to cover all . In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2Let be any function with . Then we have for almost all (in the sense of logarithmic density).

Thus for instance one has for almost all (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution for times that only get as large as a small multiple of ; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get all the way down to one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state .

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an *invariant measure* for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if one picks at random an integer from a large interval , then in most cases, the orbit of will eventually move into the interval . Similarly, if one picks an integer at random from , then in most cases, the orbit of will eventually move into . It is then tempting to concatenate the two statements and conclude that for most in , the orbit will eventually move . Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn reaches , the distribution of the final value is unlikely to be close to being uniformly distributed on , and in particular could potentially concentrate almost entirely in the exceptional set of that do not make it into . The point here is the uniform measure on is not transported by Collatz dynamics to anything resembling the uniform measure on .

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the *Syracuse map* , defined on the odd numbers by setting , where is the largest power of that divides . (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of at each iteration step, which makes the map better behaved when performing “-adic” analysis.)

When viewed -adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, is never divisible by . A little less obviously, is twice as likely to equal mod as it is to equal mod . This is because for a randomly chosen odd , the number of times that divides can be seen to have a geometric distribution of mean – it equals any given value with probability . Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of . For instance, one can compute that for large random odd , will take the residue classes with probabilities

respectively. More generally, for any , will be distributed according to the law of a random variable on that we call a *Syracuse random variable*, and can be described explicitly as

where are iid copies of a geometric random variable of mean .

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this -adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables to construct such a measure, but only if these random variables stabilise in the limit in a certain total variation sense. More precisely, in the paper we establish the estimate

for any and any . This type of stabilisation is plausible from entropy heuristics – the tuple of geometric random variables that generates has Shannon entropy , which is significantly larger than the total entropy of the uniform distribution on , so we expect a lot of “mixing” and “collision” to occur when converting the tuple to ; these heuristics can be supported by numerics (which I was able to work out up to about before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

are all distinct as vary over tuples in . Unfortunately, the process of reducing mod creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

are mostly distinct for “typical” (as drawn using the geometric distribution) as long as is a bit smaller than (basically because the rational number appearing in (3) then typically takes a form like with an integer between and ). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of of density less than for some large absolute constant ). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of , and more precisely in showing that

for any and any that is not divisible by .

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming is even for sake of discussion) as

where . The point here is that after conditioning on the to be fixed, the random variables remain independent (though the distribution of each depends on the value that we conditioned to), and so the above expression is a *conditional* sum of independent random variables. This lets one express the characeteristic function of (1) as an *averaged* Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

is not close to an integer for a moderately large number (, to be precise) of indices . (Actually, for technical reasons we have to also restrict to those for which , but let us ignore this detail here.) To put it another way, if we let denote the set of pairs for which

we have to show that (with overwhelming probability) the random walk

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of .

A little bit of elementary number theory and combinatorics allows one to describe the set as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of . The most difficult case is when the renewal process passes through a particularly large triangle in . However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of that one can finish the proof of (4), and thus Theorem 2.

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of correlations of multiplicative functions at almost all scales, with applications to the Chowla and Elliott conjectures“. This is a sequel to our previous paper that studied logarithmic correlations of the form

where were bounded multiplicative functions, were fixed shifts, was a quantity going off to infinity, and was a generalised limit functional. Our main technical result asserted that these correlations were necessarily the uniform limit of periodic functions . Furthermore, if (weakly) pretended to be a Dirichlet character , then the could be chosen to be –isotypic in the sense that whenever are integers with coprime to the periods of and ; otherwise, if did not weakly pretend to be any Dirichlet character , then vanished completely. This was then used to verify several cases of the logarithmically averaged Elliott and Chowla conjectures.

The purpose of this paper was to investigate the extent to which the methods could be extended to non-logarithmically averaged settings. For our main technical result, we now considered the unweighted averages

where is an additional parameter. Our main result was now as follows. If did not weakly pretend to be a twisted Dirichlet character , then converged to zero on (doubly logarithmic) average as . If instead did pretend to be such a twisted Dirichlet character, then converged on (doubly logarithmic) average to a limit of -isotypic functions . Thus, roughly speaking, one has the approximation

for most .

Informally, this says that at almost all scales (where “almost all” means “outside of a set of logarithmic density zero”), the non-logarithmic averages behave much like their logarithmic counterparts except for a possible additional twisting by an Archimedean character (which interacts with the Archimedean parameter in much the same way that the Dirichlet character interacts with the non-Archimedean parameter ). One consequence of this is that most of the recent results on the logarithmically averaged Chowla and Elliott conjectures can now be extended to their non-logarithmically averaged counterparts, so long as one excludes a set of exceptional scales of logarithmic density zero. For instance, the Chowla conjecture

is now established for either odd or equal to , so long as one excludes an exceptional set of scales.

In the logarithmically averaged setup, the main idea was to combine two very different pieces of information on . The first, coming from recent results in ergodic theory, was to show that was well approximated in some sense by a nilsequence. The second was to use the “entropy decrement argument” to obtain an approximate isotopy property of the form

for “most” primes and integers . Combining the two facts, one eventually finds that only the almost periodic components of the nilsequence are relevant.

In the current situation, each is approximated by a nilsequence, but the nilsequence can vary with (although there is some useful “Lipschitz continuity” of this nilsequence with respect to the parameter). Meanwhile, the entropy decrement argument gives an approximation basically of the form

for “most” . The arguments then proceed largely as in the logarithmically averaged case. A key lemma to handle the dependence on the new parameter is the following cohomological statement: if one has a map that was a quasimorphism in the sense that for all and some small , then there exists a real number such that for all small . This is achieved by applying a standard “cocycle averaging argument” to the cocycle .

It would of course be desirable to not have the set of exceptional scales. We only know of one (implausible) scenario in which we can do this, namely when one has far fewer (in particular, subexponentially many) sign patterns for (say) the Liouville function than predicted by the Chowla conjecture. In this scenario (roughly analogous to the “Siegel zero” scenario in multiplicative number theory), the entropy of the Liouville sign patterns is so small that the entropy decrement argument becomes powerful enough to control all scales rather than almost all scales. On the other hand, this scenario seems to be self-defeating, in that it allows one to establish a large number of cases of the Chowla conjecture, and the full Chowla conjecture is inconsistent with having unusually few sign patterns. Still it hints that future work in this direction may need to split into “low entropy” and “high entropy” cases, in analogy to how many arguments in multiplicative number theory have to split into the “Siegel zero” and “no Siegel zero” cases.

This coming fall quarter, I am teaching a class on topics in the mathematical theory of incompressible fluid equations, focusing particularly on the incompressible Euler and Navier-Stokes equations. These two equations are by no means the only equations used to model fluids, but I will focus on these two equations in this course to narrow the focus down to something manageable. I have not fully decided on the choice of topics to cover in this course, but I would probably begin with some core topics such as local well-posedness theory and blowup criteria, conservation laws, and construction of weak solutions, then move on to some topics such as boundary layers and the Prandtl equations, the Euler-Poincare-Arnold interpretation of the Euler equations as an infinite dimensional geodesic flow, and some discussion of the Onsager conjecture. I will probably also continue to more advanced and recent topics in the winter quarter.

In this initial set of notes, we begin by reviewing the physical derivation of the Euler and Navier-Stokes equations from the first principles of Newtonian mechanics, and specifically from Newton’s famous three laws of motion. Strictly speaking, this derivation is not needed for the mathematical analysis of these equations, which can be viewed if one wishes as an arbitrarily chosen system of partial differential equations without any physical motivation; however, I feel that the derivation sheds some insight and intuition on these equations, and is also worth knowing on purely intellectual grounds regardless of its mathematical consequences. I also find it instructive to actually see the journey from Newton’s law

to the seemingly rather different-looking law

for incompressible Navier-Stokes (or, if one drops the viscosity term , the Euler equations).

Our discussion in this set of notes is physical rather than mathematical, and so we will not be working at mathematical levels of rigour and precision. In particular we will be fairly casual about interchanging summations, limits, and integrals, we will manipulate approximate identities as if they were exact identities (e.g., by differentiating both sides of the approximate identity), and we will not attempt to verify any regularity or convergence hypotheses in the expressions being manipulated. (The same holds for the exercises in this text, which also do not need to be justified at mathematical levels of rigour.) Of course, once we resume the mathematical portion of this course in subsequent notes, such issues will be an important focus of careful attention. This is a basic division of labour in mathematical modeling: non-rigorous heuristic reasoning is used to derive a mathematical model from physical (or other “real-life”) principles, but once a precise model is obtained, the analysis of that model should be completely rigorous if at all possible (even if this requires applying the model to regimes which do not correspond to the original physical motivation of that model). See the discussion by John Ball quoted at the end of these slides of Gero Friesecke for an expansion of these points.

Note: our treatment here will differ slightly from that presented in many fluid mechanics texts, in that it will emphasise first-principles derivations from many-particle systems, rather than relying on bulk laws of physics, such as the laws of thermodynamics, which we will not cover here. (However, the derivations from bulk laws tend to be more robust, in that they are not as reliant on assumptions about the particular interactions between particles. In particular, the physical hypotheses we assume in this post are probably quite a bit stronger than the minimal assumptions needed to justify the Euler or Navier-Stokes equations, which can hold even in situations in which one or more of the hypotheses assumed here break down.)

Let be a measure-preserving system – a probability space equipped with a measure-preserving translation (which for simplicity of discussion we shall assume to be invertible). We will informally think of two points in this space as being “close” if for some that is not too large; this allows one to distinguish between “local” structure at a point (in which one only looks at nearby points for moderately large ) and “global” structure (in which one looks at the entire space ). The local/global distinction is also known as the time-averaged/space-averaged distinction in ergodic theory.

A measure-preserving system is said to be ergodic if all the invariant sets are either zero measure or full measure. An equivalent form of this statement is that any measurable function which is *locally essentially constant* in the sense that for -almost every , is necessarily *globally essentially constant* in the sense that there is a constant such that for -almost every . A basic consequence of ergodicity is the mean ergodic theorem: if , then the averages converge in norm to the mean . (The mean ergodic theorem also applies to other spaces with , though it is usually proven first in the Hilbert space .) Informally: in ergodic systems, time averages are asymptotically equal to space averages. Specialising to the case of indicator functions, this implies in particular that converges to for any measurable set .

In this short note I would like to use the mean ergodic theorem to show that ergodic systems also have the property that “somewhat locally constant” functions are necessarily “somewhat globally constant”; this is not a deep observation, and probably already in the literature, but I found it a cute statement that I had not previously seen. More precisely:

Corollary 1Let be an ergodic measure-preserving system, and let be measurable. Suppose that

for some . Then there exists a constant such that for in a set of measure at least .

Informally: if is locally constant on pairs at least of the time, then is globally constant at least of the time. Of course the claim fails if the ergodicity hypothesis is dropped, as one can simply take to be an invariant function that is not essentially constant, such as the indicator function of an invariant set of intermediate measure. This corollary can be viewed as a manifestation of the general principle that ergodic systems have the same “global” (or “space-averaged”) behaviour as “local” (or “time-averaged”) behaviour, in contrast to non-ergodic systems in which local properties do not automatically transfer over to their global counterparts.

*Proof:* By composing with (say) the arctangent function, we may assume without loss of generality that is bounded. Let , and partition as , where is the level set

For each , only finitely many of the are non-empty. By (1), one has

Using the ergodic theorem, we conclude that

On the other hand, . Thus there exists such that , thus

By the Bolzano-Weierstrass theorem, we may pass to a subsequence where converges to a limit , then we have

for infinitely many , and hence

The claim follows.

Let be the Liouville function, thus is defined to equal when is the product of an even number of primes, and when is the product of an odd number of primes. The Chowla conjecture asserts that has the statistics of a random sign pattern, in the sense that

for all and all distinct natural numbers , where we use the averaging notation

For , this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any .

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

of the conjecture, where we use the logarithmic averaging notation

Using the summation by parts (or telescoping series) identity

it is not difficult to show that the Chowla conjecture (1) for a given implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for , we have already mentioned that the Chowla conjecture

is equivalent to the prime number theorem; but the logarithmically averaged analogue

is significantly easier to show (a proof with the Liouville function replaced by the closely related Möbius function is given in this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for , and in this recent paper with Joni Teravainen, we proved the conjecture for all odd (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1Assume that the logarithmically averaged Chowla conjecture (2) is true for all . Then there exists a sequence going to infinity such that the Chowla conjecture (1) is true for all along that sequence, that is to sayfor all and all distinct .

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2Let be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for . Then there exists a set of natural numbers of logarithmic density (that is, ) such thatfor any distinct .

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture ( and odd ) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct , we take a large number and consider the limiting the second moment

We can expand this as

If all the are distinct, the hypothesis (2) tells us that the inner averages goes to zero as . The remaining averages are , and there are of these averages. We conclude that

By Markov’s inequality (and (3)), we conclude that for any fixed , there exists a set of upper logarithmic density at least , thus

such that

By deleting at most finitely many elements, we may assume that consists only of elements of size at least (say).

For any , if we let be the union of for , then has logarithmic density . By a diagonalisation argument (using the fact that the set of tuples is countable), we can then find a set of natural numbers of logarithmic density , such that for every , every sufficiently large element of lies in . Thus for every sufficiently large in , one has

for some with . By Cauchy-Schwarz, this implies that

interchanging the sums and using and , this implies that

We conclude on taking to infinity that

as required.

## Recent Comments