Just a brief post to record some notable papers in my fields of interest that appeared on the arXiv recently.

  • A sharp square function estimate for the cone in {\bf R}^3“, by Larry Guth, Hong Wang, and Ruixiang Zhang.  This paper establishes an optimal (up to epsilon losses) square function estimate for the three-dimensional light cone that was essentially conjectured by Mockenhaupt, Seeger, and Sogge, which has a number of other consequences including Sogge’s local smoothing conjecture for the wave equation in two spatial dimensions, which in turn implies the (already known) Bochner-Riesz, restriction, and Kakeya conjectures in two dimensions.   Interestingly, modern techniques such as polynomial partitioning and decoupling estimates are not used in this argument; instead, the authors mostly rely on an induction on scales argument and Kakeya type estimates.  Many previous authors (including myself) were able to get weaker estimates of this type by an induction on scales method, but there were always significant inefficiencies in doing so; in particular knowing the sharp square function estimate at smaller scales did not imply the sharp square function estimate at the given larger scale.  The authors here get around this issue by finding an even stronger estimate that implies the square function estimate, but behaves significantly better with respect to induction on scales.
  • On the Chowla and twin primes conjectures over {\mathbb F}_q[T]“, by Will Sawin and Mark Shusterman.  This paper resolves a number of well known open conjectures in analytic number theory, such as the Chowla conjecture and the twin prime conjecture (in the strong form conjectured by Hardy and Littlewood), in the case of function fields where the field is a prime power q=p^j which is fixed (in contrast to a number of existing results in the “large q” limit) but has a large exponent j.  The techniques here are orthogonal to those used in recent progress towards the Chowla conjecture over the integers (e.g., in this previous paper of mine); the starting point is an algebraic observation that in certain function fields, the Mobius function behaves like a quadratic Dirichlet character along certain arithmetic progressions.  In principle, this reduces problems such as Chowla’s conjecture to problems about estimating sums of Dirichlet characters, for which more is known; but the task is still far from trivial.
  • Bounds for sets with no polynomial progressions“, by Sarah Peluse.  This paper can be viewed as part of a larger project to obtain quantitative density Ramsey theorems of Szemeredi type.  For instance, Gowers famously established a relatively good quantitative bound for Szemeredi’s theorem that all dense subsets of integers contain arbitrarily long arithmetic progressions a, a+r, \dots, a+(k-1)r.  The corresponding question for polynomial progressions a+P_1(r), \dots, a+P_k(r) is considered more difficult for a number of reasons.  One of them is that dilation invariance is lost; a dilation of an arithmetic progression is again an arithmetic progression, but a dilation of a polynomial progression will in general not be a polynomial progression with the same polynomials P_1,\dots,P_k.  Another issue is that the ranges of the two parameters a,r are now at different scales.  Peluse gets around these difficulties in the case when all the polynomials P_1,\dots,P_k have distinct degrees, which is in some sense the opposite case to that considered by Gowers (in particular, she avoids the need to obtain quantitative inverse theorems for high order Gowers norms; which was recently obtained in this integer setting by Manners but with bounds that are probably not strong enough to for the bounds in Peluse’s results, due to a degree lowering argument that is available in this case).  To resolve the first difficulty one has to make all the estimates rather uniform in the coefficients of the polynomials P_j, so that one can still run a density increment argument efficiently.  To resolve the second difficulty one needs to find a quantitative concatenation theorem for Gowers uniformity norms.  Many of these ideas were developed in previous papers of Peluse and Peluse-Prendiville in simpler settings.
  • On blow up for the energy super critical defocusing non linear Schrödinger equations“, by Frank Merle, Pierre Raphael, Igor Rodnianski, and Jeremie Szeftel.  This paper (when combined with two companion papers) resolves a long-standing problem as to whether finite time blowup occurs for the defocusing supercritical nonlinear Schrödinger equation (at least in certain dimensions and nonlinearities).  I had a previous paper establishing a result like this if one “cheated” by replacing the nonlinear Schrodinger equation by a system of such equations, but remarkably they are able to tackle the original equation itself without any such cheating.  Given the very analogous situation with Navier-Stokes, where again one can create finite time blowup by “cheating” and modifying the equation, it does raise hope that finite time blowup for the incompressible Navier-Stokes and Euler equations can be established…  In fact the connection may not just be at the level of analogy; a surprising key ingredient in the proofs here is the observation that a certain blowup ansatz for the nonlinear Schrodinger equation is governed by solutions to the (compressible) Euler equation, and finite time blowup examples for the latter can be used to construct finite time blowup examples for the former.

Let {u: {\bf R}^3 \rightarrow {\bf R}^3} be a divergence-free vector field, thus {\nabla \cdot u = 0}, which we interpret as a velocity field. In this post we will proceed formally, largely ignoring the analytic issues of whether the fields in question have sufficient regularity and decay to justify the calculations. The vorticity field {\omega: {\bf R}^3 \rightarrow {\bf R}^3} is then defined as the curl of the velocity:

\displaystyle  \omega = \nabla \times u.

(From a differential geometry viewpoint, it would be more accurate (especially in other dimensions than three) to define the vorticity as the exterior derivative {\omega = d(g \cdot u)} of the musical isomorphism {g \cdot u} of the Euclidean metric {g} applied to the velocity field {u}; see these previous lecture notes. However, we will not need this geometric formalism in this post.)

Assuming suitable regularity and decay hypotheses of the velocity field {u}, it is possible to recover the velocity from the vorticity as follows. From the general vector identity {\nabla \times \nabla \times X = \nabla(\nabla \cdot X) - \Delta X} applied to the velocity field {u}, we see that

\displaystyle  \nabla \times \omega = -\Delta u

and thus (by the commutativity of all the differential operators involved)

\displaystyle  u = - \nabla \times \Delta^{-1} \omega.

Using the Newton potential formula

\displaystyle  -\Delta^{-1} \omega(x) := \frac{1}{4\pi} \int_{{\bf R}^3} \frac{\omega(y)}{|x-y|}\ dy

and formally differentiating under the integral sign, we obtain the Biot-Savart law

\displaystyle  u(x) = \frac{1}{4\pi} \int_{{\bf R}^3} \frac{\omega(y) \times (x-y)}{|x-y|^3}\ dy. \ \ \ \ \ (1)

This law is of fundamental importance in the study of incompressible fluid equations, such as the Euler equations

\displaystyle  \partial_t u + (u \cdot \nabla) u = -\nabla p; \quad \nabla \cdot u = 0

since on applying the curl operator one obtains the vorticity equation

\displaystyle  \partial_t \omega + (u \cdot \nabla) \omega = (\omega \cdot \nabla) u \ \ \ \ \ (2)

and then by substituting (1) one gets an autonomous equation for the vorticity field {\omega}. Unfortunately, this equation is non-local, due to the integration present in (1).

In a recent work, it was observed by Elgindi that in a certain regime, the Biot-Savart law can be approximated by a more “low rank” law, which makes the non-local effects significantly simpler in nature. This simplification was carried out in spherical coordinates, and hinged on a study of the invertibility properties of a certain second order linear differential operator in the latitude variable {\theta}; however in this post I would like to observe that the approximation can also be seen directly in Cartesian coordinates from the classical Biot-Savart law (1). As a consequence one can also initiate the beginning of Elgindi’s analysis in constructing somewhat regular solutions to the Euler equations that exhibit self-similar blowup in finite time, though I have not attempted to execute the entirety of the analysis in this setting.

Elgindi’s approximation applies under the following hypotheses:

  • (i) (Axial symmetry without swirl) The velocity field {u} is assumed to take the form

    \displaystyle  u(x_1,x_2,x_3) = ( u_r(r,x_3) \frac{x_1}{r}, u_r(r,x_3) \frac{x_2}{r}, u_3(r,x_3) ) \ \ \ \ \ (3)

    for some functions {u_r, u_3: [0,+\infty) \times {\bf R} \rightarrow {\bf R}} of the cylindrical radial variable {r := \sqrt{x_1^2+x_2^2}} and the vertical coordinate {x_3}. As a consequence, the vorticity field {\omega} takes the form

    \displaystyle  \omega(x_1,x_2,x_3) = (\omega_{r3}(r,x_3) \frac{x_2}{r}, \omega_{r3}(r,x_3) \frac{-x_1}{r}, 0) \ \ \ \ \ (4)

    where {\omega_{r3}: [0,+\infty) \times {\bf R} \rightarrow {\bf R}} is the field

    \displaystyle  \omega_{r3} = \partial_r u_3 - \partial_3 u_r.

  • (ii) (Odd symmetry) We assume that {u_3(r,-x_3) = -u_3(r,x_3)} and {u_r(r,-x_3)=u_r(r,x_3)}, so that {\omega_{r3}(r,-x_3)=\omega_{r3}(r,x_3)}.

A model example of a divergence-free vector field obeying these properties (but without good decay at infinity) is the linear vector field

\displaystyle  X(x) = (x_1, x_2, -2x_3) \ \ \ \ \ (5)

which is of the form (3) with {u_r(r,x_3) = r} and {u_3(r,x_3) = -2x_3}. The associated vorticity {\omega} vanishes.

We can now give an illustration of Elgindi’s approximation:

Proposition 1 (Elgindi’s approximation) Under the above hypotheses (and assuing suitable regularity and decay), we have the pointwise bounds

\displaystyle  u(x) = \frac{1}{2} {\mathcal L}_{12}(\omega)(|x|) X(x) + O( |x| \|\omega\|_{L^\infty({\bf R}^3)} )

for any {x \in {\bf R}^3}, where {X} is the vector field (5), and {{\mathcal L}_{12}(\omega): {\bf R}^+ \rightarrow {\bf R}} is the scalar function

\displaystyle  {\mathcal L}_{12}(\omega)(\rho) := \frac{3}{4\pi} \int_{|y| \geq \rho} \frac{r y_3}{|y|^5} \omega_{r3}(r,y_3)\ dy.

Thus under the hypotheses (i), (ii), and assuming that {\omega} is slowly varying, we expect {u} to behave like the linear vector field {X} modulated by a radial scalar function. In applications one needs to control the error in various function spaces instead of pointwise, and with {\omega} similarly controlled in other function space norms than the {L^\infty} norm, but this proposition already gives a flavour of the approximation. If one uses spherical coordinates

\displaystyle  \omega_{r3}( \rho \cos \theta, \rho \sin \theta ) = \Omega( \rho, \theta )

then we have (using the spherical change of variables formula {dy = \rho^2 \cos \theta d\rho d\theta d\phi} and the odd nature of {\Omega})

\displaystyle  {\mathcal L}_{12}(\omega) = L_{12}(\Omega),

where

\displaystyle L_{12}(\Omega)(\rho) = 3 \int_\rho^\infty \int_0^{\pi/2} \frac{\Omega(r, \theta) \sin(\theta) \cos^2(\theta)}{r}\ d\theta dr

is the operator introduced in Elgindi’s paper.

Proof: By a limiting argument we may assume that {x} is non-zero, and we may normalise {\|\omega\|_{L^\infty({\bf R}^3)}=1}. From the triangle inequality we have

\displaystyle  \int_{|y| \leq 10|x|} \frac{\omega(y) \times (x-y)}{|x-y|^3}\ dy \leq \int_{|y| \leq 10|x|} \frac{1}{|x-y|^2}\ dy

\displaystyle  \leq \int_{|z| \leq 11 |x|} \frac{1}{|z|^2}\ dz

\displaystyle  = O( |x| )

and hence by (1)

\displaystyle  u(x) = \frac{1}{4\pi} \int_{|y| > 10|x|} \frac{\omega(y) \times (x-y)}{|x-y|^3}\ dy + O(|x|).

In the regime {|y| > 2|x|} we may perform the Taylor expansion

\displaystyle  \frac{x-y}{|x-y|^3} = \frac{x-y}{|y|^3} (1 - \frac{2 x \cdot y}{|y|^2} + \frac{|x|^2}{|y|^2})^{-3/2}

\displaystyle  = \frac{x-y}{|y|^3} (1 + \frac{3 x \cdot y}{|y|^2} + O( \frac{|x|^2}{|y|^2} ) )

\displaystyle  = -\frac{y}{|y|^3} + \frac{x}{|y|^3} - \frac{3 (x \cdot y) y}{|y|^5} + O( \frac{|x|^2}{|y|^4} ).

Since

\displaystyle  \int_{|y| > 10|x|} \frac{|x|^2}{|y|^4}\ dy = O(|x|)

we see from the triangle inequality that the error term contributes {O(|x|)} to {u(x)}. We thus have

\displaystyle  u(x) = -A_0(x) + A_1(x) - 3A'_1(x) + O(|x|)

where {A_0} is the constant term

\displaystyle  A_0 := \int_{|y| > 10|x|} \frac{\omega(y) \times y}{|y|^3}\ dy,

and {A_1, A'_1} are the linear term

\displaystyle  A_1 := \int_{|y| > 10|x|} \frac{\omega(y) \times x}{|y|^3}\ dy,

\displaystyle  A'_1 := \int_{|y| > 10|x|} (x \cdot y) \frac{\omega(y) \times y}{|y|^5}\ dy.

By the hypotheses (i), (ii), we have the symmetries

\displaystyle  \omega(y_1,y_2,-y_3) = - \omega(y_1,y_2,y_3) \ \ \ \ \ (6)

and

\displaystyle  \omega(-y_1,-y_2,y_3) = - \omega(y_1,y_2,y_3) \ \ \ \ \ (7)

and hence also

\displaystyle  \omega(-y_1,-y_2,-y_3) = \omega(y_1,y_2,y_3). \ \ \ \ \ (8)

The even symmetry (8) ensures that the integrand in {A_0} is odd, so {A_0} vanishes. The symmetry (6) or (7) similarly ensures that {\int_{|y| > 10|x|} \frac{\omega(y)}{|y|^3}\ dy = 0}, so {A_1} vanishes. Since {\int_{|x| < y \leq 10|x|} \frac{|x \cdot y| |y|}{|y|^5}\ dy = O( |x| )}, we conclude that

\displaystyle  \omega(x) = -3\int_{|y| \geq |x|} (x \cdot y) \frac{\omega(y) \times y}{|y|^5}\ dy + O(|x|).

Using (4), the right-hand side is

\displaystyle  -3\int_{|y| \geq |x|} (x_1 y_1 + x_2 y_2 + x_3 y_3) \frac{\omega_{r3}(r,y_3) (-y_1 y_3, -y_2 y_3, y_1^2+y_2^2)}{r|y|^5}\ dy

\displaystyle + O(|x|)

where {r := \sqrt{y_1^2+y_2^2}}. Because of the odd nature of {\omega_{r3}}, only those terms with one factor of {y_3} give a non-vanishing contribution to the integral. Using the rotation symmetry {(y_1,y_2,y_3) \mapsto (-y_2,y_1,y_3)} we also see that any term with a factor of {y_1 y_2} also vanishes. We can thus simplify the above expression as

\displaystyle  -3\int_{|y| \geq |x|} \frac{\omega_{r3}(r,y_3) (-x_1 y_1^2 y_3, -x_2 y_2^2 y_3, x_3 (y_1^2+y_2^2) y_3)}{r|y|^5}\ dy + O(|x|).

Using the rotation symmetry {(y_1,y_2,y_3) \mapsto (-y_2,y_1,y_3)} again, we see that the term {y_1^2} in the first component can be replaced by {y_2^2} or by {\frac{1}{2} (y_1^2+y_2^2) = \frac{r^2}{2}}, and similarly for the {y_2^2} term in the second component. Thus the above expression is

\displaystyle  \frac{3}{2} \int_{|y| \geq |x|} \frac{\omega_{r3}(r,y_3) (x_1 , x_2, -2x_3) r y_3}{|y|^5}\ dy + O(|x|)

giving the claim. \Box

Example 2 Consider the divergence-free vector field {u := \nabla \times \psi}, where the vector potential {\psi} takes the form

\displaystyle  \psi(x_1,x_2,x_3) := (x_2 x_3, -x_1 x_3, 0) \eta(|x|)

for some bump function {\eta: {\bf R} \rightarrow {\bf R}} supported in {(0,+\infty)}. We can then calculate

\displaystyle  u(x_1,x_2,x_3) = X(x) \eta(|x|) + (x_1 x_3, x_2 x_3, -x_1^2-x_2^2) \frac{\eta'(|x|) x_3}{|x|}.

and

\displaystyle  \omega(x_1,x_2,x_3) = (-6x_2 x_3, 6x_1 x_3, 0) \frac{\eta'(|x|)}{|x|} + (-x_2 x_3, x_1 x_3, 0) \eta''(|x|).

In particular the hypotheses (i), (ii) are satisfied with

\displaystyle  \omega_{r3}(r,x_3) = - 6 \eta'(|x|) \frac{x_3 r}{|x|} - \eta''(|x|) x_3 r.

One can then calculate

\displaystyle  L_{12}(\omega)(\rho) = -\frac{3}{4\pi} \int_{|y| \geq \rho} (6\frac{\eta'(|y|)}{|y|^6} + \frac{\eta''(|y|)}{|y|^5}) r^2 y_3^2\ dy

\displaystyle  = -\frac{2}{5} \int_\rho^\infty 6\eta'(s) + s\eta''(s)\ ds

\displaystyle  = 2\eta(\rho) + \frac{2}{5} \rho \eta'(\rho).

If we take the specific choice

\displaystyle  \eta(\rho) = \varphi( \rho^\alpha )

where {\varphi} is a fixed bump function supported some interval {[c,C] \subset (0,+\infty)} and {\alpha>0} is a small parameter (so that {\eta} is spread out over the range {\rho \in [c^{1/\alpha},C^{1/\alpha}]}), then we see that

\displaystyle  \| \omega \|_{L^\infty} = O( \alpha )

(with implied constants allowed to depend on {\varphi}),

\displaystyle  L_{12}(\omega)(\rho) = 2\eta(\rho) + O(\alpha),

and

\displaystyle  u = X(x) \eta(|x|) + O( \alpha |x| ),

which is completely consistent with Proposition 1.

One can use this approximation to extract a plausible ansatz for a self-similar blowup to the Euler equations. We let {\alpha>0} be a small parameter and let {\omega_{rx_3}} be a time-dependent vorticity field obeying (i), (ii) of the form

\displaystyle  \omega_{rx_3}(t,r,x_3) \approx \alpha \Omega( t, R ) \mathrm{sgn}(x_3)

where {R := |x|^\alpha = (r^2+x_3^2)^{\alpha/2}} and {\Omega: {\bf R} \times [0,+\infty) \rightarrow {\bf R}} is a smooth field to be chosen later. Admittedly the signum function {\mathrm{sgn}} is not smooth at {x_3}, but let us ignore this issue for now (to rigorously make an ansatz one will have to smooth out this function a little bit; Elgindi uses the choice {(|\sin \theta| \cos^2 \theta)^{\alpha/3} \mathrm{sgn}(x_3)}, where {\theta := \mathrm{arctan}(x_3/r)}). With this ansatz one may compute

\displaystyle  {\mathcal L}_{12}(\omega(t))(\rho) \approx \frac{3\alpha}{2\pi} \int_{|y| \geq \rho; y_3 \geq 0} \Omega(t,R) \frac{r y_3}{|y|^5}\ dy

\displaystyle  = \alpha \int_\rho^\infty \Omega(t, s^\alpha) \frac{ds}{s}

\displaystyle  = \int_{\rho^\alpha}^\infty \Omega(t,s) \frac{ds}{s}.

By Proposition 1, we thus expect to have the approximation

\displaystyle  u(t,x) \approx \frac{1}{2} \int_{|x|^\alpha}^\infty \Omega(t,s) \frac{ds}{s} X(x).

We insert this into the vorticity equation (2). The transport term {(u \cdot \nabla) \omega} will be expected to be negligible because {R}, and hence {\omega_{rx_3}}, is slowly varying (the discontinuity of {\mathrm{sgn}(x_3)} will not be encountered because the vector field {X} is parallel to this singularity). The modulating function {\frac{1}{2} \int_{|x|^\alpha}^\infty \Omega(t,s) \frac{ds}{s}} is similarly slowly varying, so derivatives falling on this function should be lower order. Neglecting such terms, we arrive at the approximation

\displaystyle  (\omega \cdot \nabla) u \approx \frac{1}{2} \int_{|x|^\alpha}^\infty \Omega(t,s) \frac{ds}{s} \omega

and so in the limit {\alpha \rightarrow 0} we expect obtain a simple model equation for the evolution of the vorticity envelope {\Omega}:

\displaystyle  \partial_t \Omega(t,R) = \frac{1}{2} \int_R^\infty \Omega(t,S) \frac{dS}{S} \Omega(t,R).

If we write {L(t,R) := \int_R^\infty \Omega(t,S)\frac{dS}{S}} for the logarithmic primitive of {\Omega}, then we have {\Omega = - R \partial_R L} and hence

\displaystyle  \partial_t (R \partial_R L) = \frac{1}{2} L (R \partial_R L)

which integrates to the Ricatti equation

\displaystyle  \partial_t L = \frac{1}{4} L^2

which can be explicitly solved as

\displaystyle  L(t,R) = \frac{2}{f(R) - t/2}

where {f(R)} is any function of {R} that one pleases. (In Elgindi’s work a time dilation is used to remove the unsightly factor of {1/2} appearing here in the denominator.) If for instance we set {f(R) = 1+R}, we obtain the self-similar solution

\displaystyle  L(t,R) = \frac{2}{1+R-t/2}

and then on applying {-R \partial_R}

\displaystyle  \Omega(t,R) = \frac{2R}{(1+R-t/2)^2}.

Thus, we expect to be able to construct a self-similar blowup to the Euler equations with a vorticity field approximately behaving like

\displaystyle  \omega(t,x) \approx \alpha \frac{2R}{(1+R-t/2)^2} \mathrm{sgn}(x_3) (\frac{x_2}{r}, -\frac{x_1}{r}, 0)

and velocity field behaving like

\displaystyle  u(t,x) \approx \frac{1}{1+R-t/2} X(x).

In particular, {u} would be expected to be of regularity {C^{1,\alpha}} (and smooth away from the origin), and blows up in (say) {L^\infty} norm at time {t/2 = 1}, and one has the self-similarity

\displaystyle  u(t,x) = (1-t/2)^{\frac{1}{\alpha}-1} u( 0, \frac{x}{(1-t/2)^{1/\alpha}} )

and

\displaystyle  \omega(t,x) = (1-t/2)^{-1} \omega( 0, \frac{x}{(1-t/2)^{1/\alpha}} ).

A self-similar solution of this approximate shape is in fact constructed rigorously in Elgindi’s paper (using spherical coordinates instead of the Cartesian approach adopted here), using a nonlinear stability analysis of the above ansatz. It seems plausible that one could also carry out this stability analysis using this Cartesian coordinate approach, although I have not tried to do this in detail.

Let us call an arithmetic function {f: {\bf N} \rightarrow {\bf C}} {1}-bounded if we have {|f(n)| \leq 1} for all {n \in {\bf N}}. In this section we focus on the asymptotic behaviour of {1}-bounded multiplicative functions. Some key examples of such functions include:

  • The Möbius function {\mu};
  • The Liouville function {\lambda};
  • Archimedean” characters {n \mapsto n^{it}} (which I call Archimedean because they are pullbacks of a Fourier character {x \mapsto x^{it}} on the multiplicative group {{\bf R}^+}, which has the Archimedean property);
  • Dirichlet characters (or “non-Archimedean” characters) {\chi} (which are essentially pullbacks of Fourier characters on a multiplicative cyclic group {({\bf Z}/q{\bf Z})^\times} with the discrete (non-Archimedean) metric);
  • Hybrid characters {n \mapsto \chi(n) n^{it}}.

The space of {1}-bounded multiplicative functions is also closed under multiplication and complex conjugation.

Given a multiplicative function {f}, we are often interested in the asymptotics of long averages such as

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n)

for large values of {X}, as well as short sums

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n)

where {H} and {x} are both large, but {H} is significantly smaller than {x}. (Throughout these notes we will try to normalise most of the sums and integrals appearing here as averages that are trivially bounded by {O(1)}; note that other normalisations are preferred in some of the literature cited here.) For instance, as we established in Theorem 58 of Notes 1, the prime number theorem is equivalent to the assertion that

\displaystyle  \frac{1}{X} \sum_{n \leq X} \mu(n) = o(1) \ \ \ \ \ (1)

as {X \rightarrow \infty}. The Liouville function behaves almost identically to the Möbius function, in that estimates for one function almost always imply analogous estimates for the other:

Exercise 1 Without using the prime number theorem, show that (1) is also equivalent to

\displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n) = o(1) \ \ \ \ \ (2)

as {X \rightarrow \infty}. (Hint: use the identities {\lambda(n) = \sum_{d^2|n} \mu(n/d^2)} and {\mu(n) = \sum_{d^2|n} \mu(d) \lambda(n/d^2)}.)

Henceforth we shall focus our discussion more on the Liouville function, and turn our attention to averages on shorter intervals. From (2) one has

\displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n) = o(1) \ \ \ \ \ (3)

as {x \rightarrow \infty} if {H = H(x)} is such that {H \geq \varepsilon x} for some fixed {\varepsilon>0}. However it is significantly more difficult to understand what happens when {H} grows much slower than this. By using the techniques based on zero density estimates discussed in Notes 6, it was shown by Motohashi and that one can also establish \eqref. On the Riemann Hypothesis Maier and Montgomery lowered the threshold to {H \geq x^{1/2} \log^C x} for an absolute constant {C} (the bound {H \geq x^{1/2+\varepsilon}} is more classical, following from Exercise 33 of Notes 2). On the other hand, the randomness heuristics from Supplement 4 suggest that {H} should be able to be taken as small as {x^\varepsilon}, and perhaps even {\log^{1+\varepsilon} x} if one is particularly optimistic about the accuracy of these probabilistic models. On the other hand, the Chowla conjecture (mentioned for instance in Supplement 4) predicts that {H} cannot be taken arbitrarily slowly growing in {x}, due to the conjectured existence of arbitrarily long strings of consecutive numbers where the Liouville function does not change sign (and in fact one can already show from the known partial results towards the Chowla conjecture that (3) fails for some sequence {x \rightarrow \infty} and some sufficiently slowly growing {H = H(x)}, by modifying the arguments in these papers of mine).

The situation is better when one asks to understand the mean value on almost all short intervals, rather than all intervals. There are several equivalent ways to formulate this question:

Exercise 2 Let {H = H(X)} be a function of {X} such that {H \rightarrow \infty} and {H = o(X)} as {X \rightarrow \infty}. Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded function. Show that the following assertions are equivalent:

  • (i) One has

    \displaystyle  \frac{1}{H} \sum_{x \leq n \leq x+H} f(n) = o(1)

    as {X \rightarrow \infty}, uniformly for all {x \in [X,2X]} outside of a set of measure {o(X)}.

  • (ii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|\ dx = o(1)

    as {X \rightarrow \infty}.

  • (iii) One has

    \displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx = o(1) \ \ \ \ \ (4)

    as {X \rightarrow \infty}.

As it turns out the second moment formulation in (iii) will be the most convenient for us to work with in this set of notes, as it is well suited to Fourier-analytic techniques (and in particular the Plancherel theorem).

Using zero density methods, for instance, it was shown by Ramachandra that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll_{A,\varepsilon} \log^{-A} X

whenever {X^{1/6+\varepsilon} \leq H \leq X} and {\varepsilon>0}. With this quality of bound (saving arbitrary powers of {\log X} over the trivial bound of {O(1)}), this is still the lowest value of {H} one can reach unconditionally. However, in a striking recent breakthrough, it was shown by Matomaki and Radziwill that as long as one is willing to settle for weaker bounds (saving a small power of {\log X} or {\log H}, or just a qualitative decay of {o(1)}), one can obtain non-trivial estimates on far shorter intervals. For instance, they show

Theorem 3 (Matomaki-Radziwill theorem for Liouville) For any {2 \leq H \leq X}, one has

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)|^2\ dx \ll \log^{-c} H

for some absolute constant {c>0}.

In fact they prove a slightly more precise result: see Theorem 1 of that paper. In particular, they obtain the asymptotic (4) for any function {H = H(X)} that goes to infinity as {X \rightarrow \infty}, no matter how slowly! This ability to let {H} grow slowly with {X} is important for several applications; for instance, in order to combine this type of result with the entropy decrement methods from Notes 9, it is essential that {H} be allowed to grow more slowly than {\log X}. See also this survey of Soundararajan for further discussion.

Exercise 4 In this exercise you may use Theorem 3 freely.

  • (i) Establish the lower bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) > -1+c

    for some absolute constant {c>0} and all sufficiently large {X}. (Hint: if this bound failed, then {\lambda(n)=\lambda(n+1)} would hold for almost all {n}; use this to create many intervals {[x,x+H]} for which {\frac{1}{H} \sum_{x \leq n \leq x+H} \lambda(n)} is extremely large.)

  • (ii) Show that Theorem 3 also holds with {\lambda(n)} replaced by {\chi_2 \lambda(n)}, where {\chi_2} is the principal character of period {2}. (Use the fact that {\lambda(2n)=-\lambda(n)} for all {n}.) Use this to establish the corresponding upper bound

    \displaystyle  \frac{1}{X} \sum_{n \leq X} \lambda(n)\lambda(n+1) < 1-c

    to (i).

(There is a curious asymmetry to the difficulty level of these bounds; the upper bound in (ii) was established much earlier by Harman, Pintz, and Wolke, but the lower bound in (i) was only established in the Matomaki-Radziwill paper.)

The techniques discussed previously were highly complex-analytic in nature, relying in particular on the fact that functions such as {\mu} or {\lambda} have Dirichlet series {{\mathcal D} \mu(s) = \frac{1}{\zeta(s)}}, {{\mathcal D} \lambda(s) = \frac{\zeta(2s)}{\zeta(s)}} that extend meromorphically into the critical strip. In contrast, the Matomaki-Radziwill theorem does not rely on such meromorphic continuations, and in fact holds for more general classes of {1}-bounded multiplicative functions {f}, for which one typically does not expect any meromorphic continuation into the strip. Instead, one can view the Matomaki-Radziwill theory as following the philosophy of a slightly different approach to multiplicative number theory, namely the pretentious multiplicative number theory of Granville and Soundarajan (as presented for instance in their draft monograph). A basic notion here is the pretentious distance between two {1}-bounded multiplicative functions {f,g} (at a given scale {X}), which informally measures the extent to which {f} “pretends” to be like {g} (or vice versa). The precise definition is

Definition 5 (Pretentious distance) Given two {1}-bounded multiplicative functions {f,g}, and a threshold {X>0}, the pretentious distance {\mathbb{D}(f,g;X)} between {f} and {g} up to scale {X} is given by the formula

\displaystyle  \mathbb{D}(f,g;X) := \left( \sum_{p \leq X} \frac{1 - \mathrm{Re}(f(p) \overline{g(p)})}{p} \right)^{1/2}

Note that one can also define an infinite version {\mathbb{D}(f,g;\infty)} of this distance by removing the constraint {p \leq X}, though in such cases the pretentious distance may then be infinite. The pretentious distance is not quite a metric (because {\mathbb{D}(f,f;X)} can be non-zero, and furthermore {\mathbb{D}(f,g;X)} can vanish without {f,g} being equal), but it is still quite close to behaving like a metric, in particular it obeys the triangle inequality; see Exercise 16 below. The philosophy of pretentious multiplicative number theory is that two {1}-bounded multiplicative functions {f,g} will exhibit similar behaviour at scale {X} if their pretentious distance {\mathbb{D}(f,g;X)} is bounded, but will become uncorrelated from each other if this distance becomes large. A simple example of this philosophy is given by the following “weak Halasz theorem”, proven in Section 2:

Proposition 6 (Logarithmically averaged version of Halasz) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative functions {f,g}, one has

\displaystyle  \frac{1}{\log X} \sum_{n \leq X} \frac{f(n) \overline{g(n)}}{n} \ll \exp( - c \mathbb{D}(f, g;X)^2 )

for an absolute constant {c>0}.

In particular, if {f} does not pretend to be {1}, then the logarithmic average {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} will be small. This condition is basically necessary, since of course {\frac{1}{\log X} \sum_{n \leq X} \frac{1}{n} = 1 + o(1)}.

If one works with non-logarithmic averages {\frac{1}{X} \sum_{n \leq X} f(n)}, then not pretending to be {1} is insufficient to establish decay, as was already observed in Exercise 11 of Notes 1: if {f} is an Archimedean character {f(n) = n^{it}} for some non-zero real {t}, then {\frac{1}{\log X} \sum_{n \leq X} \frac{f(n)}{n}} goes to zero as {X \rightarrow \infty} (which is consistent with Proposition 6), but {\frac{1}{X} \sum_{n \leq X} f(n)} does not go to zero. However, this is in some sense the “only” obstruction to these averages decaying to zero, as quantified by the following basic result:

Theorem 7 (Halasz’s theorem) Let {X} be sufficiently large. Then for any {1}-bounded multiplicative function {f}, one has

\displaystyle  \frac{1}{X} \sum_{n \leq X} f(n) \ll \exp( - c \min_{|t| \leq T} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{T}

for an absolute constant {c>0} and any {T > 0}.

Informally, we refer to a {1}-bounded multiplicative function as “pretentious’; if it pretends to be a character such as {n^{it}}, and “non-pretentious” otherwise. The precise distinction is rather malleable, as the precise class of characters that one views as “obstructions” varies from situation to situation. For instance, in Proposition 6 it is just the trivial character {1} which needs to be considered, but in Theorem 7 it is the characters {n \mapsto n^{it}} with {|t| \leq T}. In other contexts one may also need to add Dirichlet characters {\chi(n)} or hybrid characters such as {\chi(n) n^{it}} to the list of characters that one might pretend to be. The division into pretentious and non-pretentious functions in multiplicative number theory is faintly analogous to the division into major and minor arcs in the circle method applied to additive number theory problems; see Notes 8. The Möbius and Liouville functions are model examples of non-pretentious functions; see Exercise 24.

In the contrapositive, Halasz’ theorem can be formulated as the assertion that if one has a large mean

\displaystyle  |\frac{1}{X} \sum_{n \leq X} f(n)| \geq \eta

for some {\eta > 0}, then one has the pretentious property

\displaystyle  \mathbb{D}( f, n \mapsto n^{it}; X ) \ll \sqrt{\log(1/\eta)}

for some {t \ll \eta^{-1}}. This has the flavour of an “inverse theorem”, of the type often found in arithmetic combinatorics.

Among other things, Halasz’s theorem gives yet another proof of the prime number theorem (1); see Section 2.

We now give a version of the Matomaki-Radziwill theorem for general (non-pretentious) multiplicative functions that is formulated in a similar contrapositive (or “inverse theorem”) fashion, though to simplify the presentation we only state a qualitative version that does not give explicit bounds.

Theorem 8 ((Qualitative) Matomaki-Radziwill theorem) Let {\eta>0}, and let {1 \leq H \leq X}, with {H} sufficiently large depending on {\eta}. Suppose that {f} is a {1}-bounded multiplicative function such that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \geq \eta^2.

Then one has

\displaystyle  \mathbb{D}(f, n \mapsto n^{it};X) \ll_\eta 1

for some {t \ll_\eta \frac{X}{H}}.

The condition {t \ll_\eta \frac{X}{H}} is basically optimal, as the following example shows:

Exercise 9 Let {\varepsilon>0} be a sufficiently small constant, and let {1 \leq H \leq X} be such that {\frac{1}{\varepsilon} \leq H \leq \varepsilon X}. Let {f} be the Archimedean character {f(n) = n^{it}} for some {|t| \leq \varepsilon \frac{X}{H}}. Show that

\displaystyle  \frac{1}{X} \int_X^{2X} |\frac{1}{H} \sum_{x \leq n \leq x+H} f(n)|^2\ dx \asymp 1.

Combining Theorem 8 with standard non-pretentiousness facts about the Liouville function (see Exercise 24), we recover Theorem 3 (but with a decay rate of only {o(1)} rather than {\log^{-c} H}). We refer the reader to the original paper of Matomaki-Radziwill (as well as this followup paper with myself) for the quantitative version of Theorem 8 that is strong enough to recover the full version of Theorem 3, and which can also handle real-valued pretentious functions.

With our current state of knowledge, the only arguments that can establish the full strength of Halasz and Matomaki-Radziwill theorems are Fourier analytic in nature, relating sums involving an arithmetic function {f} with its Dirichlet series

\displaystyle  {\mathcal D} f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s}

which one can view as a discrete Fourier transform of {f} (or more precisely of the measure {\sum_{n=1}^\infty \frac{f(n)}{n} \delta_{\log n}}, if one evaluates the Dirichlet series on the right edge {\{ 1+it: t \in {\bf R} \}} of the critical strip). In this aspect, the techniques resemble the complex-analytic methods from Notes 2, but with the key difference that no analytic or meromorphic continuation into the strip is assumed. The key identity that allows us to pass to Dirichlet series is the following variant of Proposition 7 of Notes 2:

Proposition 10 (Parseval type identity) Let {f,g: {\bf N} \rightarrow {\bf C}} be finitely supported arithmetic functions, and let {\psi: {\bf R} \rightarrow {\bf R}} be a Schwartz function. Then

\displaystyle  \sum_{n=1}^\infty \sum_{m=1}^\infty \frac{f(n)}{n} \frac{\overline{g(m)}}{m} \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} {\mathcal D} f(1+it) \overline{{\mathcal D} g(1+it)} \hat \psi(t)\ dt

where {\hat \psi(t) := \int_{\bf R} \psi(u) e^{itu}\ du} is the Fourier transform of {\psi}. (Note that the finite support of {f,g} and the Schwartz nature of {\psi,\hat \psi} ensure that both sides of the identity are absolutely convergent.)

The restriction that {f,g} be finitely supported will be slightly annoying in places, since most multiplicative functions will fail to be finitely supported, but this technicality can usually be overcome by suitably truncating the multiplicative function, and taking limits if necessary.

Proof: By expanding out the Dirichlet series, it suffices to show that

\displaystyle  \psi(\log n - \log m) = \frac{1}{2\pi} \int_{\bf R} \frac{1}{n^{it}} \frac{1}{m^{-it}} \hat \psi(t)\ dt

for any natural numbers {n,m}. But this follows from the Fourier inversion formula {\psi(u) = \frac{1}{2\pi} \int_{\bf R} e^{-itu} \hat \psi(t)\ dt} applied at {u = \log n - \log m}. \Box

For applications to Halasz type theorems, one sets {g(n)} equal to the Kronecker delta {\delta_{n=1}}, producing weighted integrals of {{\mathcal D} f(1+it)} of “{L^1}” type. For applications to Matomaki-Radziwill theorems, one instead sets {f=g}, and more precisely uses the following corollary of the above proposition, to obtain weighted integrals of {|{\mathcal D} f(1+it)|^2} of “{L^2}” type:

Exercise 11 (Plancherel type identity) If {f: {\bf N} \rightarrow {\bf C}} is finitely supported, and {\varphi: {\bf R} \rightarrow {\bf R}} is a Schwartz function, establish the identity

\displaystyle  \int_0^\infty |\sum_{n=1}^\infty \frac{f(n)}{n} \varphi(\log n - \log y)|^2 \frac{dy}{y} = \frac{1}{2\pi} \int_{\bf R} |{\mathcal D} f(1+it)|^2 |\hat \varphi(t)|^2\ dt.

In contrast, information about the non-pretentious nature of a multiplicative function {f} will give “pointwise” or “{L^\infty}” type control on the Dirichlet series {{\mathcal D} f(1+it)}, as is suggested from the Euler product factorisation of {{\mathcal D} f}.

It will be convenient to formalise the notion of {L^1}, {L^2}, and {L^\infty} control of the Dirichlet series {{\mathcal D} f}, which as previously mentioned can be viewed as a sort of “Fourier transform” of {f}:

Definition 12 (Fourier norms) Let {f: {\bf N} \rightarrow {\bf C}} be finitely supported, and let {\Omega \subset {\bf R}} be a bounded measurable set. We define the Fourier {L^\infty} norm

\displaystyle  \| f\|_{FL^\infty(\Omega)} := \sup_{t \in \Omega} |{\mathcal D} f(1+it)|,

the Fourier {L^2} norm

\displaystyle  \| f\|_{FL^2(\Omega)} := \left(\int_\Omega |{\mathcal D} f(1+it)|^2\ dt\right)^{1/2},

and the Fourier {L^1} norm

\displaystyle  \| f\|_{FL^1(\Omega)} := \int_\Omega |{\mathcal D} f(1+it)|\ dt.

One could more generally define {FL^p} norms for other exponents {p}, but we will only need the exponents {p=1,2,\infty} in this current set of notes. It is clear that all the above norms are in fact (semi-)norms on the space of finitely supported arithmetic functions.

As mentioned above, Halasz’s theorem gives good control on the Fourier {L^\infty} norm for restrictions of non-pretentious functions to intervals:

Exercise 13 (Fourier {L^\infty} control via Halasz) Let {f: {\bf N} \rightarrow {\bf C}} be a {1}-bounded multiplicative function, let {I} be an interval in {[C^{-1} X, CX]} for some {X \geq C \geq 1}, let {R \geq 1}, and let {\Omega \subset {\bf R}} be a bounded measurable set. Show that

\displaystyle  \| f 1_I \|_{FL^\infty(\Omega)} \ll_C \exp( - c \min_{t: \mathrm{dist}(t,\Omega) \leq R} \mathbb{D}(f, n \mapsto n^{it};X)^2 ) + \frac{1}{R}.

(Hint: you will need to use summation by parts (or an equivalent device) to deal with a {\frac{1}{n}} weight.)

Meanwhile, the Plancherel identity in Exercise 11 gives good control on the Fourier {L^2} norm for functions on long intervals (compare with Exercise 2 from Notes 6):

Exercise 14 ({L^2} mean value theorem) Let {T \geq 1}, and let {f: {\bf N} \rightarrow {\bf C}} be finitely supported. Show that

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll \sum_n \frac{1}{n} (\frac{T}{n} \sum_{m: |n-m| \leq n/T} |f(m)|)^2.

Conclude in particular that if {f} is supported in {[C^{-1} N, C N]} for some {C \geq 1} and {N \gg T}, then

\displaystyle  \| f \|_{FL^2([-T,T])}^2 \ll C^{O(1)} \frac{1}{N} \sum_n |f(n)|^2.

In the simplest case of the logarithmically averaged Halasz theorem (Proposition 6), Fourier {L^\infty} estimates are already sufficient to obtain decent control on the (weighted) Fourier {L^1} type expressions that show up. However, these estimates are not enough by themselves to establish the full Halasz theorem or the Matomaki-Radziwill theorem. To get from Fourier {L^\infty} control to Fourier {L^1} or {L^2} control more efficiently, the key trick is use Hölder’s inequality, which when combined with the basic Dirichlet series identity

\displaystyle  {\mathcal D}(f*g) = ({\mathcal D} f) ({\mathcal D} g)

gives the inequalities

\displaystyle  \| f*g \|_{FL^1(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^2(\Omega)} \ \ \ \ \ (5)

and

\displaystyle  \| f*g \|_{FL^2(\Omega)} \leq \|f\|_{FL^2(\Omega)} \|g\|_{FL^\infty(\Omega)} \ \ \ \ \ (6)

The strategy is then to factor (or approximately factor) the original function {f} as a Dirichlet convolution (or average of convolutions) of various components, each of which enjoys reasonably good Fourier {L^2} or {L^\infty} estimates on various regions {\Omega}, and then combine them using the Hölder inequalities (5), (6) and the triangle inequality. For instance, to prove Halasz’s theorem, we will split {f} into the Dirichlet convolution of three factors, one of which will be estimated in {FL^\infty} using the non-pretentiousness hypothesis, and the other two being estimated in {FL^2} using Exercise 14. For the Matomaki-Radziwill theorem, one uses a significantly more complicated decomposition of {f} into a variety of Dirichlet convolutions of factors, and also splits up the Fourier domain {[-T,T]} into several subregions depending on whether the Dirichlet series associated to some of these components are large or small. In each region and for each component of these decompositions, all but one of the factors will be estimated in {FL^\infty}, and the other in {FL^2}; but the precise way in which this is done will vary from component to component. For instance, in some regions a key factor will be small in {FL^\infty} by construction of the region; in other places, the {FL^\infty} control will come from Exercise 13. Similarly, in some regions, satisfactory {FL^2} control is provided by Exercise 14, but in other regions one must instead use “large value” theorems (in the spirit of Proposition 9 from Notes 6), or amplify the power of the standard {L^2} mean value theorems by combining the Dirichlet series with other Dirichlet series that are known to be large in this region.

There are several ways to achieve the desired factorisation. In the case of Halasz’s theorem, we can simply work with a crude version of the Euler product factorisation, dividing the primes into three categories (“small”, “medium”, and “large” primes) and expressing {f} as a triple Dirichlet convolution accordingly. For the Matomaki-Radziwill theorem, one instead exploits the Turan-Kubilius phenomenon (Section 5 of Notes 1, or Lemma 2 of Notes 9)) that for various moderately wide ranges {[P,Q]} of primes, the number of prime divisors of a large number {n} in the range {[P,Q]} is almost always close to {\log\log Q - \log\log P}. Thus, if we introduce the arithmetic functions

\displaystyle  w_{[P,Q]}(n) = \frac{1}{\log\log Q - \log\log P} \sum_{P \leq p \leq Q} 1_{n=p} \ \ \ \ \ (7)

then we have

\displaystyle  1 \approx 1 * w_{[P,Q]}

and more generally we have a twisted approximation

\displaystyle  f \approx f * fw_{[P,Q]}

for multiplicative functions {f}. (Actually, for technical reasons it will be convenient to work with a smoothed out version of these functions; see Section 3.) Informally, these formulas suggest that the “{FL^2} energy” of a multiplicative function {f} is concentrated in those regions where {f w_{[P,Q]}} is extremely large in a {FL^\infty} sense. Iterations of this formula (or variants of this formula, such as an identity due to Ramaré) will then give the desired (approximate) factorisation of {{\mathcal D} f}.

Read the rest of this entry »

Just a short post to announce that nominations are now open for the Maryam Mirzakhani New Frontiers Prize, which is a newly announced annual $50,000 award from the Breakthrough Prize Foundation presented to early-career, women mathematicians who have completed their PhDs within the past two years, and recognizes outstanding research achievement.  (I will be serving on the prize committee.)  Nominations for this (and other breakthrough prizes) can be made at this page.

Peter Denton, Stephen Parke, Xining Zhang, and I have just uploaded to the arXiv a completely rewritten version of our previous paper, now titled “Eigenvectors from Eigenvalues: a survey of a basic identity in linear algebra“. This paper is now a survey of the various literature surrounding the following basic identity in linear algebra, which we propose to call the eigenvector-eigenvalue identity:

Theorem 1 (Eigenvector-eigenvalue identity) Let {A} be an {n \times n} Hermitian matrix, with eigenvalues {\lambda_1(A),\dots,\lambda_n(A)}. Let {v_i} be a unit eigenvector corresponding to the eigenvalue {\lambda_i(A)}, and let {v_{i,j}} be the {j^{th}} component of {v_i}. Then

\displaystyle |v_{i,j}|^2 \prod_{k=1; k \neq i}^n (\lambda_i(A) - \lambda_k(A)) = \prod_{k=1}^{n-1} (\lambda_i(A) - \lambda_k(M_j))

where {M_j} is the {n-1 \times n-1} Hermitian matrix formed by deleting the {j^{th}} row and column from {A}.

When we posted the first version of this paper, we were unaware of previous appearances of this identity in the literature; a related identity had been used by Erdos-Schlein-Yau and by myself and Van Vu for applications to random matrix theory, but to our knowledge this specific identity appeared to be new. Even two months after our preprint first appeared on the arXiv in August, we had only learned of one other place in the literature where the identity showed up (by Forrester and Zhang, who also cite an earlier paper of Baryshnikov).

The situation changed rather dramatically with the publication of a popular science article in Quanta on this identity in November, which gave this result significantly more exposure. Within a few weeks we became informed (through private communication, online discussion, and exploration of the citation tree around the references we were alerted to) of over three dozen places where the identity, or some other closely related identity, had previously appeared in the literature, in such areas as numerical linear algebra, various aspects of graph theory (graph reconstruction, chemical graph theory, and walks on graphs), inverse eigenvalue problems, random matrix theory, and neutrino physics. As a consequence, we have decided to completely rewrite our article in order to collate this crowdsourced information, and survey the history of this identity, all the known proofs (we collect seven distinct ways to prove the identity (or generalisations thereof)), and all the applications of it that we are currently aware of. The citation graph of the literature that this ad hoc crowdsourcing effort produced is only very weakly connected, which we found surprising:

The earliest explicit appearance of the eigenvector-eigenvalue identity we are now aware of is in a 1966 paper of Thompson, although this paper is only cited (directly or indirectly) by a fraction of the known literature, and also there is a precursor identity of Löwner from 1934 that can be shown to imply the identity as a limiting case. At the end of the paper we speculate on some possible reasons why this identity only achieved a modest amount of recognition and dissemination prior to the November 2019 Quanta article.

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Moore-Schmidt theorem“. This paper revisits a classical theorem of Moore and Schmidt in measurable cohomology of measure-preserving systems. To state the theorem, let {X = (X,{\mathcal X},\mu)} be a probability space, and {\mathrm{Aut}(X, {\mathcal X}, \mu)} be the group of measure-preserving automorphisms of this space, that is to say the invertible bimeasurable maps {T: X \rightarrow X} that preserve the measure {\mu}: {T_* \mu = \mu}. To avoid some ambiguity later in this post when we introduce abstract analogues of measure theory, we will refer to measurable maps as concrete measurable maps, and measurable spaces as concrete measurable spaces. (One could also call {X = (X,{\mathcal X}, \mu)} a concrete probability space, but we will not need to do so here as we will not be working explicitly with abstract probability spaces.)

Let {\Gamma = (\Gamma,\cdot)} be a discrete group. A (concrete) measure-preserving action of {\Gamma} on {X} is a group homomorphism {\gamma \mapsto T^\gamma} from {\Gamma} to {\mathrm{Aut}(X, {\mathcal X}, \mu)}, thus {T^1} is the identity map and {T^{\gamma_1} \circ T^{\gamma_2} = T^{\gamma_1 \gamma_2}} for all {\gamma_1,\gamma_2 \in \Gamma}. A large portion of ergodic theory is concerned with the study of such measure-preserving actions, especially in the classical case when {\Gamma} is the integers (with the additive group law).

Let {K = (K,+)} be a compact Hausdorff abelian group, which we can endow with the Borel {\sigma}-algebra {{\mathcal B}(K)}. A (concrete measurable) {K}cocycle is a collection {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} of concrete measurable maps {\rho_\gamma: X \rightarrow K} obeying the cocycle equation

\displaystyle \rho_{\gamma_1 \gamma_2}(x) = \rho_{\gamma_1} \circ T^{\gamma_2}(x) + \rho_{\gamma_2}(x) \ \ \ \ \ (1)

for {\mu}-almost every {x \in X}. (Here we are glossing over a measure-theoretic subtlety that we will return to later in this post – see if you can spot it before then!) Cocycles arise naturally in the theory of group extensions of dynamical systems; in particular (and ignoring the aforementioned subtlety), each cocycle induces a measure-preserving action {\gamma \mapsto S^\gamma} on {X \times K} (which we endow with the product of {\mu} with Haar probability measure on {K}), defined by

\displaystyle S^\gamma( x, k ) := (T^\gamma x, k + \rho_\gamma(x) ).

This connection with group extensions was the original motivation for our study of measurable cohomology, but is not the focus of the current paper.

A special case of a {K}-valued cocycle is a (concrete measurable) {K}-valued coboundary, in which {\rho_\gamma} for each {\gamma \in \Gamma} takes the special form

\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)

for {\mu}-almost every {x \in X}, where {F: X \rightarrow K} is some measurable function; note that (ignoring the aforementioned subtlety), every function of this form is automatically a concrete measurable {K}-valued cocycle. One of the first basic questions in measurable cohomology is to try to characterize which {K}-valued cocycles are in fact {K}-valued coboundaries. This is a difficult question in general. However, there is a general result of Moore and Schmidt that at least allows one to reduce to the model case when {K} is the unit circle {\mathbf{T} = {\bf R}/{\bf Z}}, by taking advantage of the Pontryagin dual group {\hat K} of characters {\hat k: K \rightarrow \mathbf{T}}, that is to say the collection of continuous homomorphisms {\hat k: k \mapsto \langle \hat k, k \rangle} to the unit circle. More precisely, we have

Theorem 1 (Countable Moore-Schmidt theorem) Let {\Gamma} be a discrete group acting in a concrete measure-preserving fashion on a probability space {X}. Let {K} be a compact Hausdorff abelian group. Assume the following additional hypotheses:

Then a {K}-valued concrete measurable cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is a concrete coboundary if and only if for each character {\hat k \in \hat K}, the {\mathbf{T}}-valued cocycles {\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}} are concrete coboundaries.

The hypotheses (i), (ii), (iii) are saying in some sense that the data {\Gamma, X, K} are not too “large”; in all three cases they are saying in some sense that the data are only “countably complicated”. For instance, (iii) is equivalent to {K} being second countable, and (ii) is equivalent to {X} being modeled by a complete separable metric space. It is because of this restriction that we refer to this result as a “countable” Moore-Schmidt theorem. This theorem is a useful tool in several other applications, such as the Host-Kra structure theorem for ergodic systems; I hope to return to these subsequent applications in a future post.

Let us very briefly sketch the main ideas of the proof of Theorem 1. Ignore for now issues of measurability, and pretend that something that holds almost everywhere in fact holds everywhere. The hard direction is to show that if each {\langle \hat k, \rho \rangle} is a coboundary, then so is {\rho}. By hypothesis, we then have an equation of the form

\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \alpha_{\hat k} \circ T^\gamma(x) - \alpha_{\hat k}(x) \ \ \ \ \ (2)

for all {\hat k, \gamma, x} and some functions {\alpha_{\hat k}: X \rightarrow {\mathbf T}}, and our task is then to produce a function {F: X \rightarrow K} for which

\displaystyle \rho_\gamma(x) = F \circ T^\gamma(x) - F(x)

for all {\gamma,x}.

Comparing the two equations, the task would be easy if we could find an {F: X \rightarrow K} for which

\displaystyle \langle \hat k, F(x) \rangle = \alpha_{\hat k}(x) \ \ \ \ \ (3)

for all {\hat k, x}. However there is an obstruction to this: the left-hand side of (3) is additive in {\hat k}, so the right-hand side would have to be also in order to obtain such a representation. In other words, for this strategy to work, one would have to first establish the identity

\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = 0 \ \ \ \ \ (4)

for all {\hat k_1, \hat k_2, x}. On the other hand, the good news is that if we somehow manage to obtain the equation, then we can obtain a function {F} obeying (3), thanks to Pontryagin duality, which gives a one-to-one correspondence between {K} and the homomorphisms of the (discrete) group {\hat K} to {\mathbf{T}}.

Now, it turns out that one cannot derive the equation (4) directly from the given information (2). However, the left-hand side of (2) is additive in {\hat k}, so the right-hand side must be also. Manipulating this fact, we eventually arrive at

\displaystyle (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2}) \circ T^\gamma(x) = (\alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2})(x).

In other words, we don’t get to show that the left-hand side of (4) vanishes, but we do at least get to show that it is {\Gamma}-invariant. Now let us assume for sake of argument that the action of {\Gamma} is ergodic, which (ignoring issues about sets of measure zero) basically asserts that the only {\Gamma}-invariant functions are constant. So now we get a weaker version of (4), namely

\displaystyle \alpha_{\hat k_1 + \hat k_2}(x) - \alpha_{\hat k_1}(x) - \alpha_{\hat k_2}(x) = c_{\hat k_1, \hat k_2} \ \ \ \ \ (5)

for some constants {c_{\hat k_1, \hat k_2} \in \mathbf{T}}.

Now we need to eliminate the constants. This can be done by the following group-theoretic projection. Let {L^0({\bf X} \rightarrow {\bf T})} denote the space of concrete measurable maps {\alpha} from {{\bf X}} to {{\bf T}}, up to almost everywhere equivalence; this is an abelian group where the various terms in (5) naturally live. Inside this group we have the subgroup {{\bf T}} of constant functions (up to almost everywhere equivalence); this is where the right-hand side of (5) lives. Because {{\bf T}} is a divisible group, there is an application of Zorn’s lemma (a good exercise for those who are not acquainted with these things) to show that there exists a retraction {w: L^0({\bf X} \rightarrow {\bf T}) \rightarrow {\bf T}}, that is to say a group homomorphism that is the identity on the subgroup {{\bf T}}. We can use this retraction, or more precisely the complement {\alpha \mapsto \alpha - w(\alpha)}, to eliminate the constant in (5). Indeed, if we set

\displaystyle \tilde \alpha_{\hat k}(x) := \alpha_{\hat k}(x) - w(\alpha_{\hat k})

then from (5) we see that

\displaystyle \tilde \alpha_{\hat k_1 + \hat k_2}(x) - \tilde \alpha_{\hat k_1}(x) - \tilde \alpha_{\hat k_2}(x) = 0

while from (2) one has

\displaystyle \langle \hat k, \rho_\gamma(x) \rangle = \tilde \alpha_{\hat k} \circ T^\gamma(x) - \tilde \alpha_{\hat k}(x)

and now the previous strategy works with {\alpha_{\hat k}} replaced by {\tilde \alpha_{\hat k}}. This concludes the sketch of proof of Theorem 1.

In making the above argument rigorous, the hypotheses (i)-(iii) are used in several places. For instance, to reduce to the ergodic case one relies on the ergodic decomposition, which requires the hypothesis (ii). Also, most of the above equations only hold outside of a set of measure zero, and the hypothesis (i) and the hypothesis (iii) (which is equivalent to {\hat K} being at most countable) to avoid the problem that an uncountable union of sets of measure zero could have positive measure (or fail to be measurable at all).

My co-author Asgar Jamneshan and I are working on a long-term project to extend many results in ergodic theory (such as the aforementioned Host-Kra structure theorem) to “uncountable” settings in which hypotheses analogous to (i)-(iii) are omitted; thus we wish to consider actions on uncountable groups, on spaces that are not standard Borel, and cocycles taking values in groups that are not metrisable. Such uncountable contexts naturally arise when trying to apply ergodic theory techniques to combinatorial problems (such as the inverse conjecture for the Gowers norms), as one often relies on the ultraproduct construction (or something similar) to generate an ergodic theory translation of these problems, and these constructions usually give “uncountable” objects rather than “countable” ones. (For instance, the ultraproduct of finite groups is a hyperfinite group, which is usually uncountable.). This paper marks the first step in this project by extending the Moore-Schmidt theorem to the uncountable setting.

If one simply drops the hypotheses (i)-(iii) and tries to prove the Moore-Schmidt theorem, several serious difficulties arise. We have already mentioned the loss of the ergodic decomposition and the possibility that one has to control an uncountable union of null sets. But there is in fact a more basic problem when one deletes (iii): the addition operation {+: K \times K \rightarrow K}, while still continuous, can fail to be measurable as a map from {(K \times K, {\mathcal B}(K) \otimes {\mathcal B}(K))} to {(K, {\mathcal B}(K))}! Thus for instance the sum of two measurable functions {F: X \rightarrow K} need not remain measurable, which makes even the very definition of a measurable cocycle or measurable coboundary problematic (or at least unnatural). This phenomenon is known as the Nedoma pathology. A standard example arises when {K} is the uncountable torus {{\mathbf T}^{{\bf R}}}, endowed with the product topology. Crucially, the Borel {\sigma}-algebra {{\mathcal B}(K)} generated by this uncountable product is not the product {{\mathcal B}(\mathbf{T})^{\otimes {\bf R}}} of the factor Borel {\sigma}-algebras (the discrepancy ultimately arises from the fact that topologies permit uncountable unions, but {\sigma}-algebras do not); relating to this, the product {\sigma}-algebra {{\mathcal B}(K) \otimes {\mathcal B}(K)} is not the same as the Borel {\sigma}-algebra {{\mathcal B}(K \times K)}, but is instead a strict sub-algebra. If the group operations on {K} were measurable, then the diagonal set

\displaystyle K^\Delta := \{ (k,k') \in K \times K: k = k' \} = \{ (k,k') \in K \times K: k - k' = 0 \}

would be measurable in {{\mathcal B}(K) \otimes {\mathcal B}(K)}. But it is an easy exercise in manipulation of {\sigma}-algebras to show that if {(X, {\mathcal X}), (Y, {\mathcal Y})} are any two measurable spaces and {E \subset X \times Y} is measurable in {{\mathcal X} \otimes {\mathcal Y}}, then the fibres {E_x := \{ y \in Y: (x,y) \in E \}} of {E} are contained in some countably generated subalgebra of {{\mathcal Y}}. Thus if {K^\Delta} were {{\mathcal B}(K) \otimes {\mathcal B}(K)}-measurable, then all the points of {K} would lie in a single countably generated {\sigma}-algebra. But the cardinality of such an algebra is at most {2^{\alpha_0}} while the cardinality of {K} is {2^{2^{\alpha_0}}}, and Cantor’s theorem then gives a contradiction.

To resolve this problem, we give {K} a coarser {\sigma}-algebra than the Borel {\sigma}-algebra, which we call the reduced {\sigma}-algebra {{\mathcal B}^\otimes(K)}, thus coarsening the measurable space structure on {K = (K,{\mathcal B}(K))} to a new measurable space {K_\otimes := (K, {\mathcal B}^\otimes(K))}. In the case of compact Hausdorff abelian groups, {{\mathcal B}^{\otimes}(K)} can be defined as the {\sigma}-algebra generated by the characters {\hat k: K \rightarrow {\mathbf T}}; for more general compact abelian groups, one can define {{\mathcal B}^{\otimes}(K)} as the {\sigma}-algebra generated by all continuous maps into metric spaces. This {\sigma}-algebra is equal to {{\mathcal B}(K)} when {K} is metrisable but can be smaller for other {K}. With this measurable structure, {K_\otimes} becomes a measurable group; it seems that once one leaves the metrisable world that {K_\otimes} is a superior (or at least equally good) space to work with than {K} for analysis, as it avoids the Nedoma pathology. (For instance, from Plancherel’s theorem, we see that if {m_K} is the Haar probability measure on {K}, then {L^2(K,m_K) = L^2(K_\otimes,m_K)} (thus, every {K}-measurable set is equivalent modulo {m_K}-null sets to a {K_\otimes}-measurable set), so there is no damage to Plancherel caused by passing to the reduced {\sigma}-algebra.

Passing to the reduced {\sigma}-algebra {K_\otimes} fixes the most severe problems with an uncountable Moore-Schmidt theorem, but one is still faced with an issue of having to potentially take an uncountable union of null sets. To avoid this sort of problem, we pass to the framework of abstract measure theory, in which we remove explicit mention of “points” and can easily delete all null sets at a very early stage of the formalism. In this setup, the category of concrete measurable spaces is replaced with the larger category of abstract measurable spaces, which we formally define as the opposite category of the category of {\sigma}-algebras (with Boolean algebra homomorphisms). Thus, we define an abstract measurable space to be an object of the form {{\mathcal X}^{\mathrm{op}}}, where {{\mathcal X}} is an (abstract) {\sigma}-algebra and {\mathrm{op}} is a formal placeholder symbol that signifies use of the opposite category, and an abstract measurable map {T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}} is an object of the form {(T^*)^{\mathrm{op}}}, where {T^*: {\mathcal Y} \rightarrow {\mathcal X}} is a Boolean algebra homomorphism and {\mathrm{op}} is again used as a formal placeholder; we call {T^*} the pullback map associated to {T}.  [UPDATE: It turns out that this definition of a measurable map led to technical issues.  In a forthcoming revision of the paper we also impose the requirement that the abstract measurable map be \sigma-complete (i.e., it respects countable joins).] The composition {S \circ T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}} of two abstract measurable maps {T: {\mathcal X}^{\mathrm{op}} \rightarrow {\mathcal Y}^{\mathrm{op}}}, {S: {\mathcal Y}^{\mathrm{op}} \rightarrow {\mathcal Z}^{\mathrm{op}}} is defined by the formula {S \circ T := (T^* \circ S^*)^{\mathrm{op}}}, or equivalently {(S \circ T)^* = T^* \circ S^*}.

Every concrete measurable space {X = (X,{\mathcal X})} can be identified with an abstract counterpart {{\mathcal X}^{op}}, and similarly every concrete measurable map {T: X \rightarrow Y} can be identified with an abstract counterpart {(T^*)^{op}}, where {T^*: {\mathcal Y} \rightarrow {\mathcal X}} is the pullback map {T^* E := T^{-1}(E)}. Thus the category of concrete measurable spaces can be viewed as a subcategory of the category of abstract measurable spaces. The advantage of working in the abstract setting is that it gives us access to more spaces that could not be directly defined in the concrete setting. Most importantly for us, we have a new abstract space, the opposite measure algebra {X_\mu} of {X}, defined as {({\bf X}/{\bf N})^*} where {{\bf N}} is the ideal of null sets in {{\bf X}}. Informally, {X_\mu} is the space {X} with all the null sets removed; there is a canonical abstract embedding map {\iota: X_\mu \rightarrow X}, which allows one to convert any concrete measurable map {f: X \rightarrow Y} into an abstract one {[f]: X_\mu \rightarrow Y}. One can then define the notion of an abstract action, abstract cocycle, and abstract coboundary by replacing every occurrence of the category of concrete measurable spaces with their abstract counterparts, and replacing {X} with the opposite measure algebra {X_\mu}; see the paper for details. Our main theorem is then

Theorem 2 (Uncountable Moore-Schmidt theorem) Let {\Gamma} be a discrete group acting abstractly on a {\sigma}-finite measure space {X}. Let {K} be a compact Hausdorff abelian group. Then a {K_\otimes}-valued abstract measurable cocycle {\rho = (\rho_\gamma)_{\gamma \in \Gamma}} is an abstract coboundary if and only if for each character {\hat k \in \hat K}, the {\mathbf{T}}-valued cocycles {\langle \hat k, \rho \rangle = ( \langle \hat k, \rho_\gamma \rangle )_{\gamma \in \Gamma}} are abstract coboundaries.

With the abstract formalism, the proof of the uncountable Moore-Schmidt theorem is almost identical to the countable one (in fact we were able to make some simplifications, such as avoiding the use of the ergodic decomposition). A key tool is what we call a “conditional Pontryagin duality” theorem, which asserts that if one has an abstract measurable map {\alpha_{\hat k}: X_\mu \rightarrow {\bf T}} for each {\hat k \in K} obeying the identity { \alpha_{\hat k_1 + \hat k_2} - \alpha_{\hat k_1} - \alpha_{\hat k_2} = 0} for all {\hat k_1,\hat k_2 \in \hat K}, then there is an abstract measurable map {F: X_\mu \rightarrow K_\otimes} such that {\alpha_{\hat k} = \langle \hat k, F \rangle} for all {\hat k \in \hat K}. This is derived from the usual Pontryagin duality and some other tools, most notably the completeness of the {\sigma}-algebra of {X_\mu}, and the Sikorski extension theorem.

We feel that it is natural to stay within the abstract measure theory formalism whenever dealing with uncountable situations. However, it is still an interesting question as to when one can guarantee that the abstract objects constructed in this formalism are representable by concrete analogues. The basic questions in this regard are:

  • (i) Suppose one has an abstract measurable map {f: X_\mu \rightarrow Y} into a concrete measurable space. Does there exist a representation of {f} by a concrete measurable map {\tilde f: X \rightarrow Y}? Is it unique up to almost everywhere equivalence?
  • (ii) Suppose one has a concrete cocycle that is an abstract coboundary. When can it be represented by a concrete coboundary?

For (i) the answer is somewhat interesting (as I learned after posing this MathOverflow question):

  • If {Y} does not separate points, or is not compact metrisable or Polish, there can be counterexamples to uniqueness. If {Y} is not compact or Polish, there can be counterexamples to existence.
  • If {Y} is a compact metric space or a Polish space, then one always has existence and uniqueness.
  • If {Y} is a compact Hausdorff abelian group, one always has existence.
  • If {X} is a complete measure space, then one always has existence (from a theorem of Maharam).
  • If {X} is the unit interval with the Borel {\sigma}-algebra and Lebesgue measure, then one has existence for all compact Hausdorff {Y} assuming the continuum hypothesis (from a theorem of von Neumann) but existence can fail under other extensions of ZFC (from a theorem of Shelah, using the method of forcing).
  • For more general {X}, existence for all compact Hausdorff {Y} is equivalent to the existence of a lifting from the {\sigma}-algebra {\mathcal{X}/\mathcal{N}} to {\mathcal{X}} (or, in the language of abstract measurable spaces, the existence of an abstract retraction from {X} to {X_\mu}).
  • It is a long-standing open question (posed for instance by Fremlin) whether it is relatively consistent with ZFC that existence holds whenever {Y} is compact Hausdorff.

Our understanding of (ii) is much less complete:

  • If {K} is metrisable, the answer is “always” (which among other things establishes the countable Moore-Schmidt theorem as a corollary of the uncountable one).
  • If {\Gamma} is at most countable and {X} is a complete measure space, then the answer is again “always”.

In view of the answers to (i), I would not be surprised if the full answer to (ii) was also sensitive to axioms of set theory. However, such set theoretic issues seem to be almost completely avoided if one sticks with the abstract formalism throughout; they only arise when trying to pass back and forth between the abstract and concrete categories.

In these notes we presume familiarity with the basic concepts of probability theory, such as random variables (which could take values in the reals, vectors, or other measurable spaces), probability, and expectation. Much of this theory is in turn based on measure theory, which we will also presume familiarity with. See for instance this previous set of lecture notes for a brief review.

The basic objects of study in analytic number theory are deterministic; there is nothing inherently random about the set of prime numbers, for instance. Despite this, one can still interpret many of the averages encountered in analytic number theory in probabilistic terms, by introducing random variables into the subject. Consider for instance the form

\displaystyle  \sum_{n \leq x} \mu(n) = o(x) \ \ \ \ \ (1)

of the prime number theorem (where we take the limit {x \rightarrow \infty}). One can interpret this estimate probabilistically as

\displaystyle  {\mathbb E} \mu(\mathbf{n}) = o(1) \ \ \ \ \ (2)

where {\mathbf{n} = \mathbf{n}_{\leq x}} is a random variable drawn uniformly from the natural numbers up to {x}, and {{\mathbb E}} denotes the expectation. (In this set of notes we will use boldface symbols to denote random variables, and non-boldface symbols for deterministic objects.) By itself, such an interpretation is little more than a change of notation. However, the power of this interpretation becomes more apparent when one then imports concepts from probability theory (together with all their attendant intuitions and tools), such as independence, conditioning, stationarity, total variation distance, and entropy. For instance, suppose we want to use the prime number theorem (1) to make a prediction for the sum

\displaystyle  \sum_{n \leq x} \mu(n) \mu(n+1).

After dividing by {x}, this is essentially

\displaystyle  {\mathbb E} \mu(\mathbf{n}) \mu(\mathbf{n}+1).

With probabilistic intuition, one may expect the random variables {\mu(\mathbf{n}), \mu(\mathbf{n}+1)} to be approximately independent (there is no obvious relationship between the number of prime factors of {\mathbf{n}}, and of {\mathbf{n}+1}), and so the above average would be expected to be approximately equal to

\displaystyle  ({\mathbb E} \mu(\mathbf{n})) ({\mathbb E} \mu(\mathbf{n}+1))

which by (2) is equal to {o(1)}. Thus we are led to the prediction

\displaystyle  \sum_{n \leq x} \mu(n) \mu(n+1) = o(x). \ \ \ \ \ (3)

The asymptotic (3) is widely believed (it is a special case of the Chowla conjecture, which we will discuss in later notes; while there has been recent progress towards establishing it rigorously, it remains open for now.

How would one try to make these probabilistic intuitions more rigorous? The first thing one needs to do is find a more quantitative measurement of what it means for two random variables to be “approximately” independent. There are several candidates for such measurements, but we will focus in these notes on two particularly convenient measures of approximate independence: the “{L^2}” measure of independence known as covariance, and the “{L \log L}” measure of independence known as mutual information (actually we will usually need the more general notion of conditional mutual information that measures conditional independence). The use of {L^2} type methods in analytic number theory is well established, though it is usually not described in probabilistic terms, being referred to instead by such names as the “second moment method”, the “large sieve” or the “method of bilinear sums”. The use of {L \log L} methods (or “entropy methods”) is much more recent, and has been able to control certain types of averages in analytic number theory that were out of reach of previous methods such as {L^2} methods. For instance, in later notes we will use entropy methods to establish the logarithmically averaged version

\displaystyle  \sum_{n \leq x} \frac{\mu(n) \mu(n+1)}{n} = o(\log x) \ \ \ \ \ (4)

of (3), which is implied by (3) but strictly weaker (much as the prime number theorem (1) implies the bound {\sum_{n \leq x} \frac{\mu(n)}{n} = o(\log x)}, but the latter bound is much easier to establish than the former).

As with many other situations in analytic number theory, we can exploit the fact that certain assertions (such as approximate independence) can become significantly easier to prove if one only seeks to establish them on average, rather than uniformly. For instance, given two random variables {\mathbf{X}} and {\mathbf{Y}} of number-theoretic origin (such as the random variables {\mu(\mathbf{n})} and {\mu(\mathbf{n}+1)} mentioned previously), it can often be extremely difficult to determine the extent to which {\mathbf{X},\mathbf{Y}} behave “independently” (or “conditionally independently”). However, thanks to second moment tools or entropy based tools, it is often possible to assert results of the following flavour: if {\mathbf{Y}_1,\dots,\mathbf{Y}_k} are a large collection of “independent” random variables, and {\mathbf{X}} is a further random variable that is “not too large” in some sense, then {\mathbf{X}} must necessarily be nearly independent (or conditionally independent) to many of the {\mathbf{Y}_i}, even if one cannot pinpoint precisely which of the {\mathbf{Y}_i} the variable {\mathbf{X}} is independent with. In the case of the second moment method, this allows us to compute correlations such as {{\mathbb E} {\mathbf X} \mathbf{Y}_i} for “most” {i}. The entropy method gives bounds that are significantly weaker quantitatively than the second moment method (and in particular, in its current incarnation at least it is only able to say non-trivial assertions involving interactions with residue classes at small primes), but can control significantly more general quantities {{\mathbb E} F( {\mathbf X}, \mathbf{Y}_i )} for “most” {i} thanks to tools such as the Pinsker inequality.

Read the rest of this entry »

I’ve just uploaded to the arXiv my paper “Almost all Collatz orbits attain almost bounded values“, submitted to the proceedings of the Forum of Mathematics, Pi. In this paper I returned to the topic of the notorious Collatz conjecture (also known as the {3x+1} conjecture), which I previously discussed in this blog post. This conjecture can be phrased as follows. Let {{\bf N}+1 = \{1,2,\dots\}} denote the positive integers (with {{\bf N} =\{0,1,2,\dots\}} the natural numbers), and let {\mathrm{Col}: {\bf N}+1 \rightarrow {\bf N}+1} be the map defined by setting {\mathrm{Col}(N)} equal to {3N+1} when {N} is odd and {N/2} when {N} is even. Let {\mathrm{Col}_{\min}(N) := \inf_{n \in {\bf N}} \mathrm{Col}^n(N)} be the minimal element of the Collatz orbit {N, \mathrm{Col}(N), \mathrm{Col}^2(N),\dots}. Then we have

Conjecture 1 (Collatz conjecture) One has {\mathrm{Col}_{\min}(N)=1} for all {N \in {\bf N}+1}.

Establishing the conjecture for all {N} remains out of reach of current techniques (for instance, as discussed in the previous blog post, it is basically at least as difficult as Baker’s theorem, all known proofs of which are quite difficult). However, the situation is more promising if one is willing to settle for results which only hold for “most” {N} in some sense. For instance, it is a result of Krasikov and Lagarias that

\displaystyle  \{ N \leq x: \mathrm{Col}_{\min}(N) = 1 \} \gg x^{0.84}

for all sufficiently large {x}. In another direction, it was shown by Terras that for almost all {N} (in the sense of natural density), one has {\mathrm{Col}_{\min}(N) < N}. This was then improved by Allouche to {\mathrm{Col}_{\min}(N)  0.869}, and extended later by Korec to cover all {\theta > \frac{\log 3}{\log 4} \approx 0.7924}. In this paper we obtain the following further improvement (at the cost of weakening natural density to logarithmic density):

Theorem 2 Let {f: {\bf N}+1 \rightarrow {\bf R}} be any function with {\lim_{N \rightarrow \infty} f(N) = +\infty}. Then we have {\mathrm{Col}_{\min}(N) < f(N)} for almost all {N} (in the sense of logarithmic density).

Thus for instance one has {\mathrm{Col}_{\min}(N) < \log\log\log\log N} for almost all {N} (in the sense of logarithmic density).

The difficulty here is one usually only expects to establish “local-in-time” results that control the evolution {\mathrm{Col}^n(N)} for times {n} that only get as large as a small multiple {c \log N} of {\log N}; the aforementioned results of Terras, Allouche, and Korec, for instance, are of this type. However, to get {\mathrm{Col}^n(N)} all the way down to {f(N)} one needs something more like an “(almost) global-in-time” result, where the evolution remains under control for so long that the orbit has nearly reached the bounded state {N=O(1)}.

However, as observed by Bourgain in the context of nonlinear Schrödinger equations, one can iterate “almost sure local wellposedness” type results (which give local control for almost all initial data from a given distribution) into “almost sure (almost) global wellposedness” type results if one is fortunate enough to draw one’s data from an invariant measure for the dynamics. To illustrate the idea, let us take Korec’s aforementioned result that if {\theta > \frac{\log 3}{\log 4}} one picks at random an integer {N} from a large interval {[1,x]}, then in most cases, the orbit of {N} will eventually move into the interval {[1,x^{\theta}]}. Similarly, if one picks an integer {M} at random from {[1,x^\theta]}, then in most cases, the orbit of {M} will eventually move into {[1,x^{\theta^2}]}. It is then tempting to concatenate the two statements and conclude that for most {N} in {[1,x]}, the orbit will eventually move {[1,x^{\theta^2}]}. Unfortunately, this argument does not quite work, because by the time the orbit from a randomly drawn {N \in [1,x]} reaches {[1,x^\theta]}, the distribution of the final value is unlikely to be close to being uniformly distributed on {[1,x^\theta]}, and in particular could potentially concentrate almost entirely in the exceptional set of {M \in [1,x^\theta]} that do not make it into {[1,x^{\theta^2}]}. The point here is the uniform measure on {[1,x]} is not transported by Collatz dynamics to anything resembling the uniform measure on {[1,x^\theta]}.

So, one now needs to locate a measure which has better invariance properties under the Collatz dynamics. It turns out to be technically convenient to work with a standard acceleration of the Collatz map known as the Syracuse map {\mathrm{Syr}: 2{\bf N}+1 \rightarrow 2{\bf N}+1}, defined on the odd numbers {2{\bf N}+1 = \{1,3,5,\dots\}} by setting {\mathrm{Syr}(N) = (3N+1)/2^a}, where {2^a} is the largest power of {2} that divides {3N+1}. (The advantage of using the Syracuse map over the Collatz map is that it performs precisely one multiplication of {3} at each iteration step, which makes the map better behaved when performing “{3}-adic” analysis.)

When viewed {3}-adically, we soon see that iterations of the Syracuse map become somewhat irregular. Most obviously, {\mathrm{Syr}(N)} is never divisible by {3}. A little less obviously, {\mathrm{Syr}(N)} is twice as likely to equal {2} mod {3} as it is to equal {1} mod {3}. This is because for a randomly chosen odd {\mathbf{N}}, the number of times {\mathbf{a}} that {2} divides {3\mathbf{N}+1} can be seen to have a geometric distribution of mean {2} – it equals any given value {a \in{\bf N}+1} with probability {2^{-a}}. Such a geometric random variable is twice as likely to be odd as to be even, which is what gives the above irregularity. There are similar irregularities modulo higher powers of {3}. For instance, one can compute that for large random odd {\mathbf{N}}, {\mathrm{Syr}^2(\mathbf{N}) \hbox{ mod } 9} will take the residue classes {0,1,2,3,4,5,6,7,8 \hbox{ mod } 9} with probabilities

\displaystyle  0, \frac{8}{63}, \frac{16}{63}, 0, \frac{11}{63}, \frac{4}{63}, 0, \frac{2}{63}, \frac{22}{63}

respectively. More generally, for any {n}, {\mathrm{Syr}^n(N) \hbox{ mod } 3^n} will be distributed according to the law of a random variable {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} on {{\bf Z}/3^n{\bf Z}} that we call a Syracuse random variable, and can be described explicitly as

\displaystyle  \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) = 2^{-\mathbf{a}_1} + 3^1 2^{-\mathbf{a}_1-\mathbf{a}_2} + \dots + 3^{n-1} 2^{-\mathbf{a}_1-\dots-\mathbf{a}_n} \hbox{ mod } 3^n, \ \ \ \ \ (1)

where {\mathbf{a}_1,\dots,\mathbf{a}_n} are iid copies of a geometric random variable of mean {2}.

In view of this, any proposed “invariant” (or approximately invariant) measure (or family of measures) for the Syracuse dynamics should take this {3}-adic irregularity of distribution into account. It turns out that one can use the Syracuse random variables {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} to construct such a measure, but only if these random variables stabilise in the limit {n \rightarrow \infty} in a certain total variation sense. More precisely, in the paper we establish the estimate

\displaystyle  \sum_{Y \in {\bf Z}/3^n{\bf Z}} | \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^n{\bf Z})=Y) - 3^{m-n} \mathbb{P}( \mathbf{Syrac}({\bf Z}/3^m{\bf Z})=Y \hbox{ mod } 3^m)| \ \ \ \ \ (2)

\displaystyle  \ll_A m^{-A}

for any {1 \leq m \leq n} and any {A > 0}. This type of stabilisation is plausible from entropy heuristics – the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} of geometric random variables that generates {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})} has Shannon entropy {n \log 4}, which is significantly larger than the total entropy {n \log 3} of the uniform distribution on {{\bf Z}/3^n{\bf Z}}, so we expect a lot of “mixing” and “collision” to occur when converting the tuple {(\mathbf{a}_1,\dots,\mathbf{a}_n)} to {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}; these heuristics can be supported by numerics (which I was able to work out up to about {n=10} before running into memory and CPU issues), but it turns out to be surprisingly delicate to make this precise.

A first hint of how to proceed comes from the elementary number theory observation (easily proven by induction) that the rational numbers

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{n-1} 2^{-a_1-\dots-a_n}

are all distinct as {(a_1,\dots,a_n)} vary over tuples in {({\bf N}+1)^n}. Unfortunately, the process of reducing mod {3^n} creates a lot of collisions (as must happen from the pigeonhole principle); however, by a simple “Lefschetz principle” type argument one can at least show that the reductions

\displaystyle  2^{-a_1} + 3^1 2^{-a_1-a_2} + \dots + 3^{m-1} 2^{-a_1-\dots-a_m} \hbox{ mod } 3^n \ \ \ \ \ (3)

are mostly distinct for “typical” {a_1,\dots,a_m} (as drawn using the geometric distribution) as long as {m} is a bit smaller than {\frac{\log 3}{\log 4} n} (basically because the rational number appearing in (3) then typically takes a form like {M/2^{2m}} with {M} an integer between {0} and {3^n}). This analysis of the component (3) of (1) is already enough to get quite a bit of spreading on { \mathbf{Syrac}({\bf Z}/3^n{\bf Z})} (roughly speaking, when the argument is optimised, it shows that this random variable cannot concentrate in any subset of {{\bf Z}/3^n{\bf Z}} of density less than {n^{-C}} for some large absolute constant {C>0}). To get from this to a stabilisation property (2) we have to exploit the mixing effects of the remaining portion of (1) that does not come from (3). After some standard Fourier-analytic manipulations, matters then boil down to obtaining non-trivial decay of the characteristic function of {\mathbf{Syrac}({\bf Z}/3^n{\bf Z})}, and more precisely in showing that

\displaystyle  \mathbb{E} e^{-2\pi i \xi \mathbf{Syrac}({\bf Z}/3^n{\bf Z}) / 3^n} \ll_A n^{-A} \ \ \ \ \ (4)

for any {A > 0} and any {\xi \in {\bf Z}/3^n{\bf Z}} that is not divisible by {3}.

If the random variable (1) was the sum of independent terms, one could express this characteristic function as something like a Riesz product, which would be straightforward to estimate well. Unfortunately, the terms in (1) are loosely coupled together, and so the characteristic factor does not immediately factor into a Riesz product. However, if one groups adjacent terms in (1) together, one can rewrite it (assuming {n} is even for sake of discussion) as

\displaystyle  (2^{\mathbf{a}_2} + 3) 2^{-\mathbf{b}_1} + (2^{\mathbf{a}_4}+3) 3^2 2^{-\mathbf{b}_1-\mathbf{b}_2} + \dots

\displaystyle  + (2^{\mathbf{a}_n}+3) 3^{n-2} 2^{-\mathbf{b}_1-\dots-\mathbf{b}_{n/2}} \hbox{ mod } 3^n

where {\mathbf{b}_j := \mathbf{a}_{2j-1} + \mathbf{a}_{2j}}. The point here is that after conditioning on the {\mathbf{b}_1,\dots,\mathbf{b}_{n/2}} to be fixed, the random variables {\mathbf{a}_2, \mathbf{a}_4,\dots,\mathbf{a}_n} remain independent (though the distribution of each {\mathbf{a}_{2j}} depends on the value that we conditioned {\mathbf{b}_j} to), and so the above expression is a conditional sum of independent random variables. This lets one express the characeteristic function of (1) as an averaged Riesz product. One can use this to establish the bound (4) as long as one can show that the expression

\displaystyle  \frac{\xi 3^{2j-2} (2^{-\mathbf{b}_1-\dots-\mathbf{b}_j+1} \mod 3^n)}{3^n}

is not close to an integer for a moderately large number ({\gg A \log n}, to be precise) of indices {j = 1,\dots,n/2}. (Actually, for technical reasons we have to also restrict to those {j} for which {\mathbf{b}_j=3}, but let us ignore this detail here.) To put it another way, if we let {B} denote the set of pairs {(j,l)} for which

\displaystyle  \frac{\xi 3^{2j-2} (2^{-l+1} \mod 3^n)}{3^n} \in [-\varepsilon,\varepsilon] + {\bf Z},

we have to show that (with overwhelming probability) the random walk

\displaystyle (1,\mathbf{b}_1), (2, \mathbf{b}_1 + \mathbf{b}_2), \dots, (n/2, \mathbf{b}_1+\dots+\mathbf{b}_{n/2})

(which we view as a two-dimensional renewal process) contains at least a few points lying outside of {B}.

A little bit of elementary number theory and combinatorics allows one to describe the set {B} as the union of “triangles” with a certain non-zero separation between them. If the triangles were all fairly small, then one expects the renewal process to visit at least one point outside of {B} after passing through any given such triangle, and it then becomes relatively easy to then show that the renewal process usually has the required number of points outside of {B}. The most difficult case is when the renewal process passes through a particularly large triangle in {B}. However, it turns out that large triangles enjoy particularly good separation properties, and in particular afer passing through a large triangle one is likely to only encounter nothing but small triangles for a while. After making these heuristics more precise, one is finally able to get enough points on the renewal process outside of {B} that one can finish the proof of (4), and thus Theorem 2.

In the fall quarter (starting Sep 27) I will be teaching a graduate course on analytic prime number theory.  This will be similar to a graduate course I taught in 2015, and in particular will reuse several of the lecture notes from that course, though it will also incorporate some new material (and omit some material covered in the previous course, to compensate).  I anticipate covering the following topics:

  1. Elementary multiplicative number theory
  2. Complex-analytic multiplicative number theory
  3. The entropy decrement argument
  4. Bounds for exponential sums
  5. Zero density theorems
  6. Halasz’s theorem and the Matomaki-Radziwill theorem
  7. The circle method
  8. (If time permits) Chowla’s conjecture and the Erdos discrepancy problem

Lecture notes for topics 3, 6, and 8 will be forthcoming.

 

William Banks, Kevin Ford, and I have just uploaded to the arXiv our paper “Large prime gaps and probabilistic models“. In this paper we introduce a random model to help understand the connection between two well known conjectures regarding the primes {{\mathcal P} := \{2,3,5,\dots\}}, the Cramér conjecture and the Hardy-Littlewood conjecture:

Conjecture 1 (Cramér conjecture) If {x} is a large number, then the largest prime gap {G_{\mathcal P}(x) := \sup_{p_n, p_{n+1} \leq x} p_{n+1}-p_n} in {[1,x]} is of size {\asymp \log^2 x}. (Granville refines this conjecture to {\gtrsim \xi \log^2 x}, where {\xi := 2e^{-\gamma} = 1.1229\dots}. Here we use the asymptotic notation {X \gtrsim Y} for {X \geq (1-o(1)) Y}, {X \sim Y} for {X \gtrsim Y \gtrsim X}, {X \gg Y} for {X \geq C^{-1} Y}, and {X \asymp Y} for {X \gg Y \gg X}.)

Conjecture 2 (Hardy-Littlewood conjecture) If {\mathcal{H} := \{h_1,\dots,h_k\}} are fixed distinct integers, then the number of numbers {n \in [1,x]} with {n+h_1,\dots,n+h_k} all prime is {({\mathfrak S}(\mathcal{H}) +o(1)) \int_2^x \frac{dt}{\log^k t}} as {x \rightarrow \infty}, where the singular series {{\mathfrak S}(\mathcal{H})} is defined by the formula

\displaystyle {\mathfrak S}(\mathcal{H}) := \prod_p \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p}\right) (1-\frac{1}{p})^{-k}.

(One can view these conjectures as modern versions of two of the classical Landau problems, namely Legendre’s conjecture and the twin prime conjecture respectively.)

A well known connection between the Hardy-Littlewood conjecture and prime gaps was made by Gallagher. Among other things, Gallagher showed that if the Hardy-Littlewood conjecture was true, then the prime gaps {p_{n+1}-p_n} with {n \leq x} were asymptotically distributed according to an exponential distribution of mean {\log x}, in the sense that

\displaystyle | \{ n: p_n \leq x, p_{n+1}-p_n \geq \lambda \log x \}| = (e^{-\lambda}+o(1)) \frac{x}{\log x} \ \ \ \ \ (1)

 

as {x \rightarrow \infty} for any fixed {\lambda \geq 0}. Roughly speaking, the way this is established is by using the Hardy-Littlewood conjecture to control the mean values of {\binom{|{\mathcal P} \cap (p_n, p_n + \lambda \log x)|}{k}} for fixed {k,\lambda}, where {p_n} ranges over the primes in {[1,x]}. The relevance of these quantities arises from the Bonferroni inequalities (or “Brun pure sieve“), which can be formulated as the assertion that

\displaystyle 1_{N=0} \leq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is even and

\displaystyle 1_{N=0} \geq \sum_{k=0}^K (-1)^k \binom{N}{k}

when {K} is odd, for any natural number {N}; setting {N := |{\mathcal P} \cap (p_n, p_n + \lambda \log x)|} and taking means, one then gets upper and lower bounds for the probability that the interval {(p_n, p_n + \lambda \log x)} is free of primes. The most difficult step is to control the mean values of the singular series {{\mathfrak S}(\mathcal{H})} as {{\mathcal H}} ranges over {k}-tuples in a fixed interval such as {[0, \lambda \log x]}.

Heuristically, if one extrapolates the asymptotic (1) to the regime {\lambda \asymp \log x}, one is then led to Cramér’s conjecture, since the right-hand side of (1) falls below {1} when {\lambda} is significantly larger than {\log x}. However, this is not a rigorous derivation of Cramér’s conjecture from the Hardy-Littlewood conjecture, since Gallagher’s computations only establish (1) for fixed choices of {\lambda}, which is only enough to establish the far weaker bound {G_{\mathcal P}(x) / \log x \rightarrow \infty}, which was already known (see this previous paper for a discussion of the best known unconditional lower bounds on {G_{\mathcal P}(x)}). An inspection of the argument shows that if one wished to extend (1) to parameter choices {\lambda} that were allowed to grow with {x}, then one would need as input a stronger version of the Hardy-Littlewood conjecture in which the length {k} of the tuple {{\mathcal H} = (h_1,\dots,h_k)}, as well as the magnitudes of the shifts {h_1,\dots,h_k}, were also allowed to grow with {x}. Our initial objective in this project was then to quantify exactly what strengthening of the Hardy-Littlewood conjecture would be needed to rigorously imply Cramer’s conjecture. The precise results are technical, but roughly we show results of the following form:

Theorem 3 (Large gaps from Hardy-Littlewood, rough statement)

  • If the Hardy-Littlewood conjecture is uniformly true for {k}-tuples of length {k \ll \frac{\log x}{\log\log x}}, and with shifts {h_1,\dots,h_k} of size {O( \log^2 x )}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \frac{\log^2 x}{\log\log x}}.
  • If the Hardy-Littlewood conjecture is “true on average” for {k}-tuples of length {k \ll \frac{y}{\log x}} and shifts {h_1,\dots,h_k} of size {y} for all {\log x \leq y \leq \log^2 x \log\log x}, with a power savings in the error term, then {G_{\mathcal P}(x) \gg \log^2 x}.

In particular, we can recover Cramer’s conjecture given a sufficiently powerful version of the Hardy-Littlewood conjecture “on the average”.

Our proof of this theorem proceeds more or less along the same lines as Gallagher’s calculation, but now with {k} allowed to grow slowly with {x}. Again, the main difficulty is to accurately estimate average values of the singular series {{\mathfrak S}({\mathfrak H})}. Here we found it useful to switch to a probabilistic interpretation of this series. For technical reasons it is convenient to work with a truncated, unnormalised version

\displaystyle V_{\mathcal H}(z) := \prod_{p \leq z} \left( 1 - \frac{|{\mathcal H} \hbox{ mod } p|}{p} \right)

of the singular series, for a suitable cutoff {z}; it turns out that when studying prime tuples of size {t}, the most convenient cutoff {z(t)} is the “Pólya magic cutoff“, defined as the largest prime for which

\displaystyle \prod_{p \leq z(t)}(1-\frac{1}{p}) \geq \frac{1}{\log t} \ \ \ \ \ (2)

 

(this is well defined for {t \geq e^2}); by Mertens’ theorem, we have {z(t) \sim t^{1/e^\gamma}}. One can interpret {V_{\mathcal Z}(z)} probabilistically as

\displaystyle V_{\mathcal Z}(z) = \mathbf{P}( {\mathcal H} \subset \mathcal{S}_z )

where {\mathcal{S}_z \subset {\bf Z}} is the randomly sifted set of integers formed by removing one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p \leq z}. The Hardy-Littlewood conjecture can be viewed as an assertion that the primes {{\mathcal P}} behave in some approximate statistical sense like the random sifted set {\mathcal{S}_z}, and one can prove the above theorem by using the Bonferroni inequalities both for the primes {{\mathcal P}} and for the random sifted set, and comparing the two (using an even {K} for the sifted set and an odd {K} for the primes in order to be able to combine the two together to get a useful bound).

The proof of Theorem 3 ended up not using any properties of the set of primes {{\mathcal P}} other than that this set obeyed some form of the Hardy-Littlewood conjectures; the theorem remains true (with suitable notational changes) if this set were replaced by any other set. In order to convince ourselves that our theorem was not vacuous due to our version of the Hardy-Littlewood conjecture being too strong to be true, we then started exploring the question of coming up with random models of {{\mathcal P}} which obeyed various versions of the Hardy-Littlewood and Cramér conjectures.

This line of inquiry was started by Cramér, who introduced what we now call the Cramér random model {{\mathcal C}} of the primes, in which each natural number {n \geq 3} is selected for membership in {{\mathcal C}} with an independent probability of {1/\log n}. This model matches the primes well in some respects; for instance, it almost surely obeys the “Riemann hypothesis”

\displaystyle | {\mathcal C} \cap [1,x] | = \int_2^x \frac{dt}{\log t} + O( x^{1/2+o(1)})

and Cramér also showed that the largest gap {G_{\mathcal C}(x)} was almost surely {\sim \log^2 x}. On the other hand, it does not obey the Hardy-Littlewood conjecture; more precisely, it obeys a simplified variant of that conjecture in which the singular series {{\mathfrak S}({\mathcal H})} is absent.

Granville proposed a refinement {{\mathcal G}} to Cramér’s random model {{\mathcal C}} in which one first sieves out (in each dyadic interval {[x,2x]}) all residue classes {0 \hbox{ mod } p} for {p \leq A} for a certain threshold {A = \log^{1-o(1)} x = o(\log x)}, and then places each surviving natural number {n} in {{\mathcal G}} with an independent probability {\frac{1}{\log n} \prod_{p \leq A} (1-\frac{1}{p})^{-1}}. One can verify that this model obeys the Hardy-Littlewood conjectures, and Granville showed that the largest gap {G_{\mathcal G}(x)} in this model was almost surely {\gtrsim \xi \log^2 x}, leading to his conjecture that this bound also was true for the primes. (Interestingly, this conjecture is not yet borne out by numerics; calculations of prime gaps up to {10^{18}}, for instance, have shown that {G_{\mathcal P}(x)} never exceeds {0.9206 \log^2 x} in this range. This is not necessarily a conflict, however; Granville’s analysis relies on inspecting gaps in an extremely sparse region of natural numbers that are more devoid of primes than average, and this region is not well explored by existing numerics. See this previous blog post for more discussion of Granville’s argument.)

However, Granville’s model does not produce a power savings in the error term of the Hardy-Littlewood conjectures, mostly due to the need to truncate the singular series at the logarithmic cutoff {A}. After some experimentation, we were able to produce a tractable random model {{\mathcal R}} for the primes which obeyed the Hardy-Littlewood conjectures with power savings, and which reproduced Granville’s gap prediction of {\gtrsim \xi \log^2 x} (we also get an upper bound of {\lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x}} for both models, though we expect the lower bound to be closer to the truth); to us, this strengthens the case for Granville’s version of Cramér’s conjecture. The model can be described as follows. We select one residue class {a_p \hbox{ mod } p} uniformly at random for each prime {p}, and as before we let {S_z} be the sifted set of integers formed by deleting the residue classes {a_p \hbox{ mod } p} with {p \leq z}. We then set

\displaystyle {\mathcal R} := \{ n \geq e^2: n \in S_{z(t)}\}

with {z(t)} Pólya’s magic cutoff (this is the cutoff that gives {{\mathcal R}} a density consistent with the prime number theorem or the Riemann hypothesis). As stated above, we are able to show that almost surely one has

\displaystyle \xi \log^2 x \lesssim {\mathcal G}_{\mathcal R}(x) \lesssim \xi \log^2 x \frac{\log\log x}{2 \log\log\log x} \ \ \ \ \ (3)

 

and that the Hardy-Littlewood conjectures hold with power savings for {k} up to {\log^c x} for any fixed {c < 1} and for shifts {h_1,\dots,h_k} of size {O(\log^c x)}. This is unfortunately a tiny bit weaker than what Theorem 3 requires (which more or less corresponds to the endpoint {c=1}), although there is a variant of Theorem 3 that can use this input to produce a lower bound on gaps in the model {{\mathcal R}} (but it is weaker than the one in (3)). In fact we prove a more precise almost sure asymptotic formula for {{\mathcal G}_{\mathcal R}(x) } that involves the optimal bounds for the linear sieve (or interval sieve), in which one deletes one residue class modulo {p} from an interval {[0,y]} for all primes {p} up to a given threshold. The lower bound in (3) relates to the case of deleting the {0 \hbox{ mod } p} residue classes from {[0,y]}; the upper bound comes from the delicate analysis of the linear sieve by Iwaniec. Improving on either of the two bounds looks to be quite a difficult problem.

The probabilistic analysis of {{\mathcal R}} is somewhat more complicated than of {{\mathcal C}} or {{\mathcal G}} as there is now non-trivial coupling between the events {n \in {\mathcal R}} as {n} varies, although moment methods such as the second moment method are still viable and allow one to verify the Hardy-Littlewood conjectures by a lengthy but fairly straightforward calculation. To analyse large gaps, one has to understand the statistical behaviour of a random linear sieve in which one starts with an interval {[0,y]} and randomly deletes a residue class {a_p \hbox{ mod } p} for each prime {p} up to a given threshold. For very small {p} this is handled by the deterministic theory of the linear sieve as discussed above. For medium sized {p}, it turns out that there is good concentration of measure thanks to tools such as Bennett’s inequality or Azuma’s inequality, as one can view the sieving process as a martingale or (approximately) as a sum of independent random variables. For larger primes {p}, in which only a small number of survivors are expected to be sieved out by each residue class, a direct combinatorial calculation of all possible outcomes (involving the random graph that connects interval elements {n \in [0,y]} to primes {p} if {n} falls in the random residue class {a_p \hbox{ mod } p}) turns out to give the best results.

Archives