You are currently browsing the category archive for the ‘Mathematics’ category.

Kevin Ford, Sergei Konyagin, James Maynard, Carl Pomerance, and I have uploaded to the arXiv our paper “Long gaps in sieved sets“, submitted to J. Europ. Math. Soc..

This paper originated from the MSRI program in analytic number theory last year, and was centred around variants of the question of finding large gaps between primes. As discussed for instance in this previous post, it is now known that within the set of primes ${{\mathcal P} = \{2,3,5,\dots\}}$, one can find infinitely many adjacent elements ${a,b}$ whose gap ${b-a}$ obeys a lower bound of the form

$\displaystyle b-a \gg \log a \frac{\log_2 a \log_4 a}{\log_3 a}$

where ${\log_k}$ denotes the ${k}$-fold iterated logarithm. This compares with the trivial bound of ${b-a \gg \log a}$ that one can obtain from the prime number theorem and the pigeonhole principle. Several years ago, Pomerance posed the question of whether analogous improvements to the trivial bound can be obtained for such sets as

$\displaystyle {\mathcal P}_2 = \{ n \in {\bf N}: n^2+1 \hbox{ prime} \}.$

Here there is the obvious initial issue that this set is not even known to be infinite (this is the fourth Landau problem), but let us assume for the sake of discussion that this set is indeed infinite, so that we have an infinite number of gaps to speak of. Standard sieve theory techniques give upper bounds for the density of ${{\mathcal P}_2}$ that is comparable (up to an absolute constant) to the prime number theorem bounds for ${{\mathcal P}}$, so again we can obtain a trivial bound of ${b-a \gg \log a}$ for the gaps of ${{\mathcal P}_2}$. In this paper we improve this to

$\displaystyle b-a \gg \log a \log^c_2 a$

for an absolute constant ${c>0}$; this is not as strong as the corresponding bound for ${{\mathcal P}}$, but still improves over the trivial bound. In fact we can handle more general “sifted sets” than just ${{\mathcal P}_2}$. Recall from the sieve of Eratosthenes that the elements of ${{\mathcal P}}$ in, say, the interval ${[x/2, x]}$ can be obtained by removing from ${[x/2, x]}$ one residue class modulo ${p}$ for each prime up to ${\sqrt{x}}$, namely the class ${0}$ mod ${p}$. In a similar vein, the elements of ${{\mathcal P}_2}$ in ${[x/2,x]}$ can be obtained by removing for each prime ${p}$ up to ${x}$ zero, one, or two residue classes modulo ${p}$, depending on whether ${-1}$ is a quadratic residue modulo ${p}$. On the average, one residue class will be removed (this is a very basic case of the Chebotarev density theorem), so this sieving system is “one-dimensional on the average”. Roughly speaking, our arguments apply to any other set of numbers arising from a sieving system that is one-dimensional on average. (One can consider other dimensions also, but unfortunately our methods seem to give results that are worse than a trivial bound when the dimension is less than or greater than one.)

The standard “Erdős-Rankin” method for constructing long gaps between primes proceeds by trying to line up some residue classes modulo small primes ${p}$ so that they collectively occupy a long interval. A key tool in doing so are the smooth number estimates of de Bruijn and others, which among other things assert that if one removes from an interval such as ${[1,x]}$ all the residue classes ${0}$ mod ${p}$ for ${p}$ between ${x^{1/u}}$ and ${x}$ for some fixed ${u>1}$, then the set of survivors has exceptionally small density (roughly of the order of ${u^{-u}}$, with the precise density given by the Dickman function), in marked contrast to the situation in which one randomly removes one residue class for each such prime ${p}$, in which the density is more like ${1/u}$. One generally exploits this phenomenon to sieve out almost all the elements of a long interval using some of the primes available, and then using the remaining primes to cover up the remaining elements that have not already been sifted out. In the more recent work on this problem, advanced combinatorial tools such as hypergraph covering lemmas are used for the latter task.

In the case of ${{\mathcal P}_2}$, there does not appear to be any analogue of smooth numbers, in the sense that there is no obvious way to arrange the residue classes so that they have significantly fewer survivors than a random arrangement. Instead we adopt the following semi-random strategy to cover an interval ${[1,y]}$ by residue classes. Firstly, we randomly remove residue classes for primes ${p}$ up to some intermediate threshold ${z}$ (smaller than ${y}$ by a logarithmic factor), leaving behind a preliminary sifted set ${S_{[2,z]}}$. Then, for each prime ${p}$ between ${z}$ and another intermediate threshold ${x/2}$, we remove a residue class mod ${p}$ that maximises (or nearly maximises) its intersection with ${S_{[2,z]}}$. This ends up reducing the number of survivors to be significantly below what one would achieve if one selects residue classes randomly, particularly if one also uses the hypergraph covering lemma from our previous paper. Finally, we cover each the remaining survivors by a residue class from a remaining available prime.

This is the third “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant ${\Lambda}$, continuing this previous thread. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

We are making progress on the following test problem: can one show that ${H_t(x+iy) \neq 0}$ whenever ${t = 0.4}$, ${x \geq 0}$, and ${y \geq 0.4}$? This would imply that

$\displaystyle \Lambda \leq 0.4 + \frac{1}{2} (0.4)^2 = 0.48$

which would be the first quantitative improvement over the de Bruijn bound of ${\Lambda \leq 1/2}$ (or the Ki-Kim-Lee refinement of ${\Lambda < 1/2}$). Of course we can try to lower the two parameters of ${0.4}$ later on in the project, but this seems as good a place to start as any. One could also potentially try to use finer analysis of dynamics of zeroes to improve the bound ${\Lambda \leq 0.48}$ further, but this seems to be a less urgent task.

Probably the hardest case is ${y=0.4}$, as there is a good chance that one can then recover the ${y>0.4}$ case by a suitable use of the argument principle. Here we appear to have a workable Riemann-Siegel type formula that gives a tractable approximation for ${H_t}$. To describe this formula, first note that in the ${t=0}$ case we have

$\displaystyle H_0(z) = \frac{1}{8} \xi( \frac{1+iz}{2} )$

and the Riemann-Siegel formula gives

$\displaystyle \xi(s) = \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \sum_{n=1}^N \frac{1}{n^s}$

$\displaystyle + \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \sum_{m=1}^M \frac{1}{m^{1-s}}$

$\displaystyle + \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \frac{e^{-i\pi s} \Gamma(1-s)}{2\pi i} \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1}\ dw$

for any natural numbers ${N,M}$, where ${C_M}$ is a contour from ${+\infty}$ to ${+\infty}$ that winds once anticlockwise around the zeroes ${e^{2\pi im}, |m| \leq M}$ of ${e^w-1}$ but does not wind around any other zeroes. A good choice of ${N,M}$ to use here is

$\displaystyle N=M=\lfloor \sqrt{\mathrm{Im}(s)/2\pi}\rfloor = \lfloor \sqrt{\mathrm{Re}(z)/4\pi} \rfloor. \ \ \ \ \ (1)$

In this case, a classical steepest descent computation (see wiki) yields the approximation

$\displaystyle \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1}\ dw \approx - (2\pi i M)^{s-1} \Psi( \frac{s}{2\pi i M} - N )$

where

$\displaystyle \Psi(\alpha) := 2\pi \frac{\cos \pi(\frac{1}{2}\alpha^2 - \alpha - \pi/8)}{\cos(\pi \alpha)} \exp( \frac{i\pi}{2} \alpha^2 - \frac{5\pi i}{8} ).$

Thus we have

$\displaystyle H_0(z) \approx A^{(0)} + B^{(0)} - C^{(0)}$

where

$\displaystyle A^{(0)} := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \sum_{n=1}^N \frac{1}{n^s}$

$\displaystyle B^{(0)} := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \sum_{m=1}^M \frac{1}{m^{1-s}}$

$\displaystyle C^{(0)} := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \frac{e^{-i\pi s} \Gamma(1-s)}{2\pi i} (2\pi i M)^{s-1} \Psi( \frac{s}{2\pi i M} - N )$

with ${s := \frac{1+iz}{2}}$ and ${N,M}$ given by (1).

Heuristically, we have derived (see wiki) the more general approximation

$\displaystyle H_t(z) \approx A + B - C$

for ${t>0}$ (and in particular for ${t=0.4}$), where

$\displaystyle A := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(s/2) \sum_{n=1}^N \frac{\exp(\frac{t}{16} \log^2 \frac{s+4}{2\pi n^2} )}{n^s}$

$\displaystyle B := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \sum_{m=1}^M \frac{\exp(\frac{t}{16} \log^2 \frac{5-s}{2\pi m^2} )}{m^{1-s}}$

$\displaystyle C := \exp(-\frac{t \pi^2}{64}) C^{(0)}.$

In practice it seems that the ${C}$ term is negligible once the real part ${x}$ of ${z}$ is moderately large, so one also has the approximation

$\displaystyle H_t(z) \approx A + B.$

For large ${x}$, and for fixed ${t,y>0}$, e.g. ${t=y=0.4}$, the sums ${A,B}$ converge fairly quickly (in fact the situation seems to be significantly better here than the much more intensively studied ${t=0}$ case), and we expect the first term

$\displaystyle B_0 := \frac{1}{8} \frac{s(s-1)}{2} \pi^{-(1-s)/2} \Gamma((1-s)/2) \exp( \frac{t}{16} \log^2 \frac{5-s}{2\pi} )$

of the ${B}$ series to dominate. Indeed, analytically we know that ${\frac{A+B-C}{B_0} \rightarrow 1}$ (or ${\frac{A+B}{B_0} \rightarrow 1}$) as ${x \rightarrow \infty}$ (holding ${y}$ fixed), and it should also be provable that ${\frac{H_t}{B_0} \rightarrow 1}$ as well. Numerically with ${t=y=0.4}$, it seems in fact that ${\frac{A+B-C}{B_0}}$ (or ${\frac{A+B}{B_0}}$) stay within a distance of about ${1/2}$ of ${1}$ once ${x}$ is moderately large (e.g. ${x \geq 2 \times 10^5}$). This raises the hope that one can solve the toy problem of showing ${H_t(x+iy) \neq 0}$ for ${t=y=0.4}$ by numerically controlling ${H_t(x+iy) / B_0}$ for small ${x}$ (e.g. ${x \leq 2 \times 10^5}$), numerically controlling ${(A+B)/B_0}$ and analytically bounding the error ${(H_t - A - B)/B_0}$ for medium ${x}$ (e.g. ${2 \times 10^5 \leq x \leq 10^7}$), and analytically bounding both ${(A+B)/B_0}$ and ${(H_t-A-B)/B_0}$ for large ${x}$ (e.g. ${x \geq 10^7}$). (These numbers ${2 \times 10^5}$ and ${10^7}$ are arbitrarily chosen here and may end up being optimised to something else as the computations become clearer.)

Thus, we now have four largely independent tasks (for suitable ranges of “small”, “medium”, and “large” ${x}$):

1. Numerically computing ${H_t(x+iy) / B_0}$ for small ${x}$ (with enough accuracy to verify that there are no zeroes)
2. Numerically computing ${(A+B)/B_0}$ for medium ${x}$ (with enough accuracy to keep it away from zero)
3. Analytically bounding ${(A+B)/B_0}$ for large ${x}$ (with enough accuracy to keep it away from zero); and
4. Analytically bounding ${(H_t - A - B)/B_0}$ for medium and large ${x}$ (with a bound that is better than the bound away from zero in the previous two tasks).

Note that tasks 2 and 3 do not directly require any further understanding of the function ${H_t}$.

Below we will give a progress report on the numeric and analytic sides of these tasks.

— 1. Numerics report (contributed by Sujit Nair) —

There is some progress on the code side but not at the pace I was hoping. Here are a few things which happened (rather, mistakes which were taken care of).

1. We got rid of code which wasn’t being used. For example, @dhjpolymath computed ${H_t}$ based on an old version but only realized it after the fact.
2. We implemented tests to catch human/numerical bugs before a computation starts. Again, we lost some numerical cycles but moving forward these can be avoided.
3. David got set up on GitHub and he is able to compare his output (in C) with the Python code. That is helping a lot.

Two areas which were worked on were

1. Computing ${H_t}$ and zeroes for ${t}$ around ${0.4}$
2. Computing quantities like ${(A+B-C)/B_0}$, ${(A+B)/B_0}$, ${C/B_0}$, etc. with the goal of understanding the zero free regions.

Some observations for ${t=0.4}$, ${y=0.4}$, ${x \in ( 10^4, 10^7)}$ include:

• ${(A+B) / B_0}$ does seem to avoid the negative real axis
• ${|(A+B) / B0| > 0.4}$ (based on the oscillations and trends in the plots)
• ${|C/B_0|}$ seems to be settling around ${10^{-4}}$ range.

See the figure below. The top plot is on the complex plane and the bottom plot is the absolute value. The code to play with this is here.

— 2. Analysis report —

The Riemann-Siegel formula and some manipulations (see wiki) give ${H_0 = A^{(0)} + B^{(0)} - \tilde C^{(0)}}$, where

$\displaystyle A^{(0)} = \frac{2}{8} \sum_{n=1}^N \int_C \exp( \frac{s+4}{2} u - e^u - \frac{s}{2} \log(\pi n^2) )\ du$

$\displaystyle - \frac{3}{8} \sum_{n=1}^N \int_C \exp( \frac{s+2}{2} u - e^u - \frac{s}{2} \log(\pi n^2) )\ du$

$\displaystyle B^{(0)} = \frac{2}{8} \sum_{m=1}^M \int_{\overline{C}} \exp( \frac{5-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) )\ du$

$\displaystyle - \frac{3}{8} \sum_{m=1}^M \int_C \exp( \frac{3-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) )\ du$

$\displaystyle \tilde C^{(0)} := -\frac{2}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{5-s}{2} u - e^u)\ du dw$

$\displaystyle +\frac{3}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M} \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{3-s}{2} u - e^u)\ du dw$

where ${C}$ is a contour that goes from ${+i\infty}$ to ${+\infty}$ staying a bounded distance away from the upper imaginary and right real axes, and ${\overline{C}}$ is the complex conjugate of ${C}$. (In each of these sums, it is the first term that should dominate, with the second one being about ${O(1/x)}$ as large.) One can then evolve by the heat flow to obtain ${H_t = \tilde A + \tilde B - \tilde C}$, where

$\displaystyle \tilde A := \frac{2}{8} \sum_{n=1}^N \int_C \exp( \frac{s+4}{2} u - e^u - \frac{s}{2} \log(\pi n^2) + \frac{t}{16} (u - \log(\pi n^2))^2)\ du$

$\displaystyle - \frac{3}{8} \sum_{n=1}^N \int_C \exp( \frac{s+2}{2} u - e^u - \frac{s}{2} \log(\pi n^2) + \frac{t}{16} (u - \log(\pi n^2))^2)\ du$

$\displaystyle \tilde B := \frac{2}{8} \sum_{m=1}^M \int_{\overline{C}} \exp( \frac{5-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) + \frac{t}{16} (u - \log(\pi m^2))^2)\ du$

$\displaystyle - \frac{3}{8} \sum_{m=1}^M \int_C \exp( \frac{3-s}{2} u - e^u - \frac{1-s}{2} \log(\pi m^2) + \frac{t}{16} (u - \log(\pi m^2))^2)\ du$

$\displaystyle \tilde C := -\frac{2}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M}$

$\displaystyle \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{5-s}{2} u - e^u + \frac{t}{4} (i \pi(n-1/2) + \log \frac{w}{2\sqrt{\pi}} - \frac{u}{2})^2) \ du dw$

$\displaystyle +\frac{3}{8} \sum_{n=0}^\infty \frac{e^{-i\pi s/2} e^{i\pi s n}}{2^s \pi^{1/2}} \int_{\overline{C}} \int_{C_M}$

$\displaystyle \frac{w^{s-1} e^{-Nw}}{e^w-1} \exp( \frac{3-s}{2} u - e^u + \frac{t}{4} (i \pi(n-1/2) + \log \frac{w}{2\sqrt{\pi}} - \frac{u}{2})^2)\ du dw.$

Steepest descent heuristics then predict that ${\tilde A \approx A}$, ${\tilde B \approx B}$, and ${\tilde C \approx C}$. For the purposes of this project, we will need effective error estimates here, with explicit error terms.

A start has been made towards this goal at this wiki page. Firstly there is a “effective Laplace method” lemma that gives effective bounds on integrals of the form ${\int_I e^{\phi(x)} \psi(x)\ dx}$ if the real part of ${\phi(x)}$ is either monotone with large derivative, or has a critical point and is decreasing on both sides of that critical point. In principle, all one has to do is manipulate expressions such as ${\tilde A - A}$, ${\tilde B - B}$, ${\tilde C - C}$ by change of variables, contour shifting and integration by parts until it is of the form to which the above lemma can be profitably applied. As one may imagine though the computations are messy, particularly for the ${\tilde C}$ term. As a warm up, I have begun by trying to estimate integrals of the form

$\displaystyle \int_C \exp( s (1+u-e^u) + \frac{t}{16} (u+b)^2 )\ du$

for smallish complex numbers ${b}$, as these sorts of integrals appear in the form of ${\tilde A, \tilde B, \tilde C}$. As of this time of writing, there are effective bounds for the ${b=0}$ case, and I am currently working on extending them to the ${b \neq 0}$ case, which should give enough control to approximate ${\tilde A - A}$ and ${\tilde B-B}$. The most complicated task will be that of upper bounding ${\tilde C}$, but it also looks eventually doable.

This is the second “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant ${\Lambda}$, continuing this previous thread. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

We now have the following proposition (see this page for a proof sketch) that looks like it can give a numerically feasible approach to bound ${\Lambda}$:

Proposition 1 Suppose that one has parameters ${t_0, T, \varepsilon > 0}$ obeying the following properties:

• All the zeroes of ${H_0(x+iy)=0}$ with ${0 \leq x \leq T}$ are real.
• There are no zeroes ${H_t(x+iy)=0}$ with ${0 \leq t \leq t_0}$ in the region ${\{ x+iy: x \geq T; 1-2t \geq y^2 \geq \varepsilon^2 + (T-x)^2 \}}$.
• There are no zeroes ${H_{t_0}(x+iy)=0}$ with ${x > T}$ and ${y \geq \varepsilon}$.

Then one has ${\Lambda \leq t_0 + \frac{1}{2} \varepsilon^2}$.

The first hypothesis is already known for ${T}$ up to about ${10^{12}}$ (we should find out exactly what we can reach here). Preliminary calculations suggest that we can obtain the third item provided that ${t_0, \varepsilon \gg \frac{1}{\log T}}$. The second hypothesis requires good numerical calculation for ${H_t}$, to which we now turn.

The initial definition of ${H_t}$ is given by the formula

$\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du$

where

$\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3\pi n^2 e^{5u} ) \exp(-\pi n^2 e^{4u}).$

This formula has proven numerically computable to acceptable error up until about the first hundred zeroes of ${H_t}$, but degrades after that, and so other exact or approximate formulae for ${H_t}$ are needed. One possible exact formula that could be useful is

$\displaystyle H_t(z) = \frac{1}{2} (K_{t,\theta}(z) + \overline{K_{t,\theta}(\overline{z})})$

where

$\displaystyle K_{t,\theta}(z) := \sum_{n=1}^\infty (2\pi^2 n^4 I_{t,\theta}(z-9i, \pi n^2) - 3\pi n^2I_{t,\theta}(z-5i, \pi n^2))$

and

$\displaystyle I_{t,\theta}(b,\beta) := \int_{i\theta}^{i\theta+i\infty} \exp(tu^2 - \beta e^{4u} + ibu)\ du$

and ${-\pi/8 < \theta < \pi/8}$ can be chosen arbitrarily. We are still trying to see if this can be implemented numerically to give better accuracy than the previous formula.

It seems particularly promising to develop a generalisation of the Riemann-Siegel approximate functional equation for ${H_0}$. Preliminary computations suggest in particular that we have the approximation

$\displaystyle H_t(x+iy) \approx \frac{1}{4} (F_t(\frac{1+ix-y}{2}) + \overline{F_t(\frac{1+ix+y}{2})})$

where

$\displaystyle F_t(s) := \pi^{-s/2} \Gamma(\frac{s+4}{2}) \sum_{n \leq \sqrt{\mathrm{Im}(s)/2\pi}} \frac{\exp( \frac{t}{16} \log^2 \frac{s+4}{2\pi n^2})}{n^s}.$

Some very preliminary numerics suggest that this formula is reasonably accurate even for moderate values of ${x}$, though further numerical verification is needed. As a proof of concept, one could take this approximation as exact for the purposes of seeing what ranges of ${T}$ one can feasibly compute with (and for extremely large values of ${T}$, we will presumably have to introduce some version of the Odlyzko-Schönhage algorithm. Of course, to obtain a rigorous result, we will eventually need a rigorous version of this formula with explicit error bounds. It may also be necessary to add more terms to the approximation to reduce the size of the error.

Sujit Nair has kindly summarised for me the current state of affairs with the numerics as follows:

• We need a real milestone and work backward to set up intermediate goals. This will definitely help bring in focus!
• So far, we have some utilities to compute zeroes of ${H_t}$ with a nonlinear solver which uses roots of ${H_0}$ as an initial condition. The solver is a wrapper around MINPACK’s implementation of Powell’s method. There is some room for optimization. For example, we aren’t providing the solver with the analytical Jacobian which speeds up the computation and increases accuracy.
• We have some results in the output folder which contains the first 1000 roots of ${H_t}$ for some small values of ${t \in \{0.01, 0.1, 0.22\}}$, etc. They need some more organization and visualization.

We have a decent initial start but we have some ways to go. Moving forward, here is my proposition for some areas of focus. We should expand and prioritize after some open discussion.

1. Short term Optimize the existing framework and target to have the first million zeros of ${H_t}$ (for a reasonable range of ${t}$) and the corresponding plots. With better engineering practice and discipline, I am confident we can get to a few tens of millions range. Some things which will help include parallelization, iterative approaches (using zeroes of ${H_t}$ to compute zeroes of ${H_{t + \delta t}}$), etc.
2. Medium term We need to explore better ways to represent the zeros and compute them. An analogy is the computation of Riemann zeroes up to height ${T}$. It is computed by computing the sign changes of ${Z(t)}$ (page 119 of Edwards) and by exploiting the ${\sqrt T}$ speed up with the Riemann-Siegel formulation (over Euler-Maclaurin). For larger values of ${j}$, I am not sure the root solver based approach is going to work to understand the gaps between zeroes.
3. Long term We also need a better understanding of the errors involved in the computation — truncation, hardware/software, etc.

This is the first official “research” thread of the Polymath15 project to upper bound the de Bruijn-Newman constant ${\Lambda}$. Discussion of the project of a non-research nature can continue for now in the existing proposal thread. Progress will be summarised at this Polymath wiki page.

The proposal naturally splits into at least three separate (but loosely related) topics:

• Numerical computation of the entire functions ${H_t(z)}$, with the ultimate aim of establishing zero-free regions of the form ${\{ x+iy: 0 \leq x \leq T, y \geq \varepsilon \}}$ for various ${T, \varepsilon > 0}$.
• Improved understanding of the dynamics of the zeroes ${z_j(t)}$ of ${H_t}$.
• Establishing the zero-free nature of ${H_t(x+iy)}$ when ${y \geq \varepsilon > 0}$ and ${x}$ is sufficiently large depending on ${t}$ and ${\varepsilon}$.

Below the fold, I will present each of these topics in turn, to initiate further discussion in each of them. (I thought about splitting this post into three to have three separate discussions, but given the current volume of comments, I think we should be able to manage for now having all the comments in a single post. If this changes then of course we can split up some of the discussion later.)

To begin with, let me present some formulae for computing ${H_t}$ (inspired by similar computations in the Ki-Kim-Lee paper) which may be useful. The initial definition of ${H_t}$ is

$\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du$

where

$\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3 \pi n^2 e^{5u}) \exp(- \pi n^2 e^{4u} )$

is a variant of the Jacobi theta function. We observe that ${\Phi}$ in fact extends analytically to the strip

$\displaystyle \{ u \in {\bf C}: -\frac{\pi}{8} < \mathrm{Im} u < \frac{\pi}{8} \}, \ \ \ \ \ (1)$

as ${e^{4u}}$ has positive real part on this strip. One can use the Poisson summation formula to verify that ${\Phi}$ is even, ${\Phi(-u) = \Phi(u)}$ (see this previous post for details). This lets us obtain a number of other formulae for ${H_t}$. Most obviously, one can unfold the integral to obtain

$\displaystyle H_t(z) = \frac{1}{2} \int_{-\infty}^\infty e^{tu^2} \Phi(u) e^{izu}\ du.$

In my previous paper with Brad, we used this representation, combined with Fubini’s theorem to swap the sum and integral, to obtain a useful series representation for ${H_t}$ in the ${t<0}$ case. Unfortunately this is not possible in the ${t>0}$ case because expressions such as ${e^{tu^2} e^{9u} \exp( -\pi n^2 e^{4u} ) e^{izu}}$ diverge as ${u}$ approaches ${-\infty}$. Nevertheless we can still perform the following contour integration manipulation. Let ${0 \leq \theta < \frac{\pi}{8}}$ be fixed. The function ${\Phi}$ decays super-exponentially fast (much faster than ${e^{tu^2}}$, in particular) as ${\mathrm{Re} u \rightarrow +\infty}$ with ${-\infty \leq \mathrm{Im} u \leq \theta}$; as ${\Phi}$ is even, we also have this decay as ${\mathrm{Re} u \rightarrow -\infty}$ with ${-\infty \leq \mathrm{Im} u \leq \theta}$ (this is despite each of the summands in ${\Phi}$ having much slower decay in this direction – there is considerable cancellation!). Hence by the Cauchy integral formula we have

$\displaystyle H_t(z) = \frac{1}{2} \int_{i\theta-\infty}^{i\theta+\infty} e^{tu^2} \Phi(u) e^{izu}\ du.$

Splitting the horizontal line from ${i\theta-\infty}$ to ${i\theta+\infty}$ at ${i\theta}$ and using the even nature of ${\Phi(u)}$, we thus have

$\displaystyle H_t(z) = \frac{1}{2} ( \int_{i\theta}^{i\theta+\infty} e^{tu^2} \Phi(u) e^{izu}\ du + \int_{-i\theta}^{-i\theta+\infty} e^{tu^2} \Phi(u) e^{-izu}\ du.$

Using the functional equation ${\Phi(\overline{u}) = \overline{\Phi(u)}}$, we thus have the representation

$\displaystyle H_t(z) = \frac{1}{2} ( K_{t,\theta}(z) + \overline{K_{t,\theta}(\overline{z})} ) \ \ \ \ \ (2)$

where

$\displaystyle K_{t,\theta}(z) := \int_{i\theta}^{i \theta+\infty} e^{tu^2} \Phi(u) e^{izu}\ du$

$\displaystyle = \sum_{n=1}^\infty 2 \pi^2 n^4 I_{t, \theta}( z - 9i, \pi n^2 ) - 3 \pi n^2 I_{t,\theta}( z - 5i, \pi n^2 )$

where ${I_{t,\theta}(b,\beta)}$ is the oscillatory integral

$\displaystyle I_{t,\theta}(b,\beta) := \int_{i\theta}^{i\theta+\infty} \exp( tu^2 - \beta e^{4u} + i b u )\ du. \ \ \ \ \ (3)$

The formula (2) is valid for any ${0 \leq \theta < \frac{\pi}{8}}$. Naively one would think that it would be simplest to take ${\theta=0}$; however, when ${z=x+iy}$ and ${x}$ is large (with ${y}$ bounded), it seems asymptotically better to take ${\theta}$ closer to ${\pi/8}$, in particular something like ${\theta = \frac{\pi}{8} - \frac{1}{4x}}$ seems to be a reasonably good choice. This is because the integrand in (3) becomes significantly less oscillatory and also much lower in amplitude; the ${\exp(ibu)}$ term in (3) now generates a factor roughly comparable to ${\exp( - \pi x/8 )}$ (which, as we will see below, is the main term in the decay asymptotics for ${H_t(x+iy)}$), while the ${\exp( - \beta e^{4u} )}$ term still exhibits a reasonable amount of decay as ${u \rightarrow \infty}$. We will use the representation (2) in the asymptotic analysis of ${H_t}$ below, but it may also be a useful representation to use for numerical purposes.

Building on the interest expressed in the comments to this previous post, I am now formally proposing to initiate a “Polymath project” on the topic of obtaining new upper bounds on the de Bruijn-Newman constant ${\Lambda}$. The purpose of this post is to describe the proposal and discuss the scope and parameters of the project.

De Bruijn introduced a family ${H_t: {\bf C} \rightarrow {\bf C}}$ of entire functions for each real number ${t}$, defined by the formula

$\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du$

where ${\Phi}$ is the super-exponentially decaying function

$\displaystyle \Phi(u) := \sum_{n=1}^\infty (2\pi^2 n^4 e^{9u} - 3 \pi n^2 e^{5u}) \exp(-\pi n^2 e^{4u}).$

As discussed in this previous post, the Riemann hypothesis is equivalent to the assertion that all the zeroes of ${H_0}$ are real.

De Bruijn and Newman showed that there existed a real constant ${\Lambda}$ – the de Bruijn-Newman constant – such that ${H_t}$ has all zeroes real whenever ${t \geq \Lambda}$, and at least one non-real zero when ${t < \Lambda}$. In particular, the Riemann hypothesis is equivalent to the upper bound ${\Lambda \leq 0}$. In the opposite direction, several lower bounds on ${\Lambda}$ have been obtained over the years, most recently in my paper with Brad Rodgers where we showed that ${\Lambda \geq 0}$, a conjecture of Newman.

As for upper bounds, de Bruijn showed back in 1950 that ${\Lambda \leq 1/2}$. The only progress since then has been the work of Ki, Kim and Lee in 2009, who improved this slightly to ${\Lambda < 1/2}$. The primary proposed aim of this Polymath project is to obtain further explicit improvements to the upper bound of ${\Lambda}$. Of course, if we could lower the upper bound all the way to zero, this would solve the Riemann hypothesis, but I do not view this as a realistic outcome of this project; rather, the upper bounds that one could plausibly obtain by known methods and numerics would be comparable in achievement to the various numerical verifications of the Riemann hypothesis that exist in the literature (e.g., that the first ${N}$ non-trivial zeroes of the zeta function lie on the critical line, for various large explicit values of ${N}$).

In addition to the primary goal, one could envisage some related secondary goals of the project, such as a better understanding (both analytic and numerical) of the functions ${H_t}$ (or of similar functions), and of the dynamics of the zeroes of these functions. Perhaps further potential goals could emerge in the discussion to this post.

I think there is a plausible plan of attack on this project that proceeds as follows. Firstly, there are results going back to the original work of de Bruijn that demonstrate that the zeroes of ${H_t}$ become attracted to the real line as ${t}$ increases; in particular, if one defines ${\sigma_{max}(t)}$ to be the supremum of the imaginary parts of all the zeroes of ${H_t}$, then it is known that this quantity obeys the differential inequality

$\displaystyle \frac{d}{dt} \sigma_{max}(t) \leq - \frac{1}{\sigma_{max}(t)} \ \ \ \ \ (1)$

whenever ${\sigma_{max}(t)}$ is positive; furthermore, once ${\sigma_{max}(t) = 0}$ for some ${t}$, then ${\sigma_{max}(t') = 0}$ for all ${t' > t}$. I hope to explain this in a future post (it is basically due to the attraction that a zero off the real axis has to its complex conjugate). As a corollary of this inequality, we have the upper bound

$\displaystyle \Lambda \leq t + \frac{1}{2} \sigma_{max}(t)^2 \ \ \ \ \ (2)$

for any real number ${t}$. For instance, because all the non-trivial zeroes of the Riemann zeta function lie in the critical strip ${\{ s: 0 \leq \mathrm{Re} s \leq 1 \}}$, one has ${\sigma_{max}(0) \leq 1}$, which when inserted into (2) gives ${\Lambda \leq 1/2}$. The inequality (1) also gives ${\sigma_{max}(t) \leq \sqrt{1-2t}}$ for all ${0 \leq t \leq 1/2}$. If we could find some explicit ${t}$ between ${0}$ and ${1/2}$ where we can improve this upper bound on ${\sigma_{max}(t)}$ by an explicit constant, this would lead to a new upper bound on ${\Lambda}$.

Secondly, the work of Ki, Kim and Lee (based on an analysis of the various terms appearing in the expression for ${H_t}$) shows that for any positive ${t}$, all but finitely many of the zeroes of ${H_t}$ are real (in contrast with the ${t=0}$ situation, where it is still an open question as to whether the proportion of non-trivial zeroes of the zeta function on the critical line is asymptotically equal to ${1}$). As a key step in this analysis, Ki, Kim, and Lee show that for any ${t>0}$ and ${\varepsilon>0}$, there exists a ${T>0}$ such that all the zeroes of ${H_t}$ with real part at least ${T}$, have imaginary part at most ${\varepsilon}$. Ki, Kim and Lee do not explicitly compute how ${T}$ depends on ${t}$ and ${\varepsilon}$, but it looks like this bound could be made effective.

If so, this suggests a possible strategy to get a new upper bound on ${\Lambda}$:

• Select a good choice of parameters ${t, \varepsilon > 0}$.
• By refining the Ki-Kim-Lee analysis, find an explicit ${T}$ such that all zeroes of ${H_t}$ with real part at least ${T}$ have imaginary part at most ${\varepsilon}$.
• By a numerical computation (e.g. using the argument principle), also verify that zeroes of ${H_t}$ with real part between ${0}$ and ${T}$ have imaginary part at most ${\varepsilon}$.
• Combining these facts, we obtain that ${\sigma_{max}(t) \leq \varepsilon}$; hopefully, one can insert this into (2) and get a new upper bound for ${\Lambda}$.

Of course, there may also be alternate strategies to upper bound ${\Lambda}$, and I would imagine this would also be a legitimate topic of discussion for this project.

One appealing thing about the above strategy for the purposes of a polymath project is that it naturally splits the project into several interacting but reasonably independent parts: an analytic part in which one tries to refine the Ki-Kim-Lee analysis (based on explicitly upper and lower bounding various terms in a certain series expansion for ${H_t}$ – I may detail this later in a subsequent post); a numerical part in which one controls the zeroes of ${H_t}$ in a certain finite range; and perhaps also a dynamical part where one sees if there is any way to improve the inequality (2). For instance, the numerical “team” might, over time, be able to produce zero-free regions for ${H_t}$ with an increasingly large value of ${T}$, while in parallel the analytic “team” might produce increasingly smaller values of ${T}$ beyond which they can control zeroes, and eventually the two bounds would meet up and we obtain a new bound on ${\Lambda}$. This factoring of the problem into smaller parts was also a feature of the successful Polymath8 project on bounded gaps between primes.

The project also resembles Polymath8 in another aspect: that there is an obvious way to numerically measure progress, by seeing how the upper bound for ${\Lambda}$ decreases over time (and presumably there will also be another metric of progress regarding how well we can control ${T}$ in terms of ${t}$ and ${\varepsilon}$). However, in Polymath8 the final measure of progress (the upper bound ${H}$ on gaps between primes) was a natural number, and thus could not decrease indefinitely. Here, the bound will be a real number, and there is a possibility that one may end up having an infinite descent in which progress slows down over time, with refinements to increasingly less significant digits of the bound as the project progresses. Because of this, I think it makes sense to follow recent Polymath projects and place an expiration date for the project, for instance one year after the launch date, in which we will agree to end the project and (if the project was successful enough) write up the results, unless there is consensus at that time to extend the project. (In retrospect, we should probably have imposed similar sunset dates on older Polymath projects, some of which have now been inactive for years, but that is perhaps a discussion for another time.)

Some Polymath projects have been known for a breakneck pace, making it hard for some participants to keep up. It’s hard to control these things, but I am envisaging a relatively leisurely project here, perhaps taking the full year mentioned above. It may well be that as the project matures we will largely be waiting for the results of lengthy numerical calculations to come in, for instance. Of course, as with previous projects, we would maintain some wiki pages (and possibly some other resources, such as a code repository) to keep track of progress and also to summarise what we have learned so far. For instance, as was done with some previous Polymath projects, we could begin with some “online reading seminars” where we go through some relevant piece of literature (most obviously the Ki-Kim-Lee paper, but there may be other resources that become relevant, e.g. one could imagine the literature on numerical verification of RH to be of value).

One could also imagine some incidental outcomes of this project, such as a more efficient way to numerically establish zero free regions for various analytic functions of interest; in particular, the project may well end up focusing on some other aspect of mathematics than the specific questions posed here.

Anyway, I would be interested to hear in the comments below from others who might be interested in participating, or at least observing, this project, particularly if they have suggestions regarding the scope and direction of the project, and on organisational structure (e.g. if one should start with reading seminars, or some initial numerical exploration of the functions ${H_t}$, etc..) One could also begin some preliminary discussion of the actual mathematics of the project itself, though (in line with the leisurely pace I was hoping for), I expect that the main burst of mathematical activity would happen later, once the project is formally launched (with wiki page resources, blog posts dedicated to specific aspects of the project, etc.).

In this post we assume the Riemann hypothesis and the simplicity of zeroes, thus the zeroes of ${\zeta}$ in the critical strip take the form ${\frac{1}{2} \pm i \gamma_j}$ for some real number ordinates ${0 < \gamma_1 < \gamma_2 < \dots}$. From the Riemann-von Mangoldt formula, one has the asymptotic

$\displaystyle \gamma_n = (1+o(1)) \frac{2\pi}{\log n} n$

as ${n \rightarrow \infty}$; in particular, the spacing ${\gamma_{n+1} - \gamma_n}$ should behave like ${\frac{2\pi}{\log n}}$ on the average. However, it can happen that some gaps are unusually small compared to other nearby gaps. For the sake of concreteness, let us define a Lehmer pair to be a pair of adjacent ordinates ${\gamma_n, \gamma_{n+1}}$ such that

$\displaystyle \frac{1}{(\gamma_{n+1} - \gamma_n)^2} \geq 1.3 \sum_{m \neq n,n+1} \frac{1}{(\gamma_m - \gamma_n)^2} + \frac{1}{(\gamma_m - \gamma_{n+1})^2}. \ \ \ \ \ (1)$

The specific value of constant ${1.3}$ is not particularly important here; anything larger than ${\frac{5}{4}}$ would suffice. An example of such a pair would be the classical pair

$\displaystyle \gamma_{6709} = 7005.062866\dots$

$\displaystyle \gamma_{6710} = 7005.100564\dots$

discovered by Lehmer. It follows easily from the main results of Csordas, Smith, and Varga that if an infinite number of Lehmer pairs (in the above sense) existed, then the de Bruijn-Newman constant ${\Lambda}$ is non-negative. This implication is now redundant in view of the unconditional results of this recent paper of Rodgers and myself; however, the question of whether an infinite number of Lehmer pairs exist remain open.

In this post, I sketch an argument that Brad and I came up with (as initially suggested by Odlyzko) the GUE hypothesis implies the existence of infinitely many Lehmer pairs. We argue probabilistically: pick a sufficiently large number ${T}$, pick ${n}$ at random from ${T \log T}$ to ${2 T \log T}$ (so that the average gap size is close to ${\frac{2\pi}{\log T}}$), and prove that the Lehmer pair condition (1) occurs with positive probability.

Introduce the renormalised ordinates ${x_n := \frac{\log T}{2\pi} \gamma_n}$ for ${T \log T \leq n \leq 2 T \log T}$, and let ${\varepsilon > 0}$ be a small absolute constant (independent of ${T}$). It will then suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2} \geq$

$\displaystyle 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2}$

$\displaystyle + \frac{1}{6\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$, since the contribution of those ${m}$ outside of ${[T \log T, 2T \log T]}$ can be absorbed by the ${\frac{1}{\varepsilon^2}}$ factor with probability ${o(1)}$.

As one consequence of the GUE hypothesis, we have ${x_{n+1} - x_n \leq \varepsilon^2}$ with probability ${O(\varepsilon^6)}$. Thus, if ${E := \{ m \in [T \log T, 2T \log T]: x_{m+1} - x_m \leq \varepsilon^2 \}}$, then ${E}$ has density ${O( \varepsilon^6 )}$. Applying the Hardy-Littlewood maximal inequality, we see that with probability ${O(\varepsilon^6)}$, we have

$\displaystyle \sup_{h \geq 1} | \# E \cap [n+h, n-h] | \leq \frac{1}{10}$

which implies in particular that

$\displaystyle |x_m - x_n|, |x_{m} - x_{n+1}| \gg \varepsilon^2 |m-n|$

for all ${m \in [T \log T, 2 T \log T] \backslash \{ n, n+1\}}$. This implies in particular that

$\displaystyle \sum_{m \in [T \log T, 2T \log T]: |m-n| \geq \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} \ll \varepsilon^{-1}$

and so it will suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2}$

$\displaystyle \geq 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1; |m-n| < \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} + \frac{1}{5\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$.

By the GUE hypothesis (and the fact that ${\varepsilon}$ is independent of ${T}$), it suffices to show that a Dyson sine process ${(x_n)_{n \in {\bf Z}}}$, normalised so that ${x_0}$ is the first positive point in the process, obeys the inequality

$\displaystyle \frac{1}{(x_{1} - x_0)^2} \geq 1.3 \sum_{|m| < \varepsilon^{-3}: m \neq 0,1} \frac{1}{(x_m - x_0)^2} + \frac{1}{(x_m - x_1)^2} \ \ \ \ \ (2)$

with probability ${\gg \varepsilon^4}$. However, if we let ${A > 0}$ be a moderately large constant (and assume ${\varepsilon}$ small depending on ${A}$), one can show using ${k}$-point correlation functions for the Dyson sine process (and the fact that the Dyson kernel ${K(x,y) = \sin(\pi(x-y))/\pi(x-y)}$ equals ${1}$ to second order at the origin) that

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} \gg \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} \ll \varepsilon^7$

$\displaystyle {\bf E} \binom{N_{[-\varepsilon,0]}}{2} N_{[0,\varepsilon]} \ll \varepsilon^7$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[\varepsilon,A^{-1}]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-A^{-1}, -\varepsilon]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-k, k]}^2 \ll k^2 \varepsilon^4 \ \ \ \ \ (3)$

for any natural number ${k}$, where ${N_{I}}$ denotes the number of elements of the process in ${I}$. For instance, the expression ${{\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} }$ can be written in terms of the three-point correlation function ${\rho_3(x_1,x_2,x_3) = \mathrm{det}(K(x_i,x_j))_{1 \leq i,j \leq 3}}$ as

$\displaystyle \int_{-\varepsilon \leq x_1 \leq 0 \leq x_2 \leq x_3 \leq \varepsilon} \rho_3( x_1, x_2, x_3 )\ dx_1 dx_2 dx_3$

which can easily be estimated to be ${O(\varepsilon^7)}$ (since ${\rho_3 = O(\varepsilon^4)}$ in this region), and similarly for the other estimates claimed above.

Since for natural numbers ${a,b}$, the quantity ${ab - 2 a \binom{b}{2} - 2 b \binom{a}{2} = ab (5-2a-2b)}$ is only positive when ${a=b=1}$, we see from the first three estimates that the event ${E}$ that ${N_{[-\varepsilon,0]} = N_{[0,\varepsilon]} = 1}$ occurs with probability ${\gg \varepsilon^4}$. In particular, by Markov’s inequality we have the conditional probabilities

$\displaystyle {\bf P} ( N_{[\varepsilon,A^{-1}]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-A^{-1}, -\varepsilon]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-k, k]} \geq A k^{5/3} | E ) \ll A^{-4} k^{-4/3}$

and thus, if ${A}$ is large enough, and ${\varepsilon}$ small enough, it will be true with probability ${\gg \varepsilon^4}$ that

$\displaystyle N_{[-\varepsilon,0]}, N_{[0,\varepsilon]} = 1$

and

$\displaystyle N_{[A^{-1}, \varepsilon]} = N_{[\varepsilon, A^{-1}]} = 0$

and simultaneously that

$\displaystyle N_{[-k,k]} \leq A k^{5/3}$

for all natural numbers ${k}$. This implies in particular that

$\displaystyle x_1 - x_0 \leq 2\varepsilon$

and

$\displaystyle |x_m - x_0|, |x_m - x_1| \gg_A |m|^{3/5}$

for all ${m \neq 0,1}$, which gives (2) for ${\varepsilon}$ small enough.

Remark 1 The above argument needed the GUE hypothesis for correlations up to fourth order (in order to establish (3)). It might be possible to reduce the number of correlations needed, but I do not see how to obtain the claim just using pair correlations only.

Brad Rodgers and I have uploaded to the arXiv our paper “The De Bruijn-Newman constant is non-negative“. This paper affirms a conjecture of Newman regarding to the extent to which the Riemann hypothesis, if true, is only “barely so”. To describe the conjecture, let us begin with the Riemann xi function

$\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s)$

where ${\Gamma(s) := \int_0^\infty e^{-t} t^{s-1}\ dt}$ is the Gamma function and ${\zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}}$ is the Riemann zeta function. Initially, this function is only defined for ${\mathrm{Re} s > 1}$, but, as was already known to Riemann, we can manipulate it into a form that extends to the entire complex plane as follows. Firstly, in view of the standard identity ${s \Gamma(s) = \Gamma(s+1)}$, we can write

$\displaystyle \frac{s(s-1)}{2} \Gamma(\frac{s}{2}) = 2 \Gamma(\frac{s+4}{2}) - 3 \Gamma( \frac{s+2}{2} )$

and hence

$\displaystyle \xi(s) = \sum_{n=1}^\infty 2 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt - 3 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt.$

By a rescaling, one may write

$\displaystyle \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt = (\pi n^2)^{\frac{s+4}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+4}{2}-1}\ dt$

and similarly

$\displaystyle \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt = (\pi n^2)^{\frac{s+2}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt$

and thus (after applying Fubini’s theorem)

$\displaystyle \xi(s) = \int_0^\infty \sum_{n=1}^\infty 2 \pi^2 n^4 e^{-\pi n^2 t} t^{\frac{s+4}{2}-1} - 3 \pi n^2 e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt.$

We’ll make the change of variables ${t = e^{4u}}$ to obtain

$\displaystyle \xi(s) = 4 \int_{\bf R} \sum_{n=1}^\infty (2 \pi^2 n^4 e^{8u} - 3 \pi n^2 e^{4u}) \exp( 2su - \pi n^2 e^{4u} )\ du.$

If we introduce the mild renormalisation

$\displaystyle H_0(z) := \frac{1}{8} \xi( \frac{1}{2} + \frac{iz}{2} )$

of ${\xi}$, we then conclude (at least for ${\mathrm{Im} z > 1}$) that

$\displaystyle H_0(z) = \frac{1}{2} \int_{\bf R} \Phi(u)\exp(izu)\ du \ \ \ \ \ (1)$

where ${\Phi: {\bf R} \rightarrow {\bf C}}$ is the function

$\displaystyle \Phi(u) := \sum_{n=1}^\infty (2 \pi^2 n^4 e^{9u} - 3 \pi n^2 e^{5u}) \exp( - \pi n^2 e^{4u} ), \ \ \ \ \ (2)$

which one can verify to be rapidly decreasing both as ${u \rightarrow +\infty}$ and as ${u \rightarrow -\infty}$, with the decrease as ${u \rightarrow +\infty}$ faster than any exponential. In particular ${H_0}$ extends holomorphically to the upper half plane.

If we normalize the Fourier transform ${{\mathcal F} f(\xi)}$ of a (Schwartz) function ${f(x)}$ as ${{\mathcal F} f(\xi) := \int_{\bf R} f(x) e^{-2\pi i \xi x}\ dx}$, it is well known that the Gaussian ${x \mapsto e^{-\pi x^2}}$ is its own Fourier transform. The creation operator ${2\pi x - \frac{d}{dx}}$ interacts with the Fourier transform by the identity

$\displaystyle {\mathcal F} (( 2\pi x - \frac{d}{dx} ) f) (\xi) = -i (2 \pi \xi - \frac{d}{d\xi} ) {\mathcal F} f(\xi).$

Since ${(-i)^4 = 1}$, this implies that the function

$\displaystyle x \mapsto (2\pi x - \frac{d}{dx})^4 e^{-\pi x^2} = 128 \pi^2 (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2} + 48 \pi^2 e^{-\pi x^2}$

is its own Fourier transform. (One can view the polynomial ${128 \pi^2 (2\pi^2 x^4 - 3 \pi x^2) + 48 \pi^2}$ as a renormalised version of the fourth Hermite polynomial.) Taking a suitable linear combination of this with ${x \mapsto e^{-\pi x^2}}$, we conclude that

$\displaystyle x \mapsto (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2}$

is also its own Fourier transform. Rescaling ${x}$ by ${e^{2u}}$ and then multiplying by ${e^u}$, we conclude that the Fourier transform of

$\displaystyle x \mapsto (2 \pi^2 x^4 e^{9u} - 3 \pi x^2 e^{5u}) \exp( - \pi x^2 e^{4u} )$

is

$\displaystyle x \mapsto (2 \pi^2 x^4 e^{-9u} - 3 \pi x^2 e^{-5u}) \exp( - \pi x^2 e^{-4u} ),$

and hence by the Poisson summation formula (using symmetry and vanishing at ${n=0}$ to unfold the ${n}$ summation in (2) to the integers rather than the natural numbers) we obtain the functional equation

$\displaystyle \Phi(-u) = \Phi(u),$

which implies that ${\Phi}$ and ${H_0}$ are even functions (in particular, ${H_0}$ now extends to an entire function). From this symmetry we can also rewrite (1) as

$\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du,$

which now gives a convergent expression for the entire function ${H_0(z)}$ for all complex ${z}$. As ${\Phi}$ is even and real-valued on ${{\bf R}}$, ${H_0(z)}$ is even and also obeys the functional equation ${H_0(\overline{z}) = \overline{H_0(z)}}$, which is equivalent to the usual functional equation for the Riemann zeta function. The Riemann hypothesis is equivalent to the claim that all the zeroes of ${H_0}$ are real.

De Bruijn introduced the family ${H_t: {\bf C} \rightarrow {\bf C}}$ of deformations of ${H_0: {\bf C} \rightarrow {\bf C}}$, defined for all ${t \in {\bf R}}$ and ${z \in {\bf C}}$ by the formula

$\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du.$

From a PDE perspective, one can view ${H_t}$ as the evolution of ${H_0}$ under the backwards heat equation ${\partial_t H_t(z) = - \partial_{zz} H_t(z)}$. As with ${H_0}$, the ${H_t}$ are all even entire functions that obey the functional equation ${H_t(\overline{z}) = \overline{H_t(z)}}$, and one can ask an analogue of the Riemann hypothesis for each such ${H_t}$, namely whether all the zeroes of ${H_t}$ are real. De Bruijn showed that these hypotheses were monotone in ${t}$: if ${H_t}$ had all real zeroes for some ${t}$, then ${H_{t'}}$ would also have all zeroes real for any ${t' \geq t}$. Newman later sharpened this claim by showing the existence of a finite number ${\Lambda \leq 1/2}$, now known as the de Bruijn-Newman constant, with the property that ${H_t}$ had all zeroes real if and only if ${t \geq \Lambda}$. Thus, the Riemann hypothesis is equivalent to the inequality ${\Lambda \leq 0}$. Newman then conjectured the complementary bound ${\Lambda \geq 0}$; in his words, this conjecture asserted that if the Riemann hypothesis is true, then it is only “barely so”, in that the reality of all the zeroes is destroyed by applying heat flow for even an arbitrarily small amount of time. Over time, a significant amount of evidence was established in favour of this conjecture; most recently, in 2011, Saouter, Gourdon, and Demichel showed that ${\Lambda \geq -1.15 \times 10^{-11}}$.

In this paper we finish off the proof of Newman’s conjecture, that is we show that ${\Lambda \geq 0}$. The proof is by contradiction, assuming that ${\Lambda < 0}$ (which among other things, implies the truth of the Riemann hypothesis), and using the properties of backwards heat evolution to reach a contradiction.

Very roughly, the argument proceeds as follows. As observed by Csordas, Smith, and Varga (and also discussed in this previous blog post, the backwards heat evolution of the ${H_t}$ introduces a nice ODE dynamics on the zeroes ${x_j(t)}$ of ${H_t}$, namely that they solve the ODE

$\displaystyle \frac{d}{dt} x_j(t) = -2 \sum_{j \neq k} \frac{1}{x_k(t) - x_j(t)} \ \ \ \ \ (3)$

for all ${j}$ (one has to interpret the sum in a principal value sense as it is not absolutely convergent, but let us ignore this technicality for the current discussion). Intuitively, this ODE is asserting that the zeroes ${x_j(t)}$ repel each other, somewhat like positively charged particles (but note that the dynamics is first-order, as opposed to the second-order laws of Newtonian mechanics). Formally, a steady state (or equilibrium) of this dynamics is reached when the ${x_k(t)}$ are arranged in an arithmetic progression. (Note for instance that for any positive ${u}$, the functions ${z \mapsto e^{tu^2} \cos(uz)}$ obey the same backwards heat equation as ${H_t}$, and their zeroes are on a fixed arithmetic progression ${\{ \frac{2\pi (k+\tfrac{1}{2})}{u}: k \in {\bf Z} \}}$.) The strategy is to then show that the dynamics from time ${-\Lambda}$ to time ${0}$ creates a convergence to local equilibrium, in which the zeroes ${x_k(t)}$ locally resemble an arithmetic progression at time ${t=0}$. This will be in contradiction with known results on pair correlation of zeroes (or on related statistics, such as the fluctuations on gaps between zeroes), such as the results of Montgomery (actually for technical reasons it is slightly more convenient for us to use related results of Conrey, Ghosh, Goldston, Gonek, and Heath-Brown). Another way of thinking about this is that even very slight deviations from local equilibrium (such as a small number of gaps that are slightly smaller than the average spacing) will almost immediately lead to zeroes colliding with each other and leaving the real line as one evolves backwards in time (i.e., under the forward heat flow). This is a refinement of the strategy used in previous lower bounds on ${\Lambda}$, in which “Lehmer pairs” (pairs of zeroes of the zeta function that were unusually close to each other) were used to limit the extent to which the evolution continued backwards in time while keeping all zeroes real.

How does one obtain this convergence to local equilibrium? We proceed by broad analogy with the “local relaxation flow” method of Erdos, Schlein, and Yau in random matrix theory, in which one combines some initial control on zeroes (which, in the case of the Erdos-Schlein-Yau method, is referred to with terms such as “local semicircular law”) with convexity properties of a relevant Hamiltonian that can be used to force the zeroes towards equilibrium.

We first discuss the initial control on zeroes. For ${H_0}$, we have the classical Riemann-von Mangoldt formula, which asserts that the number of zeroes in the interval ${[0,T]}$ is ${\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log T)}$ as ${T \rightarrow \infty}$. (We have a factor of ${4\pi}$ here instead of the more familiar ${2\pi}$ due to the way ${H_0}$ is normalised.) This implies for instance that for a fixed ${\alpha}$, the number of zeroes in the interval ${[T, T+\alpha]}$ is ${\frac{\alpha}{4\pi} \log T + O(\log T)}$. Actually, because we get to assume the Riemann hypothesis, we can sharpen this to ${\frac{\alpha}{4\pi} \log T + o(\log T)}$, a result of Littlewood (see this previous blog post for a proof). Ideally, we would like to obtain similar control for the other ${H_t}$, ${\Lambda \leq t < 0}$, as well. Unfortunately we were only able to obtain the weaker claims that the number of zeroes of ${H_t}$ in ${[0,T]}$ is ${\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log^2 T)}$, and that the number of zeroes in ${[T, T+\alpha \log T]}$ is ${\frac{\alpha}{4 \pi} \log^2 T + o(\log^2 T)}$, that is to say we only get good control on the distribution of zeroes at scales ${\gg \log T}$ rather than at scales ${\gg 1}$. Ultimately this is because we were only able to get control (and in particular, lower bounds) on ${|H_t(x-iy)|}$ with high precision when ${y \gg \log x}$ (whereas ${|H_0(x-iy)|}$ has good estimates as soon as ${y}$ is larger than (say) ${2}$). This control is obtained by the expressing ${H_t(x-iy)}$ in terms of some contour integrals and using the method of steepest descent (actually it is slightly simpler to rely instead on the Stirling approximation for the Gamma function, which can be proven in turn by steepest descent methods). Fortunately, it turns out that this weaker control is still (barely) enough for the rest of our argument to go through.

Once one has the initial control on zeroes, we now need to force convergence to local equilibrium by exploiting convexity of a Hamiltonian. Here, the relevant Hamiltonian is

$\displaystyle H(t) := \sum_{j,k: j \neq k} \log \frac{1}{|x_j(t) - x_k(t)|},$

ignoring for now the rather important technical issue that this sum is not actually absolutely convergent. (Because of this, we will need to truncate and renormalise the Hamiltonian in a number of ways which we will not detail here.) The ODE (3) is formally the gradient flow for this Hamiltonian. Furthermore, this Hamiltonian is a convex function of the ${x_j}$ (because ${t \mapsto \log \frac{1}{t}}$ is a convex function on ${(0,+\infty)}$). We therefore expect the Hamiltonian to be a decreasing function of time, and that the derivative should be an increasing function of time. As time passes, the derivative of the Hamiltonian would then be expected to converge to zero, which should imply convergence to local equilibrium.

Formally, the derivative of the above Hamiltonian is

$\displaystyle \partial_t H(t) = -4 E(t), \ \ \ \ \ (4)$

where ${E(t)}$ is the “energy”

$\displaystyle E(t) := \sum_{j,k: j \neq k} \frac{1}{|x_j(t) - x_k(t)|^2}.$

Again, there is the important technical issue that this quantity is infinite; but it turns out that if we renormalise the Hamiltonian appropriately, then the energy will also become suitably renormalised, and in particular will vanish when the ${x_j}$ are arranged in an arithmetic progression, and be positive otherwise. One can also formally calculate the derivative of ${E(t)}$ to be a somewhat complicated but manifestly non-negative quantity (a sum of squares); see this previous blog post for analogous computations in the case of heat flow on polynomials. After flowing from time ${\Lambda}$ to time ${0}$, and using some crude initial bounds on ${H(t)}$ and ${E(t)}$ in this region (coming from the Riemann-von Mangoldt type formulae mentioned above and some further manipulations), we can eventually show that the (renormalisation of the) energy ${E(0)}$ at time zero is small, which forces the ${x_j}$ to locally resemble an arithmetic progression, which gives the required convergence to local equilibrium.

There are a number of technicalities involved in making the above sketch of argument rigorous (for instance, justifying interchanges of derivatives and infinite sums turns out to be a little bit delicate). I will highlight here one particular technical point. One of the ways in which we make expressions such as the energy ${E(t)}$ finite is to truncate the indices ${j,k}$ to an interval ${I}$ to create a truncated energy ${E_I(t)}$. In typical situations, we would then expect ${E_I(t)}$ to be decreasing, which will greatly help in bounding ${E_I(0)}$ (in particular it would allow one to control ${E_I(0)}$ by time-averaged quantities such as ${\int_{\Lambda/2}^0 E_I(t)\ dt}$, which can in turn be controlled using variants of (4)). However, there are boundary effects at both ends of ${I}$ that could in principle add a large amount of energy into ${E_I}$, which is bad news as it could conceivably make ${E_I(0)}$ undesirably large even if integrated energies such as ${\int_{\Lambda/2}^0 E_I(t)\ dt}$ remain adequately controlled. As it turns out, such boundary effects are negligible as long as there is a large gap between adjacent zeroes at boundary of ${I}$ – it is only narrow gaps that can rapidly transmit energy across the boundary of ${I}$. Now, narrow gaps can certainly exist (indeed, the GUE hypothesis predicts these happen a positive fraction of the time); but the pigeonhole principle (together with the Riemann-von Mangoldt formula) can allow us to pick the endpoints of the interval ${I}$ so that no narrow gaps appear at the boundary of ${I}$ for any given time ${t}$. However, there was a technical problem: this argument did not allow one to find a single interval ${I}$ that avoided gaps for all times ${\Lambda/2 \leq t \leq 0}$ simultaneously – the pigeonhole principle could produce a different interval ${I}$ for each time ${t}$! Since the number of times was uncountable, this was a serious issue. (In physical terms, the problem was that there might be very fast “longitudinal waves” in the dynamics that, at each time, cause some gaps between zeroes to be highly compressed, but the specific gap that was narrow changed very rapidly with time. Such waves could, in principle, import a huge amount of energy into ${E_I}$ by time ${0}$.) To resolve this, we borrowed a PDE trick of Bourgain’s, in which the pigeonhole principle was coupled with local conservation laws. More specifically, we use the phenomenon that very narrow gaps ${g_i = x_{i+1}-x_i}$ take a nontrivial amount of time to expand back to a reasonable size (this can be seen by comparing the evolution of this gap with solutions of the scalar ODE ${\partial_t g = \frac{4}{g^2}}$, which represents the fastest at which a gap such as ${g_i}$ can expand). Thus, if a gap ${g_i}$ is reasonably large at some time ${t_0}$, it will also stay reasonably large at slightly earlier times ${t \in [t_0-\delta, t_0]}$ for some moderately small ${\delta>0}$. This lets one locate an interval ${I}$ that has manageable boundary effects during the times in ${[t_0-\delta, t_0]}$, so in particular ${E_I}$ is basically non-increasing in this time interval. Unfortunately, this interval is a little bit too short to cover all of ${[\Lambda/2,0]}$; however it turns out that one can iterate the above construction and find a nested sequence of intervals ${I_k}$, with each ${E_{I_k}}$ non-increasing in a different time interval ${[t_k - \delta, t_k]}$, and with all of the time intervals covering ${[\Lambda/2,0]}$. This turns out to be enough (together with the obvious fact that ${E_I}$ is monotone in ${I}$) to still control ${E_I(0)}$ for some reasonably sized interval ${I}$, as required for the rest of the arguments.

ADDED LATER: the following analogy (involving functions with just two zeroes, rather than an infinite number of zeroes) may help clarify the relation between this result and the Riemann hypothesis (and in particular why this result does not make the Riemann hypothesis any easier to prove, in fact it confirms the delicate nature of that hypothesis). Suppose one had a quadratic polynomial ${P}$ of the form ${P(z) = z^2 + \Lambda}$, where ${\Lambda}$ was an unknown real constant. Suppose that one was for some reason interested in the analogue of the “Riemann hypothesis” for ${P}$, namely that all the zeroes of ${P}$ are real. A priori, there are three scenarios:

• (Riemann hypothesis false) ${\Lambda > 0}$, and ${P}$ has zeroes ${\pm i |\Lambda|^{1/2}}$ off the real axis.
• (Riemann hypothesis true, but barely so) ${\Lambda = 0}$, and both zeroes of ${P}$ are on the real axis; however, any slight perturbation of ${\Lambda}$ in the positive direction would move zeroes off the real axis.
• (Riemann hypothesis true, with room to spare) ${\Lambda < 0}$, and both zeroes of ${P}$ are on the real axis. Furthermore, any slight perturbation of ${P}$ will also have both zeroes on the real axis.

The analogue of our result in this case is that ${\Lambda \geq 0}$, thus ruling out the third of the three scenarios here. In this simple example in which only two zeroes are involved, one can think of the inequality ${\Lambda \geq 0}$ as asserting that if the zeroes of ${P}$ are real, then they must be repeated. In our result (in which there are an infinity of zeroes, that become increasingly dense near infinity), and in view of the convergence to local equilibrium properties of (3), the analogous assertion is that if the zeroes of ${H_0}$ are real, then they do not behave locally as if they were in arithmetic progression.

The Polymath14 online collaboration has uploaded to the arXiv its paper “Homogeneous length functions on groups“, submitted to Algebra & Number Theory. The paper completely classifies homogeneous length functions ${\| \|: G \rightarrow {\bf R}^+}$ on an arbitrary group ${G = (G,\cdot,e,()^{-1})}$, that is to say non-negative functions that obey the symmetry condition ${\|x^{-1}\| = \|x\|}$, the non-degeneracy condition ${\|x\|=0 \iff x=e}$, the triangle inequality ${\|xy\| \leq \|x\| + \|y\|}$, and the homogeneity condition ${\|x^2\| = 2\|x\|}$. It turns out that these norms can only arise from pulling back the norm of a Banach space by an isometric embedding of the group. Among other things, this shows that ${G}$ can only support a homogeneous length function if and only if it is abelian and torsion free, thus giving a metric description of this property.

The proof is based on repeated use of the homogeneous length function axioms, combined with elementary identities of commutators, to obtain increasingly good bounds on quantities such as ${\|[x,y]\|}$, until one can show that such norms have to vanish. See the previous post for a full proof. The result is robust in that it allows for some loss in the triangle inequality and homogeneity condition, allowing for some new results on “quasinorms” on groups that relate to quasihomomorphisms.

As there are now a large number of comments on the previous post on this project, this post will also serve as the new thread for any final discussion of this project as it winds down.

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions II. Divisor correlations in short ranges“. This is a sequel of sorts to our previous paper on divisor correlations, though the proof techniques in this paper are rather different. As with the previous paper, our interest is in correlations such as

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) \ \ \ \ \ (1)$

for medium-sized ${h}$ and large ${X}$, where ${k \geq l \geq 1}$ are natural numbers and ${d_k(n) = \sum_{n = m_1 \dots m_k} 1}$ is the ${k^{th}}$ divisor function (actually our methods can also treat a generalisation in which ${k}$ is non-integer, but for simplicity let us stick with the integer case for this discussion). Our methods also allow for one of the divisor function factors to be replaced with a von Mangoldt function, but (in contrast to the previous paper) we cannot treat the case when both factors are von Mangoldt.

As discussed in this previous post, one heuristically expects an asymptotic of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O( X^{1/2+\varepsilon})$

for any fixed ${\varepsilon>0}$, where ${P_{k,l,h}}$ is a certain explicit (but rather complicated) polynomial of degree ${k+l-1}$. Such asymptotics are known when ${l \leq 2}$, but remain open for ${k \geq l \geq 3}$. In the previous paper, we were able to obtain a weaker bound of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O_A( X \log^{-A} X)$

for ${1-O_A(\log^{-A} X)}$ of the shifts ${-H \leq h \leq H}$, whenever the shift range ${H}$ lies between ${X^{8/33+\varepsilon}}$ and ${X^{1-\varepsilon}}$. But the methods become increasingly hard to use as ${H}$ gets smaller. In this paper, we use a rather different method to obtain the even weaker bound

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = (1+o(1)) P_{k,l,h}( \log X ) X$

for ${1-o(1)}$ of the shifts ${-H \leq h \leq H}$, where ${H}$ can now be as short as ${H = \log^{10^4 k \log k} X}$. The constant ${10^4}$ can be improved, but there are serious obstacles to using our method to go below ${\log^{k \log k} X}$ (as the exceptionally large values of ${d_k}$ then begin to dominate). This can be viewed as an analogue to our previous paper on correlations of bounded multiplicative functions on average, in which the functions ${d_k,d_l}$ are now unbounded, and indeed our proof strategy is based in large part on that paper (but with many significant new technical complications).

We now discuss some of the ingredients of the proof. Unsurprisingly, the first step is the circle method, expressing (1) in terms of exponential sums such as

$\displaystyle S(\alpha) := \sum_{n \leq X} d_k(n) e(\alpha).$

Actually, it is convenient to first prune ${d_k}$ slightly by zeroing out this function on “atypical” numbers ${n}$ that have an unusually small or large number of factors in a certain sense, but let us ignore this technicality for this discussion. The contribution of ${S(\alpha)}$ for “major arc” ${\alpha}$ can be treated by standard techniques (and is the source of the main term ${P_{k,l,h}(\log X) X}$; the main difficulty comes from treating the contribution of “minor arc” ${\alpha}$.

In our previous paper on bounded multiplicative functions, we used Plancherel’s theorem to estimate the global ${L^2}$ norm ${\int_{{\bf R}/{\bf Z}} |S(\alpha)|^2\ d\alpha}$, and then also used the Katai-Bourgain-Sarnak-Ziegler orthogonality criterion to control local ${L^2}$ norms ${\int_I |S(\alpha)|^2\ d\alpha}$, where ${I}$ was a minor arc interval of length about ${1/H}$, and these two estimates together were sufficient to get a good bound on correlations by an application of Hölder’s inequality. For ${d_k}$, it is more convenient to use Dirichlet series methods (and Ramaré-type factorisations of such Dirichlet series) to control local ${L^2}$ norms on minor arcs, in the spirit of the proof of the Matomaki-Radziwill theorem; a key point is to develop “log-free” mean value theorems for Dirichlet series associated to functions such as ${d_k}$, so as not to wipe out the (rather small) savings one will get over the trivial bound from this method. On the other hand, the global ${L^2}$ bound will definitely be unusable, because the ${\ell^2}$ sum ${\sum_{n \leq X} d_k(n)^2}$ has too many unwanted factors of ${\log X}$. Fortunately, we can substitute this global ${L^2}$ bound with a “large values” bound that controls expressions such as

$\displaystyle \sum_{i=1}^J \int_{I_i} |S(\alpha)|^2\ d\alpha$

for a moderate number of disjoint intervals ${I_1,\dots,I_J}$, with a bound that is slightly better (for ${J}$ a medium-sized power of ${\log X}$) than what one would have obtained by bounding each integral ${\int_{I_i} |S(\alpha)|^2\ d\alpha}$ separately. (One needs to save more than ${J^{1/2}}$ for the argument to work; we end up saving a factor of about ${J^{3/4}}$.) This large values estimate is probably the most novel contribution of the paper. After taking the Fourier transform, matters basically reduce to getting a good estimate for

$\displaystyle \sum_{i=1}^J (\int_X^{2X} |\sum_{x \leq n \leq x+H} d_k(n) e(\alpha_i n)|^2\ dx)^{1/2},$

where ${\alpha_i}$ is the midpoint of ${I_i}$; thus we need some upper bound on the large local Fourier coefficients of ${d_k}$. These coefficients are difficult to calculate directly, but, in the spirit of a paper of Ben Green and myself, we can try to replace ${d_k}$ by a more tractable and “pseudorandom” majorant ${\tilde d_k}$ for which the local Fourier coefficients are computable (on average). After a standard duality argument, one ends up having to control expressions such as

$\displaystyle |\sum_{x \leq n \leq x+H} \tilde d_k(n) e((\alpha_i -\alpha_{i'}) n)|$

after various averaging in the ${x, i,i'}$ parameters. These local Fourier coefficients of ${\tilde d_k}$ turn out to be small on average unless ${\alpha_i -\alpha_{i'}}$ is “major arc”. One then is left with a mostly combinatorial problem of trying to bound how often this major arc scenario occurs. This is very close to a computation in the previously mentioned paper of Ben and myself; there is a technical wrinkle in that the ${\alpha_i}$ are not as well separated as they were in my paper with Ben, but it turns out that one can modify the arguments in that paper to still obtain a satisfactory estimate in this case (after first grouping nearby frequencies ${\alpha_i}$ together, and modifying the duality argument accordingly).

In the tradition of “Polymath projects“, the problem posed in the previous two blog posts has now been solved, thanks to the cumulative effect of many small contributions by many participants (including, but not limited to, Sean Eberhard, Tobias Fritz, Siddharta Gadgil, Tobias Hartnick, Chris Jerdonek, Apoorva Khare, Antonio Machiavelo, Pace Nielsen, Andy Putman, Will Sawin, Alexander Shamov, Lior Silberman, and David Speyer). In this post I’ll write down a streamlined resolution, eliding a number of important but ultimately removable partial steps and insights made by the above contributors en route to the solution.

Theorem 1 Let ${G = (G,\cdot)}$ be a group. Suppose one has a “seminorm” function ${\| \|: G \rightarrow [0,+\infty)}$ which obeys the triangle inequality

$\displaystyle \|xy \| \leq \|x\| + \|y\|$

for all ${x,y \in G}$, with equality whenever ${x=y}$. Then the seminorm factors through the abelianisation map ${G \mapsto G/[G,G]}$.

Proof: By the triangle inequality, it suffices to show that ${\| [x,y]\| = 0}$ for all ${x,y \in G}$, where ${[x,y] := xyx^{-1}y^{-1}}$ is the commutator.

We first establish some basic facts. Firstly, by hypothesis we have ${\|x^2\| = 2 \|x\|}$, and hence ${\|x^n \| = n \|x\|}$ whenever ${n}$ is a power of two. On the other hand, by the triangle inequality we have ${\|x^n \| \leq n\|x\|}$ for all positive ${n}$, and hence by the triangle inequality again we also have the matching lower bound, thus

$\displaystyle \|x^n \| = n \|x\|$

for all ${n > 0}$. The claim is also true for ${n=0}$ (apply the preceding bound with ${x=1}$ and ${n=2}$). By replacing ${\|x\|}$ with ${\max(\|x\|, \|x^{-1}\|)}$ if necessary we may now also assume without loss of generality that ${\|x^{-1} \| = \|x\|}$, thus

$\displaystyle \|x^n \| = |n| \|x\| \ \ \ \ \ (1)$

for all integers ${n}$.

Next, for any ${x,y \in G}$, and any natural number ${n}$, we have

$\displaystyle \|yxy^{-1} \| = \frac{1}{n} \| (yxy^{-1})^n \|$

$\displaystyle = \frac{1}{n} \| y x^n y^{-1} \|$

$\displaystyle \leq \frac{1}{n} ( \|y\| + n \|x\| + \|y\|^{-1} )$

so on taking limits as ${n \rightarrow \infty}$ we have ${\|yxy^{-1} \| \leq \|x\|}$. Replacing ${x,y}$ by ${yxy^{-1},y^{-1}}$ gives the matching lower bound, thus we have the conjugation invariance

$\displaystyle \|yxy^{-1} \| = \|x\|. \ \ \ \ \ (2)$

Next, we observe that if ${x,y,z,w}$ are such that ${x}$ is conjugate to both ${wy}$ and ${zw^{-1}}$, then one has the inequality

$\displaystyle \|x\| \leq \frac{1}{2} ( \|y \| + \| z \| ). \ \ \ \ \ (3)$

Indeed, if we write ${x = swys^{-1} = t zw^{-1} t^{-1}}$ for some ${s,t \in G}$, then for any natural number ${n}$ one has

$\displaystyle \|x\| = \frac{1}{2n} \| x^n x^n \|$

$\displaystyle = \frac{1}{2n} \| swy \dots wy s^{-1}t zw^{-1} \dots zw^{-1} t^{-1} \|$

where the ${wy}$ and ${zw^{-1}}$ terms each appear ${n}$ times. From (2) we see that conjugation by ${w}$ does not affect the norm. Using this and the triangle inequality several times, we conclude that

$\displaystyle \|x\| \leq \frac{1}{2n} ( \|s\| + n \|y\| + \| s^{-1} t\| + n \|z\| + \|t^{-1} \| ),$

and the claim (3) follows by sending ${n \rightarrow \infty}$.

The following special case of (3) will be of particular interest. Let ${x,y \in G}$, and for any integers ${m,k}$, define the quantity

$\displaystyle f(m,k) := \| x^m [x,y]^k \|.$

Observe that ${x^m [x,y]^k}$ is conjugate to both ${x (x^{m-1} [x,y]^k)}$ and to ${(y^{-1} x^m [x,y]^{k-1} xy) x^{-1}}$, hence by (3) one has

$\displaystyle \| x^m [x,y]^k \| \leq \frac{1}{2} ( \| x^{m-1} [x,y]^k \| + \| y^{-1} x^{m} [x,y]^{k-1} xy \|)$

which by (2) leads to the recursive inequality

$\displaystyle f(m,k) \leq \frac{1}{2} (f(m-1,k) + f(m+1,k-1)).$

We can write this in probabilistic notation as

$\displaystyle f(m,k) \leq {\bf E} f( (m,k) + X )$

where ${X}$ is a random vector that takes the values ${(-1,0)}$ and ${(1,-1)}$ with probability ${1/2}$ each. Iterating this, we conclude in particular that for any large natural number ${n}$, one has

$\displaystyle f(0,n) \leq {\bf E} f( Z )$

where ${Z := (0,n) + X_1 + \dots + X_{2n}}$ and ${X_1,\dots,X_{2n}}$ are iid copies of ${X}$. We can write ${Z = (1,-1/2) (Y_1 + \dots + Y_{2n})}$ where $Y_1,\dots,Y_{2n} = \pm 1$ are iid signs.  By the triangle inequality, we thus have

$\displaystyle f( Z ) \leq |Y_1+\dots+Y_{2n}| (\|x\| + \frac{1}{2} \| [x,y] \|),$

noting that $Y_1+\dots+Y_{2n}$ is an even integer.  On the other hand, $Y_1+\dots+Y_{2n}$ has mean zero and variance $2n$, hence by Cauchy-Schwarz

$\displaystyle f(0,n) \leq \sqrt{2n}( \|x\| + \frac{1}{2} \| [x,y] \|).$

But by (1), the left-hand side is equal to ${n \| [x,y]\|}$. Dividing by ${n}$ and then sending ${n \rightarrow \infty}$, we obtain the claim. $\Box$

The above theorem reduces such seminorms to abelian groups. It is easy to see from (1) that any torsion element of such groups has zero seminorm, so we can in fact restrict to torsion-free groups, which we now write using additive notation ${G = (G,+)}$, thus for instance ${\| nx \| = |n| \|x\|}$ for ${n \in {\bf Z}}$. We think of ${G}$ as a ${{\bf Z}}$-module. One can then extend the seminorm to the associated ${{\bf Q}}$-vector space ${G \otimes_{\bf Z} {\bf Q}}$ by the formula ${\|\frac{a}{b} x\| := \frac{a}{b} \|x\|}$, and then to the associated ${{\bf R}}$-vector space ${G \otimes_{\bf Z} {\bf R}}$ by continuity, at which point it becomes a genuine seminorm (provided we have ensured the symmetry condition ${\|x\| = \|x^{-1}\|}$). Conversely, any seminorm on ${G \otimes_{\bf Z} {\bf R}}$ induces a seminorm on ${G}$. (These arguments also appear in this paper of Khare and Rajaratnam.)