In this post we assume the Riemann hypothesis and the simplicity of zeroes, thus the zeroes of ${\zeta}$ in the critical strip take the form ${\frac{1}{2} \pm i \gamma_j}$ for some real number ordinates ${0 < \gamma_1 < \gamma_2 < \dots}$. From the Riemann-von Mangoldt formula, one has the asymptotic

$\displaystyle \gamma_n = (1+o(1)) \frac{2\pi}{\log n} n$

as ${n \rightarrow \infty}$; in particular, the spacing ${\gamma_{n+1} - \gamma_n}$ should behave like ${\frac{2\pi}{\log n}}$ on the average. However, it can happen that some gaps are unusually small compared to other nearby gaps. For the sake of concreteness, let us define a Lehmer pair to be a pair of adjacent ordinates ${\gamma_n, \gamma_{n+1}}$ such that

$\displaystyle \frac{1}{(\gamma_{n+1} - \gamma_n)^2} \geq 1.3 \sum_{m \neq n,n+1} \frac{1}{(\gamma_m - \gamma_n)^2} + \frac{1}{(\gamma_m - \gamma_{n+1})^2}. \ \ \ \ \ (1)$

The specific value of constant ${1.3}$ is not particularly important here; anything larger than ${\frac{5}{4}}$ would suffice. An example of such a pair would be the classical pair

$\displaystyle \gamma_{6709} = 7005.062866\dots$

$\displaystyle \gamma_{6710} = 7005.100564\dots$

discovered by Lehmer. It follows easily from the main results of Csordas, Smith, and Varga that if an infinite number of Lehmer pairs (in the above sense) existed, then the de Bruijn-Newman constant ${\Lambda}$ is non-negative. This implication is now redundant in view of the unconditional results of this recent paper of Rodgers and myself; however, the question of whether an infinite number of Lehmer pairs exist remain open.

In this post, I sketch an argument that Brad and I came up with (as initially suggested by Odlyzko) the GUE hypothesis implies the existence of infinitely many Lehmer pairs. We argue probabilistically: pick a sufficiently large number ${T}$, pick ${n}$ at random from ${T \log T}$ to ${2 T \log T}$ (so that the average gap size is close to ${\frac{2\pi}{\log T}}$), and prove that the Lehmer pair condition (1) occurs with positive probability.

Introduce the renormalised ordinates ${x_n := \frac{\log T}{2\pi} \gamma_n}$ for ${T \log T \leq n \leq 2 T \log T}$, and let ${\varepsilon > 0}$ be a small absolute constant (independent of ${T}$). It will then suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2} \geq$

$\displaystyle 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2}$

$\displaystyle + \frac{1}{6\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$, since the contribution of those ${m}$ outside of ${[T \log T, 2T \log T]}$ can be absorbed by the ${\frac{1}{\varepsilon^2}}$ factor with probability ${o(1)}$.

As one consequence of the GUE hypothesis, we have ${x_{n+1} - x_n \leq \varepsilon^2}$ with probability ${O(\varepsilon^6)}$. Thus, if ${E := \{ m \in [T \log T, 2T \log T]: x_{m+1} - x_m \leq \varepsilon^2 \}}$, then ${E}$ has density ${O( \varepsilon^6 )}$. Applying the Hardy-Littlewood maximal inequality, we see that with probability ${O(\varepsilon^6)}$, we have

$\displaystyle \sup_{h \geq 1} | \# E \cap [n+h, n-h] | \leq \frac{1}{10}$

which implies in particular that

$\displaystyle |x_m - x_n|, |x_{m} - x_{n+1}| \gg \varepsilon^2 |m-n|$

for all ${m \in [T \log T, 2 T \log T] \backslash \{ n, n+1\}}$. This implies in particular that

$\displaystyle \sum_{m \in [T \log T, 2T \log T]: |m-n| \geq \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} \ll \varepsilon^{-1}$

and so it will suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2}$

$\displaystyle \geq 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1; |m-n| < \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} + \frac{1}{5\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$.

By the GUE hypothesis (and the fact that ${\varepsilon}$ is independent of ${T}$), it suffices to show that a Dyson sine process ${(x_n)_{n \in {\bf Z}}}$, normalised so that ${x_0}$ is the first positive point in the process, obeys the inequality

$\displaystyle \frac{1}{(x_{1} - x_0)^2} \geq 1.3 \sum_{|m| < \varepsilon^{-3}: m \neq 0,1} \frac{1}{(x_m - x_0)^2} + \frac{1}{(x_m - x_1)^2} \ \ \ \ \ (2)$

with probability ${\gg \varepsilon^4}$. However, if we let ${A > 0}$ be a moderately large constant (and assume ${\varepsilon}$ small depending on ${A}$), one can show using ${k}$-point correlation functions for the Dyson sine process (and the fact that the Dyson kernel ${K(x,y) = \sin(\pi(x-y))/\pi(x-y)}$ equals ${1}$ to second order at the origin) that

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} \gg \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} \ll \varepsilon^7$

$\displaystyle {\bf E} \binom{N_{[-\varepsilon,0]}}{2} N_{[0,\varepsilon]} \ll \varepsilon^7$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[\varepsilon,A^{-1}]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-A^{-1}, -\varepsilon]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-k, k]}^2 \ll k^2 \varepsilon^4 \ \ \ \ \ (3)$

for any natural number ${k}$, where ${N_{I}}$ denotes the number of elements of the process in ${I}$. For instance, the expression ${{\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} }$ can be written in terms of the three-point correlation function ${\rho_3(x_1,x_2,x_3) = \mathrm{det}(K(x_i,x_j))_{1 \leq i,j \leq 3}}$ as

$\displaystyle \int_{-\varepsilon \leq x_1 \leq 0 \leq x_2 \leq x_3 \leq \varepsilon} \rho_3( x_1, x_2, x_3 )\ dx_1 dx_2 dx_3$

which can easily be estimated to be ${O(\varepsilon^7)}$ (since ${\rho_3 = O(\varepsilon^4)}$ in this region), and similarly for the other estimates claimed above.

Since for natural numbers ${a,b}$, the quantity ${ab - 2 a \binom{b}{2} - 2 b \binom{a}{2} = ab (5-2a-2b)}$ is only positive when ${a=b=1}$, we see from the first three estimates that the event ${E}$ that ${N_{[-\varepsilon,0]} = N_{[0,\varepsilon]} = 1}$ occurs with probability ${\gg \varepsilon^4}$. In particular, by Markov’s inequality we have the conditional probabilities

$\displaystyle {\bf P} ( N_{[\varepsilon,A^{-1}]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-A^{-1}, -\varepsilon]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-k, k]} \geq A k^{5/3} | E ) \ll A^{-4} k^{-4/3}$

and thus, if ${A}$ is large enough, and ${\varepsilon}$ small enough, it will be true with probability ${\gg \varepsilon^4}$ that

$\displaystyle N_{[-\varepsilon,0]}, N_{[0,\varepsilon]} = 1$

and

$\displaystyle N_{[A^{-1}, \varepsilon]} = N_{[\varepsilon, A^{-1}]} = 0$

and simultaneously that

$\displaystyle N_{[-k,k]} \leq A k^{5/3}$

for all natural numbers ${k}$. This implies in particular that

$\displaystyle x_1 - x_0 \leq 2\varepsilon$

and

$\displaystyle |x_m - x_0|, |x_m - x_1| \gg_A |m|^{3/5}$

for all ${m \neq 0,1}$, which gives (2) for ${\varepsilon}$ small enough.

Remark 1 The above argument needed the GUE hypothesis for correlations up to fourth order (in order to establish (3)). It might be possible to reduce the number of correlations needed, but I do not see how to obtain the claim just using pair correlations only.

Brad Rodgers and I have uploaded to the arXiv our paper “The De Bruijn-Newman constant is non-negative“. This paper affirms a conjecture of Newman regarding to the extent to which the Riemann hypothesis, if true, is only “barely so”. To describe the conjecture, let us begin with the Riemann xi function

$\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s)$

where ${\Gamma(s) := \int_0^\infty e^{-t} t^{s-1}\ dt}$ is the Gamma function and ${\zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}}$ is the Riemann zeta function. Initially, this function is only defined for ${\mathrm{Re} s > 1}$, but, as was already known to Riemann, we can manipulate it into a form that extends to the entire complex plane as follows. Firstly, in view of the standard identity ${s \Gamma(s) = \Gamma(s+1)}$, we can write

$\displaystyle \frac{s(s-1)}{2} \Gamma(\frac{s}{2}) = 2 \Gamma(\frac{s+4}{2}) - 3 \Gamma( \frac{s+2}{2} )$

and hence

$\displaystyle \xi(s) = \sum_{n=1}^\infty 2 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt - 3 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt.$

By a rescaling, one may write

$\displaystyle \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt = (\pi n^2)^{\frac{s+4}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+4}{2}-1}\ dt$

and similarly

$\displaystyle \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt = (\pi n^2)^{\frac{s+2}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt$

and thus (after applying Fubini’s theorem)

$\displaystyle \xi(s) = \int_0^\infty \sum_{n=1}^\infty 2 \pi^2 n^4 e^{-\pi n^2 t} t^{\frac{s+4}{2}-1} - 3 \pi n^2 e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt.$

We’ll make the change of variables ${t = e^{4u}}$ to obtain

$\displaystyle \xi(s) = 4 \int_{\bf R} \sum_{n=1}^\infty (2 \pi^2 n^4 e^{8u} - 3 \pi n^2 e^{4u}) \exp( 2su - \pi n^2 e^{4u} )\ du.$

If we introduce the mild renormalisation

$\displaystyle H_0(z) := \frac{1}{8} \xi( \frac{1}{2} + \frac{iz}{2} )$

of ${\xi}$, we then conclude (at least for ${\mathrm{Im} z > 1}$) that

$\displaystyle H_0(z) = \frac{1}{2} \int_{\bf R} \Phi(u)\exp(izu)\ du \ \ \ \ \ (1)$

where ${\Phi: {\bf R} \rightarrow {\bf C}}$ is the function

$\displaystyle \Phi(u) := \sum_{n=1}^\infty (2 \pi^2 n^4 e^{9u} - 3 \pi n^2 e^{5u}) \exp( - \pi n^2 e^{4u} ), \ \ \ \ \ (2)$

which one can verify to be rapidly decreasing both as ${u \rightarrow +\infty}$ and as ${u \rightarrow -\infty}$, with the decrease as ${u \rightarrow +\infty}$ faster than any exponential. In particular ${H_0}$ extends holomorphically to the upper half plane.

If we normalize the Fourier transform ${{\mathcal F} f(\xi)}$ of a (Schwartz) function ${f(x)}$ as ${{\mathcal F} f(\xi) := \int_{\bf R} f(x) e^{-2\pi i \xi x}\ dx}$, it is well known that the Gaussian ${x \mapsto e^{-\pi x^2}}$ is its own Fourier transform. The creation operator ${2\pi x - \frac{d}{dx}}$ interacts with the Fourier transform by the identity

$\displaystyle {\mathcal F} (( 2\pi x - \frac{d}{dx} ) f) (\xi) = -i (2 \pi \xi - \frac{d}{d\xi} ) {\mathcal F} f(\xi).$

Since ${(-i)^4 = 1}$, this implies that the function

$\displaystyle x \mapsto (2\pi x - \frac{d}{dx})^4 e^{-\pi x^2} = 128 \pi^2 (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2} + 48 \pi^2 e^{-\pi x^2}$

is its own Fourier transform. (One can view the polynomial ${128 \pi^2 (2\pi^2 x^4 - 3 \pi x^2) + 48 \pi^2}$ as a renormalised version of the fourth Hermite polynomial.) Taking a suitable linear combination of this with ${x \mapsto e^{-\pi x^2}}$, we conclude that

$\displaystyle x \mapsto (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2}$

is also its own Fourier transform. Rescaling ${x}$ by ${e^{2u}}$ and then multiplying by ${e^u}$, we conclude that the Fourier transform of

$\displaystyle x \mapsto (2 \pi^2 x^4 e^{9u} - 3 \pi x^2 e^{5u}) \exp( - \pi x^2 e^{4u} )$

is

$\displaystyle x \mapsto (2 \pi^2 x^4 e^{-9u} - 3 \pi x^2 e^{-5u}) \exp( - \pi x^2 e^{-4u} ),$

and hence by the Poisson summation formula (using symmetry and vanishing at ${n=0}$ to unfold the ${n}$ summation in (2) to the integers rather than the natural numbers) we obtain the functional equation

$\displaystyle \Phi(-u) = \Phi(u),$

which implies that ${\Phi}$ and ${H_0}$ are even functions (in particular, ${H_0}$ now extends to an entire function). From this symmetry we can also rewrite (1) as

$\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du,$

which now gives a convergent expression for the entire function ${H_0(z)}$ for all complex ${z}$. As ${\Phi}$ is even and real-valued on ${{\bf R}}$, ${H_0(z)}$ is even and also obeys the functional equation ${H_0(\overline{z}) = \overline{H_0(z)}}$, which is equivalent to the usual functional equation for the Riemann zeta function. The Riemann hypothesis is equivalent to the claim that all the zeroes of ${H_0}$ are real.

De Bruijn introduced the family ${H_t: {\bf C} \rightarrow {\bf C}}$ of deformations of ${H_0: {\bf C} \rightarrow {\bf C}}$, defined for all ${t \in {\bf R}}$ and ${z \in {\bf C}}$ by the formula

$\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du.$

From a PDE perspective, one can view ${H_t}$ as the evolution of ${H_0}$ under the backwards heat equation ${\partial_t H_t(z) = - \partial_{zz} H_t(z)}$. As with ${H_0}$, the ${H_t}$ are all even entire functions that obey the functional equation ${H_t(\overline{z}) = \overline{H_t(z)}}$, and one can ask an analogue of the Riemann hypothesis for each such ${H_t}$, namely whether all the zeroes of ${H_t}$ are real. De Bruijn showed that these hypotheses were monotone in ${t}$: if ${H_t}$ had all real zeroes for some ${t}$, then ${H_{t'}}$ would also have all zeroes real for any ${t' \geq t}$. Newman later sharpened this claim by showing the existence of a finite number ${\Lambda \leq 1/2}$, now known as the de Bruijn-Newman constant, with the property that ${H_t}$ had all zeroes real if and only if ${t \geq \Lambda}$. Thus, the Riemann hypothesis is equivalent to the inequality ${\Lambda \leq 0}$. Newman then conjectured the complementary bound ${\Lambda \geq 0}$; in his words, this conjecture asserted that if the Riemann hypothesis is true, then it is only “barely so”, in that the reality of all the zeroes is destroyed by applying heat flow for even an arbitrarily small amount of time. Over time, a significant amount of evidence was established in favour of this conjecture; most recently, in 2011, Saouter, Gourdon, and Demichel showed that ${\Lambda \geq -1.15 \times 10^{-11}}$.

In this paper we finish off the proof of Newman’s conjecture, that is we show that ${\Lambda \geq 0}$. The proof is by contradiction, assuming that ${\Lambda < 0}$ (which among other things, implies the truth of the Riemann hypothesis), and using the properties of backwards heat evolution to reach a contradiction.

Very roughly, the argument proceeds as follows. As observed by Csordas, Smith, and Varga (and also discussed in this previous blog post, the backwards heat evolution of the ${H_t}$ introduces a nice ODE dynamics on the zeroes ${x_j(t)}$ of ${H_t}$, namely that they solve the ODE

$\displaystyle \frac{d}{dt} x_j(t) = -2 \sum_{j \neq k} \frac{1}{x_k(t) - x_j(t)} \ \ \ \ \ (3)$

for all ${j}$ (one has to interpret the sum in a principal value sense as it is not absolutely convergent, but let us ignore this technicality for the current discussion). Intuitively, this ODE is asserting that the zeroes ${x_j(t)}$ repel each other, somewhat like positively charged particles (but note that the dynamics is first-order, as opposed to the second-order laws of Newtonian mechanics). Formally, a steady state (or equilibrium) of this dynamics is reached when the ${x_k(t)}$ are arranged in an arithmetic progression. (Note for instance that for any positive ${u}$, the functions ${z \mapsto e^{tu^2} \cos(uz)}$ obey the same backwards heat equation as ${H_t}$, and their zeroes are on a fixed arithmetic progression ${\{ \frac{2\pi (k+\tfrac{1}{2})}{u}: k \in {\bf Z} \}}$.) The strategy is to then show that the dynamics from time ${-\Lambda}$ to time ${0}$ creates a convergence to local equilibrium, in which the zeroes ${x_k(t)}$ locally resemble an arithmetic progression at time ${t=0}$. This will be in contradiction with known results on pair correlation of zeroes (or on related statistics, such as the fluctuations on gaps between zeroes), such as the results of Montgomery (actually for technical reasons it is slightly more convenient for us to use related results of Conrey, Ghosh, Goldston, Gonek, and Heath-Brown). Another way of thinking about this is that even very slight deviations from local equilibrium (such as a small number of gaps that are slightly smaller than the average spacing) will almost immediately lead to zeroes colliding with each other and leaving the real line as one evolves backwards in time (i.e., under the forward heat flow). This is a refinement of the strategy used in previous lower bounds on ${\Lambda}$, in which “Lehmer pairs” (pairs of zeroes of the zeta function that were unusually close to each other) were used to limit the extent to which the evolution continued backwards in time while keeping all zeroes real.

How does one obtain this convergence to local equilibrium? We proceed by broad analogy with the “local relaxation flow” method of Erdos, Schlein, and Yau in random matrix theory, in which one combines some initial control on zeroes (which, in the case of the Erdos-Schlein-Yau method, is referred to with terms such as “local semicircular law”) with convexity properties of a relevant Hamiltonian that can be used to force the zeroes towards equilibrium.

We first discuss the initial control on zeroes. For ${H_0}$, we have the classical Riemann-von Mangoldt formula, which asserts that the number of zeroes in the interval ${[0,T]}$ is ${\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log T)}$ as ${T \rightarrow \infty}$. (We have a factor of ${4\pi}$ here instead of the more familiar ${2\pi}$ due to the way ${H_0}$ is normalised.) This implies for instance that for a fixed ${\alpha}$, the number of zeroes in the interval ${[T, T+\alpha]}$ is ${\frac{\alpha}{4\pi} \log T + O(\log T)}$. Actually, because we get to assume the Riemann hypothesis, we can sharpen this to ${\frac{\alpha}{4\pi} \log T + o(\log T)}$, a result of Littlewood (see this previous blog post for a proof). Ideally, we would like to obtain similar control for the other ${H_t}$, ${\Lambda \leq t < 0}$, as well. Unfortunately we were only able to obtain the weaker claims that the number of zeroes of ${H_t}$ in ${[0,T]}$ is ${\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log^2 T)}$, and that the number of zeroes in ${[T, T+\alpha \log T]}$ is ${\frac{\alpha}{4 \pi} \log^2 T + o(\log^2 T)}$, that is to say we only get good control on the distribution of zeroes at scales ${\gg \log T}$ rather than at scales ${\gg 1}$. Ultimately this is because we were only able to get control (and in particular, lower bounds) on ${|H_t(x-iy)|}$ with high precision when ${y \gg \log x}$ (whereas ${|H_0(x-iy)|}$ has good estimates as soon as ${y}$ is larger than (say) ${2}$). This control is obtained by the expressing ${H_t(x-iy)}$ in terms of some contour integrals and using the method of steepest descent (actually it is slightly simpler to rely instead on the Stirling approximation for the Gamma function, which can be proven in turn by steepest descent methods). Fortunately, it turns out that this weaker control is still (barely) enough for the rest of our argument to go through.

Once one has the initial control on zeroes, we now need to force convergence to local equilibrium by exploiting convexity of a Hamiltonian. Here, the relevant Hamiltonian is

$\displaystyle H(t) := \sum_{j,k: j \neq k} \log \frac{1}{|x_j(t) - x_k(t)|},$

ignoring for now the rather important technical issue that this sum is not actually absolutely convergent. (Because of this, we will need to truncate and renormalise the Hamiltonian in a number of ways which we will not detail here.) The ODE (3) is formally the gradient flow for this Hamiltonian. Furthermore, this Hamiltonian is a convex function of the ${x_j}$ (because ${t \mapsto \log \frac{1}{t}}$ is a convex function on ${(0,+\infty)}$). We therefore expect the Hamiltonian to be a decreasing function of time, and that the derivative should be an increasing function of time. As time passes, the derivative of the Hamiltonian would then be expected to converge to zero, which should imply convergence to local equilibrium.

Formally, the derivative of the above Hamiltonian is

$\displaystyle \partial_t H(t) = -4 E(t), \ \ \ \ \ (4)$

where ${E(t)}$ is the “energy”

$\displaystyle E(t) := \sum_{j,k: j \neq k} \frac{1}{|x_j(t) - x_k(t)|^2}.$

Again, there is the important technical issue that this quantity is infinite; but it turns out that if we renormalise the Hamiltonian appropriately, then the energy will also become suitably renormalised, and in particular will vanish when the ${x_j}$ are arranged in an arithmetic progression, and be positive otherwise. One can also formally calculate the derivative of ${E(t)}$ to be a somewhat complicated but manifestly non-negative quantity (a sum of squares); see this previous blog post for analogous computations in the case of heat flow on polynomials. After flowing from time ${\Lambda}$ to time ${0}$, and using some crude initial bounds on ${H(t)}$ and ${E(t)}$ in this region (coming from the Riemann-von Mangoldt type formulae mentioned above and some further manipulations), we can eventually show that the (renormalisation of the) energy ${E(0)}$ at time zero is small, which forces the ${x_j}$ to locally resemble an arithmetic progression, which gives the required convergence to local equilibrium.

There are a number of technicalities involved in making the above sketch of argument rigorous (for instance, justifying interchanges of derivatives and infinite sums turns out to be a little bit delicate). I will highlight here one particular technical point. One of the ways in which we make expressions such as the energy ${E(t)}$ finite is to truncate the indices ${j,k}$ to an interval ${I}$ to create a truncated energy ${E_I(t)}$. In typical situations, we would then expect ${E_I(t)}$ to be decreasing, which will greatly help in bounding ${E_I(0)}$ (in particular it would allow one to control ${E_I(0)}$ by time-averaged quantities such as ${\int_{\Lambda/2}^0 E_I(t)\ dt}$, which can in turn be controlled using variants of (4)). However, there are boundary effects at both ends of ${I}$ that could in principle add a large amount of energy into ${E_I}$, which is bad news as it could conceivably make ${E_I(0)}$ undesirably large even if integrated energies such as ${\int_{\Lambda/2}^0 E_I(t)\ dt}$ remain adequately controlled. As it turns out, such boundary effects are negligible as long as there is a large gap between adjacent zeroes at boundary of ${I}$ – it is only narrow gaps that can rapidly transmit energy across the boundary of ${I}$. Now, narrow gaps can certainly exist (indeed, the GUE hypothesis predicts these happen a positive fraction of the time); but the pigeonhole principle (together with the Riemann-von Mangoldt formula) can allow us to pick the endpoints of the interval ${I}$ so that no narrow gaps appear at the boundary of ${I}$ for any given time ${t}$. However, there was a technical problem: this argument did not allow one to find a single interval ${I}$ that avoided gaps for all times ${\Lambda/2 \leq t \leq 0}$ simultaneously – the pigeonhole principle could produce a different interval ${I}$ for each time ${t}$! Since the number of times was uncountable, this was a serious issue. (In physical terms, the problem was that there might be very fast “longitudinal waves” in the dynamics that, at each time, cause some gaps between zeroes to be highly compressed, but the specific gap that was narrow changed very rapidly with time. Such waves could, in principle, import a huge amount of energy into ${E_I}$ by time ${0}$.) To resolve this, we borrowed a PDE trick of Bourgain’s, in which the pigeonhole principle was coupled with local conservation laws. More specifically, we use the phenomenon that very narrow gaps ${g_i = x_{i+1}-x_i}$ take a nontrivial amount of time to expand back to a reasonable size (this can be seen by comparing the evolution of this gap with solutions of the scalar ODE ${\partial_t g = \frac{4}{g^2}}$, which represents the fastest at which a gap such as ${g_i}$ can expand). Thus, if a gap ${g_i}$ is reasonably large at some time ${t_0}$, it will also stay reasonably large at slightly earlier times ${t \in [t_0-\delta, t_0]}$ for some moderately small ${\delta>0}$. This lets one locate an interval ${I}$ that has manageable boundary effects during the times in ${[t_0-\delta, t_0]}$, so in particular ${E_I}$ is basically non-increasing in this time interval. Unfortunately, this interval is a little bit too short to cover all of ${[\Lambda/2,0]}$; however it turns out that one can iterate the above construction and find a nested sequence of intervals ${I_k}$, with each ${E_{I_k}}$ non-increasing in a different time interval ${[t_k - \delta, t_k]}$, and with all of the time intervals covering ${[\Lambda/2,0]}$. This turns out to be enough (together with the obvious fact that ${E_I}$ is monotone in ${I}$) to still control ${E_I(0)}$ for some reasonably sized interval ${I}$, as required for the rest of the arguments.

ADDED LATER: the following analogy (involving functions with just two zeroes, rather than an infinite number of zeroes) may help clarify the relation between this result and the Riemann hypothesis (and in particular why this result does not make the Riemann hypothesis any easier to prove, in fact it confirms the delicate nature of that hypothesis). Suppose one had a quadratic polynomial ${P}$ of the form ${P(z) = z^2 + \Lambda}$, where ${\Lambda}$ was an unknown real constant. Suppose that one was for some reason interested in the analogue of the “Riemann hypothesis” for ${P}$, namely that all the zeroes of ${P}$ are real. A priori, there are three scenarios:

• (Riemann hypothesis false) ${\Lambda > 0}$, and ${P}$ has zeroes ${\pm i |\Lambda|^{1/2}}$ off the real axis.
• (Riemann hypothesis true, but barely so) ${\Lambda = 0}$, and both zeroes of ${P}$ are on the real axis; however, any slight perturbation of ${\Lambda}$ in the positive direction would move zeroes off the real axis.
• (Riemann hypothesis true, with room to spare) ${\Lambda < 0}$, and both zeroes of ${P}$ are on the real axis. Furthermore, any slight perturbation of ${P}$ will also have both zeroes on the real axis.

The analogue of our result in this case is that ${\Lambda \geq 0}$, thus ruling out the third of the three scenarios here. In this simple example in which only two zeroes are involved, one can think of the inequality ${\Lambda \geq 0}$ as asserting that if the zeroes of ${P}$ are real, then they must be repeated. In our result (in which there are an infinity of zeroes, that become increasingly dense near infinity), and in view of the convergence to local equilibrium properties of (3), the analogous assertion is that if the zeroes of ${H_0}$ are real, then they do not behave locally as if they were in arithmetic progression.

The Polymath14 online collaboration has uploaded to the arXiv its paper “Homogeneous length functions on groups“, submitted to Algebra & Number Theory. The paper completely classifies homogeneous length functions ${\| \|: G \rightarrow {\bf R}^+}$ on an arbitrary group ${G = (G,\cdot,e,()^{-1})}$, that is to say non-negative functions that obey the symmetry condition ${\|x^{-1}\| = \|x\|}$, the non-degeneracy condition ${\|x\|=0 \iff x=e}$, the triangle inequality ${\|xy\| \leq \|x\| + \|y\|}$, and the homogeneity condition ${\|x^2\| = 2\|x\|}$. It turns out that these norms can only arise from pulling back the norm of a Banach space by an isometric embedding of the group. Among other things, this shows that ${G}$ can only support a homogeneous length function if and only if it is abelian and torsion free, thus giving a metric description of this property.

The proof is based on repeated use of the homogeneous length function axioms, combined with elementary identities of commutators, to obtain increasingly good bounds on quantities such as ${\|[x,y]\|}$, until one can show that such norms have to vanish. See the previous post for a full proof. The result is robust in that it allows for some loss in the triangle inequality and homogeneity condition, allowing for some new results on “quasinorms” on groups that relate to quasihomomorphisms.

As there are now a large number of comments on the previous post on this project, this post will also serve as the new thread for any final discussion of this project as it winds down.

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions II. Divisor correlations in short ranges“. This is a sequel of sorts to our previous paper on divisor correlations, though the proof techniques in this paper are rather different. As with the previous paper, our interest is in correlations such as

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) \ \ \ \ \ (1)$

for medium-sized ${h}$ and large ${X}$, where ${k \geq l \geq 1}$ are natural numbers and ${d_k(n) = \sum_{n = m_1 \dots m_k} 1}$ is the ${k^{th}}$ divisor function (actually our methods can also treat a generalisation in which ${k}$ is non-integer, but for simplicity let us stick with the integer case for this discussion). Our methods also allow for one of the divisor function factors to be replaced with a von Mangoldt function, but (in contrast to the previous paper) we cannot treat the case when both factors are von Mangoldt.

As discussed in this previous post, one heuristically expects an asymptotic of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O( X^{1/2+\varepsilon})$

for any fixed ${\varepsilon>0}$, where ${P_{k,l,h}}$ is a certain explicit (but rather complicated) polynomial of degree ${k+l-1}$. Such asymptotics are known when ${l \leq 2}$, but remain open for ${k \geq l \geq 3}$. In the previous paper, we were able to obtain a weaker bound of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O_A( X \log^{-A} X)$

for ${1-O_A(\log^{-A} X)}$ of the shifts ${-H \leq h \leq H}$, whenever the shift range ${H}$ lies between ${X^{8/33+\varepsilon}}$ and ${X^{1-\varepsilon}}$. But the methods become increasingly hard to use as ${H}$ gets smaller. In this paper, we use a rather different method to obtain the even weaker bound

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = (1+o(1)) P_{k,l,h}( \log X ) X$

for ${1-o(1)}$ of the shifts ${-H \leq h \leq H}$, where ${H}$ can now be as short as ${H = \log^{10^4 k \log k} X}$. The constant ${10^4}$ can be improved, but there are serious obstacles to using our method to go below ${\log^{k \log k} X}$ (as the exceptionally large values of ${d_k}$ then begin to dominate). This can be viewed as an analogue to our previous paper on correlations of bounded multiplicative functions on average, in which the functions ${d_k,d_l}$ are now unbounded, and indeed our proof strategy is based in large part on that paper (but with many significant new technical complications).

We now discuss some of the ingredients of the proof. Unsurprisingly, the first step is the circle method, expressing (1) in terms of exponential sums such as

$\displaystyle S(\alpha) := \sum_{n \leq X} d_k(n) e(\alpha).$

Actually, it is convenient to first prune ${d_k}$ slightly by zeroing out this function on “atypical” numbers ${n}$ that have an unusually small or large number of factors in a certain sense, but let us ignore this technicality for this discussion. The contribution of ${S(\alpha)}$ for “major arc” ${\alpha}$ can be treated by standard techniques (and is the source of the main term ${P_{k,l,h}(\log X) X}$; the main difficulty comes from treating the contribution of “minor arc” ${\alpha}$.

In our previous paper on bounded multiplicative functions, we used Plancherel’s theorem to estimate the global ${L^2}$ norm ${\int_{{\bf R}/{\bf Z}} |S(\alpha)|^2\ d\alpha}$, and then also used the Katai-Bourgain-Sarnak-Ziegler orthogonality criterion to control local ${L^2}$ norms ${\int_I |S(\alpha)|^2\ d\alpha}$, where ${I}$ was a minor arc interval of length about ${1/H}$, and these two estimates together were sufficient to get a good bound on correlations by an application of Hölder’s inequality. For ${d_k}$, it is more convenient to use Dirichlet series methods (and Ramaré-type factorisations of such Dirichlet series) to control local ${L^2}$ norms on minor arcs, in the spirit of the proof of the Matomaki-Radziwill theorem; a key point is to develop “log-free” mean value theorems for Dirichlet series associated to functions such as ${d_k}$, so as not to wipe out the (rather small) savings one will get over the trivial bound from this method. On the other hand, the global ${L^2}$ bound will definitely be unusable, because the ${\ell^2}$ sum ${\sum_{n \leq X} d_k(n)^2}$ has too many unwanted factors of ${\log X}$. Fortunately, we can substitute this global ${L^2}$ bound with a “large values” bound that controls expressions such as

$\displaystyle \sum_{i=1}^J \int_{I_i} |S(\alpha)|^2\ d\alpha$

for a moderate number of disjoint intervals ${I_1,\dots,I_J}$, with a bound that is slightly better (for ${J}$ a medium-sized power of ${\log X}$) than what one would have obtained by bounding each integral ${\int_{I_i} |S(\alpha)|^2\ d\alpha}$ separately. (One needs to save more than ${J^{1/2}}$ for the argument to work; we end up saving a factor of about ${J^{3/4}}$.) This large values estimate is probably the most novel contribution of the paper. After taking the Fourier transform, matters basically reduce to getting a good estimate for

$\displaystyle \sum_{i=1}^J (\int_X^{2X} |\sum_{x \leq n \leq x+H} d_k(n) e(\alpha_i n)|^2\ dx)^{1/2},$

where ${\alpha_i}$ is the midpoint of ${I_i}$; thus we need some upper bound on the large local Fourier coefficients of ${d_k}$. These coefficients are difficult to calculate directly, but, in the spirit of a paper of Ben Green and myself, we can try to replace ${d_k}$ by a more tractable and “pseudorandom” majorant ${\tilde d_k}$ for which the local Fourier coefficients are computable (on average). After a standard duality argument, one ends up having to control expressions such as

$\displaystyle |\sum_{x \leq n \leq x+H} \tilde d_k(n) e((\alpha_i -\alpha_{i'}) n)|$

after various averaging in the ${x, i,i'}$ parameters. These local Fourier coefficients of ${\tilde d_k}$ turn out to be small on average unless ${\alpha_i -\alpha_{i'}}$ is “major arc”. One then is left with a mostly combinatorial problem of trying to bound how often this major arc scenario occurs. This is very close to a computation in the previously mentioned paper of Ben and myself; there is a technical wrinkle in that the ${\alpha_i}$ are not as well separated as they were in my paper with Ben, but it turns out that one can modify the arguments in that paper to still obtain a satisfactory estimate in this case (after first grouping nearby frequencies ${\alpha_i}$ together, and modifying the duality argument accordingly).

In the tradition of “Polymath projects“, the problem posed in the previous two blog posts has now been solved, thanks to the cumulative effect of many small contributions by many participants (including, but not limited to, Sean Eberhard, Tobias Fritz, Siddharta Gadgil, Tobias Hartnick, Chris Jerdonek, Apoorva Khare, Antonio Machiavelo, Pace Nielsen, Andy Putman, Will Sawin, Alexander Shamov, Lior Silberman, and David Speyer). In this post I’ll write down a streamlined resolution, eliding a number of important but ultimately removable partial steps and insights made by the above contributors en route to the solution.

Theorem 1 Let ${G = (G,\cdot)}$ be a group. Suppose one has a “seminorm” function ${\| \|: G \rightarrow [0,+\infty)}$ which obeys the triangle inequality

$\displaystyle \|xy \| \leq \|x\| + \|y\|$

for all ${x,y \in G}$, with equality whenever ${x=y}$. Then the seminorm factors through the abelianisation map ${G \mapsto G/[G,G]}$.

Proof: By the triangle inequality, it suffices to show that ${\| [x,y]\| = 0}$ for all ${x,y \in G}$, where ${[x,y] := xyx^{-1}y^{-1}}$ is the commutator.

We first establish some basic facts. Firstly, by hypothesis we have ${\|x^2\| = 2 \|x\|}$, and hence ${\|x^n \| = n \|x\|}$ whenever ${n}$ is a power of two. On the other hand, by the triangle inequality we have ${\|x^n \| \leq n\|x\|}$ for all positive ${n}$, and hence by the triangle inequality again we also have the matching lower bound, thus

$\displaystyle \|x^n \| = n \|x\|$

for all ${n > 0}$. The claim is also true for ${n=0}$ (apply the preceding bound with ${x=1}$ and ${n=2}$). By replacing ${\|x\|}$ with ${\max(\|x\|, \|x^{-1}\|)}$ if necessary we may now also assume without loss of generality that ${\|x^{-1} \| = \|x\|}$, thus

$\displaystyle \|x^n \| = |n| \|x\| \ \ \ \ \ (1)$

for all integers ${n}$.

Next, for any ${x,y \in G}$, and any natural number ${n}$, we have

$\displaystyle \|yxy^{-1} \| = \frac{1}{n} \| (yxy^{-1})^n \|$

$\displaystyle = \frac{1}{n} \| y x^n y^{-1} \|$

$\displaystyle \leq \frac{1}{n} ( \|y\| + n \|x\| + \|y\|^{-1} )$

so on taking limits as ${n \rightarrow \infty}$ we have ${\|yxy^{-1} \| \leq \|x\|}$. Replacing ${x,y}$ by ${yxy^{-1},y^{-1}}$ gives the matching lower bound, thus we have the conjugation invariance

$\displaystyle \|yxy^{-1} \| = \|x\|. \ \ \ \ \ (2)$

Next, we observe that if ${x,y,z,w}$ are such that ${x}$ is conjugate to both ${wy}$ and ${zw^{-1}}$, then one has the inequality

$\displaystyle \|x\| \leq \frac{1}{2} ( \|y \| + \| z \| ). \ \ \ \ \ (3)$

Indeed, if we write ${x = swys^{-1} = t zw^{-1} t^{-1}}$ for some ${s,t \in G}$, then for any natural number ${n}$ one has

$\displaystyle \|x\| = \frac{1}{2n} \| x^n x^n \|$

$\displaystyle = \frac{1}{2n} \| swy \dots wy s^{-1}t zw^{-1} \dots zw^{-1} t^{-1} \|$

where the ${wy}$ and ${zw^{-1}}$ terms each appear ${n}$ times. From (2) we see that conjugation by ${w}$ does not affect the norm. Using this and the triangle inequality several times, we conclude that

$\displaystyle \|x\| \leq \frac{1}{2n} ( \|s\| + n \|y\| + \| s^{-1} t\| + n \|z\| + \|t^{-1} \| ),$

and the claim (3) follows by sending ${n \rightarrow \infty}$.

The following special case of (3) will be of particular interest. Let ${x,y \in G}$, and for any integers ${m,k}$, define the quantity

$\displaystyle f(m,k) := \| x^m [x,y]^k \|.$

Observe that ${x^m [x,y]^k}$ is conjugate to both ${x (x^{m-1} [x,y]^k)}$ and to ${(y^{-1} x^m [x,y]^{k-1} xy) x^{-1}}$, hence by (3) one has

$\displaystyle \| x^m [x,y]^k \| \leq \frac{1}{2} ( \| x^{m-1} [x,y]^k \| + \| y^{-1} x^{m} [x,y]^{k-1} xy \|)$

which by (2) leads to the recursive inequality

$\displaystyle f(m,k) \leq \frac{1}{2} (f(m-1,k) + f(m+1,k-1)).$

We can write this in probabilistic notation as

$\displaystyle f(m,k) \leq {\bf E} f( (m,k) + X )$

where ${X}$ is a random vector that takes the values ${(-1,0)}$ and ${(1,-1)}$ with probability ${1/2}$ each. Iterating this, we conclude in particular that for any large natural number ${n}$, one has

$\displaystyle f(0,n) \leq {\bf E} f( Z )$

where ${Z := (0,n) + X_1 + \dots + X_{2n}}$ and ${X_1,\dots,X_{2n}}$ are iid copies of ${X}$. We can write ${Z = (1,-1/2) (Y_1 + \dots + Y_{2n})}$ where $Y_1,\dots,Y_{2n} = \pm 1$ are iid signs.  By the triangle inequality, we thus have

$\displaystyle f( Z ) \leq |Y_1+\dots+Y_{2n}| (\|x\| + \frac{1}{2} \| [x,y] \|),$

noting that $Y_1+\dots+Y_{2n}$ is an even integer.  On the other hand, $Y_1+\dots+Y_{2n}$ has mean zero and variance $2n$, hence by Cauchy-Schwarz

$\displaystyle f(0,n) \leq \sqrt{2n}( \|x\| + \frac{1}{2} \| [x,y] \|).$

But by (1), the left-hand side is equal to ${n \| [x,y]\|}$. Dividing by ${n}$ and then sending ${n \rightarrow \infty}$, we obtain the claim. $\Box$

The above theorem reduces such seminorms to abelian groups. It is easy to see from (1) that any torsion element of such groups has zero seminorm, so we can in fact restrict to torsion-free groups, which we now write using additive notation ${G = (G,+)}$, thus for instance ${\| nx \| = |n| \|x\|}$ for ${n \in {\bf Z}}$. We think of ${G}$ as a ${{\bf Z}}$-module. One can then extend the seminorm to the associated ${{\bf Q}}$-vector space ${G \otimes_{\bf Z} {\bf Q}}$ by the formula ${\|\frac{a}{b} x\| := \frac{a}{b} \|x\|}$, and then to the associated ${{\bf R}}$-vector space ${G \otimes_{\bf Z} {\bf R}}$ by continuity, at which point it becomes a genuine seminorm (provided we have ensured the symmetry condition ${\|x\| = \|x^{-1}\|}$). Conversely, any seminorm on ${G \otimes_{\bf Z} {\bf R}}$ induces a seminorm on ${G}$. (These arguments also appear in this paper of Khare and Rajaratnam.)

This post is a continuation of the previous post, which has attracted a large number of comments. I’m recording here some calculations that arose from those comments (particularly those of Pace Nielsen, Lior Silberman, Tobias Fritz, and Apoorva Khare). Please feel free to either continue these calculations or to discuss other approaches to the problem, such as those mentioned in the remaining comments to the previous post.

Let ${F_2}$ be the free group on two generators ${a,b}$, and let ${\| \|: F_2 \rightarrow {\bf R}^+}$ be a quantity obeying the triangle inequality

$\displaystyle \| xy\| \leq \|x \| + \|y\|$

and the linear growth property

$\displaystyle \| x^n \| = |n| \| x\|$

for all ${x,y \in F_2}$ and integers ${n \in {\bf Z}}$; this implies the conjugation invariance

$\displaystyle \| y^{-1} x y \| = \|x\|$

or equivalently

$\displaystyle \| xy \| = \| yx\|$

We consider inequalities of the form

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \alpha \|x\| + \beta \| y\| \ \ \ \ \ (1)$

or

$\displaystyle \| xyx^{-2}y^{-1} \| \leq \gamma \|x\| + \delta \| y\| \ \ \ \ \ (2)$

for various real numbers ${\alpha,\beta,\gamma,\delta}$. For instance, since

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \| xyx^{-1}\| + \|y^{-1} \| = \|y\| + \|y\|$

we have (1) for ${(\alpha,\beta) = (2,0)}$. We also have the following further relations:

Proposition 1

• (i) If (1) holds for ${(\alpha,\beta)}$, then it holds for ${(\beta,\alpha)}$.
• (ii) If (1) holds for ${(\alpha,\beta)}$, then (2) holds for ${(\alpha+1, \frac{\beta}{2})}$.
• (iii) If (2) holds for ${(\gamma,\delta)}$, then (1) holds for ${(\frac{2\gamma}{3}, \frac{2\delta}{3})}$.
• (iv) If (1) holds for ${(\alpha,\beta)}$ and (2) holds for ${(\gamma,\delta)}$, then (1) holds for ${(\frac{2\alpha+1+\gamma}{4}, \frac{\delta+\beta}{4})}$.

Proof: For (i) we simply observe that

$\displaystyle \| xyx^{-1} y^{-1} \| = \| (xyx^{-1} y^{-1})^{-1} \| = \| y^{-1} x^{-1} y x \| = \| y x y^{-1} x^{-1} \|.$

For (ii), we calculate

$\displaystyle \| xyx^{-2}y^{-1} \| = \frac{1}{2}\| (xyx^{-2}y^{-1})^2 \|$

$\displaystyle = \frac{1}{2} \| (xyx^{-2}y^{-1} x) (yx^{-2} y^{-1}) \|$

$\displaystyle \leq \frac{1}{2} (\| xyx^{-2}y^{-1} x\| + \|yx^{-2} y^{-1}\|)$

$\displaystyle \leq \frac{1}{2} ( \| x^2 y x^{-2} y^{-1} \| + 2 \|x\| )$

$\displaystyle \leq \frac{1}{2} ( 2 \alpha \|x\| + \beta \|y\| + 2 \|x\|)$

giving the claim.

For (iii), we calculate

$\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{3} \| (xyx^{-1}y^{-1})^3 \|$

$\displaystyle = \frac{1}{3} \| (xyx) (x^{-2} y^{-1} xy) (xyx)^{-1} (x^2 y x^{-1} y^{-1}) \|$

$\displaystyle \leq \frac{1}{3} ( \| x^{-2} y^{-1} xy\| + \| x^2 y x^{-1} y^{-1}\| )$

$\displaystyle = \frac{1}{3} ( \| xy x^{-2} y^{-1} \| + \|x^{-1} y^{-1} x^2 y \| )$

$\displaystyle \leq \frac{1}{3} ( \gamma \|x\| + \delta \|y\| + \gamma \|x\| + \delta \|y\|)$

giving the claim.

For (iv), we calculate

$\displaystyle \| xyx^{-1}y^{-1}\| = \frac{1}{4} \| (xyx^{-1}y^{-1})^4 \|$

$\displaystyle = \frac{1}{4} \| (xy) (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) (xy)^{-1} (x^2yx^{-1}y^{-1}) \|$

$\displaystyle \leq \frac{1}{4} ( \| (x^{-1} y^{-1} x) (y x^{-1} y^{-1}) (xyx^{-1}) \| + \|x^2yx^{-1}y^{-1}\| )$

$\displaystyle \leq \frac{1}{4} ( \|(y x^{-1} y^{-1}) (xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)$

$\displaystyle \leq \frac{1}{4} ( \|x\| + \|(xy^{-1}x^{-1})(x^{-1} y x) \| + \gamma \|x\| + \delta \|y\|)$

$\displaystyle = \frac{1}{4} ( \|x\| + \|x^{-2} y x^2 y^{-1} \|+ \gamma \|x\| + \delta \|y\|)$

$\displaystyle \leq \frac{1}{4} ( \|x\| + 2\alpha \|x\| + \beta \|y\| + \gamma \|x\| + \delta \|y\|)$

giving the claim. $\Box$

Here is a typical application of the above estimates. If (1) holds for ${(\alpha,\beta)}$, then by part (i) it holds for ${(\beta,\alpha)}$, then by (ii) (2) holds for ${(\beta+1,\frac{\alpha}{2})}$, then by (iv) (1) holds for ${(\frac{3\beta+2}{4}, \frac{3\alpha}{8})}$. The map ${(\alpha,\beta) \mapsto (\frac{3\beta+2}{4}, \frac{3\alpha}{8})}$ has fixed point ${(\alpha,\beta) = (\frac{16}{23}, \frac{6}{23})}$, thus

$\displaystyle \| xyx^{-1}y^{-1} \| \leq \frac{16}{23} \|x\| + \frac{6}{23} \|y\|.$

For instance, if ${\|a\|, \|b\| \leq 1}$, then ${\|aba^{-1}b^{-1} \| \leq 22/23 = 0.95652\dots}$.

Here is a curious question posed to me by Apoorva Khare that I do not know the answer to. Let ${F_2}$ be the free group on two generators ${a,b}$. Does there exist a metric ${d}$ on this group which is

• bi-invariant, thus ${d(xg,yg)=d(gx,gy) = d(x,y)}$ for all ${x,y,g \in F_2}$; and
• linear growth in the sense that ${d(x^n,1) = n d(x,1)}$ for all ${x \in F_2}$ and all natural numbers ${n}$?

By defining the “norm” of an element ${x \in F_2}$ to be ${\| x\| := d(x,1)}$, an equivalent formulation of the problem asks if there exists a non-negative norm function ${\| \|: F_2 \rightarrow {\bf R}^+}$ that obeys the conjugation invariance

$\displaystyle \| gxg^{-1} \| = \|x \| \ \ \ \ \ (1)$

for all ${x,g \in F_2}$, the triangle inequality

$\displaystyle \| xy \| \leq \| x\| + \| y\| \ \ \ \ \ (2)$

for all ${x,y \in F_2}$, and the linear growth

$\displaystyle \| x^n \| = |n| \|x\| \ \ \ \ \ (3)$

for all ${x \in F_2}$ and ${n \in {\bf Z}}$, and such that ${\|x\| > 0}$ for all non-identity ${x \in F_2}$. Indeed, if such a norm exists then one can just take ${d(x,y) := \| x y^{-1} \|}$ to give the desired metric.

One can normalise the norm of the generators to be at most ${1}$, thus

$\displaystyle \| a \|, \| b \| \leq 1.$

This can then be used to upper bound the norm of other words in ${F_2}$. For instance, from (1), (3) one has

$\displaystyle \| aba^{-1} \|, \| b^{-1} a b \|, \| a^{-1} b^{-1} a \|, \| bab^{-1}\| \leq 1.$

A bit less trivially, from (3), (2), (1) one can bound commutators as

$\displaystyle \| aba^{-1} b^{-1} \| = \frac{1}{3} \| (aba^{-1} b^{-1})^3 \|$

$\displaystyle = \frac{1}{3} \| (aba^{-1}) (b^{-1} ab) (a^{-1} b^{-1} a) (b ab^{-1}) \|$

$\displaystyle \leq \frac{4}{3}.$

In a similar spirit one has

$\displaystyle \| aba^{-2} b^{-1} \| = \frac{1}{2} \| (aba^{-2} b^{-1})^2 \|$

$\displaystyle = \frac{1}{2} \| (aba^{-1}) (a^{-1} b^{-1} a) (ba^{-1} b^{-1}) (ba^{-1} b^{-1}) \|$

$\displaystyle \leq 2.$

What is not clear to me is if one can keep arguing like this to continually improve the upper bounds on the norm ${\| g\|}$ of a given non-trivial group element ${g}$ to the point where this norm must in fact vanish, which would demonstrate that no metric with the above properties on ${F_2}$ would exist (and in fact would impose strong constraints on similar metrics existing on other groups as well). It is also tempting to use some ideas from geometric group theory (e.g. asymptotic cones) to try to understand these metrics further, though I wasn’t able to get very far with this approach. Anyway, this feels like a problem that might be somewhat receptive to a more crowdsourced attack, so I am posing it here in case any readers wish to try to make progress on it.

The Boussinesq equations for inviscid, incompressible two-dimensional fluid flow in the presence of gravity are given by

$\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) u_x = -\partial_x p \ \ \ \ \ (1)$

$\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) u_y = \rho - \partial_y p \ \ \ \ \ (2)$

$\displaystyle (\partial_t + u_x \partial_x+ u_y \partial_y) \rho = 0 \ \ \ \ \ (3)$

$\displaystyle \partial_x u_x + \partial_y u_y = 0 \ \ \ \ \ (4)$

where ${u: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}^2}$ is the velocity field, ${p: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}}$ is the pressure field, and ${\rho: {\bf R} \times {\bf R}^2 \rightarrow {\bf R}}$ is the density field (or, in some physical interpretations, the temperature field). In this post we shall restrict ourselves to formal manipulations, assuming implicitly that all fields are regular enough (or sufficiently decaying at spatial infinity) that the manipulations are justified. Using the material derivative ${D_t := \partial_t + u_x \partial_x + u_y \partial_y}$, one can abbreviate these equations as

$\displaystyle D_t u_x = -\partial_x p$

$\displaystyle D_t u_y = \rho - \partial_y p$

$\displaystyle D_t \rho = 0$

$\displaystyle \partial_x u_x + \partial_y u_y = 0.$

One can eliminate the role of the pressure ${p}$ by working with the vorticity ${\omega := \partial_x u_y - \partial_y u_x}$. A standard calculation then leads us to the equivalent “vorticity-stream” formulation

$\displaystyle D_t \omega = \partial_x \rho$

$\displaystyle D_t \rho = 0$

$\displaystyle \omega = \partial_x u_y - \partial_y u_x$

$\displaystyle \partial_x u_y + \partial_y u_y = 0$

of the Boussinesq equations. The latter two equations can be used to recover the velocity field ${u}$ from the vorticity ${\omega}$ by the Biot-Savart law

$\displaystyle u_x := -\partial_y \Delta^{-1} \omega; \quad u_y = \partial_x \Delta^{-1} \omega.$

It has long been observed (see e.g. Section 5.4.1 of Bertozzi-Majda) that the Boussinesq equations are very similar, though not quite identical, to the three-dimensional inviscid incompressible Euler equations under the hypothesis of axial symmetry (with swirl). The Euler equations are

$\displaystyle \partial_t u + (u \cdot \nabla) u = - \nabla p$

$\displaystyle \nabla \cdot u = 0$

where now the velocity field ${u: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}^3}$ and pressure field ${p: {\bf R} \times {\bf R}^3 \rightarrow {\bf R}}$ are over the three-dimensional domain ${{\bf R}^3}$. If one expresses ${{\bf R}^3}$ in polar coordinates ${(z,r,\theta)}$ then one can write the velocity vector field ${u}$ in these coordinates as

$\displaystyle u = u^z \frac{d}{dz} + u^r \frac{d}{dr} + u^\theta \frac{d}{d\theta}.$

If we make the axial symmetry assumption that these components, as well as ${p}$, do not depend on the ${\theta}$ variable, thus

$\displaystyle \partial_\theta u^z, \partial_\theta u^r, \partial_\theta u^\theta, \partial_\theta p = 0,$

then after some calculation (which we give below the fold) one can eventually reduce the Euler equations to the system

$\displaystyle \tilde D_t \omega = \frac{1}{r^4} \partial_z \rho \ \ \ \ \ (5)$

$\displaystyle \tilde D_t \rho = 0 \ \ \ \ \ (6)$

$\displaystyle \omega = \frac{1}{r} (\partial_z u^r - \partial_r u^z) \ \ \ \ \ (7)$

$\displaystyle \partial_z(ru^z) + \partial_r(ru^r) = 0 \ \ \ \ \ (8)$

where ${\tilde D_t := \partial_t + u^z \partial_z + u^r \partial_r}$ is the modified material derivative, and ${\rho}$ is the field ${\rho := (r u^\theta)^2}$. This is almost identical with the Boussinesq equations except for some additional powers of ${r}$; thus, the intuition is that the Boussinesq equations are a simplified model for axially symmetric Euler flows when one stays away from the axis ${r=0}$ and also does not wander off to ${r=\infty}$.

However, this heuristic is not rigorous; the above calculations do not actually give an embedding of the Boussinesq equations into Euler. (The equations do match on the cylinder ${r=1}$, but this is a measure zero subset of the domain, and so is not enough to give an embedding on any non-trivial region of space.) Recently, while playing around with trying to embed other equations into the Euler equations, I discovered that it is possible to make such an embedding into a four-dimensional Euler equation, albeit on a slightly curved manifold rather than in Euclidean space. More precisely, we use the Ebin-Marsden generalisation

$\displaystyle \partial_t u + \nabla_u u = - \mathrm{grad}_g p$

$\displaystyle \mathrm{div}_g u = 0$

of the Euler equations to an arbitrary Riemannian manifold ${(M,g)}$ (ignoring any issues of boundary conditions for this discussion), where ${u: {\bf R} \rightarrow \Gamma(TM)}$ is a time-dependent vector field, ${p: {\bf R} \rightarrow C^\infty(M)}$ is a time-dependent scalar field, and ${\nabla_u}$ is the covariant derivative along ${u}$ using the Levi-Civita connection ${\nabla}$. In Penrose abstract index notation (using the Levi-Civita connection ${\nabla}$, and raising and lowering indices using the metric ${g = g_{ij}}$), the equations of motion become

$\displaystyle \partial_t u^i + u^j \nabla_j u^i = - \nabla^i p \ \ \ \ \ (9)$

$\displaystyle \nabla_i u^i = 0;$

in coordinates, this becomes

$\displaystyle \partial_t u^i + u^j (\partial_j u^i + \Gamma^i_{jk} u^k) = - g^{ij} \partial_j p$

$\displaystyle \partial_i u^i + \Gamma^i_{ik} u^k = 0 \ \ \ \ \ (10)$

where the Christoffel symbols ${\Gamma^i_{jk}}$ are given by the formula

$\displaystyle \Gamma^i_{jk} := \frac{1}{2} g^{il} (\partial_j g_{lk} + \partial_k g_{lj} - \partial_l g_{jk}),$

where ${g^{il}}$ is the inverse to the metric tensor ${g_{il}}$. If the coordinates are chosen so that the volume form ${dg}$ is the Euclidean volume form ${dx}$, thus ${\mathrm{det}(g)=1}$, then on differentiating we have ${g^{ij} \partial_k g_{ij} = 0}$, and hence ${\Gamma^i_{ik} = 0}$, and so the divergence-free equation (10) simplifies in this case to ${\partial_i u^i = 0}$. The Ebin-Marsden Euler equations are the natural generalisation of the Euler equations to arbitrary manifolds; for instance, they (formally) conserve the kinetic energy

$\displaystyle \frac{1}{2} \int_M |u|_g^2\ dg = \frac{1}{2} \int_M g_{ij} u^i u^j\ dg$

and can be viewed as the formal geodesic flow equation on the infinite-dimensional manifold of volume-preserving diffeomorphisms on ${M}$ (see this previous post for a discussion of this in the flat space case).

The specific four-dimensional manifold in question is the space ${{\bf R} \times {\bf R}^+ \times {\bf R}/{\bf Z} \times {\bf R}/{\bf Z}}$ with metric

$\displaystyle dx^2 + dy^2 + y^{-1} dz^2 + y dw^2$

and solutions to the Boussinesq equation on ${{\bf R} \times {\bf R}^+}$ can be transformed into solutions to the Euler equations on this manifold. This is part of a more general family of embeddings into the Euler equations in which passive scalar fields (such as the field ${\rho}$ appearing in the Boussinesq equations) can be incorporated into the dynamics via fluctuations in the Riemannian metric ${g}$). I am writing the details below the fold (partly for my own benefit).

I have just uploaded to the arXiv the paper “An inverse theorem for an inequality of Kneser“, submitted to a special issue of the Proceedings of the Steklov Institute of Mathematics in honour of Sergei Konyagin. It concerns an inequality of Kneser discussed previously in this blog, namely that

$\displaystyle \mu(A+B) \geq \min(\mu(A)+\mu(B), 1) \ \ \ \ \ (1)$

whenever ${A,B}$ are compact non-empty subsets of a compact connected additive group ${G}$ with probability Haar measure ${\mu}$.  (A later result of Kemperman extended this inequality to the nonabelian case.) This inequality is non-trivial in the regime

$\displaystyle \mu(A), \mu(B), 1- \mu(A)-\mu(B) > 0. \ \ \ \ \ (2)$

The connectedness of ${G}$ is essential, otherwise one could form counterexamples involving proper subgroups of ${G}$ of positive measure. In the blog post, I indicated how this inequality (together with a more “robust” strengthening of it) could be deduced from submodularity inequalities such as

$\displaystyle \mu( (A_1 \cup A_2) + B) + \mu( (A_1 \cap A_2) + B)$

$\displaystyle \leq \mu(A_1+B) + \mu(A_2+B) \ \ \ \ \ (3)$

which in turn easily follows from the identity ${(A_1 \cup A_2) + B = (A_1+B) \cup (A_2+B)}$ and the inclusion ${(A_1 \cap A_2) + B \subset (A_1 +B) \cap (A_2+B)}$, combined with the inclusion-exclusion formula.

In the non-trivial regime (2), equality can be attained in (1), for instance by taking ${G}$ to be the unit circle ${G = {\bf R}/{\bf Z}}$ and ${A,B}$ to be arcs in that circle (obeying (2)). A bit more generally, if ${G}$ is an arbitrary connected compact abelian group and ${\xi: G \rightarrow {\bf R}/{\bf Z}}$ is a non-trivial character (i.e., a continuous homomorphism), then ${\xi}$ must be surjective (as ${{\bf R}/{\bf Z}}$ has no non-trivial connected subgroups), and one can take ${A = \xi^{-1}(I)}$ and ${B = \xi^{-1}(J)}$ for some arcs ${I,J}$ in that circle (again choosing the measures of these arcs to obey (2)). The main result of this paper is an inverse theorem that asserts that this is the only way in which equality can occur in (1) (assuming (2)); furthermore, if (1) is close to being satisfied with equality and (2) holds, then ${A,B}$ must be close (in measure) to an example of the above form ${A = \xi^{-1}(I), B = \xi^{-1}(J)}$. Actually, for technical reasons (and for the applications we have in mind), it is important to establish an inverse theorem not just for (1), but for the more robust version mentioned earlier (in which the sumset ${A+B}$ is replaced by the partial sumset ${A +_\varepsilon B}$ consisting of “popular” sums).

Roughly speaking, the idea is as follows. Let us informally call ${(A,B)}$ a critical pair if (2) holds and the inequality (1) (or more precisely, a robust version of this inequality) is almost obeyed with equality. The notion of a critical pair obeys some useful closure properties. Firstly, it is symmetric in ${A,B}$, and invariant with respect to translation of either ${A}$ or ${B}$. Furthermore, from the submodularity inequality (3), one can show that if ${(A_1,B)}$ and ${(A_2,B)}$ are critical pairs (with ${\mu(A_1 \cap A_2)}$ and ${1 - \mu(A_1 \cup A_2) - \mu(B)}$ positive), then ${(A_1 \cap A_2,B)}$ and ${(A_1 \cup A_2, B)}$ are also critical pairs. (Note that this is consistent with the claim that critical pairs only occur when ${A,B}$ come from arcs of a circle.) Similarly, from associativity ${(A+B)+C = A+(B+C)}$, one can show that if ${(A,B)}$ and ${(A+B,C)}$ are critical pairs, then so are ${(B,C)}$ and ${(A,B+C)}$.

One can combine these closure properties to obtain further ones. For instance, suppose ${A,B}$ is such that ${\mu(A+B) 0}$. Then (cheating a little bit), one can show that ${(A+B,C)}$ is also a critical pair, basically because ${A+B}$ is the union of the ${A+b}$, ${b \in B}$, the ${(A+b,C)}$ are all critical pairs, and the ${A+b}$ all intersect each other. This argument doesn’t quite work as stated because one has to apply the closure property under union an uncountable number of times, but it turns out that if one works with the robust version of sumsets and uses a random sampling argument to approximate ${A+B}$ by the union of finitely many of the ${A+b}$, then the argument can be made to work.

Using all of these closure properties, it turns out that one can start with an arbitrary critical pair ${(A,B)}$ and end up with a small set ${C}$ such that ${(A,C)}$ and ${(kC,C)}$ are also critical pairs for all ${1 \leq k \leq 10^4}$ (say), where ${kC}$ is the ${k}$-fold sumset of ${C}$. (Intuitively, if ${A,B}$ are thought of as secretly coming from the pullback of arcs ${I,J}$ by some character ${\xi}$, then ${C}$ should be the pullback of a much shorter arc by the same character.) In particular, ${C}$ exhibits linear growth, in that ${\mu(kC) = k\mu(C)}$ for all ${1 \leq k \leq 10^4}$. One can now use standard technology from inverse sumset theory to show first that ${C}$ has a very large Fourier coefficient (and thus is biased with respect to some character ${\xi}$), and secondly that ${C}$ is in fact almost of the form ${C = \xi^{-1}(K)}$ for some arc ${K}$, from which it is not difficult to conclude similar statements for ${A}$ and ${B}$ and thus finish the proof of the inverse theorem.

In order to make the above argument rigorous, one has to be more precise about what the modifier “almost” means in the definition of a critical pair. I chose to do this in the language of “cheap” nonstandard analysis (aka asymptotic analysis), as discussed in this previous blog post; one could also have used the full-strength version of nonstandard analysis, but this does not seem to convey any substantial advantages. (One can also work in a more traditional “non-asymptotic” framework, but this requires one to keep much more careful account of various small error terms and leads to a messier argument.)

[Update, Nov 15: Corrected the attribution of the inequality (1) to Kneser instead of Kemperman.  Thanks to John Griesmer for pointing out the error.]

A basic object of study in multiplicative number theory are the arithmetic functions: functions ${f: {\bf N} \rightarrow {\bf C}}$ from the natural numbers to the complex numbers. Some fundamental examples of such functions include

• The constant function ${1: n \mapsto 1}$;
• The Kronecker delta function ${\delta: n \mapsto 1_{n=1}}$;
• The natural logarithm function ${L: n \mapsto \log n}$;
• The divisor function ${d_2: n \mapsto \sum_{d|n} 1}$;
• The von Mangoldt function ${\Lambda}$, with ${\Lambda(n)}$ defined to equal ${\log p}$ when ${n}$ is a power ${p^j}$ of a prime ${p}$ for some ${j \geq 1}$, and defined to equal zero otherwise; and
• The Möbius function ${\mu}$, with ${\mu(n)}$ defined to equal ${(-1)^k}$ when ${n}$ is the product of ${k}$ distinct primes, and defined to equal zero otherwise.

Given an arithmetic function ${f}$, we are often interested in statistics such as the summatory function

$\displaystyle \sum_{n \leq x} f(n), \ \ \ \ \ (1)$

the logarithmically (or harmonically) weighted summatory function

$\displaystyle \sum_{n \leq x} \frac{f(n)}{n}, \ \ \ \ \ (2)$

or the Dirichlet series

$\displaystyle {\mathcal D}[f](s) := \sum_n \frac{f(n)}{n^s}.$

In the latter case, one typically has to first restrict ${s}$ to those complex numbers whose real part is large enough in order to ensure the series on the right converges; but in many important cases, one can then extend the Dirichlet series to almost all of the complex plane by analytic continuation. One is also interested in correlations involving additive shifts, such as ${\sum_{n \leq x} f(n) f(n+h)}$, but these are significantly more difficult to study and cannot be easily estimated by the methods of classical multiplicative number theory.

A key operation on arithmetic functions is that of Dirichlet convolution, which when given two arithmetic functions ${f,g: {\bf N} \rightarrow {\bf C}}$, forms a new arithmetic function ${f*g: {\bf N} \rightarrow {\bf C}}$, defined by the formula

$\displaystyle f*g(n) := \sum_{d|n} f(d) g(\frac{n}{d}).$

Thus for instance ${1*1 = d_2}$, ${1 * \Lambda = L}$, ${1 * \mu = \delta}$, and ${\delta * f = f}$ for any arithmetic function ${f}$. Dirichlet convolution and Dirichlet series are related by the fundamental formula

$\displaystyle {\mathcal D}[f * g](s) = {\mathcal D}[f](s) {\mathcal D}[g](s), \ \ \ \ \ (3)$

at least when the real part of ${s}$ is large enough that all sums involved become absolutely convergent (but in practice one can use analytic continuation to extend this identity to most of the complex plane). There is also the identity

$\displaystyle {\mathcal D}[Lf](s) = - \frac{d}{ds} {\mathcal D}[f](s), \ \ \ \ \ (4)$

at least when the real part of ${s}$ is large enough to justify interchange of differentiation and summation. As a consequence, many Dirichlet series can be expressed in terms of the Riemann zeta function ${\zeta = {\mathcal D}[1]}$, thus for instance

$\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s)$

$\displaystyle {\mathcal D}[L](s) = - \zeta'(s)$

$\displaystyle {\mathcal D}[\delta](s) = 1$

$\displaystyle {\mathcal D}[\mu](s) = \frac{1}{\zeta(s)}$

$\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)}.$

Much of the difficulty of multiplicative number theory can be traced back to the discrete nature of the natural numbers ${{\bf N}}$, which form a rather complicated abelian semigroup with respect to multiplication (in particular the set of generators is the set of prime numbers). One can obtain a simpler analogue of the subject by working instead with the half-infinite interval ${{\bf N}_\infty := [1,+\infty)}$, which is a much simpler abelian semigroup under multiplication (being a one-dimensional Lie semigroup). (I will think of this as a sort of “completion” of ${{\bf N}}$ at the infinite place ${\infty}$, hence the terminology.) Accordingly, let us define a continuous arithmetic function to be a locally integrable function ${f: {\bf N}_\infty \rightarrow {\bf C}}$. The analogue of the summatory function (1) is then an integral

$\displaystyle \int_1^x f(t)\ dt,$

and similarly the analogue of (2) is

$\displaystyle \int_1^x \frac{f(t)}{t}\ dt.$

The analogue of the Dirichlet series is the Mellin-type transform

$\displaystyle {\mathcal D}_\infty[f](s) := \int_1^\infty \frac{f(t)}{t^s}\ dt,$

which will be well-defined at least if the real part of ${s}$ is large enough and if the continuous arithmetic function ${f: {\bf N}_\infty \rightarrow {\bf C}}$ does not grow too quickly, and hopefully will also be defined elsewhere in the complex plane by analytic continuation.

For instance, the continuous analogue of the discrete constant function ${1: {\bf N} \rightarrow {\bf C}}$ would be the constant function ${1_\infty: {\bf N}_\infty \rightarrow {\bf C}}$, which maps any ${t \in [1,+\infty)}$ to ${1}$, and which we will denote by ${1_\infty}$ in order to keep it distinct from ${1}$. The two functions ${1_\infty}$ and ${1}$ have approximately similar statistics; for instance one has

$\displaystyle \sum_{n \leq x} 1 = \lfloor x \rfloor \approx x-1 = \int_1^x 1\ dt$

and

$\displaystyle \sum_{n \leq x} \frac{1}{n} = H_{\lfloor x \rfloor} \approx \log x = \int_1^x \frac{1}{t}\ dt$

where ${H_n}$ is the ${n^{th}}$ harmonic number, and we are deliberately vague as to what the symbol ${\approx}$ means. Continuing this analogy, we would expect

$\displaystyle {\mathcal D}[1](s) = \zeta(s) \approx \frac{1}{s-1} = {\mathcal D}_\infty[1_\infty](s)$

which reflects the fact that ${\zeta}$ has a simple pole at ${s=1}$ with residue ${1}$, and no other poles. Note that the identity ${{\mathcal D}_\infty[1_\infty](s) = \frac{1}{s-1}}$ is initially only valid in the region ${\mathrm{Re} s > 1}$, but clearly the right-hand side can be continued analytically to the entire complex plane except for the pole at ${1}$, and so one can define ${{\mathcal D}_\infty[1_\infty]}$ in this region also.

In a similar vein, the logarithm function ${L: {\bf N} \rightarrow {\bf C}}$ is approximately similar to the logarithm function ${L_\infty: {\bf N}_\infty \rightarrow {\bf C}}$, giving for instance the crude form

$\displaystyle \sum_{n \leq x} L(n) = \log \lfloor x \rfloor! \approx x \log x - x = \int_1^\infty L_\infty(t)\ dt$

of Stirling’s formula, or the Dirichlet series approximation

$\displaystyle {\mathcal D}[L](s) = -\zeta'(s) \approx \frac{1}{(s-1)^2} = {\mathcal D}_\infty[L_\infty](s).$

The continuous analogue of Dirichlet convolution is multiplicative convolution using the multiplicative Haar measure ${\frac{dt}{t}}$: given two continuous arithmetic functions ${f_\infty, g_\infty: {\bf N}_\infty \rightarrow {\bf C}}$, one can define their convolution ${f_\infty *_\infty g_\infty: {\bf N}_\infty \rightarrow {\bf C}}$ by the formula

$\displaystyle f_\infty *_\infty g_\infty(t) := \int_1^t f_\infty(s) g_\infty(\frac{t}{s}) \frac{ds}{s}.$

Thus for instance ${1_\infty * 1_\infty = L_\infty}$. A short computation using Fubini’s theorem shows the analogue

$\displaystyle D_\infty[f_\infty *_\infty g_\infty](s) = D_\infty[f_\infty](s) D_\infty[g_\infty](s)$

of (3) whenever the real part of ${s}$ is large enough that Fubini’s theorem can be justified; similarly, differentiation under the integral sign shows that

$\displaystyle D_\infty[L_\infty f_\infty](s) = -\frac{d}{ds} D_\infty[f_\infty](s) \ \ \ \ \ (5)$

again assuming that the real part of ${s}$ is large enough that differentiation under the integral sign (or some other tool like this, such as the Cauchy integral formula for derivatives) can be justified.

Direct calculation shows that for any complex number ${\rho}$, one has

$\displaystyle \frac{1}{s-\rho} = D_\infty[ t \mapsto t^{\rho-1} ](s)$

(at least for the real part of ${s}$ large enough), and hence by several applications of (5)

$\displaystyle \frac{1}{(s-\rho)^k} = D_\infty[ t \mapsto \frac{1}{(k-1)!} t^{\rho-1} \log^{k-1} t ](s)$

for any natural number ${k}$. This can lead to the following heuristic: if a Dirichlet series ${D[f](s)}$ behaves like a linear combination of poles ${\frac{1}{(s-\rho)^k}}$, in that

$\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{(s-\rho)^{k_\rho}}$

for some set ${\rho}$ of poles and some coefficients ${c_\rho}$ and natural numbers ${k_\rho}$ (where we again are vague as to what ${\approx}$ means, and how to interpret the sum ${\sum_\rho}$ if the set of poles is infinite), then one should expect the arithmetic function ${f}$ to behave like the continuous arithmetic function

$\displaystyle t \mapsto \sum_\rho \frac{c_\rho}{(k_\rho-1)!} t^{\rho-1} \log^{k_\rho-1} t.$

In particular, if we only have simple poles,

$\displaystyle D[f](s) \approx \sum_\rho \frac{c_\rho}{s-\rho}$

then we expect to have ${f}$ behave like continuous arithmetic function

$\displaystyle t \mapsto \sum_\rho c_\rho t^{\rho-1}.$

Integrating this from ${1}$ to ${x}$, this heuristically suggests an approximation

$\displaystyle \sum_{n \leq x} f(n) \approx \sum_\rho c_\rho \frac{x^\rho-1}{\rho}$

for the summatory function, and similarly

$\displaystyle \sum_{n \leq x} \frac{f(n)}{n} \approx \sum_\rho c_\rho \frac{x^{\rho-1}-1}{\rho-1},$

with the convention that ${\frac{x^\rho-1}{\rho}}$ is ${\log x}$ when ${\rho=0}$, and similarly ${\frac{x^{\rho-1}-1}{\rho-1}}$ is ${\log x}$ when ${\rho=1}$. One can make these sorts of approximations more rigorous by means of Perron’s formula (or one of its variants) combined with the residue theorem, provided that one has good enough control on the relevant Dirichlet series, but we will not pursue these rigorous calculations here. (But see for instance this previous blog post for some examples.)

For instance, using the more refined approximation

$\displaystyle \zeta(s) \approx \frac{1}{s-1} + \gamma$

to the zeta function near ${s=1}$, we have

$\displaystyle {\mathcal D}[d_2](s) = \zeta^2(s) \approx \frac{1}{(s-1)^2} + \frac{2 \gamma}{s-1}$

we would expect that

$\displaystyle d_2 \approx L_\infty + 2 \gamma$

and thus for instance

$\displaystyle \sum_{n \leq x} d_2(n) \approx x \log x - x + 2 \gamma x$

which matches what one actually gets from the Dirichlet hyperbola method (see e.g. equation (44) of this previous post).

Or, noting that ${\zeta(s)}$ has a simple pole at ${s=1}$ and assuming simple zeroes elsewhere, the log derivative ${-\zeta'(s)/\zeta(s)}$ will have simple poles of residue ${+1}$ at ${s=1}$ and ${-1}$ at all the zeroes, leading to the heuristic

$\displaystyle {\mathcal D}[\Lambda](s) = -\frac{\zeta'(s)}{\zeta(s)} \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho}$

suggesting that ${\Lambda}$ should behave like the continuous arithmetic function

$\displaystyle t \mapsto 1 - \sum_\rho t^{\rho-1}$

leading for instance to the summatory approximation

$\displaystyle \sum_{n \leq x} \Lambda(n) \approx x - \sum_\rho \frac{x^\rho-1}{\rho}$

which is a heuristic form of the Riemann-von Mangoldt explicit formula (see Exercise 45 of these notes for a rigorous version of this formula).

Exercise 1 Go through some of the other explicit formulae listed at this Wikipedia page and give heuristic justifications for them (up to some lower order terms) by similar calculations to those given above.

Given the “adelic” perspective on number theory, I wonder if there are also ${p}$-adic analogues of arithmetic functions to which a similar set of heuristics can be applied, perhaps to study sums such as ${\sum_{n \leq x: n = a \hbox{ mod } p^j} f(n)}$. A key problem here is that there does not seem to be any good interpretation of the expression ${\frac{1}{t^s}}$ when ${s}$ is complex and ${t}$ is a ${p}$-adic number, so it is not clear that one can analyse a Dirichlet series ${p}$-adically. For similar reasons, we don’t have a canonical way to define ${\chi(t)}$ for a Dirichlet character ${\chi}$ (unless its conductor happens to be a power of ${p}$), so there doesn’t seem to be much to say in the ${q}$-aspect either.