You are currently browsing the tag archive for the ‘Riemann zeta function’ tag.

Previous set of notes: Notes 3. Next set of notes: 246C Notes 1.

One of the great classical triumphs of complex analysis was in providing the first complete proof (by Hadamard and de la Vallée Poussin in 1896) of arguably the most important theorem in analytic number theory, the prime number theorem:

Theorem 1 (Prime number theorem) Let ${\pi(x)}$ denote the number of primes less than a given real number ${x}$. Then

$\displaystyle \lim_{x \rightarrow \infty} \frac{\pi(x)}{x/\ln x} = 1$

(or in asymptotic notation, ${\pi(x) = (1+o(1)) \frac{x}{\ln x}}$ as ${x \rightarrow \infty}$).

(Actually, it turns out to be slightly more natural to replace the approximation ${\frac{x}{\ln x}}$ in the prime number theorem by the logarithmic integral ${\int_2^x \frac{dt}{\ln t}}$, which happens to be a more precise approximation, but we will not stress this point here.)

The complex-analytic proof of this theorem hinges on the study of a key meromorphic function related to the prime numbers, the Riemann zeta function ${\zeta}$. Initially, it is only defined on the half-plane ${\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}$:

Definition 2 (Riemann zeta function, preliminary definition) Let ${s \in {\bf C}}$ be such that ${\mathrm{Re} s > 1}$. Then we define

$\displaystyle \zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}. \ \ \ \ \ (1)$

Note that the series is locally uniformly convergent in the half-plane ${\{ s \in {\bf C}: \mathrm{Re} s > 1 \}}$, so in particular ${\zeta}$ is holomorphic on this region. In previous notes we have already evaluated some special values of this function:

$\displaystyle \zeta(2) = \frac{\pi^2}{6}; \quad \zeta(4) = \frac{\pi^4}{90}; \quad \zeta(6) = \frac{\pi^6}{945}. \ \ \ \ \ (2)$

However, it turns out that the zeroes (and pole) of this function are of far greater importance to analytic number theory, particularly with regards to the study of the prime numbers.

The Riemann zeta function has several remarkable properties, some of which we summarise here:

Theorem 3 (Basic properties of the Riemann zeta function)

Proof: We just prove (i) and (ii) for now, leaving (iii) and (iv) for later sections.

The claim (i) is an encoding of the fundamental theorem of arithmetic, which asserts that every natural number ${n}$ is uniquely representable as a product ${n = \prod_p p^{a_p}}$ over primes, where the ${a_p}$ are natural numbers, all but finitely many of which are zero. Writing this representation as ${\frac{1}{n^s} = \prod_p \frac{1}{p^{a_p s}}}$, we see that

$\displaystyle \sum_{n \in S_{x,m}} \frac{1}{n^s} = \prod_{p \leq x} \sum_{a=0}^m \frac{1}{p^{as}}$

whenever ${x \geq 1}$, ${m \geq 0}$, and ${S_{x,m}}$ consists of all the natural numbers of the form ${n = \prod_{p \leq x} p^{a_p}}$ for some ${a_p \leq m}$. Sending ${m}$ and ${x}$ to infinity, we conclude from monotone convergence and the geometric series formula that

$\displaystyle \sum_{n=1}^\infty \frac{1}{n^s} = \prod_{p} \sum_{a=0}^\infty \frac{1}{p^{s}} =\prod_p (1 - \frac{1}{p^s})^{-1}$

whenever ${s>1}$ is real, and then from dominated convergence we see that the same formula holds for complex ${s}$ with ${\mathrm{Re} s > 1}$ as well. Local uniform convergence then follows from the product form of the Weierstrass ${M}$-test (Exercise 19 of Notes 1).

The claim (ii) is immediate from (i) since the Euler product ${\prod_p (1-\frac{1}{p^s})^{-1}}$ is absolutely convergent and all terms are non-zero. $\Box$

We remark that by sending ${s}$ to ${1}$ in Theorem 3(i) we conclude that

$\displaystyle \sum_{n=1}^\infty \frac{1}{n} = \prod_p (1-\frac{1}{p})^{-1}$

and from the divergence of the harmonic series we then conclude Euler’s theorem ${\sum_p \frac{1}{p} = \infty}$. This can be viewed as a weak version of the prime number theorem, and already illustrates the potential applicability of the Riemann zeta function to control the distribution of the prime numbers.

The meromorphic continuation (iii) of the zeta function is initially surprising, but can be interpreted either as a manifestation of the extremely regular spacing of the natural numbers ${n}$ occurring in the sum (1), or as a consequence of various integral representations of ${\zeta}$ (or slight modifications thereof). We will focus in this set of notes on a particular representation of ${\zeta}$ as essentially the Mellin transform of the theta function ${\theta}$ that briefly appeared in previous notes, and the functional equation (iv) can then be viewed as a consequence of the modularity of that theta function. This in turn was established using the Poisson summation formula, so one can view the functional equation as ultimately being a manifestation of Poisson summation. (For a direct proof of the functional equation via Poisson summation, see these notes.)

Henceforth we work with the meromorphic continuation of ${\zeta}$. The functional equation (iv), when combined with special values of ${\zeta}$ such as (2), gives some additional values of ${\zeta}$ outside of its initial domain ${\{s: \mathrm{Re} s > 1\}}$, most famously

$\displaystyle \zeta(-1) = -\frac{1}{12}.$

If one formally compares this formula with (1), one arrives at the infamous identity

$\displaystyle 1 + 2 + 3 + \dots = -\frac{1}{12}$

although this identity has to be interpreted in a suitable non-classical sense in order for it to be rigorous (see this previous blog post for further discussion).

From Theorem 3 and the non-vanishing nature of ${\Gamma}$, we see that ${\zeta}$ has simple zeroes (known as trivial zeroes) at the negative even integers ${-2, -4, \dots}$, and all other zeroes (the non-trivial zeroes) inside the critical strip ${\{ s \in {\bf C}: 0 \leq \mathrm{Re} s \leq 1 \}}$. (The non-trivial zeroes are conjectured to all be simple, but this is hopelessly far from being proven at present.) As we shall see shortly, these latter zeroes turn out to be closely related to the distribution of the primes. The functional equation tells us that if ${\rho}$ is a non-trivial zero then so is ${1-\rho}$; also, we have the identity

$\displaystyle \zeta(s) = \overline{\zeta(\overline{s})} \ \ \ \ \ (7)$

for all ${s>1}$ by (1), hence for all ${s}$ (except the pole at ${s=1}$) by meromorphic continuation. Thus if ${\rho}$ is a non-trivial zero then so is ${\overline{\rho}}$. We conclude that the set of non-trivial zeroes is symmetric by reflection by both the real axis and the critical line ${\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}$. We have the following infamous conjecture:

Conjecture 4 (Riemann hypothesis) All the non-trivial zeroes of ${\zeta}$ lie on the critical line ${\{ s \in {\bf C}: \mathrm{Re} s = \frac{1}{2} \}}$.

This conjecture would have many implications in analytic number theory, particularly with regard to the distribution of the primes. Of course, it is far from proven at present, but the partial results we have towards this conjecture are still sufficient to establish results such as the prime number theorem.

Return now to the original region where ${\mathrm{Re} s > 1}$. To take more advantage of the Euler product formula (3), we take complex logarithms to conclude that

$\displaystyle -\log \zeta(s) = \sum_p \log(1 - \frac{1}{p^s})$

for suitable branches of the complex logarithm, and then on taking derivatives (using for instance the generalised Cauchy integral formula and Fubini’s theorem to justify the interchange of summation and derivative) we see that

$\displaystyle -\frac{\zeta'(s)}{\zeta(s)} = \sum_p \frac{\ln p/p^s}{1 - \frac{1}{p^s}}.$

From the geometric series formula we have

$\displaystyle \frac{\ln p/p^s}{1 - \frac{1}{p^s}} = \sum_{j=1}^\infty \frac{\ln p}{p^{js}}$

and so (by another application of Fubini’s theorem) we have the identity

$\displaystyle -\frac{\zeta'(s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}, \ \ \ \ \ (8)$

for ${\mathrm{Re} s > 1}$, where the von Mangoldt function ${\Lambda(n)}$ is defined to equal ${\Lambda(n) = \ln p}$ whenever ${n = p^j}$ is a power ${p^j}$ of a prime ${p}$ for some ${j=1,2,\dots}$, and ${\Lambda(n)=0}$ otherwise. The contribution of the higher prime powers ${p^2, p^3, \dots}$ is negligible in practice, and as a first approximation one can think of the von Mangoldt function as the indicator function of the primes, weighted by the logarithm function.

The series ${\sum_{n=1}^\infty \frac{1}{n^s}}$ and ${\sum_{n=1}^\infty \frac{\Lambda(n)}{n^s}}$ that show up in the above formulae are examples of Dirichlet series, which are a convenient device to transform various sequences of arithmetic interest into holomorphic or meromorphic functions. Here are some more examples:

Exercise 5 (Standard Dirichlet series) Let ${s}$ be a complex number with ${\mathrm{Re} s > 1}$.
• (i) Show that ${-\zeta'(s) = \sum_{n=1}^\infty \frac{\ln n}{n^s}}$.
• (ii) Show that ${\zeta^2(s) = \sum_{n=1}^\infty \frac{\tau(n)}{n^s}}$, where ${\tau(n) := \sum_{d|n} 1}$ is the divisor function of ${n}$ (the number of divisors of ${n}$).
• (iii) Show that ${\frac{1}{\zeta(s)} = \sum_{n=1}^\infty \frac{\mu(n)}{n^s}}$, where ${\mu(n)}$ is the Möbius function, defined to equal ${(-1)^k}$ when ${n}$ is the product of ${k}$ distinct primes for some ${k \geq 0}$, and ${0}$ otherwise.
• (iv) Show that ${\frac{\zeta(2s)}{\zeta(s)} = \sum_{n=1}^\infty \frac{\lambda(n)}{n^s}}$, where ${\lambda(n)}$ is the Liouville function, defined to equal ${(-1)^k}$ when ${n}$ is the product of ${k}$ (not necessarily distinct) primes for some ${k \geq 0}$.
• (v) Show that ${\log \zeta(s) = \sum_{n=1}^\infty \frac{\Lambda(n)/\ln n}{n^s}}$, where ${\log \zeta}$ is the holomorphic branch of the logarithm that is real for ${s>1}$, and with the convention that ${\Lambda(n)/\ln n}$ vanishes for ${n=1}$.
• (vi) Use the fundamental theorem of arithmetic to show that the von Mangoldt function is the unique function ${\Lambda: {\bf N} \rightarrow {\bf R}}$ such that

$\displaystyle \ln n = \sum_{d|n} \Lambda(d)$

for every positive integer ${n}$. Use this and (i) to provide an alternate proof of the identity (8). Thus we see that (8) is really just another encoding of the fundamental theorem of arithmetic.

Given the appearance of the von Mangoldt function ${\Lambda}$, it is natural to reformulate the prime number theorem in terms of this function:

Theorem 6 (Prime number theorem, von Mangoldt form) One has

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} \Lambda(n) = 1$

(or in asymptotic notation, ${\sum_{n\leq x} \Lambda(n) = x + o(x)}$ as ${x \rightarrow \infty}$).

Let us see how Theorem 6 implies Theorem 1. Firstly, for any ${x \geq 2}$, we can write

$\displaystyle \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + \sum_{j=2}^\infty \sum_{p \leq x^{1/j}} \ln p.$

The sum ${\sum_{p \leq x^{1/j}} \ln p}$ is non-zero for only ${O(\ln x)}$ values of ${j}$, and is of size ${O( x^{1/2} \ln x )}$, thus

$\displaystyle \sum_{n \leq x} \Lambda(n) = \sum_{p \leq x} \ln p + O( x^{1/2} \ln^2 x ).$

Since ${x^{1/2} \ln^2 x = o(x)}$, we conclude from Theorem 6 that

$\displaystyle \sum_{p \leq x} \ln p = x + o(x)$

as ${x \rightarrow \infty}$. Next, observe from the fundamental theorem of calculus that

$\displaystyle \frac{1}{\ln p} - \frac{1}{\ln x} = \int_p^x \frac{1}{\ln^2 y} \frac{dy}{y}.$

Multiplying by ${\log p}$ and summing over all primes ${p \leq x}$, we conclude that

$\displaystyle \pi(x) - \frac{\sum_{p \leq x} \ln p}{\ln x} = \int_2^x \sum_{p \leq y} \ln p \frac{1}{\ln^2 y} \frac{dy}{y}.$

From Theorem 6 we certainly have ${\sum_{p \leq y} \ln p = O(y)}$, thus

$\displaystyle \pi(x) - \frac{x + o(x)}{\ln x} = O( \int_2^x \frac{dy}{\ln^2 y} ).$

By splitting the integral into the ranges ${2 \leq y \leq \sqrt{x}}$ and ${\sqrt{x} < y \leq x}$ we see that the right-hand side is ${o(x/\ln x)}$, and Theorem 1 follows.

Exercise 7 Show that Theorem 1 conversely implies Theorem 6.

The alternate form (8) of the Euler product identity connects the primes (represented here via proxy by the von Mangoldt function) with the logarithmic derivative of the zeta function, and can be used as a starting point for describing further relationships between ${\zeta}$ and the primes. Most famously, we shall see later in these notes that it leads to the remarkably precise Riemann-von Mangoldt explicit formula:

Theorem 8 (Riemann-von Mangoldt explicit formula) For any non-integer ${x > 1}$, we have

$\displaystyle \sum_{n \leq x} \Lambda(n) = x - \lim_{T \rightarrow \infty} \sum_{\rho: |\hbox{Im}(\rho)| \leq T} \frac{x^\rho}{\rho} - \ln(2\pi) - \frac{1}{2} \ln( 1 - x^{-2} )$

where ${\rho}$ ranges over the non-trivial zeroes of ${\zeta}$ with imaginary part in ${[-T,T]}$. Furthermore, the convergence of the limit is locally uniform in ${x}$.

Actually, it turns out that this formula is in some sense too precise; in applications it is often more convenient to work with smoothed variants of this formula in which the sum on the left-hand side is smoothed out, but the contribution of zeroes with large imaginary part is damped; see Exercise 22. Nevertheless, this formula clearly illustrates how the non-trivial zeroes ${\rho}$ of the zeta function influence the primes. Indeed, if one formally differentiates the above formula in ${x}$, one is led to the (quite nonrigorous) approximation

$\displaystyle \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (9)$

or (writing ${\rho = \sigma+i\gamma}$)

$\displaystyle \Lambda(n) \approx 1 - \sum_{\sigma+i\gamma} \frac{n^{i\gamma}}{n^{1-\sigma}}.$

Thus we see that each zero ${\rho = \sigma + i\gamma}$ induces an oscillation in the von Mangoldt function, with ${\gamma}$ controlling the frequency of the oscillation and ${\sigma}$ the rate to which the oscillation dies out as ${n \rightarrow \infty}$. This relationship is sometimes known informally as “the music of the primes”.

Comparing Theorem 8 with Theorem 6, it is natural to suspect that the key step in the proof of the latter is to establish the following slight but important extension of Theorem 3(ii), which can be viewed as a very small step towards the Riemann hypothesis:

Theorem 9 (Slight enlargement of zero-free region) There are no zeroes of ${\zeta}$ on the line ${\{ 1+it: t \in {\bf R} \}}$.

It is not quite immediate to see how Theorem 6 follows from Theorem 8 and Theorem 9, but we will demonstrate it below the fold.

Although Theorem 9 only seems like a slight improvement of Theorem 3(ii), proving it is surprisingly non-trivial. The basic idea is the following: if there was a zero at ${1+it}$, then there would also be a different zero at ${1-it}$ (note ${t}$ cannot vanish due to the pole at ${s=1}$), and then the approximation (9) becomes

$\displaystyle \Lambda(n) \approx 1 - n^{it} - n^{-it} + \dots = 1 - 2 \cos(t \ln n) + \dots.$

But the expression ${1 - 2 \cos(t \ln n)}$ can be negative for large regions of the variable ${n}$, whereas ${\Lambda(n)}$ is always non-negative. This conflict eventually leads to a contradiction, but it is not immediately obvious how to make this argument rigorous. We will present here the classical approach to doing so using a trigonometric identity of Mertens.

In fact, Theorem 9 is basically equivalent to the prime number theorem:

Exercise 10 For the purposes of this exercise, assume Theorem 6, but do not assume Theorem 9. For any non-zero real ${t}$, show that

$\displaystyle -\frac{\zeta'(\sigma+it)}{\zeta(\sigma+it)} = o( \frac{1}{\sigma-1})$

as ${\sigma \rightarrow 1^+}$, where ${o( \frac{1}{\sigma-1})}$ denotes a quantity that goes to zero as ${\sigma \rightarrow 1^+}$ after being multiplied by ${\sigma-1}$. Use this to derive Theorem 9.

This equivalence can help explain why the prime number theorem is remarkably non-trivial to prove, and why the Riemann zeta function has to be either explicitly or implicitly involved in the proof.

This post is only intended as the briefest of introduction to complex-analytic methods in analytic number theory; also, we have not chosen the shortest route to the prime number theorem, electing instead to travel in directions that particularly showcase the complex-analytic results introduced in this course. For some further discussion see this previous set of lecture notes, particularly Notes 2 and Supplement 3 (with much of the material in this post drawn from the latter).

In a recent post I discussed how the Riemann zeta function ${\zeta}$ can be locally approximated by a polynomial, in the sense that for randomly chosen ${t \in [T,2T]}$ one has an approximation

$\displaystyle \zeta(\frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (1)$

where ${N}$ grows slowly with ${T}$, and ${P_t}$ is a polynomial of degree ${N}$. Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of ${P_t}$ should all lie on the unit circle, and one should then be able to write ${P_t}$ as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix ${U = U_t \in U(N)}$, which we normalise as

$\displaystyle P_t(Z) = \exp(A_t) \mathrm{det}(1 - ZU). \ \ \ \ \ (2)$

Here ${A_t}$ is some quantity depending on ${t}$. We view ${U}$ as a random element of ${U(N)}$; in the limit ${T \rightarrow \infty}$, the GUE hypothesis is equivalent to ${U}$ becoming equidistributed with respect to Haar measure on ${U(N)}$ (also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view ${U}$ as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).

Taking logarithmic derivatives of (2), we have

$\displaystyle -\frac{P'_t(Z)}{P_t(Z)} = \mathrm{tr}( U (1-ZU)^{-1} ) = \sum_{j=1}^\infty Z^{j-1} \mathrm{tr} U^j \ \ \ \ \ (3)$

and hence on taking logarithmic derivatives of (1) in the ${z}$ variable we (heuristically) have

$\displaystyle -\frac{2\pi i}{\log T} \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) \approx \frac{2\pi i}{N} \sum_{j=1}^\infty e^{2\pi i jz/N} \mathrm{tr} U^j.$

Morally speaking, we have

$\displaystyle - \frac{\zeta'}{\zeta}( \frac{1}{2} + it - \frac{2\pi i z}{\log T}) = \sum_{n=1}^\infty \frac{\Lambda(n)}{n^{1/2+it}} e^{2\pi i z (\log n/\log T)}$

so on comparing coefficients we expect to interpret the moments ${\mathrm{tr} U^j}$ of ${U}$ as a finite Dirichlet series:

$\displaystyle \mathrm{tr} U^j \approx \frac{N}{\log T} \sum_{T^{(j-1)/N} < n \leq T^{j/N}} \frac{\Lambda(n)}{n^{1/2+it}}. \ \ \ \ \ (4)$

To understand the distribution of ${U}$ in the unitary group ${U(N)}$, it suffices to understand the distribution of the moments

$\displaystyle {\bf E}_t \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (5)$

where ${{\bf E}_t}$ denotes averaging over ${t \in [T,2T]}$, and ${k, a_1,\dots,a_k, b_1,\dots,b_k \geq 0}$. The GUE hypothesis asserts that in the limit ${T \rightarrow \infty}$, these moments converge to their CUE counterparts

$\displaystyle {\bf E}_{\mathrm{CUE}} \prod_{j=1}^k (\mathrm{tr} U^j)^{a_j} (\overline{\mathrm{tr} U^j})^{b_j} \ \ \ \ \ (6)$

where ${U}$ is now drawn uniformly in ${U(n)}$ with respect to the CUE ensemble, and ${{\bf E}_{\mathrm{CUE}}}$ denotes expectation with respect to that measure.

The moment (6) vanishes unless one has the homogeneity condition

$\displaystyle \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j. \ \ \ \ \ (7)$

This follows from the fact that for any phase ${\theta \in {\bf R}}$, ${e(\theta) U}$ has the same distribution as ${U}$, where we use the number theory notation ${e(\theta) := e^{2\pi i\theta}}$.

In the case when the degree ${\sum_{j=1}^k j a_j}$ is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:

Proposition 1 (Low moments in CUE model) If

$\displaystyle \sum_{j=1}^k j a_j \leq N, \ \ \ \ \ (8)$

then the moment (6) vanishes unless ${a_j=b_j}$ for all ${j}$, in which case it is equal to

$\displaystyle \prod_{j=1}^k j^{a_j} a_j!. \ \ \ \ \ (9)$

Another way of viewing this proposition is that for ${U}$ distributed according to CUE, the random variables ${\mathrm{tr} U^j}$ are distributed like independent complex random variables of mean zero and variance ${j}$, as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of ${a_j}$, so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of ${j}$‘s and lets ${N}$ go to infinity. (The paper of Diaconis and Shahshahani writes ${\sum_{j=1}^k a_j + b_j}$ in place of ${\sum_{j=1}^k j a_j}$, but I believe this to be a typo.)

Proof: Let ${D}$ be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence

$\displaystyle D = \sum_{j=1}^k j a_j = \sum_{j=1}^k j b_j.$

Our starting point is Schur-Weyl duality. Namely, we consider the ${n^D}$-dimensional complex vector space

$\displaystyle ({\bf C}^n)^{\otimes D} = {\bf C}^n \otimes \dots \otimes {\bf C}^n.$

This space has an action of the product group ${S_D \times GL_n({\bf C})}$: the symmetric group ${S_D}$ acts by permutation on the ${D}$ tensor factors, while the general linear group ${GL_n({\bf C})}$ acts diagonally on the ${{\bf C}^n}$ factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition

$\displaystyle ({\bf C}^n)^{\otimes D} \equiv \bigoplus_\lambda V^\lambda_{S_D} \otimes V^\lambda_{GL_n({\bf C})} \ \ \ \ \ (10)$

where ${\lambda}$ ranges over Young tableaux of size ${D}$ with at most ${n}$ rows, ${V^\lambda_{S_D}}$ is the ${S_D}$-irreducible unitary representation corresponding to ${\lambda}$ (which can be constructed for instance using Specht modules), and ${V^\lambda_{GL_n({\bf C})}}$ is the ${GL_n({\bf C})}$-irreducible polynomial representation corresponding with highest weight ${\lambda}$.

Let ${\pi \in S_D}$ be a permutation consisting of ${a_j}$ cycles of length ${j}$ (this is uniquely determined up to conjugation), and let ${g \in GL_n({\bf C})}$. The pair ${(\pi,g)}$ then acts on ${({\bf C}^n)^{\otimes D}}$, with the action on basis elements ${e_{i_1} \otimes \dots \otimes e_{i_D}}$ given by

$\displaystyle g e_{\pi(i_1)} \otimes \dots \otimes g_{\pi(i_D)}.$

The trace of this action can then be computed as

$\displaystyle \sum_{i_1,\dots,i_D \in \{1,\dots,n\}} g_{\pi(i_1),i_1} \dots g_{\pi(i_D),i_D}$

where ${g_{i,j}}$ is the ${ij}$ matrix coefficient of ${g}$. Breaking up into cycles and summing, this is just

$\displaystyle \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j}.$

But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity

$\displaystyle \prod_{j=1}^k \mathrm{tr}(g^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(g) \ \ \ \ \ (11)$

where ${\chi_\lambda: S_D \rightarrow {\bf C}}$ is the character on ${S_D}$ associated to ${V^\lambda_{S_D}}$, and ${s_\lambda: GL_n({\bf C}) \rightarrow {\bf C}}$ is the character on ${GL_n({\bf C})}$ associated to ${V^\lambda_{GL_n({\bf C})}}$. As is well known, ${s_\lambda(g)}$ is just the Schur polynomial of weight ${\lambda}$ applied to the (algebraic, generalised) eigenvalues of ${g}$. We can specialise to unitary matrices to conclude that

$\displaystyle \prod_{j=1}^k \mathrm{tr}(U^j)^{a_j} = \sum_\lambda \chi_\lambda(\pi) s_\lambda(U)$

and similarly

$\displaystyle \prod_{j=1}^k \mathrm{tr}(U^j)^{b_j} = \sum_\lambda \chi_\lambda(\pi') s_\lambda(U)$

where ${\pi' \in S_D}$ consists of ${b_j}$ cycles of length ${j}$ for each ${j=1,\dots,k}$. On the other hand, the characters ${s_\lambda}$ are an orthonormal system on ${L^2(U(N))}$ with the CUE measure. Thus we can write the expectation (6) as

$\displaystyle \sum_\lambda \chi_\lambda(\pi) \overline{\chi_\lambda(\pi')}. \ \ \ \ \ (12)$

Now recall that ${\lambda}$ ranges over all the Young tableaux of size ${D}$ with at most ${N}$ rows. But by (8) we have ${D \leq N}$, and so the condition of having ${N}$ rows is redundant. Hence ${\lambda}$ now ranges over all Young tableaux of size ${D}$, which as is well known enumerates all the irreducible representations of ${S_D}$. One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if ${\pi}$, ${\pi'}$ are not conjugate, and is equal to ${D!}$ divided by the size of the conjugacy class of ${\pi}$ (or equivalently, by the size of the centraliser of ${\pi}$) otherwise. But the latter expression is easily computed to be ${\prod_{j=1}^k j^{a_j} a_j!}$, giving the claim. $\Box$

Example 2 We illustrate the identity (11) when ${D=3}$, ${n \geq 3}$. The Schur polynomials are given as

$\displaystyle s_{3}(g) = \sum_i \lambda_i^3 + \sum_{i

$\displaystyle s_{2,1}(g) = \sum_{i < j} \lambda_i^2 \lambda_j + \sum_{i < j,k} \lambda_i \lambda_j \lambda_k$

$\displaystyle s_{1,1,1}(g) = \sum_{i

where ${\lambda_1,\dots,\lambda_n}$ are the (generalised) eigenvalues of ${g}$, and the formula (11) in this case becomes

$\displaystyle \mathrm{tr}(g^3) = s_{3}(g) - s_{2,1}(g) + s_{1,1,1}(g)$

$\displaystyle \mathrm{tr}(g^2) \mathrm{tr}(g) = s_{3}(g) - s_{1,1,1}(g)$

$\displaystyle \mathrm{tr}(g)^3 = s_{3}(g) + 2 s_{2,1}(g) + s_{1,1,1}(g).$

The functions ${s_{1,1,1}, s_{2,1}, s_3}$ are orthonormal on ${U(n)}$, so the three functions ${\mathrm{tr}(g^3), \mathrm{tr}(g^2) \mathrm{tr}(g), \mathrm{tr}(g)^3}$ are also, and their ${L^2}$ norms are ${\sqrt{3}}$, ${\sqrt{2}}$, and ${\sqrt{6}}$ respectively, reflecting the size in ${S_3}$ of the centralisers of the permutations ${(123)}$, ${(12)}$, and ${\mathrm{id}}$ respectively. If ${n}$ is instead set to say ${2}$, then the ${s_{1,1,1}}$ terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.

Example 3 Consider the moment ${{\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2}$. For ${j \leq N}$, the above proposition shows us that this moment is equal to ${D}$. What happens for ${j>N}$? The formula (12) computes this moment as

$\displaystyle \sum_\lambda |\chi_\lambda(\pi)|^2$

where ${\pi}$ is a cycle of length ${j}$ in ${S_j}$, and ${\lambda}$ ranges over all Young tableaux with size ${j}$ and at most ${N}$ rows. The Murnaghan-Nakayama rule tells us that ${\chi_\lambda(\pi)}$ vanishes unless ${\lambda}$ is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space ${{\bf C}^j_{\sum=0}}$ of vectors in ${{\bf C}^j}$ whose coordinates sum to zero), in which case it is equal to ${\pm 1}$ (depending on the parity of the number of non-zero rows). As such we see that this moment is equal to ${N}$. Thus in general we have

$\displaystyle {\bf E}_{\mathrm{CUE}} |\mathrm{tr} U^j|^2 = \min(j,N). \ \ \ \ \ (13)$

Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges ${T^{(j-1)/N} < n \leq T^{j/N}}$ and ${T^{(k-1)/N} < n \leq T^{k/N}}$ is not quite the range ${T^{(j+k-1)/N} < n \leq T^{j+k/N}}$ but instead leaks into the adjacent range ${T^{(j+k-2)/N} < n \leq T^{j+k-1/N}}$. This issue can be addressed by working in a “weak" sense in which parameters such as ${j,k}$ are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of ${t}$ to the range ${[T,2T]}$ (it would be slightly better technically to use a smooth cutoff).

One can morally expand out (5) using (4) as

$\displaystyle (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J,m_1,\dots,m_K} \frac{\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)}{n_1^{1/2} \dots n_J^{1/2} m_1^{1/2} \dots m_K^{1/2}} \times \ \ \ \ \ (14)$

$\displaystyle \times {\bf E}_t (m_1 \dots m_K / n_1 \dots n_J)^{it}$

where ${J := \sum_{j=1}^k a_j}$, ${K := \sum_{j=1}^k b_j}$, and the integers ${n_i,m_i}$ are in the ranges

$\displaystyle T^{(j-1)/N} < n_{a_1 + \dots + a_{j-1} + i} \leq T^{j/N}$

for ${j=1,\dots,k}$ and ${1 \leq i \leq a_j}$, and

$\displaystyle T^{(j-1)/N} < m_{b_1 + \dots + b_{j-1} + i} \leq T^{j/N}$

for ${j=1,\dots,k}$ and ${1 \leq i \leq b_j}$. Morally, the expectation here is negligible unless

$\displaystyle m_1 \dots m_K = (1 + O(1/T)) n_1 \dots n_J \ \ \ \ \ (15)$

in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then ${n_1 \dots n_J}$ is significantly less than ${T}$, so the ${O(1/T)}$ multiplicative error in (15) becomes an additive error of ${o(1)}$. On the other hand, because of the fundamental integrality gap – that the integers are always separated from each other by a distance of at least ${1}$ – this forces the integers ${m_1 \dots m_K}$, ${n_1 \dots n_J}$ to in fact be equal:

$\displaystyle m_1 \dots m_K = n_1 \dots n_J. \ \ \ \ \ (16)$

The von Mangoldt factors ${\Lambda(n_1) \dots \Lambda(n_J) \Lambda(m_1) \dots \Lambda(m_K)}$ effectively restrict ${n_1,\dots,n_J,m_1,\dots,m_K}$ to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces ${J=K}$, and ${n_1,\dots,n_J}$ to be a permutation of ${m_1,\dots,m_K}$, which then forces ${a_j = b_j}$ for all ${j=1,\dots,k}$._ For a given ${n_1,\dots,n_J}$, the number of possible ${m_1 \dots m_K}$ is then ${\prod_{j=1}^k a_j!}$, and the expectation in (14) is equal to ${1}$. Thus this expectation is morally

$\displaystyle (\frac{N}{\log T})^{J+K} \sum_{n_1,\dots,n_J} \frac{\Lambda^2(n_1) \dots \Lambda^2(n_J) }{n_1 \dots n_J} \prod_{j=1}^k a_j!$

and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree ${1}$ case of pair correlations due to Montgomery, and the degree ${2}$ case due to Hejhal.)

With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for ${{\mathbf E}_t}$, thus

$\displaystyle {\bf E}_{t} |\mathrm{tr} U^j|^2 \approx \min(j,N) \ \ \ \ \ (17)$

for all ${j \geq 0}$. Montgomery showed this for (essentially) the range ${j \leq N}$ (as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.

These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of ${U}$. One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose ${Q_t(Z)}$ is some random polynomial depending on ${t}$ of degree at most ${N}$. Let ${\lambda_1,\dots,\lambda_n}$ denote the eigenvalues of ${U}$, and let ${c > 0}$ be a parameter. Observe from the pigeonhole principle that if the quantity

$\displaystyle \sum_{j=1}^n \int_0^{c/N} |Q_t( e(\theta) \lambda_j )|^2\ d\theta \ \ \ \ \ (18)$

exceeds the quantity

$\displaystyle \int_{0}^{2\pi} |Q_t(e(\theta))|^2\ d\theta, \ \ \ \ \ (19)$

then the arcs ${\{ e(\theta) \lambda_j: 0 \leq \theta \leq c \}}$ cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than ${c/N}$ (${c}$ times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than ${c}$ times the mean angle separation. By judiciously choosing the coefficients of ${Q_t}$ as functions of the moments ${\mathrm{tr}(U^j)}$, one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as

$\displaystyle \frac{1}{2\pi i} \int_0^{c/N} (\int_{|z| = 1+\varepsilon} - \int_{|z|=1-\varepsilon}) Q_t( e(\theta) z ) \overline{Q_t}( \frac{1}{e(\theta) z} ) \frac{P'_t(z)}{P_t(z)}\ dz$

for sufficiently small ${\varepsilon>0}$, and this can be computed (in principle, at least) using (3) if the coefficients of ${Q_t}$ are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than ${\mu}$ times the mean spacing and greater than ${\lambda}$ times the mean spacing infinitely often for certain ${0 < \mu < 1 < \lambda}$; the current records are ${\mu = 0.50412}$ (due to Goldston and Turnage-Butterbaugh) and ${\lambda = 3.18}$ (due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).

It would be of great interest if one could push the upper bound ${\mu}$ for the smallest gap below ${1/2}$. The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases ${\theta}$ of the eigenvalues ${e^{2\pi i\theta}}$ of ${U}$ are nasymptotically always non-zero integer multiples of ${1/2N}$. The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of ${T}$); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which ${L(1,\chi)}$ is very close to zero, then ${1*\chi}$ behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined ${L}$-function ${\zeta(s) L(s,\chi)}$ will have a polynomial approximation which in our language looks like a scalar multiple of ${1 + e(\theta) Z^{2N+M}}$, where ${q \approx T^{M/N}}$ and ${\theta}$ is a phase. The zeroes of this approximation lie on a coset of the ${(2N+M)^{th}}$ roots of unity; the polynomial ${P}$ is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of ${1/(2N+M)}$. Taking ${M = o(N)}$ then gives the claim.)

Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the Alternative Circular Unitary Ensemble (ACUE) which is indistinguishable in low moments in the sense that the expectation ${{\bf E}_{ACUE}}$ for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of ${1/2N}$. This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)

To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group ${U(n)}$ in terms of the distribution of the phases ${\theta_1,\dots,\theta_N \in {\bf R}/{\bf Z}}$ of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure

$\displaystyle \frac{1}{N!} |V(\theta)|^2\ d\theta_1 \dots d\theta_N; \ \ \ \ \ (20)$

where

$\displaystyle V(\theta) := \prod_{1 \leq i

is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity

$\displaystyle V(\theta) = \sum_{\pi \in S_N} \mathrm{sgn}(\pi) e(\theta \cdot \pi(\rho))$

where ${\theta := (\theta_1,\dots,\theta_N)}$, ${\cdot}$ denotes the dot product, and ${\rho := (1,2,\dots,N)}$ is the “long word”, which implies that (20) is a trigonometric series with constant term ${1}$; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing ${(\theta_1,\dots,\theta_n) \in ({\bf R}/{\bf Z})^N}$ using the probability measure (20), and then generating ${U}$ to be a random unitary matrix with eigenvalues ${e(\theta_1),\dots,e(\theta_N)}$.

For the alternative distribution, we first draw ${(\theta_1,\dots,\theta_N)}$ on the discrete torus ${(\frac{1}{2N}{\bf Z}/{\bf Z})^N}$ (thus each ${\theta_j}$ is a ${2N^{th}}$ root of unity) with probability density function

$\displaystyle \frac{1}{(2N)^N} \frac{1}{N!} |V(\theta)|^2 \ \ \ \ \ (21)$

shift by a phase ${\alpha \in {\bf R}/{\bf Z}}$ drawn uniformly at random, and then select ${U}$ to be a random unitary matrix with eigenvalues ${e^{i(\theta_1+\alpha)}, \dots, e^{i(\theta_N+\alpha)}}$. Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form ${e(\theta \cdot (\pi(\rho)-\pi'(\rho))}$ for ${\pi,\pi' \in S_N}$. The diagonal contribution ${\pi=\pi'}$ gives the constant function ${\frac{1}{(2N)^N}}$, which has total mass one. All of the other exponentials have a frequency ${\pi(\rho)-\pi'(\rho)}$ that is not a multiple of ${2N}$, and hence will have mean zero on ${(\frac{1}{2N}{\bf Z}/{\bf Z})^N}$. The claim follows.

From construction it is clear that the matrix ${U}$ drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of ${1/2N}$. Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials ${s_\lambda}$ with ${\lambda}$ of size at most ${N}$ and of equal size remain orthonormal with respect to the alternative measure. That is to say,

$\displaystyle \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{CUE}}(U) = \int_{U(N)} s_\lambda(U) \overline{s_{\lambda'}(U)}\ d\mu_{\mathrm{ACUE}}(U)$

when ${\lambda,\lambda'}$ have size equal to each other and at most ${N}$. In this case the phase ${\alpha}$ in the definition of ${U}$ is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that

$\displaystyle \int_{({\bf R}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2\ d\theta = \frac{1}{(2N)^N} \sum_{\theta \in (\frac{1}{2N}{\bf Z}/{\bf Z})^N} s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2.$

By Fourier decomposition, it then suffices to show that the trigonometric polynomial ${s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2}$ does not contain any components of the form ${e( \theta \cdot 2N k)}$ for some non-zero lattice vector ${k \in {\bf Z}^N}$. But we have already observed that ${|V(\theta)|^2}$ is a linear combination of plane waves of the form ${e(\theta \cdot (\pi(\rho)-\pi'(\rho))}$ for ${\pi,\pi' \in S_N}$. Also, as is well known, ${s_\lambda(\theta)}$ is a linear combination of plane waves ${e( \theta \cdot \kappa )}$ where ${\kappa}$ is majorised by ${\lambda}$, and similarly ${s_{\lambda'}(\theta)}$ is a linear combination of plane waves ${e( \theta \cdot \kappa' )}$ where ${\kappa'}$ is majorised by ${\lambda'}$. So the product ${s_\lambda(\theta) \overline{s_{\lambda'}(\theta)} |V(\theta)|^2}$ is a linear combination of plane waves of the form ${e(\theta \cdot (\kappa - \kappa' + \pi(\rho) - \pi'(\rho)))}$. But every coefficient of the vector ${\kappa - \kappa' + \pi(\rho) - \pi'(\rho)}$ lies between ${1-2N}$ and ${2N-1}$, and so cannot be of the form ${2Nk}$ for any non-zero lattice vector ${k}$, giving the claim.

Example 4 If ${N=2}$, then the distribution (21) assigns a probability of ${\frac{1}{4^2 2!} 2}$ to any pair ${(\theta_1,\theta_2) \in (\frac{1}{4} {\bf Z}/{\bf Z})^2}$ that is a permuted rotation of ${(0,\frac{1}{4})}$, and a probability of ${\frac{1}{4^2 2!} 4}$ to any pair that is a permuted rotation of ${(0,\frac{1}{2})}$. Thus, a matrix ${U}$ drawn from the alternative distribution will be conjugate to a phase rotation of ${\mathrm{diag}(1, i)}$ with probability ${1/2}$, and to ${\mathrm{diag}(1,-1)}$ with probability ${1/2}$.

A similar computation when ${N=3}$ gives ${U}$ conjugate to a phase rotation of ${\mathrm{diag}(1, e(1/6), e(1/3))}$ with probability ${1/12}$, to a phase rotation of ${\mathrm{diag}( 1, e(1/6), -1)}$ or its adjoint with probability of ${1/3}$ each, and a phase rotation of ${\mathrm{diag}(1, e(1/3), e(2/3))}$ with probability ${1/4}$.

Remark 5 For large ${N}$ it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of ${1/2N}$; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.

In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then ${|\mathrm{tr}(U^j)|}$ is periodic in ${j}$ with period ${2N}$, so from Proposition 1 for the alternative distribution one has

$\displaystyle {\bf E}_{\mathrm{ACUE}} |\mathrm{tr} U^j|^2 = \min_{k \in {\bf Z}} |j-2Nk|$

which differs from (13) for any ${|j| > N}$. (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single ${j}$ with ${|j| > N}$ would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)

Remark 6 One can view the CUE as normalised Lebesgue measure on ${U(N)}$ (viewed as a smooth submanifold of ${{\bf C}^{N^2}}$). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of ${U(N)}$ consisting of those unitary matrices whose phase spacings are non-zero integer multiples of ${1/2N}$; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel ${K(\theta,\theta') = \frac{1}{N} \sum_{j=0}^{N-1} e(j(\theta - \theta'))}$ (or one can equivalently take ${K(\theta,\theta') = \frac{\sin(\pi N (\theta-\theta'))}{N\sin(\pi(\theta-\theta'))}}$; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be ${2N^{th}}$ roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of ${\frac{1}{2}}$). In particular, the ${k}$-point correlation functions of ACUE (after this rotation) are precisely the restriction of the ${k}$-point correlation functions of CUE after normalisation, that is to say they are proportional to ${\mathrm{det}( K( \theta_i,\theta_j) )_{1 \leq i,j \leq k}}$.

Remark 7 One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics for

$\displaystyle \int_T^{2T} |\zeta(\frac{1}{2}+it)|^{2k} |Q(\frac{1}{2}+it)|^2\ dt$

for some small even exponent ${2k}$ (almost always ${2}$ or ${4}$) and some short Dirichlet polynomial ${Q}$; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like

$\displaystyle {\bf E}_t |P_t(1)|^{2k} |Q_t(1)|^2$

where ${Q_t}$ is now some random medium degree polynomial that depends on the unitary matrix ${U}$ associated to ${P_t}$ (and in applications will typically also contain some negative power of ${\exp(A_t)}$ to cancel the corresponding powers of ${\exp(A_t)}$ in ${|P_t(1)|^{2k}}$). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of ${Q}$ involve products of traces ${\mathrm{tr}(U^k)}$ of total order less than ${N-k}$, then in terms of the eigenvalue phases ${\theta}$, ${|Q(1)|^2}$ is a linear combination of plane waves ${e(\theta \cdot \xi)}$ where the frequencies ${\xi}$ have coefficients of magnitude less than ${N-k}$. On the other hand, as each coefficient of ${P_t}$ is an elementary symmetric function of the eigenvalues, ${P_t(1)}$ is a linear combination of plane waves ${e(\theta \cdot \xi)}$ where the frequencies ${\xi}$ have coefficients of magnitude at most ${1}$. Thus ${|P_t(1)|^{2k} |Q_t(1)|^2}$ is a linear combination of plane waves where the frequencies ${\xi}$ have coefficients of magnitude less than ${N}$, and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus ${({\bf R}/{\bf Z})^n}$ by the previous arguments. In other words, ${|P_t(1)|^{2k} |Q_t(1)|^2}$ has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier ${Q_t}$ has degree close to or exceeding ${N}$, which corresponds to Dirichlet polynomials ${Q}$ of length close to or exceeding ${T}$, which is far beyond current technology for such moment estimates.

Remark 8 The GUE hypothesis for the zeta function asserts that the average

$\displaystyle \lim_{T \rightarrow \infty} \frac{1}{T} \int_T^{2T} \sum_{\gamma_1,\dots,\gamma_n \hbox{ distinct}} \eta( \frac{\log T}{2\pi}(\gamma_1-t),\dots, \frac{\log T}{2\pi}(\gamma_k-t))\ dt \ \ \ \ \ (22)$

is equal to

$\displaystyle \int_{{\bf R}^n} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ dx_1 \dots dx_k \ \ \ \ \ (23)$

for any ${k \geq 1}$ and any test function ${\eta: {\bf R}^k \rightarrow {\bf C}}$, where ${K(x) := \frac{\sin \pi x}{\pi x}}$ is the Dyson sine kernel and ${\gamma_i}$ are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for ${U}$. The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):

$\displaystyle \int_0^1 2^{-k} \sum_{x_1,\dots,x_k \in \frac{1}{2} {\bf Z} + \theta} \eta(x) \det(K(x_i-x_j))_{1 \leq i,j \leq k}\ d\theta.$

This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)

A useful rule of thumb in complex analysis is that holomorphic functions ${f(z)}$ behave like large degree polynomials ${P(z)}$. This can be evidenced for instance at a “local” level by the Taylor series expansion for a complex analytic function in the disk, or at a “global” level by factorisation theorems such as the Weierstrass factorisation theorem (or the closely related Hadamard factorisation theorem). One can truncate these theorems in a variety of ways (e.g., Taylor’s theorem with remainder) to be able to approximate a holomorphic function by a polynomial on various domains.

In some cases it can be convenient instead to work with polynomials ${P(Z)}$ of another variable ${Z}$ such as ${Z = e^{2\pi i z}}$ (or more generally ${Z=e^{2\pi i z/N}}$ for a scaling parameter ${N}$). In the case of the Riemann zeta function, defined by meromorphic continuation of the formula

$\displaystyle \zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} \ \ \ \ \ (1)$

one ends up having the following heuristic approximation in the neighbourhood of a point ${\frac{1}{2}+it}$ on the critical line:

Heuristic 1 (Polynomial approximation) Let ${T \ggg 1}$ be a height, let ${t}$ be a “typical” element of ${[T,2T]}$, and let ${1 \lll N \ll \log T}$ be an integer. Let ${\phi_t = \phi_{t,T}: {\bf C} \rightarrow {\bf C}}$ be the linear change of variables

$\displaystyle \phi_t(z) := \frac{1}{2} + it - \frac{2\pi i z}{\log T}.$

Then one has an approximation

$\displaystyle \zeta( \phi_t(z) ) \approx P_t( e^{2\pi i z/N} ) \ \ \ \ \ (2)$

for ${z = o(N)}$ and some polynomial ${P_t = P_{t,T}}$ of degree ${N}$.

The requirement ${z=o(N)}$ is necessary since the right-hand side is periodic with period ${N}$ in the ${z}$ variable (or period ${\frac{2\pi i N}{\log T}}$ in the ${s = \phi_t(z)}$ variable), whereas the zeta function is not expected to have any such periodicity, even approximately.

Let us give two non-rigorous justifications of this heuristic. Firstly, it is standard that inside the critical strip (with ${\mathrm{Im}(s) = O(T)}$) we have an approximate form

$\displaystyle \zeta(s) \approx \sum_{n \leq T} \frac{1}{n^s}$

of (11). If we group the integers ${n}$ from ${1}$ to ${T}$ into ${N}$ bins depending on what powers of ${T^{1/N}}$ they lie between, we thus have

$\displaystyle \zeta(s) \approx \sum_{j=0}^N \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^s}$

For ${s = \phi_t(z)}$ with ${z = o(N)}$ and ${T^{j/N} \leq n < T^{(j+1)/N}}$ we heuristically have

$\displaystyle \frac{1}{n^s} \approx \frac{1}{n^{\frac{1}{2}+it}} e^{2\pi i j z / N}$

and so

$\displaystyle \zeta(s) \approx \sum_{j=0}^N a_j(t) (e^{2\pi i z/N})^j$

where ${a_j(t)}$ are the partial Dirichlet series

$\displaystyle a_j(t) \approx \sum_{T^{j/N} \leq n < T^{(j+1)/N}} \frac{1}{n^{\frac{1}{2}+it}}. \ \ \ \ \ (3)$

This gives the desired polynomial approximation.

A second non-rigorous justification is as follows. From factorisation theorems such as the Hadamard factorisation theorem we expect to have

$\displaystyle \zeta(s) \propto \prod_\rho (s-\rho) \times \dots$

where ${\rho}$ runs over the non-trivial zeroes of ${\zeta}$, and there are some additional factors arising from the trivial zeroes and poles of ${\zeta}$ which we will ignore here; we will also completely ignore the issue of how to renormalise the product to make it converge properly. In the region ${s = \frac{1}{2} + it + o( N / \log T) = \phi_t( \{ z: z = o(N) \})}$, the dominant contribution to this product (besides multiplicative constants) should arise from zeroes ${\rho}$ that are also in this region. The Riemann-von Mangoldt formula suggests that for “typical” ${t}$ one should have about ${N}$ such zeroes. If one lets ${\rho_1,\dots,\rho_N}$ be any enumeration of ${N}$ zeroes closest to ${\frac{1}{2}+it}$, and then repeats this set of zeroes periodically by period ${\frac{2\pi i N}{\log T}}$, one then expects to have an approximation of the form

$\displaystyle \zeta(s) \propto \prod_{j=1}^N \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) )$

again ignoring all issues of convergence. If one writes ${s = \phi_t(z)}$ and ${\rho_j = \phi_t(\lambda_j)}$, then Euler’s famous product formula for sine basically gives

$\displaystyle \prod_{k \in {\bf Z}} (s-(\rho_j+\frac{2\pi i kN}{\log T}) ) \propto \prod_{k \in {\bf Z}} (z - (\lambda_j+2\pi k N) )$

$\displaystyle \propto (e^{2\pi i z/N} - e^{2\pi i \lambda j/N})$

(here we are glossing over some technical issues regarding renormalisation of the infinite products, which can be dealt with by studying the asymptotics as ${\mathrm{Im}(z) \rightarrow \infty}$) and hence we expect

$\displaystyle \zeta(s) \propto \prod_{j=1}^N (e^{2\pi i z/N} - e^{2\pi i \lambda j/N}).$

This again gives the desired polynomial approximation.

Below the fold we give a rigorous version of the second argument suitable for “microscale” analysis. More precisely, we will show

Theorem 2 Let ${N = N(T)}$ be an integer going sufficiently slowly to infinity. Let ${W_0 \ll N}$ go to zero sufficiently slowly depending on ${N}$. Let ${t}$ be drawn uniformly at random from ${[T,2T]}$. Then with probability ${1-o(1)}$ (in the limit ${T \rightarrow \infty}$), and possibly after adjusting ${N}$ by ${1}$, there exists a polynomial ${P_t(Z)}$ of degree ${N}$ and obeying the functional equation (9) below, such that

$\displaystyle \zeta( \phi_t(z) ) = (1+o(1)) P_t( e^{2\pi i z/N} ) \ \ \ \ \ (4)$

whenever ${|z| \leq W_0}$.

It should be possible to refine the arguments to extend this theorem to the mesoscale setting by letting ${N}$ be anything growing like ${o(\log T)}$, and ${W_0}$ anything growing like ${o(N)}$; also we should be able to delete the need to adjust ${N}$ by ${1}$. We have not attempted these optimisations here.

Many conjectures and arguments involving the Riemann zeta function can be heuristically translated into arguments involving the polynomials ${P_t(Z)}$, which one can view as random degree ${N}$ polynomials if ${t}$ is interpreted as a random variable drawn uniformly at random from ${[T,2T]}$. These can be viewed as providing a “toy model” for the theory of the Riemann zeta function, in which the complex analysis is simplified to the study of the zeroes and coefficients of this random polynomial (for instance, the role of the gamma function is now played by a monomial in ${Z}$). This model also makes the zeta function theory more closely resemble the function field analogues of this theory (in which the analogue of the zeta function is also a polynomial (or a rational function) in some variable ${Z}$, as per the Weil conjectures). The parameter ${N}$ is at our disposal to choose, and reflects the scale ${\approx N/\log T}$ at which one wishes to study the zeta function. For “macroscopic” questions, at which one wishes to understand the zeta function at unit scales, it is natural to take ${N \approx \log T}$ (or very slightly larger), while for “microscopic” questions one would take ${N}$ close to ${1}$ and only growing very slowly with ${T}$. For the intermediate “mesoscopic” scales one would take ${N}$ somewhere between ${1}$ and ${\log T}$. Unfortunately, the statistical properties of ${P_t}$ are only understood well at a conjectural level at present; even if one assumes the Riemann hypothesis, our understanding of ${P_t}$ is largely restricted to the computation of low moments (e.g., the second or fourth moments) of various linear statistics of ${P_t}$ and related functions (e.g., ${1/P_t}$, ${P'_t/P_t}$, or ${\log P_t}$).

Let’s now heuristically explore the polynomial analogues of this theory in a bit more detail. The Riemann hypothesis basically corresponds to the assertion that all the ${N}$ zeroes of the polynomial ${P_t(Z)}$ lie on the unit circle ${|Z|=1}$ (which, after the change of variables ${Z = e^{2\pi i z/N}}$, corresponds to ${z}$ being real); in a similar vein, the GUE hypothesis corresponds to ${P_t(Z)}$ having the asymptotic law of a random scalar ${a_N(t)}$ times the characteristic polynomial of a random unitary ${N \times N}$ matrix. Next, we consider what happens to the functional equation

$\displaystyle \zeta(s) = \chi(s) \zeta(1-s) \ \ \ \ \ (5)$

where

$\displaystyle \chi(s) := 2^s \pi^{s-1} \sin(\frac{\pi s}{2}) \Gamma(1-s).$

A routine calculation involving Stirling’s formula reveals that

$\displaystyle \chi(\frac{1}{2}+it) = (1+o(1)) e^{-2\pi i L(t)} \ \ \ \ \ (6)$

with ${L(t) := \frac{t}{2\pi} \log \frac{t}{2\pi} - \frac{t}{2\pi} + \frac{7}{8}}$; one also has the closely related approximation

$\displaystyle \frac{\chi'}{\chi}(s) = -\log T + O(1) \ \ \ \ \ (7)$

and hence

$\displaystyle \chi(\phi_t(z)) = (1+o(1)) e^{-2\pi i \theta(t)} e^{2\pi i z} \ \ \ \ \ (8)$

when ${z = o(\log T)}$. Since ${\zeta(1-s) = \overline{\zeta(\overline{1-s})}}$, applying (5) with ${s = \phi_t(z)}$ and using the approximation (2) suggests a functional equation for ${P_t}$:

$\displaystyle P_t(e^{2\pi i z/N}) = e^{-2\pi i L(t)} e^{2\pi i z} \overline{P_t(e^{2\pi i \overline{z}/N})}$

or in terms of ${Z := e^{2\pi i z/N}}$,

$\displaystyle P_t(Z) = e^{-2\pi i L(t)} Z^N \overline{P_t}(1/Z) \ \ \ \ \ (9)$

where ${\overline{P_t}(Z) := \overline{P_t(\overline{Z})}}$ is the polynomial ${P_t}$ with all the coefficients replaced by their complex conjugate. Thus if we write

$\displaystyle P_t(Z) = \sum_{j=0}^N a_j Z^j$

then the functional equation can be written as

$\displaystyle a_j(t) = e^{-2\pi i L(t)} \overline{a_{N-j}(t)}.$

We remark that if we use the heuristic (3) (interpreting the cutoffs in the ${n}$ summation in a suitably vague fashion) then this equation can be viewed as an instance of the Poisson summation formula.

Another consequence of the functional equation is that the zeroes of ${P_t}$ are symmetric with respect to inversion ${Z \mapsto 1/\overline{Z}}$ across the unit circle. This is of course consistent with the Riemann hypothesis, but does not obviously imply it. The phase ${L(t)}$ is of little consequence in this functional equation; one could easily conceal it by working with the phase rotation ${e^{\pi i L(t)} P_t}$ of ${P_t}$ instead.

One consequence of the functional equation is that ${e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})}$ is real for any ${\theta \in {\bf R}}$; the same is then true for the derivative ${e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})}$. Among other things, this implies that ${P'_t(e^{i\theta})}$ cannot vanish unless ${P_t(e^{i\theta})}$ does also; thus the zeroes of ${P'_t}$ will not lie on the unit circle except where ${P_t}$ has repeated zeroes. The analogous statement is true for ${\zeta}$; the zeroes of ${\zeta'}$ will not lie on the critical line except where ${\zeta}$ has repeated zeroes.

Relating to this fact, it is a classical result of Speiser that the Riemann hypothesis is true if and only if all the zeroes of the derivative ${\zeta'}$ of the zeta function in the critical strip lie on or to the right of the critical line. The analogous result for polynomials is

Proposition 3 We have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P'_t(Z) = 0 \}$

(where all zeroes are counted with multiplicity.) In particular, the zeroes of ${P_t(Z)}$ all lie on the unit circle if and only if the zeroes of ${P'_t(Z)}$ lie in the closed unit disk.

Proof: From the functional equation we have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| > 1: P_t(Z) = 0 \}.$

Thus it will suffice to show that ${P_t}$ and ${P'_t}$ have the same number of zeroes outside the closed unit disk.

Set ${f(z) := z \frac{P'(z)}{P(z)}}$, then ${f}$ is a rational function that does not have a zero or pole at infinity. For ${e^{i\theta}}$ not a zero of ${P_t}$, we have already seen that ${e^{\pi i L(t)} e^{-i N \theta/2} P_t(e^{i\theta})}$ and ${e^{\pi i L(t)} e^{i N \theta} (i e^{i\theta} P'_t(e^{i\theta}) - i \frac{N}{2} P_t(e^{i\theta})}$ are real, so on dividing we see that ${i f(e^{i\theta}) - \frac{iN}{2}}$ is always real, that is to say

$\displaystyle \mathrm{Re} f(e^{i\theta}) = \frac{N}{2}.$

(This can also be seen by writing ${f(e^{i\theta}) = \sum_\lambda \frac{1}{1-e^{-i\theta} \lambda}}$, where ${\lambda}$ runs over the zeroes of ${P_t}$, and using the fact that these zeroes are symmetric with respect to reflection across the unit circle.) When ${e^{i\theta}}$ is a zero of ${P_t}$, ${f(z)}$ has a simple pole at ${e^{i\theta}}$ with residue a positive multiple of ${e^{i\theta}}$, and so ${f(z)}$ stays on the right half-plane if one traverses a semicircular arc around ${e^{i\theta}}$ outside the unit disk. From this and continuity we see that ${f}$ stays on the right-half plane in a circle slightly larger than the unit circle, and hence by the argument principle it has the same number of zeroes and poles outside of this circle, giving the claim. $\Box$

From the functional equation and the chain rule, ${Z}$ is a zero of ${P'_t}$ if and only if ${1/\overline{Z}}$ is a zero of ${N P_t - P'_t}$. We can thus write the above proposition in the equivalent form

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} = N - 2 \# \{ |Z| < 1: NP_t(Z) - P'_t(Z) = 0 \}.$

One can use this identity to get a lower bound on the number of zeroes of ${P_t}$ by the method of mollifiers. Namely, for any other polynomial ${M_t}$, we clearly have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \}$

$\displaystyle \geq N - 2 \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}.$

By Jensen’s formula, we have for any ${r>1}$ that

$\displaystyle \log |M_t(0)| |NP_t(0)-P'_t(0)|$

$\displaystyle \leq -(\log r) \# \{ |Z| < 1: M_t(Z)(NP_t(Z) - P'_t(Z)) = 0 \}$

$\displaystyle + \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|\ d\theta.$

We therefore have

$\displaystyle \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N + \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|$

$\displaystyle - \frac{1}{\log r} \frac{1}{2\pi} \int_0^{2\pi} \log |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta.$

As the logarithm function is concave, we can apply Jensen’s inequality to conclude

$\displaystyle {\bf E} \# \{ |Z| = 1: P_t(Z) = 0 \} \geq N$

$\displaystyle + {\bf E} \frac{2}{\log r} \log |M_t(0)| |NP_t(0)-P'_t(0)|$

$\displaystyle - \frac{1}{\log r} \log \left( \frac{1}{2\pi} \int_0^{2\pi} {\bf E} |M_t(re^{i\theta})(NP_t(e^{i\theta}) - P'_t(re^{i\theta}))|^2\ d\theta\right).$

where the expectation is over the ${t}$ parameter. It turns out that by choosing the mollifier ${M_t}$ carefully in order to make ${M_t P_t}$ behave like the function ${1}$ (while keeping the degree ${M_t}$ small enough that one can compute the second moment here), and then optimising in ${r}$, one can use this inequality to get a positive fraction of zeroes of ${P_t}$ on the unit circle on average. This is the polynomial analogue of a classical argument of Levinson, who used this to show that at least one third of the zeroes of the Riemann zeta function are on the critical line; all later improvements on this fraction have been based on some version of Levinson’s method, mainly focusing on more advanced choices for the mollifier ${M_t}$ and of the differential operator ${N - \partial_z}$ that implicitly appears in the above approach. (The most recent lower bound I know of is ${0.4191637}$, due to Pratt and Robles. In principle (as observed by Farmer) this bound can get arbitrarily close to ${1}$ if one is allowed to use arbitrarily long mollifiers, but establishing this seems of comparable difficulty to unsolved problems such as the pair correlation conjecture; see this paper of Radziwill for more discussion.) A variant of these techniques can also establish “zero density estimates” of the following form: for any ${W \geq 1}$, the number of zeroes of ${P_t}$ that lie further than ${\frac{W}{N}}$ from the unit circle is of order ${O( e^{-cW} N )}$ on average for some absolute constant ${c>0}$. Thus, roughly speaking, most zeroes of ${P_t}$ lie within ${O(1/N)}$ of the unit circle. (Analogues of these results for the Riemann zeta function were worked out by Selberg, by Jutila, and by Conrey, with increasingly strong values of ${c}$.)

The zeroes of ${P'_t}$ tend to live somewhat closer to the origin than the zeroes of ${P_t}$. Suppose for instance that we write

$\displaystyle P_t(Z) = \sum_{j=0}^N a_j(t) Z^j = a_N(t) \prod_{j=1}^N (Z - \lambda_j)$

where ${\lambda_1,\dots,\lambda_N}$ are the zeroes of ${P_t(Z)}$, then by evaluating at zero we see that

$\displaystyle \lambda_1 \dots \lambda_N = (-1)^N a_0(t) / a_N(t)$

and the right-hand side is of unit magnitude by the functional equation. However, if we differentiate

$\displaystyle P'_t(Z) = \sum_{j=1}^N a_j(t) j Z^{j-1} = N a_N(t) \prod_{j=1}^{N-1} (Z - \lambda'_j)$

where ${\lambda'_1,\dots,\lambda'_{N-1}}$ are the zeroes of ${P'_t}$, then by evaluating at zero we now see that

$\displaystyle \lambda'_1 \dots \lambda'_{N-1} = (-1)^N a_1(t) / N a_N(t).$

The right-hand side would now be typically expected to be of size ${O(1/N) \approx \exp(- \log N)}$, and so on average we expect the ${\lambda'_j}$ to have magnitude like ${\exp( - \frac{\log N}{N} )}$, that is to say pushed inwards from the unit circle by a distance roughly ${\frac{\log N}{N}}$. The analogous result for the Riemann zeta function is that the zeroes of ${\zeta'(s)}$ at height ${\sim T}$ lie at a distance roughly ${\frac{\log\log T}{\log T}}$ to the right of the critical line on the average; see this paper of Levinson and Montgomery for a precise statement.

In this post we assume the Riemann hypothesis and the simplicity of zeroes, thus the zeroes of ${\zeta}$ in the critical strip take the form ${\frac{1}{2} \pm i \gamma_j}$ for some real number ordinates ${0 < \gamma_1 < \gamma_2 < \dots}$. From the Riemann-von Mangoldt formula, one has the asymptotic

$\displaystyle \gamma_n = (1+o(1)) \frac{2\pi}{\log n} n$

as ${n \rightarrow \infty}$; in particular, the spacing ${\gamma_{n+1} - \gamma_n}$ should behave like ${\frac{2\pi}{\log n}}$ on the average. However, it can happen that some gaps are unusually small compared to other nearby gaps. For the sake of concreteness, let us define a Lehmer pair to be a pair of adjacent ordinates ${\gamma_n, \gamma_{n+1}}$ such that

$\displaystyle \frac{1}{(\gamma_{n+1} - \gamma_n)^2} \geq 1.3 \sum_{m \neq n,n+1} \frac{1}{(\gamma_m - \gamma_n)^2} + \frac{1}{(\gamma_m - \gamma_{n+1})^2}. \ \ \ \ \ (1)$

The specific value of constant ${1.3}$ is not particularly important here; anything larger than ${\frac{5}{4}}$ would suffice. An example of such a pair would be the classical pair

$\displaystyle \gamma_{6709} = 7005.062866\dots$

$\displaystyle \gamma_{6710} = 7005.100564\dots$

discovered by Lehmer. It follows easily from the main results of Csordas, Smith, and Varga that if an infinite number of Lehmer pairs (in the above sense) existed, then the de Bruijn-Newman constant ${\Lambda}$ is non-negative. This implication is now redundant in view of the unconditional results of this recent paper of Rodgers and myself; however, the question of whether an infinite number of Lehmer pairs exist remain open.

In this post, I sketch an argument that Brad and I came up with (as initially suggested by Odlyzko) the GUE hypothesis implies the existence of infinitely many Lehmer pairs. We argue probabilistically: pick a sufficiently large number ${T}$, pick ${n}$ at random from ${T \log T}$ to ${2 T \log T}$ (so that the average gap size is close to ${\frac{2\pi}{\log T}}$), and prove that the Lehmer pair condition (1) occurs with positive probability.

Introduce the renormalised ordinates ${x_n := \frac{\log T}{2\pi} \gamma_n}$ for ${T \log T \leq n \leq 2 T \log T}$, and let ${\varepsilon > 0}$ be a small absolute constant (independent of ${T}$). It will then suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2} \geq$

$\displaystyle 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2}$

$\displaystyle + \frac{1}{6\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$, since the contribution of those ${m}$ outside of ${[T \log T, 2T \log T]}$ can be absorbed by the ${\frac{1}{\varepsilon^2}}$ factor with probability ${o(1)}$.

As one consequence of the GUE hypothesis, we have ${x_{n+1} - x_n \leq \varepsilon^2}$ with probability ${O(\varepsilon^6)}$. Thus, if ${E := \{ m \in [T \log T, 2T \log T]: x_{m+1} - x_m \leq \varepsilon^2 \}}$, then ${E}$ has density ${O( \varepsilon^6 )}$. Applying the Hardy-Littlewood maximal inequality, we see that with probability ${O(\varepsilon^6)}$, we have

$\displaystyle \sup_{h \geq 1} | \# E \cap [n+h, n-h] | \leq \frac{1}{10}$

which implies in particular that

$\displaystyle |x_m - x_n|, |x_{m} - x_{n+1}| \gg \varepsilon^2 |m-n|$

for all ${m \in [T \log T, 2 T \log T] \backslash \{ n, n+1\}}$. This implies in particular that

$\displaystyle \sum_{m \in [T \log T, 2T \log T]: |m-n| \geq \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} \ll \varepsilon^{-1}$

and so it will suffice to show that

$\displaystyle \frac{1}{(x_{n+1} - x_n)^2}$

$\displaystyle \geq 1.3 \sum_{m \in [T \log T, 2T \log T]: m \neq n,n+1; |m-n| < \varepsilon^{-3}} \frac{1}{(x_m - x_n)^2} + \frac{1}{(x_m - x_{n+1})^2} + \frac{1}{5\varepsilon^2}$

(say) with probability ${\gg \varepsilon^4 - o(1)}$.

By the GUE hypothesis (and the fact that ${\varepsilon}$ is independent of ${T}$), it suffices to show that a Dyson sine process ${(x_n)_{n \in {\bf Z}}}$, normalised so that ${x_0}$ is the first positive point in the process, obeys the inequality

$\displaystyle \frac{1}{(x_{1} - x_0)^2} \geq 1.3 \sum_{|m| < \varepsilon^{-3}: m \neq 0,1} \frac{1}{(x_m - x_0)^2} + \frac{1}{(x_m - x_1)^2} \ \ \ \ \ (2)$

with probability ${\gg \varepsilon^4}$. However, if we let ${A > 0}$ be a moderately large constant (and assume ${\varepsilon}$ small depending on ${A}$), one can show using ${k}$-point correlation functions for the Dyson sine process (and the fact that the Dyson kernel ${K(x,y) = \sin(\pi(x-y))/\pi(x-y)}$ equals ${1}$ to second order at the origin) that

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} \gg \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} \ll \varepsilon^7$

$\displaystyle {\bf E} \binom{N_{[-\varepsilon,0]}}{2} N_{[0,\varepsilon]} \ll \varepsilon^7$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[\varepsilon,A^{-1}]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-A^{-1}, -\varepsilon]} \ll A^{-3} \varepsilon^4$

$\displaystyle {\bf E} N_{[-\varepsilon,0]} N_{[0,\varepsilon]} N_{[-k, k]}^2 \ll k^2 \varepsilon^4 \ \ \ \ \ (3)$

for any natural number ${k}$, where ${N_{I}}$ denotes the number of elements of the process in ${I}$. For instance, the expression ${{\bf E} N_{[-\varepsilon,0]} \binom{N_{[0,\varepsilon]}}{2} }$ can be written in terms of the three-point correlation function ${\rho_3(x_1,x_2,x_3) = \mathrm{det}(K(x_i,x_j))_{1 \leq i,j \leq 3}}$ as

$\displaystyle \int_{-\varepsilon \leq x_1 \leq 0 \leq x_2 \leq x_3 \leq \varepsilon} \rho_3( x_1, x_2, x_3 )\ dx_1 dx_2 dx_3$

which can easily be estimated to be ${O(\varepsilon^7)}$ (since ${\rho_3 = O(\varepsilon^4)}$ in this region), and similarly for the other estimates claimed above.

Since for natural numbers ${a,b}$, the quantity ${ab - 2 a \binom{b}{2} - 2 b \binom{a}{2} = ab (5-2a-2b)}$ is only positive when ${a=b=1}$, we see from the first three estimates that the event ${E}$ that ${N_{[-\varepsilon,0]} = N_{[0,\varepsilon]} = 1}$ occurs with probability ${\gg \varepsilon^4}$. In particular, by Markov’s inequality we have the conditional probabilities

$\displaystyle {\bf P} ( N_{[\varepsilon,A^{-1}]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-A^{-1}, -\varepsilon]} \geq 1 | E ) \ll A^{-3}$

$\displaystyle {\bf P} ( N_{[-k, k]} \geq A k^{5/3} | E ) \ll A^{-4} k^{-4/3}$

and thus, if ${A}$ is large enough, and ${\varepsilon}$ small enough, it will be true with probability ${\gg \varepsilon^4}$ that

$\displaystyle N_{[-\varepsilon,0]}, N_{[0,\varepsilon]} = 1$

and

$\displaystyle N_{[A^{-1}, \varepsilon]} = N_{[\varepsilon, A^{-1}]} = 0$

and simultaneously that

$\displaystyle N_{[-k,k]} \leq A k^{5/3}$

for all natural numbers ${k}$. This implies in particular that

$\displaystyle x_1 - x_0 \leq 2\varepsilon$

and

$\displaystyle |x_m - x_0|, |x_m - x_1| \gg_A |m|^{3/5}$

for all ${m \neq 0,1}$, which gives (2) for ${\varepsilon}$ small enough.

Remark 1 The above argument needed the GUE hypothesis for correlations up to fourth order (in order to establish (3)). It might be possible to reduce the number of correlations needed, but I do not see how to obtain the claim just using pair correlations only.

Brad Rodgers and I have uploaded to the arXiv our paper “The De Bruijn-Newman constant is non-negative“. This paper affirms a conjecture of Newman regarding to the extent to which the Riemann hypothesis, if true, is only “barely so”. To describe the conjecture, let us begin with the Riemann xi function

$\displaystyle \xi(s) := \frac{s(s-1)}{2} \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s)$

where ${\Gamma(s) := \int_0^\infty e^{-t} t^{s-1}\ dt}$ is the Gamma function and ${\zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}}$ is the Riemann zeta function. Initially, this function is only defined for ${\mathrm{Re} s > 1}$, but, as was already known to Riemann, we can manipulate it into a form that extends to the entire complex plane as follows. Firstly, in view of the standard identity ${s \Gamma(s) = \Gamma(s+1)}$, we can write

$\displaystyle \frac{s(s-1)}{2} \Gamma(\frac{s}{2}) = 2 \Gamma(\frac{s+4}{2}) - 3 \Gamma( \frac{s+2}{2} )$

and hence

$\displaystyle \xi(s) = \sum_{n=1}^\infty 2 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt - 3 \pi^{-s/2} n^{-s} \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt.$

By a rescaling, one may write

$\displaystyle \int_0^\infty e^{-t} t^{\frac{s+4}{2}-1}\ dt = (\pi n^2)^{\frac{s+4}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+4}{2}-1}\ dt$

and similarly

$\displaystyle \int_0^\infty e^{-t} t^{\frac{s+2}{2}-1}\ dt = (\pi n^2)^{\frac{s+2}{2}} \int_0^\infty e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt$

and thus (after applying Fubini’s theorem)

$\displaystyle \xi(s) = \int_0^\infty \sum_{n=1}^\infty 2 \pi^2 n^4 e^{-\pi n^2 t} t^{\frac{s+4}{2}-1} - 3 \pi n^2 e^{-\pi n^2 t} t^{\frac{s+2}{2}-1}\ dt.$

We’ll make the change of variables ${t = e^{4u}}$ to obtain

$\displaystyle \xi(s) = 4 \int_{\bf R} \sum_{n=1}^\infty (2 \pi^2 n^4 e^{8u} - 3 \pi n^2 e^{4u}) \exp( 2su - \pi n^2 e^{4u} )\ du.$

If we introduce the mild renormalisation

$\displaystyle H_0(z) := \frac{1}{8} \xi( \frac{1}{2} + \frac{iz}{2} )$

of ${\xi}$, we then conclude (at least for ${\mathrm{Im} z > 1}$) that

$\displaystyle H_0(z) = \frac{1}{2} \int_{\bf R} \Phi(u)\exp(izu)\ du \ \ \ \ \ (1)$

where ${\Phi: {\bf R} \rightarrow {\bf C}}$ is the function

$\displaystyle \Phi(u) := \sum_{n=1}^\infty (2 \pi^2 n^4 e^{9u} - 3 \pi n^2 e^{5u}) \exp( - \pi n^2 e^{4u} ), \ \ \ \ \ (2)$

which one can verify to be rapidly decreasing both as ${u \rightarrow +\infty}$ and as ${u \rightarrow -\infty}$, with the decrease as ${u \rightarrow +\infty}$ faster than any exponential. In particular ${H_0}$ extends holomorphically to the upper half plane.

If we normalize the Fourier transform ${{\mathcal F} f(\xi)}$ of a (Schwartz) function ${f(x)}$ as ${{\mathcal F} f(\xi) := \int_{\bf R} f(x) e^{-2\pi i \xi x}\ dx}$, it is well known that the Gaussian ${x \mapsto e^{-\pi x^2}}$ is its own Fourier transform. The creation operator ${2\pi x - \frac{d}{dx}}$ interacts with the Fourier transform by the identity

$\displaystyle {\mathcal F} (( 2\pi x - \frac{d}{dx} ) f) (\xi) = -i (2 \pi \xi - \frac{d}{d\xi} ) {\mathcal F} f(\xi).$

Since ${(-i)^4 = 1}$, this implies that the function

$\displaystyle x \mapsto (2\pi x - \frac{d}{dx})^4 e^{-\pi x^2} = 128 \pi^2 (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2} + 48 \pi^2 e^{-\pi x^2}$

is its own Fourier transform. (One can view the polynomial ${128 \pi^2 (2\pi^2 x^4 - 3 \pi x^2) + 48 \pi^2}$ as a renormalised version of the fourth Hermite polynomial.) Taking a suitable linear combination of this with ${x \mapsto e^{-\pi x^2}}$, we conclude that

$\displaystyle x \mapsto (2 \pi^2 x^4 - 3 \pi x^2) e^{-\pi x^2}$

is also its own Fourier transform. Rescaling ${x}$ by ${e^{2u}}$ and then multiplying by ${e^u}$, we conclude that the Fourier transform of

$\displaystyle x \mapsto (2 \pi^2 x^4 e^{9u} - 3 \pi x^2 e^{5u}) \exp( - \pi x^2 e^{4u} )$

is

$\displaystyle x \mapsto (2 \pi^2 x^4 e^{-9u} - 3 \pi x^2 e^{-5u}) \exp( - \pi x^2 e^{-4u} ),$

and hence by the Poisson summation formula (using symmetry and vanishing at ${n=0}$ to unfold the ${n}$ summation in (2) to the integers rather than the natural numbers) we obtain the functional equation

$\displaystyle \Phi(-u) = \Phi(u),$

which implies that ${\Phi}$ and ${H_0}$ are even functions (in particular, ${H_0}$ now extends to an entire function). From this symmetry we can also rewrite (1) as

$\displaystyle H_0(z) = \int_0^\infty \Phi(u) \cos(zu)\ du,$

which now gives a convergent expression for the entire function ${H_0(z)}$ for all complex ${z}$. As ${\Phi}$ is even and real-valued on ${{\bf R}}$, ${H_0(z)}$ is even and also obeys the functional equation ${H_0(\overline{z}) = \overline{H_0(z)}}$, which is equivalent to the usual functional equation for the Riemann zeta function. The Riemann hypothesis is equivalent to the claim that all the zeroes of ${H_0}$ are real.

De Bruijn introduced the family ${H_t: {\bf C} \rightarrow {\bf C}}$ of deformations of ${H_0: {\bf C} \rightarrow {\bf C}}$, defined for all ${t \in {\bf R}}$ and ${z \in {\bf C}}$ by the formula

$\displaystyle H_t(z) := \int_0^\infty e^{tu^2} \Phi(u) \cos(zu)\ du.$

From a PDE perspective, one can view ${H_t}$ as the evolution of ${H_0}$ under the backwards heat equation ${\partial_t H_t(z) = - \partial_{zz} H_t(z)}$. As with ${H_0}$, the ${H_t}$ are all even entire functions that obey the functional equation ${H_t(\overline{z}) = \overline{H_t(z)}}$, and one can ask an analogue of the Riemann hypothesis for each such ${H_t}$, namely whether all the zeroes of ${H_t}$ are real. De Bruijn showed that these hypotheses were monotone in ${t}$: if ${H_t}$ had all real zeroes for some ${t}$, then ${H_{t'}}$ would also have all zeroes real for any ${t' \geq t}$. Newman later sharpened this claim by showing the existence of a finite number ${\Lambda \leq 1/2}$, now known as the de Bruijn-Newman constant, with the property that ${H_t}$ had all zeroes real if and only if ${t \geq \Lambda}$. Thus, the Riemann hypothesis is equivalent to the inequality ${\Lambda \leq 0}$. Newman then conjectured the complementary bound ${\Lambda \geq 0}$; in his words, this conjecture asserted that if the Riemann hypothesis is true, then it is only “barely so”, in that the reality of all the zeroes is destroyed by applying heat flow for even an arbitrarily small amount of time. Over time, a significant amount of evidence was established in favour of this conjecture; most recently, in 2011, Saouter, Gourdon, and Demichel showed that ${\Lambda \geq -1.15 \times 10^{-11}}$.

In this paper we finish off the proof of Newman’s conjecture, that is we show that ${\Lambda \geq 0}$. The proof is by contradiction, assuming that ${\Lambda < 0}$ (which among other things, implies the truth of the Riemann hypothesis), and using the properties of backwards heat evolution to reach a contradiction.

Very roughly, the argument proceeds as follows. As observed by Csordas, Smith, and Varga (and also discussed in this previous blog post, the backwards heat evolution of the ${H_t}$ introduces a nice ODE dynamics on the zeroes ${x_j(t)}$ of ${H_t}$, namely that they solve the ODE

$\displaystyle \frac{d}{dt} x_j(t) = -2 \sum_{j \neq k} \frac{1}{x_k(t) - x_j(t)} \ \ \ \ \ (3)$

for all ${j}$ (one has to interpret the sum in a principal value sense as it is not absolutely convergent, but let us ignore this technicality for the current discussion). Intuitively, this ODE is asserting that the zeroes ${x_j(t)}$ repel each other, somewhat like positively charged particles (but note that the dynamics is first-order, as opposed to the second-order laws of Newtonian mechanics). Formally, a steady state (or equilibrium) of this dynamics is reached when the ${x_k(t)}$ are arranged in an arithmetic progression. (Note for instance that for any positive ${u}$, the functions ${z \mapsto e^{tu^2} \cos(uz)}$ obey the same backwards heat equation as ${H_t}$, and their zeroes are on a fixed arithmetic progression ${\{ \frac{2\pi (k+\tfrac{1}{2})}{u}: k \in {\bf Z} \}}$.) The strategy is to then show that the dynamics from time ${-\Lambda}$ to time ${0}$ creates a convergence to local equilibrium, in which the zeroes ${x_k(t)}$ locally resemble an arithmetic progression at time ${t=0}$. This will be in contradiction with known results on pair correlation of zeroes (or on related statistics, such as the fluctuations on gaps between zeroes), such as the results of Montgomery (actually for technical reasons it is slightly more convenient for us to use related results of Conrey, Ghosh, Goldston, Gonek, and Heath-Brown). Another way of thinking about this is that even very slight deviations from local equilibrium (such as a small number of gaps that are slightly smaller than the average spacing) will almost immediately lead to zeroes colliding with each other and leaving the real line as one evolves backwards in time (i.e., under the forward heat flow). This is a refinement of the strategy used in previous lower bounds on ${\Lambda}$, in which “Lehmer pairs” (pairs of zeroes of the zeta function that were unusually close to each other) were used to limit the extent to which the evolution continued backwards in time while keeping all zeroes real.

How does one obtain this convergence to local equilibrium? We proceed by broad analogy with the “local relaxation flow” method of Erdos, Schlein, and Yau in random matrix theory, in which one combines some initial control on zeroes (which, in the case of the Erdos-Schlein-Yau method, is referred to with terms such as “local semicircular law”) with convexity properties of a relevant Hamiltonian that can be used to force the zeroes towards equilibrium.

We first discuss the initial control on zeroes. For ${H_0}$, we have the classical Riemann-von Mangoldt formula, which asserts that the number of zeroes in the interval ${[0,T]}$ is ${\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log T)}$ as ${T \rightarrow \infty}$. (We have a factor of ${4\pi}$ here instead of the more familiar ${2\pi}$ due to the way ${H_0}$ is normalised.) This implies for instance that for a fixed ${\alpha}$, the number of zeroes in the interval ${[T, T+\alpha]}$ is ${\frac{\alpha}{4\pi} \log T + O(\log T)}$. Actually, because we get to assume the Riemann hypothesis, we can sharpen this to ${\frac{\alpha}{4\pi} \log T + o(\log T)}$, a result of Littlewood (see this previous blog post for a proof). Ideally, we would like to obtain similar control for the other ${H_t}$, ${\Lambda \leq t < 0}$, as well. Unfortunately we were only able to obtain the weaker claims that the number of zeroes of ${H_t}$ in ${[0,T]}$ is ${\frac{T}{4\pi} \log \frac{T}{4\pi} - \frac{T}{4\pi} + O(\log^2 T)}$, and that the number of zeroes in ${[T, T+\alpha \log T]}$ is ${\frac{\alpha}{4 \pi} \log^2 T + o(\log^2 T)}$, that is to say we only get good control on the distribution of zeroes at scales ${\gg \log T}$ rather than at scales ${\gg 1}$. Ultimately this is because we were only able to get control (and in particular, lower bounds) on ${|H_t(x-iy)|}$ with high precision when ${y \gg \log x}$ (whereas ${|H_0(x-iy)|}$ has good estimates as soon as ${y}$ is larger than (say) ${2}$). This control is obtained by the expressing ${H_t(x-iy)}$ in terms of some contour integrals and using the method of steepest descent (actually it is slightly simpler to rely instead on the Stirling approximation for the Gamma function, which can be proven in turn by steepest descent methods). Fortunately, it turns out that this weaker control is still (barely) enough for the rest of our argument to go through.

Once one has the initial control on zeroes, we now need to force convergence to local equilibrium by exploiting convexity of a Hamiltonian. Here, the relevant Hamiltonian is

$\displaystyle H(t) := \sum_{j,k: j \neq k} \log \frac{1}{|x_j(t) - x_k(t)|},$

ignoring for now the rather important technical issue that this sum is not actually absolutely convergent. (Because of this, we will need to truncate and renormalise the Hamiltonian in a number of ways which we will not detail here.) The ODE (3) is formally the gradient flow for this Hamiltonian. Furthermore, this Hamiltonian is a convex function of the ${x_j}$ (because ${t \mapsto \log \frac{1}{t}}$ is a convex function on ${(0,+\infty)}$). We therefore expect the Hamiltonian to be a decreasing function of time, and that the derivative should be an increasing function of time. As time passes, the derivative of the Hamiltonian would then be expected to converge to zero, which should imply convergence to local equilibrium.

Formally, the derivative of the above Hamiltonian is

$\displaystyle \partial_t H(t) = -4 E(t), \ \ \ \ \ (4)$

where ${E(t)}$ is the “energy”

$\displaystyle E(t) := \sum_{j,k: j \neq k} \frac{1}{|x_j(t) - x_k(t)|^2}.$

Again, there is the important technical issue that this quantity is infinite; but it turns out that if we renormalise the Hamiltonian appropriately, then the energy will also become suitably renormalised, and in particular will vanish when the ${x_j}$ are arranged in an arithmetic progression, and be positive otherwise. One can also formally calculate the derivative of ${E(t)}$ to be a somewhat complicated but manifestly non-negative quantity (a sum of squares); see this previous blog post for analogous computations in the case of heat flow on polynomials. After flowing from time ${\Lambda}$ to time ${0}$, and using some crude initial bounds on ${H(t)}$ and ${E(t)}$ in this region (coming from the Riemann-von Mangoldt type formulae mentioned above and some further manipulations), we can eventually show that the (renormalisation of the) energy ${E(0)}$ at time zero is small, which forces the ${x_j}$ to locally resemble an arithmetic progression, which gives the required convergence to local equilibrium.

There are a number of technicalities involved in making the above sketch of argument rigorous (for instance, justifying interchanges of derivatives and infinite sums turns out to be a little bit delicate). I will highlight here one particular technical point. One of the ways in which we make expressions such as the energy ${E(t)}$ finite is to truncate the indices ${j,k}$ to an interval ${I}$ to create a truncated energy ${E_I(t)}$. In typical situations, we would then expect ${E_I(t)}$ to be decreasing, which will greatly help in bounding ${E_I(0)}$ (in particular it would allow one to control ${E_I(0)}$ by time-averaged quantities such as ${\int_{\Lambda/2}^0 E_I(t)\ dt}$, which can in turn be controlled using variants of (4)). However, there are boundary effects at both ends of ${I}$ that could in principle add a large amount of energy into ${E_I}$, which is bad news as it could conceivably make ${E_I(0)}$ undesirably large even if integrated energies such as ${\int_{\Lambda/2}^0 E_I(t)\ dt}$ remain adequately controlled. As it turns out, such boundary effects are negligible as long as there is a large gap between adjacent zeroes at boundary of ${I}$ – it is only narrow gaps that can rapidly transmit energy across the boundary of ${I}$. Now, narrow gaps can certainly exist (indeed, the GUE hypothesis predicts these happen a positive fraction of the time); but the pigeonhole principle (together with the Riemann-von Mangoldt formula) can allow us to pick the endpoints of the interval ${I}$ so that no narrow gaps appear at the boundary of ${I}$ for any given time ${t}$. However, there was a technical problem: this argument did not allow one to find a single interval ${I}$ that avoided gaps for all times ${\Lambda/2 \leq t \leq 0}$ simultaneously – the pigeonhole principle could produce a different interval ${I}$ for each time ${t}$! Since the number of times was uncountable, this was a serious issue. (In physical terms, the problem was that there might be very fast “longitudinal waves” in the dynamics that, at each time, cause some gaps between zeroes to be highly compressed, but the specific gap that was narrow changed very rapidly with time. Such waves could, in principle, import a huge amount of energy into ${E_I}$ by time ${0}$.) To resolve this, we borrowed a PDE trick of Bourgain’s, in which the pigeonhole principle was coupled with local conservation laws. More specifically, we use the phenomenon that very narrow gaps ${g_i = x_{i+1}-x_i}$ take a nontrivial amount of time to expand back to a reasonable size (this can be seen by comparing the evolution of this gap with solutions of the scalar ODE ${\partial_t g = \frac{4}{g^2}}$, which represents the fastest at which a gap such as ${g_i}$ can expand). Thus, if a gap ${g_i}$ is reasonably large at some time ${t_0}$, it will also stay reasonably large at slightly earlier times ${t \in [t_0-\delta, t_0]}$ for some moderately small ${\delta>0}$. This lets one locate an interval ${I}$ that has manageable boundary effects during the times in ${[t_0-\delta, t_0]}$, so in particular ${E_I}$ is basically non-increasing in this time interval. Unfortunately, this interval is a little bit too short to cover all of ${[\Lambda/2,0]}$; however it turns out that one can iterate the above construction and find a nested sequence of intervals ${I_k}$, with each ${E_{I_k}}$ non-increasing in a different time interval ${[t_k - \delta, t_k]}$, and with all of the time intervals covering ${[\Lambda/2,0]}$. This turns out to be enough (together with the obvious fact that ${E_I}$ is monotone in ${I}$) to still control ${E_I(0)}$ for some reasonably sized interval ${I}$, as required for the rest of the arguments.

ADDED LATER: the following analogy (involving functions with just two zeroes, rather than an infinite number of zeroes) may help clarify the relation between this result and the Riemann hypothesis (and in particular why this result does not make the Riemann hypothesis any easier to prove, in fact it confirms the delicate nature of that hypothesis). Suppose one had a quadratic polynomial ${P}$ of the form ${P(z) = z^2 + \Lambda}$, where ${\Lambda}$ was an unknown real constant. Suppose that one was for some reason interested in the analogue of the “Riemann hypothesis” for ${P}$, namely that all the zeroes of ${P}$ are real. A priori, there are three scenarios:

• (Riemann hypothesis false) ${\Lambda > 0}$, and ${P}$ has zeroes ${\pm i |\Lambda|^{1/2}}$ off the real axis.
• (Riemann hypothesis true, but barely so) ${\Lambda = 0}$, and both zeroes of ${P}$ are on the real axis; however, any slight perturbation of ${\Lambda}$ in the positive direction would move zeroes off the real axis.
• (Riemann hypothesis true, with room to spare) ${\Lambda < 0}$, and both zeroes of ${P}$ are on the real axis. Furthermore, any slight perturbation of ${P}$ will also have both zeroes on the real axis.

The analogue of our result in this case is that ${\Lambda \geq 0}$, thus ruling out the third of the three scenarios here. In this simple example in which only two zeroes are involved, one can think of the inequality ${\Lambda \geq 0}$ as asserting that if the zeroes of ${P}$ are real, then they must be repeated. In our result (in which there are an infinity of zeroes, that become increasingly dense near infinity), and in view of the convergence to local equilibrium properties of (3), the analogous assertion is that if the zeroes of ${H_0}$ are real, then they do not behave locally as if they were in arithmetic progression.

A major topic of interest of analytic number theory is the asymptotic behaviour of the Riemann zeta function ${\zeta}$ in the critical strip ${\{ \sigma+it: 0 < \sigma < 1; t \in {\bf R} \}}$ in the limit ${t \rightarrow +\infty}$. For the purposes of this set of notes, it is a little simpler technically to work with the log-magnitude ${\log |\zeta|: {\bf C} \rightarrow [-\infty,+\infty]}$ of the zeta function. (In principle, one can reconstruct a branch of ${\log \zeta}$, and hence ${\zeta}$ itself, from ${\log |\zeta|}$ using the Cauchy-Riemann equations, or tools such as the Borel-Carathéodory theorem, see Exercise 40 of Supplement 2.)

One has the classical estimate

$\displaystyle \zeta(\sigma+it) = O( t^{O(1)} )$

when ${\sigma = O(1)}$ and ${t \geq 10}$ (say), so that

$\displaystyle \log |\zeta(\sigma+it)| \leq O( \log t ). \ \ \ \ \ (1)$

(See e.g. Exercise 37 from Supplement 3.) In view of this, let us define the normalised log-magnitudes ${F_T: {\bf C} \rightarrow [-\infty,+\infty]}$ for any ${T \geq 10}$ by the formula

$\displaystyle F_T( \sigma + it ) := \frac{1}{\log T} \log |\zeta( \sigma + i(T + t) )|;$

informally, this is a normalised window into ${\log |\zeta|}$ near ${iT}$. One can rephrase several assertions about the zeta function in terms of the asymptotic behaviour of ${F_T}$. For instance:

• (i) The bound (1) implies that ${F_T}$ is asymptotically locally bounded from above in the limit ${T \rightarrow \infty}$, thus for any compact set ${K \subset {\bf C}}$ we have ${F_T(\sigma+it) \leq O_K(1)}$ for ${\sigma+it \in K}$ and ${T}$ sufficiently large. In fact the implied constant in ${K}$ only depends on the projection of ${K}$ to the real axis.
• (ii) For ${\sigma > 1}$, we have the bounds

$\displaystyle |\zeta(\sigma+it)|, \frac{1}{|\zeta(\sigma+it)|} \leq \zeta(\sigma)$

which implies that ${F_T}$ converges locally uniformly as ${T \rightarrow +\infty}$ to zero in the region ${\{ \sigma+it: \sigma > 1, t \in {\bf R} \}}$.

• (iii) The functional equation, together with the symmetry ${\zeta(\sigma-it) = \overline{\zeta(\sigma+it)}}$, implies that

$\displaystyle |\zeta(\sigma+it)| = 2^\sigma \pi^{\sigma-1} |\sin \frac{\pi(\sigma+it)}{2}| |\Gamma(1-\sigma-it)| |\zeta(1-\sigma+it)|$

which by Exercise 17 of Supplement 3 shows that

$\displaystyle F_T( 1-\sigma+it ) = \frac{1}{2}-\sigma + F_T(\sigma+it) + o(1)$

as ${T \rightarrow \infty}$, locally uniformly in ${\sigma+it}$. In particular, when combined with the previous item, we see that ${F_T(\sigma+it)}$ converges locally uniformly as ${T \rightarrow +\infty}$ to ${\frac{1}{2}-\sigma}$ in the region ${\{ \sigma+it: \sigma < 0, t \in {\bf R}\}}$.

• (iv) From Jensen’s formula (Theorem 16 of Supplement 2) we see that ${\log|\zeta|}$ is a subharmonic function, and thus ${F_T}$ is subharmonic as well. In particular we have the mean value inequality

$\displaystyle F_T( z_0 ) \leq \frac{1}{\pi r^2} \int_{z: |z-z_0| \leq r} F_T(z)$

for any disk ${\{ z: |z-z_0| \leq r \}}$, where the integral is with respect to area measure. From this and (ii) we conclude that

$\displaystyle \int_{z: |z-z_0| \leq r} F_T(z) \geq O_{z_0,r}(1)$

for any disk with ${\hbox{Re}(z_0)>1}$ and sufficiently large ${T}$; combining this with (i) we conclude that ${F_T}$ is asymptotically locally bounded in ${L^1}$ in the limit ${T \rightarrow \infty}$, thus for any compact set ${K \subset {\bf C}}$ we have ${\int_K |F_T| \ll_K 1}$ for sufficiently large ${T}$.

From (iv) and the usual Arzela-Ascoli diagonalisation argument, we see that the ${F_T}$ are asymptotically compact in the topology of distributions: given any sequence ${T_n}$ tending to ${+\infty}$, one can extract a subsequence such that the ${F_T}$ converge in the sense of distributions. Let us then define a normalised limit profile of ${\log|\zeta|}$ to be a distributional limit ${F}$ of a sequence of ${F_T}$; they are analogous to limiting profiles in PDE, and also to the more recent introduction of “graphons” in the theory of graph limits. Then by taking limits in (i)-(iv) we can say a lot about such normalised limit profiles ${F}$ (up to almost everywhere equivalence, which is an issue we will address shortly):

• (i) ${F}$ is bounded from above in the critical strip ${\{ \sigma+it: 0 \leq \sigma \leq 1 \}}$.
• (ii) ${F}$ vanishes on ${\{ \sigma+it: \sigma \geq 1\}}$.
• (iii) We have the functional equation ${F(1-\sigma+it) = \frac{1}{2}-\sigma + F(\sigma+it)}$ for all ${\sigma+it}$. In particular ${F(\sigma+it) = \frac{1}{2}-\sigma}$ for ${\sigma<0}$.
• (iv) ${F}$ is subharmonic.

Unfortunately, (i)-(iv) fail to characterise ${F}$ completely. For instance, one could have ${F(\sigma+it) = f(\sigma)}$ for any convex function ${f(\sigma)}$ of ${\sigma}$ that equals ${0}$ for ${\sigma \geq 1}$, ${\frac{1}{2}-\sigma}$ for ${\sigma \leq 1}$, and obeys the functional equation ${f(1-\sigma) = \frac{1}{2}-\sigma+f(\sigma)}$, and this would be consistent with (i)-(iv). One can also perturb such examples in a region where ${f}$ is strictly convex to create further examples of functions obeying (i)-(iv). Note from subharmonicity that the function ${\sigma \mapsto \sup_t F(\sigma+it)}$ is always going to be convex in ${\sigma}$; this can be seen as a limiting case of the Hadamard three-lines theorem (Exercise 41 of Supplement 2).

We pause to address one minor technicality. We have defined ${F}$ as a distributional limit, and as such it is a priori only defined up to almost everywhere equivalence. However, due to subharmonicity, there is a unique upper semi-continuous representative of ${F}$ (taking values in ${[-\infty,+\infty)}$), defined by the formula

$\displaystyle F(z_0) = \lim_{r \rightarrow 0^+} \frac{1}{\pi r^2} \int_{B(z_0,r)} F(z)\ dz$

for any ${z_0 \in {\bf C}}$ (note from subharmonicity that the expression in the limit is monotone nonincreasing as ${r \rightarrow 0}$, and is also continuous in ${z_0}$). We will now view this upper semi-continuous representative of ${F}$ as the canonical representative of ${F}$, so that ${F}$ is now defined everywhere, rather than up to almost everywhere equivalence.

By a classical theorem of Riesz, a function ${F}$ is subharmonic if and only if the distribution ${-\Delta F}$ is a non-negative measure, where ${\Delta := \frac{\partial^2}{\partial \sigma^2} + \frac{\partial^2}{\partial t^2}}$ is the Laplacian in the ${\sigma,t}$ coordinates. Jensen’s formula (or Greens’ theorem), when interpreted distributionally, tells us that

$\displaystyle -\Delta \log |\zeta| = \frac{1}{2\pi} \sum_\rho \delta_\rho$

away from the real axis, where ${\rho}$ ranges over the non-trivial zeroes of ${\zeta}$. Thus, if ${F}$ is a normalised limit profile for ${\log |\zeta|}$ that is the distributional limit of ${F_{T_n}}$, then we have

$\displaystyle -\Delta F = \nu$

where ${\nu}$ is a non-negative measure which is the limit in the vague topology of the measures

$\displaystyle \nu_{T_n} := \frac{1}{2\pi \log T_n} \sum_\rho \delta_{\rho - T_n}.$

Thus ${\nu}$ is a normalised limit profile of the zeroes of the Riemann zeta function.

Using this machinery, we can recover many classical theorems about the Riemann zeta function by “soft” arguments that do not require extensive calculation. Here are some examples:

Theorem 1 The Riemann hypothesis implies the Lindelöf hypothesis.

Proof: It suffices to show that any limiting profile ${F}$ (arising as the limit of some ${F_{T_n}}$) vanishes on the critical line ${\{1/2+it: t \in {\bf R}\}}$. But if the Riemann hypothesis holds, then the measures ${\nu_{T_n}}$ are supported on the critical line ${\{1/2+it: t \in {\bf R}\}}$, so the normalised limit profile ${\nu}$ is also supported on this line. This implies that ${F}$ is harmonic outside of the critical line. By (ii) and unique continuation for harmonic functions, this implies that ${F}$ vanishes on the half-space ${\{ \sigma+it: \sigma \geq \frac{1}{2} \}}$ (and equals ${\frac{1}{2}-\sigma}$ on the complementary half-space, by (iii)), giving the claim. $\Box$

In fact, we have the following sharper statement:

Theorem 2 (Backlund) The Lindelöf hypothesis is equivalent to the assertion that for any fixed ${\sigma_0 > \frac{1}{2}}$, the number of zeroes in the region ${\{ \sigma+it: \sigma > \sigma_0, T \leq t \leq T+1 \}}$ is ${o(\log T)}$ as ${T \rightarrow \infty}$.

Proof: If the latter claim holds, then for any ${T_n \rightarrow \infty}$, the measures ${\nu_{T_n}}$ assign a mass of ${o(1)}$ to any region of the form ${\{ \sigma+it: \sigma > \sigma_0; t_0 \leq t \leq t_0+1 \}}$ as ${n \rightarrow \infty}$ for any fixed ${\sigma_0>\frac{1}{2}}$ and ${t_0 \in {\bf R}}$. Thus the normalised limiting profile measure ${\nu}$ is supported on the critical line, and we can repeat the previous argument.

Conversely, suppose the claim fails, then we can find a sequence ${T_n}$ and ${\sigma_0>0}$ such that ${\nu_{T_n}}$ assigns a mass of ${\gg 1}$ to the region ${\{ \sigma+it: \sigma > \sigma_0; 0\leq t \leq 1 \}}$. Extracting a normalised limiting profile, we conclude that the normalised limiting profile measure ${\nu}$ is non-trivial somewhere to the right of the critical line, so the associated subharmonic function ${F}$ is not harmonic everywhere to the right of the critical line. From the maximum principle and (ii) this implies that ${F}$ has to be positive somewhere on the critical line, but this contradicts the Lindelöf hypothesis. (One has to take a bit of care in the last step since ${F_{T_n}}$ only converges to ${F}$ in the sense of distributions, but it turns out that the subharmonicity of all the functions involved gives enough regularity to justify the argument; we omit the details here.) $\Box$

Theorem 3 (Littlewood) Assume the Lindelöf hypothesis. Then for any fixed ${\alpha>0}$, the number of zeroes in the region ${\{ \sigma+it: T \leq t \leq T+\alpha \}}$ is ${(2\pi \alpha+o(1)) \log T}$ as ${T \rightarrow +\infty}$.

Proof: By the previous arguments, the only possible normalised limiting profile for ${\log |\zeta|}$ is ${\max( 0, \frac{1}{2}-\sigma )}$. Taking distributional Laplacians, we see that the only possible normalised limiting profile for the zeroes is Lebesgue measure on the critical line. Thus, ${\nu_T( \{\sigma+it: T \leq t \leq T+\alpha \} )}$ can only converge to ${\alpha}$ as ${T \rightarrow +\infty}$, and the claim follows. $\Box$

Even without the Lindelöf hypothesis, we have the following result:

Theorem 4 (Titchmarsh) For any fixed ${\alpha>0}$, there are ${\gg_\alpha \log T}$ zeroes in the region ${\{ \sigma+it: T \leq t \leq T+\alpha \}}$ for sufficiently large ${T}$.

Among other things, this theorem recovers a classical result of Littlewood that the gaps between the imaginary parts of the zeroes goes to zero, even without assuming unproven conjectures such as the Riemann or Lindelöf hypotheses.

Proof: Suppose for contradiction that this were not the case, then we can find ${\alpha > 0}$ and a sequence ${T_n \rightarrow \infty}$ such that ${\{ \sigma+it: T_n \leq t \leq T_n+\alpha \}}$ contains ${o(\log T)}$ zeroes. Passing to a subsequence to extract a limit profile, we conclude that the normalised limit profile measure ${\nu}$ assigns no mass to the horizontal strip ${\{ \sigma+it: 0 \leq t \leq\alpha \}}$. Thus the associated subharmonic function ${F}$ is actually harmonic on this strip. But by (ii) and unique continuation this forces ${F}$ to vanish on this strip, contradicting the functional equation (iii). $\Box$

Exercise 5 Use limiting profiles to obtain the matching upper bound of ${O_\alpha(\log T)}$ for the number of zeroes in ${\{ \sigma+it: T \leq t \leq T+\alpha \}}$ for sufficiently large ${T}$.

Remark 6 One can remove the need to take limiting profiles in the above arguments if one can come up with quantitative (or “hard”) substitutes for qualitative (or “soft”) results such as the unique continuation property for harmonic functions. This would also allow one to replace the qualitative decay rates ${o(1)}$ with more quantitative decay rates such as ${1/\log \log T}$ or ${1/\log\log\log T}$. Indeed, the classical proofs of the above theorems come with quantitative bounds that are typically of this form (see e.g. the text of Titchmarsh for details).

Exercise 7 Let ${S(T)}$ denote the quantity ${S(T) := \frac{1}{\pi} \hbox{arg} \zeta(\frac{1}{2}+iT)}$, where the branch of the argument is taken by using a line segment connecting ${\frac{1}{2}+iT}$ to (say) ${2+iT}$, and then to ${2}$. If we have a sequence ${T_n \rightarrow \infty}$ producing normalised limit profiles ${F, \nu}$ for ${\log|\zeta|}$ and the zeroes respectively, show that ${t \mapsto \frac{1}{\log T_n} S(T_n + t)}$ converges in the sense of distributions to the function ${t \mapsto \frac{1}{\pi} \int_{1/2}^1 \frac{\partial F}{\partial t}(\sigma+it)\ d\sigma}$, or equivalently

$\displaystyle t \mapsto \frac{1}{2\pi} \frac{\partial}{\partial t} \int_0^1 F(\sigma+it)\ d\sigma.$

Conclude in particular that if the Lindelöf hypothesis holds, then ${S(T) = o(\log T)}$ as ${T \rightarrow \infty}$.

A little bit more about the normalised limit profiles ${F}$ are known unconditionally, beyond (i)-(iv). For instance, from Exercise 3 of Notes 5 we have ${\zeta(1/2 + it ) = O( t^{1/6+o(1)} )}$ as ${t \rightarrow +\infty}$, which implies that any normalised limit profile ${F}$ for ${\log|\zeta|}$ is bounded by ${1/6}$ on the critical line, beating the bound of ${1/4}$ coming from convexity and (ii), (iii), and then convexity can be used to further bound ${F}$ away from the critical line also. Some further small improvements of this type are known (coming from various methods for estimating exponential sums), though they fall well short of determining ${F}$ completely at our current level of understanding. Of course, given that we believe the Riemann hypothesis (and hence the Lindelöf hypothesis) to be true, the only actual limit profile that should exist is ${\max(0,\frac{1}{2}-\sigma)}$ (in fact this assertion is equivalent to the Lindelöf hypothesis, by the arguments above).

Better control on limiting profiles is available if we do not insist on controlling ${\zeta}$ for all values of the height parameter ${T}$, but only for most such values, thanks to the existence of several mean value theorems for the zeta function, as discussed in Notes 6; we discuss this below the fold.

We return to the study of the Riemann zeta function ${\zeta(s)}$, focusing now on the task of upper bounding the size of this function within the critical strip; as seen in Exercise 43 of Notes 2, such upper bounds can lead to zero-free regions for ${\zeta}$, which in turn lead to improved estimates for the error term in the prime number theorem.

In equation (21) of Notes 2 we obtained the somewhat crude estimates

$\displaystyle \zeta(s) = \sum_{n \leq x} \frac{1}{n^s} - \frac{x^{1-s}}{1-s} + O( \frac{|s|}{\sigma} \frac{1}{x^\sigma} ) \ \ \ \ \ (1)$

for any ${x > 0}$ and ${s = \sigma+it}$ with ${\sigma>0}$ and ${s \neq 1}$. Setting ${x=1}$, we obtained the crude estimate

$\displaystyle \zeta(s) = \frac{1}{s-1} + O( \frac{|s|}{\sigma} )$

in this region. In particular, if ${0 < \varepsilon \leq \sigma \ll 1}$ and ${|t| \gg 1}$ then we had ${\zeta(s) = O_\varepsilon( |t| )}$. Using the functional equation and the Hadamard three lines lemma, we can improve this to ${\zeta(s) \ll_\varepsilon |t|^{\frac{1-\sigma}{2}+\varepsilon}}$; see Supplement 3.
Now we seek better upper bounds on ${\zeta}$. We will reduce the problem to that of bounding certain exponential sums, in the spirit of Exercise 34 of Supplement 3:

Proposition 1 Let ${s = \sigma+it}$ with ${0 < \varepsilon \leq \sigma \ll 1}$ and ${|t| \gg 1}$. Then

$\displaystyle \zeta(s) \ll_\varepsilon \log(2+|t|) \sup_{1 \leq M \leq N \ll |t|}$

$\displaystyle N^{1-\sigma} |\frac{1}{N} \sum_{N \leq n < N+M} e( -\frac{t}{2\pi} \log n)|$

where ${e(x) := e^{2\pi i x}}$.

Proof: We fix a smooth function ${\eta: {\bf R} \rightarrow {\bf C}}$ with ${\eta(t)=1}$ for ${t \leq -1}$ and ${\eta(t)=0}$ for ${t \geq 1}$, and allow implied constants to depend on ${\eta}$. Let ${s=\sigma+it}$ with ${\varepsilon \leq \sigma \ll 1}$. From Exercise 34 of Supplement 3, we have

$\displaystyle \zeta(s) = \sum_n \frac{1}{n^s} \eta( \log n - \log C|t| ) + O_\varepsilon( 1 )$

for some sufficiently large absolute constant ${C}$. By dyadic decomposition, we thus have

$\displaystyle \zeta(s) \ll_{\varepsilon} 1 + \log(2+|t|) \sup_{1 \leq N \ll |t|} |\sum_{N \leq n < 2N} \frac{1}{n^s} \eta( \log n - \log C|t| )|.$

We can absorb the first term in the second using the ${N=1}$ case of the supremum. Writing ${\frac{1}{n^s} \eta( \log n - \log|C| t ) = N^{-\sigma} e( - \frac{t}{2\pi} \log n ) F_N(n)}$, where

$\displaystyle F_N(n) := (N/n)^\sigma \eta(\log n - \log C|t| ),$

it thus suffices to show that

$\displaystyle \sum_{N \leq n < 2N} e(-\frac{t}{2\pi} \log N) F_N(n) \ll \sup_{1 \leq M \leq N} |\sum_{N \leq n < N+M} e(-\frac{t}{2\pi} \log n)|$

for each ${N}$. But from the fundamental theorem of calculus, the left-hand side can be written as

$\displaystyle F_N(2N) \sum_{N \leq n < 2N} e(-\frac{t}{2\pi} \log n)$

$\displaystyle - \int_0^{N} (\sum_{N \leq n < N+M} e(-\frac{t}{2\pi} \log n)) F'_N(M)\ dM$

and the claim then follows from the triangle inequality and a routine calculation. $\Box$
We are thus interested in getting good bounds on the sum ${\sum_{N \leq n < N+M} e( -\frac{t}{2\pi} \log n )}$. More generally, we consider normalised exponential sums of the form

$\displaystyle \frac{1}{N} \sum_{n \in I} e( f(n) ) \ \ \ \ \ (2)$

where ${I \subset {\bf R}}$ is an interval of length at most ${N}$ for some ${N \geq 1}$, and ${f: {\bf R} \rightarrow {\bf R}}$ is a smooth function. We will assume smoothness estimates of the form

$\displaystyle |f^{(j)}(x)| = \exp( O(j^2) ) \frac{T}{N^j} \ \ \ \ \ (3)$

for some ${T>0}$, all ${x \in I}$, and all ${j \geq 1}$, where ${f^{(j)}}$ is the ${j}$-fold derivative of ${f}$; in the case ${f(x) := -\frac{t}{2\pi} \log x}$, ${I \subset [N,2N]}$ of interest for the Riemann zeta function, we easily verify that these estimates hold with ${T := |t|}$. (One can consider exponential sums under more general hypotheses than (3), but the hypotheses here are adequate for our needs.) We do not bound the zeroth derivative ${f^{(0)}=f}$ of ${f}$ directly, but it would not be natural to do so in any event, since the magnitude of the sum (2) is unaffected if one adds an arbitrary constant to ${f(n)}$.
The trivial bound for (2) is

$\displaystyle \frac{1}{N} \sum_{n \in I} e(f(n)) \ll 1 \ \ \ \ \ (4)$

and we will seek to obtain significant improvements to this bound. Pseudorandomness heuristics predict a bound of ${O_\varepsilon(N^{-1/2+\varepsilon})}$ for (2) for any ${\varepsilon>0}$ if ${T = O(N^{O(1)})}$; this assertion (a special case of the exponent pair hypothesis) would have many consequences (for instance, inserting it into Proposition 1 soon yields the Lindelöf hypothesis), but is unfortunately quite far from resolution with known methods. However, we can obtain weaker gains of the form ${O(N^{1-c_K})}$ when ${T \ll N^K}$ and ${c_K > 0}$ depends on ${K}$. We present two such results here, which perform well for small and large values of ${K}$ respectively:

Theorem 2 Let ${2 \leq N \ll T}$, let ${I}$ be an interval of length at most ${N}$, and let ${f: I \rightarrow {\bf R}}$ be a smooth function obeying (3) for all ${j \geq 1}$ and ${x \in I}$.

• (i) (van der Corput estimate) For any natural number ${k \geq 2}$, one has

$\displaystyle \frac{1}{N} \sum_{n \in I} e( f(n) ) \ll (\frac{T}{N^k})^{\frac{1}{2^k-2}} \log^{1/2} (2+T). \ \ \ \ \ (5)$

• (ii) (Vinogradov estimate) If ${k}$ is a natural number and ${T \leq N^{k}}$, then

$\displaystyle \frac{1}{N} \sum_{n \in I} e( f(n) ) \ll N^{-c/k^2} \ \ \ \ \ (6)$

for some absolute constant ${c>0}$.

The factor of ${\log^{1/2} (2+T)}$ can be removed by a more careful argument, but we will not need to do so here as we are willing to lose powers of ${\log T}$. The estimate (6) is superior to (5) when ${T \sim N^K}$ for ${K}$ large, since (after optimising in ${k}$) (5) gives a gain of the form ${N^{-c/2^{cK}}}$ over the trivial bound, while (6) gives ${N^{-c/K^2}}$. We have not attempted to obtain completely optimal estimates here, settling for a relatively simple presentation that still gives good bounds on ${\zeta}$, and there are a wide variety of additional exponential sum estimates beyond the ones given here; see Chapter 8 of Iwaniec-Kowalski, or Chapters 3-4 of Montgomery, for further discussion.

We now briefly discuss the strategies of proof of Theorem 2. Both parts of the theorem proceed by treating ${f}$ like a polynomial of degree roughly ${k}$; in the case of (ii), this is done explicitly via Taylor expansion, whereas for (i) it is only at the level of analogy. Both parts of the theorem then try to “linearise” the phase to make it a linear function of the summands (actually in part (ii), it is necessary to introduce an additional variable and make the phase a bilinear function of the summands). The van der Corput estimate achieves this linearisation by squaring the exponential sum about ${k}$ times, which is why the gain is only exponentially small in ${k}$. The Vinogradov estimate achieves linearisation by raising the exponential sum to a significantly smaller power – on the order of ${k^2}$ – by using Hölder’s inequality in combination with the fact that the discrete curve ${\{ (n,n^2,\dots,n^k): n \in \{1,\dots,M\}\}}$ becomes roughly equidistributed in the box ${\{ (a_1,\dots,a_k): a_j = O( M^j ) \}}$ after taking the sumset of about ${k^2}$ copies of this curve. This latter fact has a precise formulation, known as the Vinogradov mean value theorem, and its proof is the most difficult part of the argument, relying on using a “${p}$-adic” version of this equidistribution to reduce the claim at a given scale ${M}$ to a smaller scale ${M/p}$ with ${p \sim M^{1/k}}$, and then proceeding by induction.

One can combine Theorem 2 with Proposition 1 to obtain various bounds on the Riemann zeta function:

Exercise 3 (Subconvexity bound)

• (i) Show that ${\zeta(\frac{1}{2}+it) \ll (1+|t|)^{1/6} \log^{O(1)}(2+|t|)}$ for all ${t \in {\bf R}}$. (Hint: use the ${k=3}$ case of the Van der Corput estimate.)
• (ii) For any ${0 < \sigma < 1}$, show that ${\zeta(\sigma+it) \ll (1+|t|)^{\max( \frac{1-\sigma}{3}, \frac{1}{2} - \frac{2\sigma}{3}) + o(1)}}$ as ${|t| \rightarrow \infty}$ (the decay rate in the ${o(1)}$ is allowed to depend on ${\sigma}$).

Exercise 4 Let ${t}$ be such that ${|t| \geq 100}$, and let ${\sigma \geq 1/2}$.

• (i) (Littlewood bound) Use the van der Corput estimate to show that ${\zeta(\sigma+it) \ll \log^{O(1)} |t|}$ whenever ${\sigma \geq 1 - O( \frac{(\log\log |t|)^2}{\log |t|} ))}$.
• (ii) (Vinogradov-Korobov bound) Use the Vinogradov estimate to show that ${\zeta(\sigma+it) \ll \log^{O(1)} |t|}$ whenever ${\sigma \geq 1 - O( \frac{(\log\log |t|)^{2/3}}{\log^{2/3} |t|} )}$.

As noted in Exercise 43 of Notes 2, the Vinogradov-Korobov bound leads to the zero-free region ${\{ \sigma+it: \sigma > 1 - c \frac{1}{(\log |t|)^{2/3} (\log\log |t|)^{1/3}}; |t| \geq 100 \}}$, which in turn leads to the prime number theorem with error term

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + O\left( x \exp\left( - c \frac{\log^{3/5} x}{(\log\log x)^{1/5}} \right) \right)$

for ${x > 100}$. If one uses the weaker Littlewood bound instead, one obtains the narrower zero-free region

$\displaystyle \{ \sigma+it: \sigma > 1 - c \frac{\log\log|t|}{\log |t|}; |t| \geq 100 \}$

(which is only slightly wider than the classical zero-free region) and an error term

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + O( x \exp( - c \sqrt{\log x \log\log x} ) )$

in the prime number theorem.

Exercise 5 (Vinogradov-Korobov in arithmetic progressions) Let ${\chi}$ be a non-principal character of modulus ${q}$.

• (i) (Vinogradov-Korobov bound) Use the Vinogradov estimate to show that ${L(\sigma+it,\chi) \ll \log^{O(1)}(q|t|)}$ whenever ${|t| \geq 100}$ and

$\displaystyle \sigma \geq 1 - O( \min( \frac{\log\log(q|t|)}{\log q}, \frac{(\log\log(q|t|))^{2/3}}{\log^{2/3} |t|} ) ).$

(Hint: use the Vinogradov estimate and a change of variables to control ${\sum_{n \in I: n = a\ (q)} \exp( -it \log n)}$ for various intervals ${I}$ of length at most ${N}$ and residue classes ${a\ (q)}$, in the regime ${N \geq q^2}$ (say). For ${N < q^2}$, do not try to capture any cancellation and just use the triangle inequality instead.)

• (ii) Obtain a zero-free region

$\displaystyle \{ \sigma+it: \sigma > 1 - c \min( \frac{1}{(\log |t|)^{2/3} (\log\log |t|)^{1/3}}, \frac{1}{\log q} );$

$\displaystyle |t| \geq 100 \}$

for ${L(s,\chi)}$, for some (effective) absolute constant ${c>0}$.

• (iii) Obtain the prime number theorem in arithmetic progressions with error term

$\displaystyle \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\phi(q)} + O\left( x \exp\left( - c_A \frac{\log^{3/5} x}{(\log\log x)^{1/5}} \right) \right)$

whenever ${x > 100}$, ${q \leq \log^A x}$, ${a\ (q)}$ is primitive, and ${c_A>0}$ depends (ineffectively) on ${A}$.

In Notes 2, the Riemann zeta function ${\zeta}$ (and more generally, the Dirichlet ${L}$-functions ${L(\cdot,\chi)}$) were extended meromorphically into the region ${\{ s: \hbox{Re}(s) > 0 \}}$ in and to the right of the critical strip. This is a sufficient amount of meromorphic continuation for many applications in analytic number theory, such as establishing the prime number theorem and its variants. The zeroes of the zeta function in the critical strip ${\{ s: 0 < \hbox{Re}(s) < 1 \}}$ are known as the non-trivial zeroes of ${\zeta}$, and thanks to the truncated explicit formulae developed in Notes 2, they control the asymptotic distribution of the primes (up to small errors).

The ${\zeta}$ function obeys the trivial functional equation

$\displaystyle \zeta(\overline{s}) = \overline{\zeta(s)} \ \ \ \ \ (1)$

for all ${s}$ in its domain of definition. Indeed, as ${\zeta(s)}$ is real-valued when ${s}$ is real, the function ${\zeta(s) - \overline{\zeta(\overline{s})}}$ vanishes on the real line and is also meromorphic, and hence vanishes everywhere. Similarly one has the functional equation

$\displaystyle \overline{L(s, \chi)} = L(\overline{s}, \overline{\chi}). \ \ \ \ \ (2)$

From these equations we see that the zeroes of the zeta function are symmetric across the real axis, and the zeroes of ${L(\cdot,\chi)}$ are the reflection of the zeroes of ${L(\cdot,\overline{\chi})}$ across this axis.
It is a remarkable fact that these functions obey an additional, and more non-trivial, functional equation, this time establishing a symmetry across the critical line ${\{ s: \hbox{Re}(s) = \frac{1}{2} \}}$ rather than the real axis. One consequence of this symmetry is that the zeta function and ${L}$-functions may be extended meromorphically to the entire complex plane. For the zeta function, the functional equation was discovered by Riemann, and reads as follows:

Theorem 1 (Functional equation for the Riemann zeta function) The Riemann zeta function ${\zeta}$ extends meromorphically to the entire complex plane, with a simple pole at ${s=1}$ and no other poles. Furthermore, one has the functional equation

$\displaystyle \zeta(s) = \alpha(s) \zeta(1-s) \ \ \ \ \ (3)$

or equivalently

$\displaystyle \zeta(1-s) = \alpha(1-s) \zeta(s) \ \ \ \ \ (4)$

for all complex ${s}$ other than ${s=0,1}$, where ${\alpha}$ is the function

$\displaystyle \alpha(s) := 2^s \pi^{s-1} \sin( \frac{\pi s}{2}) \Gamma(1-s). \ \ \ \ \ (5)$

Here ${\cos(z) := \frac{e^{iz} + e^{-iz}}{2}}$, ${\sin(z) := \frac{e^{-iz}-e^{-iz}}{2i}}$ are the complex-analytic extensions of the classical trigionometric functions ${\cos(x), \sin(x)}$, and ${\Gamma}$ is the Gamma function, whose definition and properties we review below the fold.

The functional equation can be placed in a more symmetric form as follows:

Corollary 2 (Functional equation for the Riemann xi function) The Riemann xi function

$\displaystyle \xi(s) := \frac{1}{2} s(s-1) \pi^{-s/2} \Gamma(\frac{s}{2}) \zeta(s) \ \ \ \ \ (6)$

is analytic on the entire complex plane ${{\bf C}}$ (after removing all removable singularities), and obeys the functional equations

$\displaystyle \xi(\overline{s}) = \overline{\xi(s)}$

and

$\displaystyle \xi(s) = \xi(1-s). \ \ \ \ \ (7)$

In particular, the zeroes of ${\xi}$ consist precisely of the non-trivial zeroes of ${\zeta}$, and are symmetric about both the real axis and the critical line. Also, ${\xi}$ is real-valued on the critical line and on the real axis.

Corollary 2 is an easy consequence of Theorem 1 together with the duplication theorem for the Gamma function, and the fact that ${\zeta}$ has no zeroes to the right of the critical strip, and is left as an exercise to the reader (Exercise 19). The functional equation in Theorem 1 has many proofs, but most of them are related in on way or another to the Poisson summation formula

$\displaystyle \sum_n f(n) = \sum_m \hat f(2\pi m) \ \ \ \ \ (8)$

(Theorem 34 from Supplement 2, at least in the case when ${f}$ is twice continuously differentiable and compactly supported), which can be viewed as a Fourier-analytic link between the coarse-scale distribution of the integers and the fine-scale distribution of the integers. Indeed, there is a quick heuristic proof of the functional equation that comes from formally applying the Poisson summation formula to the function ${1_{x>0} \frac{1}{x^s}}$, and noting that the functions ${x \mapsto \frac{1}{x^s}}$ and ${\xi \mapsto \frac{1}{\xi^{1-s}}}$ are formally Fourier transforms of each other, up to some Gamma function factors, as well as some trigonometric factors arising from the distinction between the real line and the half-line. Such a heuristic proof can indeed be made rigorous, and we do so below the fold, while also providing Riemann’s two classical proofs of the functional equation.
From the functional equation (and the poles of the Gamma function), one can see that ${\zeta}$ has trivial zeroes at the negative even integers ${-2,-4,-6,\dots}$, in addition to the non-trivial zeroes in the critical strip. More generally, the following table summarises the zeroes and poles of the various special functions appearing in the functional equation, after they have been meromorphically extended to the entire complex plane, and with zeroes classified as “non-trivial” or “trivial” depending on whether they lie in the critical strip or not. (Exponential functions such as ${2^{s-1}}$ or ${\pi^{-s}}$ have no zeroes or poles, and will be ignored in this table; the zeroes and poles of rational functions such as ${s(s-1)}$ are self-evident and will also not be displayed here.)

 Function Non-trivial zeroes Trivial zeroes Poles ${\zeta(s)}$ Yes ${-2,-4,-6,\dots}$ ${1}$ ${\zeta(1-s)}$ Yes ${3,5,\dots}$ ${0}$ ${\sin(\pi s/2)}$ No Even integers No ${\cos(\pi s/2)}$ No Odd integers No ${\sin(\pi s)}$ No Integers No ${\Gamma(s)}$ No No ${0,-1,-2,\dots}$ ${\Gamma(s/2)}$ No No ${0,-2,-4,\dots}$ ${\Gamma(1-s)}$ No No ${1,2,3,\dots}$ ${\Gamma((1-s)/2)}$ No No ${1,3,5,\dots}$ ${\xi(s)}$ Yes No No

Among other things, this table indicates that the Gamma and trigonometric factors in the functional equation are tied to the trivial zeroes and poles of zeta, but have no direct bearing on the distribution of the non-trivial zeroes, which is the most important feature of the zeta function for the purposes of analytic number theory, beyond the fact that they are symmetric about the real axis and critical line. In particular, the Riemann hypothesis is not going to be resolved just from further analysis of the Gamma function!

The zeta function computes the “global” sum ${\sum_n \frac{1}{n^s}}$, with ${n}$ ranging all the way from ${1}$ to infinity. However, by some Fourier-analytic (or complex-analytic) manipulation, it is possible to use the zeta function to also control more “localised” sums, such as ${\sum_n \frac{1}{n^s} \psi(\log n - \log N)}$ for some ${N \gg 1}$ and some smooth compactly supported function ${\psi: {\bf R} \rightarrow {\bf C}}$. It turns out that the functional equation (3) for the zeta function localises to this context, giving an approximate functional equation which roughly speaking takes the form

$\displaystyle \sum_n \frac{1}{n^s} \psi( \log n - \log N ) \approx \alpha(s) \sum_m \frac{1}{m^{1-s}} \psi( \log M - \log m )$

whenever ${s=\sigma+it}$ and ${NM = \frac{|t|}{2\pi}}$; see Theorem 39 below for a precise formulation of this equation. Unsurprisingly, this form of the functional equation is also very closely related to the Poisson summation formula (8), indeed it is essentially a special case of that formula (or more precisely, of the van der Corput ${B}$-process). This useful identity relates long smoothed sums of ${\frac{1}{n^s}}$ to short smoothed sums of ${\frac{1}{m^{1-s}}}$ (or vice versa), and can thus be used to shorten exponential sums involving terms such as ${\frac{1}{n^s}}$, which is useful when obtaining some of the more advanced estimates on the Riemann zeta function.
We will give two other basic uses of the functional equation. The first is to get a good count (as opposed to merely an upper bound) on the density of zeroes in the critical strip, establishing the Riemann-von Mangoldt formula that the number ${N(T)}$ of zeroes of imaginary part between ${0}$ and ${T}$ is ${\frac{T}{2\pi} \log \frac{T}{2\pi} - \frac{T}{2\pi} + O(\log T)}$ for large ${T}$. The other is to obtain untruncated versions of the explicit formula from Notes 2, giving a remarkable exact formula for sums involving the von Mangoldt function in terms of zeroes of the Riemann zeta function. These results are not strictly necessary for most of the material in the rest of the course, but certainly help to clarify the nature of the Riemann zeta function and its relation to the primes.

In view of the material in previous notes, it should not be surprising that there are analogues of all of the above theory for Dirichlet ${L}$-functions ${L(\cdot,\chi)}$. We will restrict attention to primitive characters ${\chi}$, since the ${L}$-function for imprimitive characters merely differs from the ${L}$-function of the associated primitive factor by a finite Euler product; indeed, if ${\chi = \chi' \chi_0}$ for some principal ${\chi_0}$ whose modulus ${q_0}$ is coprime to that of ${\chi'}$, then

$\displaystyle L(s,\chi) = L(s,\chi') \prod_{p|q_0} (1 - \frac{1}{p^s}) \ \ \ \ \ (9)$

(cf. equation (45) of Notes 2).
The main new feature is that the Poisson summation formula needs to be “twisted” by a Dirichlet character ${\chi}$, and this boils down to the problem of understanding the finite (additive) Fourier transform of a Dirichlet character. This is achieved by the classical theory of Gauss sums, which we review below the fold. There is one new wrinkle; the value of ${\chi(-1) \in \{-1,+1\}}$ plays a role in the functional equation. More precisely, we have

Theorem 3 (Functional equation for ${L}$-functions) Let ${\chi}$ be a primitive character of modulus ${q}$ with ${q>1}$. Then ${L(s,\chi)}$ extends to an entire function on the complex plane, with

$\displaystyle L(s,\chi) = \varepsilon(\chi) 2^s \pi^{s-1} q^{1/2-s} \sin(\frac{\pi}{2}(s+\kappa)) \Gamma(1-s) L(1-s,\overline{\chi})$

or equivalently

$\displaystyle L(1-s,\overline{\chi}) = \varepsilon(\overline{\chi}) 2^{1-s} \pi^{-s} q^{s-1/2} \sin(\frac{\pi}{2}(1-s+\kappa)) \Gamma(s) L(s,\chi)$

for all ${s}$, where ${\kappa}$ is equal to ${0}$ in the even case ${\chi(-1)=+1}$ and ${1}$ in the odd case ${\chi(-1)=-1}$, and

$\displaystyle \varepsilon(\chi) := \frac{\tau(\chi)}{i^\kappa \sqrt{q}} \ \ \ \ \ (10)$

where ${\tau(\chi)}$ is the Gauss sum

$\displaystyle \tau(\chi) := \sum_{n \in {\bf Z}/q{\bf Z}} \chi(n) e(n/q). \ \ \ \ \ (11)$

and ${e(x) := e^{2\pi ix}}$, with the convention that the ${q}$-periodic function ${n \mapsto e(n/q)}$ is also (by abuse of notation) applied to ${n}$ in the cyclic group ${{\bf Z}/q{\bf Z}}$.

From this functional equation and (2) we see that, as with the Riemann zeta function, the non-trivial zeroes of ${L(s,\chi)}$ (defined as the zeroes within the critical strip ${\{ s: 0 < \hbox{Re}(s) < 1 \}}$ are symmetric around the critical line (and, if ${\chi}$ is real, are also symmetric around the real axis). In addition, ${L(s,\chi)}$ acquires trivial zeroes at the negative even integers and at zero if ${\chi(-1)=1}$, and at the negative odd integers if ${\chi(-1)=-1}$. For imprimitive ${\chi}$, we see from (9) that ${L(s,\chi)}$ also acquires some additional trivial zeroes on the left edge of the critical strip.

There is also a symmetric version of this equation, analogous to Corollary 2:

Corollary 4 Let ${\chi,q,\varepsilon(\chi)}$ be as above, and set

$\displaystyle \xi(s,\chi) := (q/\pi)^{(s+\kappa)/2} \Gamma((s+\kappa)/2) L(s,\chi),$

then ${\xi(\cdot,\chi)}$ is entire with ${\xi(1-s,\overline{\chi}) = \varepsilon(\chi) \xi(s,\chi)}$.

For further detail on the functional equation and its implications, I recommend the classic text of Titchmarsh or the text of Davenport.

In Notes 1, we approached multiplicative number theory (the study of multiplicative functions ${f: {\bf N} \rightarrow {\bf C}}$ and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions ${\sum_{n \leq x} f(n)}$ and logarithmic sums ${\sum_{n \leq x} \frac{f(n)}{n}}$. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series ${{\mathcal D} f}$, defined (at least for ${s}$ of sufficiently large real part) by the formula

$\displaystyle {\mathcal D} f(s) := \sum_n \frac{f(n)}{n^s}.$

These series also made an appearance in the elementary approach to the subject, but only for real ${s}$ that were larger than ${1}$. But now we will exploit the freedom to extend the variable ${s}$ to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as ${\sum_{n\leq x} f(n)}$ or ${\sum_{n\leq x} \frac{f(n)}{n}}$ from control on the Dirichlet series. Crucially, for many key functions ${f}$ of number-theoretic interest, the Dirichlet series ${{\mathcal D} f}$ can be analytically (or at least meromorphically) continued to the left of the line ${\{ s: \hbox{Re}(s) = 1 \}}$. The zeroes and poles of the resulting meromorphic continuations of ${{\mathcal D} f}$ (and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of ${f}$; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function ${\zeta}$ are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function ${\Lambda}$ (and hence to the primes) of the form

$\displaystyle \Lambda(n) \approx 1 - \sum_\rho n^{\rho-1} \ \ \ \ \ (1)$

where the sum is over zeroes ${\rho}$ (counting multiplicity) of the Riemann zeta function ${\zeta = {\mathcal D} 1}$ (with the sum often restricted so that ${\rho}$ has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that

$\displaystyle \sum_n \Lambda(n) g(n) \approx \int_0^\infty g(y)\ dy - \sum_\rho \int_0^\infty g(y) y^{\rho-1}\ dy \ \ \ \ \ (2)$

for suitable “test functions” ${g}$ (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + o(x) \ \ \ \ \ (3)$

as ${x \rightarrow \infty}$, with the size of the error term ${o(x)}$ closely tied to the location of the zeroes ${\rho}$ of the Riemann zeta function.
The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation

$\displaystyle -\frac{\zeta'}{\zeta}(s) \approx \frac{1}{s-1} - \sum_\rho \frac{1}{s-\rho} \ \ \ \ \ (4)$

for the Dirichlet series ${{\mathcal D} \Lambda = -\frac{\zeta'}{\zeta}}$ of the von Mangoldt function; note that (4) is formally the special case of (2) when ${g(n) = n^{-s}}$. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.
More generally, one has an explicit formula

$\displaystyle \Lambda(n) \chi(n) \approx - \sum_\rho n^{\rho-1} \ \ \ \ \ (5)$

for any (non-principal) Dirichlet character ${\chi}$, where ${\rho}$ now ranges over the zeroes of the associated Dirichlet ${L}$-function ${L(s,\chi) := {\mathcal D} \chi(s)}$; we view this formula as a “twist” of (1) by the Dirichlet character ${\chi}$. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that

$\displaystyle \sum_{n \leq x: n = a\ (q)} \Lambda(n) = \frac{x}{\phi(q)} + o(x) \ \ \ \ \ (6)$

as ${x \rightarrow \infty}$, whenever ${a\ (q)}$ is a fixed primitive residue class. Again, the size of the error term ${o(x)}$ here is closely tied to the location of the zeroes of the Dirichlet ${L}$-function, with particular importance given to whether there is a zero very close to ${s=1}$ (such a zero is known as an exceptional zero or Siegel zero).
While any information on the behaviour of zeta functions or ${L}$-functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are

1. The region on or near the point ${s=1}$.
2. The region on or near the right edge ${\{ 1+it: t \in {\bf R} \}}$ of the critical strip ${\{ s: 0 \leq \hbox{Re}(s) \leq 1 \}}$.
3. The right half ${\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}}$ of the critical strip.
4. The region on or near the critical line ${\{ \frac{1}{2} + it: t \in {\bf R} \}}$ that bisects the critical strip.
5. Everywhere else.

For instance:

1. We will shortly show that the Riemann zeta function ${\zeta}$ has a simple pole at ${s=1}$ with residue ${1}$, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function ${\tau}$. For Dirichlet ${L}$-functions, the behaviour is instead controlled by the quantity ${L(1,\chi)}$ discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
2. The zeta function is also known to have no zeroes on the right edge ${\{1+it: t \in {\bf R}\}}$ of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for ${\zeta}$ into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for ${L}$-functions and the prime number theorem in arithmetic progressions.
3. The (as yet unproven) Riemann hypothesis prohibits ${\zeta}$ from having any zeroes within the right half ${\{ s: \frac{1}{2} < \hbox{Re}(s) < 1 \}}$ of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for ${L}$-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
4. Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for ${L}$-functions and primes in short arithmetic progressions.
5. The functional equation of the zeta function describes the behaviour of ${\zeta}$ to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for ${L}$-functions.

Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function ${\zeta(\sigma+it) = \sum_n n^{-\sigma-it}}$ and all of the ${L}$-functions ${L(\sigma+it,\chi) = \sum_n \chi(n) n^{-\sigma-it}}$ for various Dirichlet characters ${\chi}$ into a single object, viewing ${n \mapsto \chi(n) n^{-it}}$ as a general multiplicative character on the adeles; thus the imaginary coordinate ${t}$ and the Dirichlet character ${\chi}$ are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet ${L}$-functions. (The non-Archimedean character ${\chi(n)}$ and the Archimedean character ${n^{it}}$ behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)

Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.

As a consequence of this hierarchy of importance, information about the ${\zeta}$ function away from the critical strip, such as Euler’s identity

$\displaystyle \zeta(2) = \frac{\pi^2}{6}$

or equivalently

$\displaystyle 1 + \frac{1}{2^2} + \frac{1}{3^2} + \dots = \frac{\pi^2}{6}$

or the infamous identity

$\displaystyle \zeta(-1) = -\frac{1}{12},$

which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as

$\displaystyle 1 + 2 + 3 + \dots = -\frac{1}{12},$

are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity ${\zeta(2)}$ does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value ${L(1,\chi)}$ of an ${L}$-function at ${s=1}$ turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.
For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.

Mertens’ theorems are a set of classical estimates concerning the asymptotic distribution of the prime numbers:

Theorem 1 (Mertens’ theorems) In the asymptotic limit ${x \rightarrow \infty}$, we have

$\displaystyle \sum_{p\leq x} \frac{\log p}{p} = \log x + O(1), \ \ \ \ \ (1)$

$\displaystyle \sum_{p\leq x} \frac{1}{p} = \log \log x + O(1), \ \ \ \ \ (2)$

and

$\displaystyle \sum_{p\leq x} \log(1-\frac{1}{p}) = -\log \log x - \gamma + o(1) \ \ \ \ \ (3)$

where ${\gamma}$ is the Euler-Mascheroni constant, defined by requiring that

$\displaystyle 1 + \frac{1}{2} + \ldots + \frac{1}{n} = \log n + \gamma + o(1) \ \ \ \ \ (4)$

in the limit ${n \rightarrow \infty}$.

The third theorem (3) is usually stated in exponentiated form

$\displaystyle \prod_{p \leq x} (1-\frac{1}{p}) = \frac{e^{-\gamma}+o(1)}{\log x},$

but in the logarithmic form (3) we see that it is strictly stronger than (2), in view of the asymptotic ${\log(1-\frac{1}{p}) = -\frac{1}{p} + O(\frac{1}{p^2})}$.

Remarkably, these theorems can be proven without the assistance of the prime number theorem

$\displaystyle \sum_{p \leq x} 1 = \frac{x}{\log x} + o( \frac{x}{\log x} ),$

which was proven about two decades after Mertens’ work. (But one can certainly use versions of the prime number theorem with good error term, together with summation by parts, to obtain good estimates on the various errors in Mertens’ theorems.) Roughly speaking, the reason for this is that Mertens’ theorems only require control on the Riemann zeta function ${\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}}$ in the neighbourhood of the pole at ${s=1}$, whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line ${\{ 1+it: t \in {\bf R} \}}$. Specifically, Mertens’ theorem is ultimately deduced from the Euler product formula

$\displaystyle \zeta(s) = \prod_p (1-\frac{1}{p^s})^{-1}, \ \ \ \ \ (5)$

valid in the region ${\hbox{Re}(s) > 1}$ (which is ultimately a Fourier-Dirichlet transform of the fundamental theorem of arithmetic), and following crude asymptotics:

Proposition 2 (Simple pole) For ${s}$ sufficiently close to ${1}$ with ${\hbox{Re}(s) > 1}$, we have

$\displaystyle \zeta(s) = \frac{1}{s-1} + O(1) \ \ \ \ \ (6)$

and

$\displaystyle \zeta'(s) = \frac{-1}{(s-1)^2} + O(1).$

Proof: For ${s}$ as in the proposition, we have ${\frac{1}{n^s} = \frac{1}{t^s} + O(\frac{1}{n^2})}$ for any natural number ${n}$ and ${n \leq t \leq n+1}$, and hence

$\displaystyle \frac{1}{n^s} = \int_n^{n+1} \frac{1}{t^s}\ dt + O( \frac{1}{n^2} ).$

Summing in ${n}$ and using the identity ${\int_1^\infty \frac{1}{t^s}\ dt = \frac{1}{s-1}}$, we obtain the first claim. Similarly, we have

$\displaystyle \frac{-\log n}{n^s} = \int_n^{n+1} \frac{-\log t}{t^s}\ dt + O( \frac{\log n}{n^2} ),$

and by summing in ${n}$ and using the identity ${\int_1^\infty \frac{-\log t}{t^s}\ dt = \frac{-1}{(s-1)^2}}$ (the derivative of the previous identity) we obtain the claim. $\Box$

The first two of Mertens’ theorems (1), (2) are relatively easy to prove, and imply the third theorem (3) except with ${\gamma}$ replaced by an unspecified absolute constant. To get the specific constant ${\gamma}$ requires a little bit of additional effort. From (4), one might expect that the appearance of ${\gamma}$ arises from the refinement

$\displaystyle \zeta(s) = \frac{1}{s-1} + \gamma + O(|s-1|) \ \ \ \ \ (7)$

that one can obtain to (6). However, it turns out that the connection is not so much with the zeta function, but with the Gamma function, and specifically with the identity ${\Gamma'(1) = - \gamma}$ (which is of course related to (7) through the functional equation for zeta, but can be proven without any reference to zeta functions). More specifically, we have the following asymptotic for the exponential integral:

Proposition 3 (Exponential integral asymptotics) For sufficiently small ${\epsilon}$, one has

$\displaystyle \int_\epsilon^\infty \frac{e^{-t}}{t}\ dt = \log \frac{1}{\epsilon} - \gamma + O(\epsilon).$

A routine integration by parts shows that this asymptotic is equivalent to the identity

$\displaystyle \int_0^\infty e^{-t} \log t\ dt = -\gamma$

which is the identity ${\Gamma'(1)=-\gamma}$ mentioned previously.

Proof: We start by using the identity ${\frac{1}{i} = \int_0^1 x^{i-1}\ dx}$ to express the harmonic series ${H_n := 1+\frac{1}{2}+\ldots+\frac{1}{n}}$ as

$\displaystyle H_n = \int_0^1 1 + x + \ldots + x^{n-1}\ dx$

or on summing the geometric series

$\displaystyle H_n = \int_0^1 \frac{1-x^n}{1-x}\ dx.$

Since ${\int_0^{1-1/n} \frac{1}{1-x} = \log n}$, we thus have

$\displaystyle H_n - \log n = \int_0^1 \frac{1_{[1-1/n,1]}(x) - x^n}{1-x}\ dx;$

making the change of variables ${x = 1-\frac{t}{n}}$, this becomes

$\displaystyle H_n - \log n = \int_0^n \frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}\ dt.$

As ${n \rightarrow \infty}$, ${\frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}}$ converges pointwise to ${\frac{1_{[0,1]}(t) - e^{-t}}{t}}$ and is pointwise dominated by ${O( e^{-t} )}$. Taking limits as ${n \rightarrow \infty}$ using dominated convergence, we conclude that

$\displaystyle \gamma = \int_0^\infty \frac{1_{[0,1]}(t) - e^{-t}}{t}\ dt.$

or equivalently

$\displaystyle \int_0^\infty \frac{e^{-t} - 1_{[0,\epsilon]}(t)}{t}\ dt = \log \frac{1}{\epsilon} - \gamma.$

The claim then follows by bounding the ${\int_0^\epsilon}$ portion of the integral on the left-hand side. $\Box$

Below the fold I would like to record how Proposition 2 and Proposition 3 imply Theorem 1; the computations are utterly standard, and can be found in most analytic number theory texts, but I wanted to write them down for my own benefit (I always keep forgetting, in particular, how the third of Mertens’ theorems is proven).