You are currently browsing the tag archive for the ‘Riemann zeta function’ tag.
Previous set of notes: Notes 3. Next set of notes: 246C Notes 1.
One of the great classical triumphs of complex analysis was in providing the first complete proof (by Hadamard and de la Vallée Poussin in 1896) of arguably the most important theorem in analytic number theory, the prime number theorem:
Theorem 1 (Prime number theorem) Letdenote the number of primes less than a given real number
. Then
(or in asymptotic notation,
as
).
(Actually, it turns out to be slightly more natural to replace the approximation in the prime number theorem by the logarithmic integral
, which happens to be a more precise approximation, but we will not stress this point here.)
The complex-analytic proof of this theorem hinges on the study of a key meromorphic function related to the prime numbers, the Riemann zeta function . Initially, it is only defined on the half-plane
:
Definition 2 (Riemann zeta function, preliminary definition) Letbe such that
. Then we define
Note that the series is locally uniformly convergent in the half-plane , so in particular
is holomorphic on this region. In previous notes we have already evaluated some special values of this function:
The Riemann zeta function has several remarkable properties, some of which we summarise here:
Theorem 3 (Basic properties of the Riemann zeta function)
- (i) (Euler product formula) For any
with
, we have
where the product is absolutely convergent (and locally uniform in
) and is over the prime numbers
.
- (ii) (Trivial zero-free region)
has no zeroes in the region
.
- (iii) (Meromorphic continuation)
has a unique meromorphic continuation to the complex plane (which by abuse of notation we also call
), with a simple pole at
and no other poles. Furthermore, the Riemann xi function
is an entire function of order
(after removing all singularities). The function
is an entire function of order one after removing the singularity at
.
- (iv) (Functional equation) After applying the meromorphic continuation from (iii), we have
for all
(excluding poles). Equivalently, we have
for all
. (The equivalence between the (5) and (6) is a routine consequence of the Euler reflection formula and the Legendre duplication formula, see Exercises 26 and 31 of Notes 1.)
Proof: We just prove (i) and (ii) for now, leaving (iii) and (iv) for later sections.
The claim (i) is an encoding of the fundamental theorem of arithmetic, which asserts that every natural number is uniquely representable as a product
over primes, where the
are natural numbers, all but finitely many of which are zero. Writing this representation as
, we see that
The claim (ii) is immediate from (i) since the Euler product is absolutely convergent and all terms are non-zero.
We remark that by sending to
in Theorem 3(i) we conclude that
The meromorphic continuation (iii) of the zeta function is initially surprising, but can be interpreted either as a manifestation of the extremely regular spacing of the natural numbers occurring in the sum (1), or as a consequence of various integral representations of
(or slight modifications thereof). We will focus in this set of notes on a particular representation of
as essentially the Mellin transform of the theta function
that briefly appeared in previous notes, and the functional equation (iv) can then be viewed as a consequence of the modularity of that theta function. This in turn was established using the Poisson summation formula, so one can view the functional equation as ultimately being a manifestation of Poisson summation. (For a direct proof of the functional equation via Poisson summation, see these notes.)
Henceforth we work with the meromorphic continuation of . The functional equation (iv), when combined with special values of
such as (2), gives some additional values of
outside of its initial domain
, most famously
From Theorem 3 and the non-vanishing nature of , we see that
has simple zeroes (known as trivial zeroes) at the negative even integers
, and all other zeroes (the non-trivial zeroes) inside the critical strip
. (The non-trivial zeroes are conjectured to all be simple, but this is hopelessly far from being proven at present.) As we shall see shortly, these latter zeroes turn out to be closely related to the distribution of the primes. The functional equation tells us that if
is a non-trivial zero then so is
; also, we have the identity
Conjecture 4 (Riemann hypothesis) All the non-trivial zeroes oflie on the critical line
.
This conjecture would have many implications in analytic number theory, particularly with regard to the distribution of the primes. Of course, it is far from proven at present, but the partial results we have towards this conjecture are still sufficient to establish results such as the prime number theorem.
Return now to the original region where . To take more advantage of the Euler product formula (3), we take complex logarithms to conclude that
The series and
that show up in the above formulae are examples of Dirichlet series, which are a convenient device to transform various sequences of arithmetic interest into holomorphic or meromorphic functions. Here are some more examples:
Exercise 5 (Standard Dirichlet series) Letbe a complex number with
.
- (i) Show that
.
- (ii) Show that
, where
is the divisor function of
(the number of divisors of
).
- (iii) Show that
, where
is the Möbius function, defined to equal
when
is the product of
distinct primes for some
, and
otherwise.
- (iv) Show that
, where
is the Liouville function, defined to equal
when
is the product of
(not necessarily distinct) primes for some
.
- (v) Show that
, where
is the holomorphic branch of the logarithm that is real for
, and with the convention that
vanishes for
.
- (vi) Use the fundamental theorem of arithmetic to show that the von Mangoldt function is the unique function
such that
for every positive integer
. Use this and (i) to provide an alternate proof of the identity (8). Thus we see that (8) is really just another encoding of the fundamental theorem of arithmetic.
Given the appearance of the von Mangoldt function , it is natural to reformulate the prime number theorem in terms of this function:
Theorem 6 (Prime number theorem, von Mangoldt form) One has(or in asymptotic notation,
as
).
Let us see how Theorem 6 implies Theorem 1. Firstly, for any , we can write
Exercise 7 Show that Theorem 1 conversely implies Theorem 6.
The alternate form (8) of the Euler product identity connects the primes (represented here via proxy by the von Mangoldt function) with the logarithmic derivative of the zeta function, and can be used as a starting point for describing further relationships between and the primes. Most famously, we shall see later in these notes that it leads to the remarkably precise Riemann-von Mangoldt explicit formula:
Theorem 8 (Riemann-von Mangoldt explicit formula) For any non-integer, we have
where
ranges over the non-trivial zeroes of
with imaginary part in
. Furthermore, the convergence of the limit is locally uniform in
.
Actually, it turns out that this formula is in some sense too precise; in applications it is often more convenient to work with smoothed variants of this formula in which the sum on the left-hand side is smoothed out, but the contribution of zeroes with large imaginary part is damped; see Exercise 22. Nevertheless, this formula clearly illustrates how the non-trivial zeroes of the zeta function influence the primes. Indeed, if one formally differentiates the above formula in
, one is led to the (quite nonrigorous) approximation
Comparing Theorem 8 with Theorem 6, it is natural to suspect that the key step in the proof of the latter is to establish the following slight but important extension of Theorem 3(ii), which can be viewed as a very small step towards the Riemann hypothesis:
Theorem 9 (Slight enlargement of zero-free region) There are no zeroes ofon the line
.
It is not quite immediate to see how Theorem 6 follows from Theorem 8 and Theorem 9, but we will demonstrate it below the fold.
Although Theorem 9 only seems like a slight improvement of Theorem 3(ii), proving it is surprisingly non-trivial. The basic idea is the following: if there was a zero at , then there would also be a different zero at
(note
cannot vanish due to the pole at
), and then the approximation (9) becomes
In fact, Theorem 9 is basically equivalent to the prime number theorem:
Exercise 10 For the purposes of this exercise, assume Theorem 6, but do not assume Theorem 9. For any non-zero real, show that
as
, where
denotes a quantity that goes to zero as
after being multiplied by
. Use this to derive Theorem 9.
This equivalence can help explain why the prime number theorem is remarkably non-trivial to prove, and why the Riemann zeta function has to be either explicitly or implicitly involved in the proof.
This post is only intended as the briefest of introduction to complex-analytic methods in analytic number theory; also, we have not chosen the shortest route to the prime number theorem, electing instead to travel in directions that particularly showcase the complex-analytic results introduced in this course. For some further discussion see this previous set of lecture notes, particularly Notes 2 and Supplement 3 (with much of the material in this post drawn from the latter).
In a recent post I discussed how the Riemann zeta function can be locally approximated by a polynomial, in the sense that for randomly chosen
one has an approximation
where grows slowly with
, and
is a polynomial of degree
. Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of
should all lie on the unit circle, and one should then be able to write
as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix
, which we normalise as
Here is some quantity depending on
. We view
as a random element of
; in the limit
, the GUE hypothesis is equivalent to
becoming equidistributed with respect to Haar measure on
(also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view
as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).
Taking logarithmic derivatives of (2), we have
and hence on taking logarithmic derivatives of (1) in the variable we (heuristically) have
Morally speaking, we have
so on comparing coefficients we expect to interpret the moments of
as a finite Dirichlet series:
To understand the distribution of in the unitary group
, it suffices to understand the distribution of the moments
where denotes averaging over
, and
. The GUE hypothesis asserts that in the limit
, these moments converge to their CUE counterparts
where is now drawn uniformly in
with respect to the CUE ensemble, and
denotes expectation with respect to that measure.
The moment (6) vanishes unless one has the homogeneity condition
This follows from the fact that for any phase ,
has the same distribution as
, where we use the number theory notation
.
In the case when the degree is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:
Proposition 1 (Low moments in CUE model) If
then the moment (6) vanishes unless
for all
, in which case it is equal to
Another way of viewing this proposition is that for distributed according to CUE, the random variables
are distributed like independent complex random variables of mean zero and variance
, as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of
, so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of
‘s and lets
go to infinity. (The paper of Diaconis and Shahshahani writes
in place of
, but I believe this to be a typo.)
Proof: Let be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence
Our starting point is Schur-Weyl duality. Namely, we consider the -dimensional complex vector space
This space has an action of the product group : the symmetric group
acts by permutation on the
tensor factors, while the general linear group
acts diagonally on the
factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition
where ranges over Young tableaux of size
with at most
rows,
is the
-irreducible unitary representation corresponding to
(which can be constructed for instance using Specht modules), and
is the
-irreducible polynomial representation corresponding with highest weight
.
Let be a permutation consisting of
cycles of length
(this is uniquely determined up to conjugation), and let
. The pair
then acts on
, with the action on basis elements
given by
The trace of this action can then be computed as
where is the
matrix coefficient of
. Breaking up into cycles and summing, this is just
But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity
where is the character on
associated to
, and
is the character on
associated to
. As is well known,
is just the Schur polynomial of weight
applied to the (algebraic, generalised) eigenvalues of
. We can specialise to unitary matrices to conclude that
and similarly
where consists of
cycles of length
for each
. On the other hand, the characters
are an orthonormal system on
with the CUE measure. Thus we can write the expectation (6) as
Now recall that ranges over all the Young tableaux of size
with at most
rows. But by (8) we have
, and so the condition of having
rows is redundant. Hence
now ranges over all Young tableaux of size
, which as is well known enumerates all the irreducible representations of
. One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if
,
are not conjugate, and is equal to
divided by the size of the conjugacy class of
(or equivalently, by the size of the centraliser of
) otherwise. But the latter expression is easily computed to be
, giving the claim.
Example 2 We illustrate the identity (11) when
,
. The Schur polynomials are given as
where
are the (generalised) eigenvalues of
, and the formula (11) in this case becomes
The functions
are orthonormal on
, so the three functions
are also, and their
norms are
,
, and
respectively, reflecting the size in
of the centralisers of the permutations
,
, and
respectively. If
is instead set to say
, then the
terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.
Example 3 Consider the moment
. For
, the above proposition shows us that this moment is equal to
. What happens for
? The formula (12) computes this moment as
where
is a cycle of length
in
, and
ranges over all Young tableaux with size
and at most
rows. The Murnaghan-Nakayama rule tells us that
vanishes unless
is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space
of vectors in
whose coordinates sum to zero), in which case it is equal to
(depending on the parity of the number of non-zero rows). As such we see that this moment is equal to
. Thus in general we have
Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges and
is not quite the range
but instead leaks into the adjacent range
. This issue can be addressed by working in a “weak" sense in which parameters such as
are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of
to the range
(it would be slightly better technically to use a smooth cutoff).
One can morally expand out (5) using (4) as
where ,
, and the integers
are in the ranges
for and
, and
for and
. Morally, the expectation here is negligible unless
in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then is significantly less than
, so the
multiplicative error in (15) becomes an additive error of
. On the other hand, because of the fundamental integrality gap – that the integers are always separated from each other by a distance of at least
– this forces the integers
,
to in fact be equal:
The von Mangoldt factors effectively restrict
to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces
, and
to be a permutation of
, which then forces
for all
._ For a given
, the number of possible
is then
, and the expectation in (14) is equal to
. Thus this expectation is morally
and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree case of pair correlations due to Montgomery, and the degree
case due to Hejhal.)
With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for , thus
for all . Montgomery showed this for (essentially) the range
(as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.
These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of . One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose
is some random polynomial depending on
of degree at most
. Let
denote the eigenvalues of
, and let
be a parameter. Observe from the pigeonhole principle that if the quantity
then the arcs cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than
(
times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than
times the mean angle separation. By judiciously choosing the coefficients of
as functions of the moments
, one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as
for sufficiently small , and this can be computed (in principle, at least) using (3) if the coefficients of
are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than
times the mean spacing and greater than
times the mean spacing infinitely often for certain
; the current records are
(due to Goldston and Turnage-Butterbaugh) and
(due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).
It would be of great interest if one could push the upper bound for the smallest gap below
. The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases
of the eigenvalues
of
are nasymptotically always non-zero integer multiples of
. The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of
); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which
is very close to zero, then
behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined
-function
will have a polynomial approximation which in our language looks like a scalar multiple of
, where
and
is a phase. The zeroes of this approximation lie on a coset of the
roots of unity; the polynomial
is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of
. Taking
then gives the claim.)
Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the Alternative Circular Unitary Ensemble (ACUE) which is indistinguishable in low moments in the sense that the expectation for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of
. This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)
To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group in terms of the distribution of the phases
of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure
where
is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity
where ,
denotes the dot product, and
is the “long word”, which implies that (20) is a trigonometric series with constant term
; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing
using the probability measure (20), and then generating
to be a random unitary matrix with eigenvalues
.
For the alternative distribution, we first draw on the discrete torus
(thus each
is a
root of unity) with probability density function
shift by a phase drawn uniformly at random, and then select
to be a random unitary matrix with eigenvalues
. Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form
for
. The diagonal contribution
gives the constant function
, which has total mass one. All of the other exponentials have a frequency
that is not a multiple of
, and hence will have mean zero on
. The claim follows.
From construction it is clear that the matrix drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of
. Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials
with
of size at most
and of equal size remain orthonormal with respect to the alternative measure. That is to say,
when have size equal to each other and at most
. In this case the phase
in the definition of
is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that
By Fourier decomposition, it then suffices to show that the trigonometric polynomial does not contain any components of the form
for some non-zero lattice vector
. But we have already observed that
is a linear combination of plane waves of the form
for
. Also, as is well known,
is a linear combination of plane waves
where
is majorised by
, and similarly
is a linear combination of plane waves
where
is majorised by
. So the product
is a linear combination of plane waves of the form
. But every coefficient of the vector
lies between
and
, and so cannot be of the form
for any non-zero lattice vector
, giving the claim.
Example 4 If
, then the distribution (21) assigns a probability of
to any pair
that is a permuted rotation of
, and a probability of
to any pair that is a permuted rotation of
. Thus, a matrix
drawn from the alternative distribution will be conjugate to a phase rotation of
with probability
, and to
with probability
.
A similar computation when
gives
conjugate to a phase rotation of
with probability
, to a phase rotation of
or its adjoint with probability of
each, and a phase rotation of
with probability
.
Remark 5 For large
it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of
; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.
In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then is periodic in
with period
, so from Proposition 1 for the alternative distribution one has
which differs from (13) for any . (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single
with
would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)
Remark 6 One can view the CUE as normalised Lebesgue measure on
(viewed as a smooth submanifold of
). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of
consisting of those unitary matrices whose phase spacings are non-zero integer multiples of
; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel
(or one can equivalently take
; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be
roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of
). In particular, the
-point correlation functions of ACUE (after this rotation) are precisely the restriction of the
-point correlation functions of CUE after normalisation, that is to say they are proportional to
.
Remark 7 One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics for
for some small even exponent
(almost always
or
) and some short Dirichlet polynomial
; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like
where
is now some random medium degree polynomial that depends on the unitary matrix
associated to
(and in applications will typically also contain some negative power of
to cancel the corresponding powers of
in
). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of
involve products of traces
of total order less than
, then in terms of the eigenvalue phases
,
is a linear combination of plane waves
where the frequencies
have coefficients of magnitude less than
. On the other hand, as each coefficient of
is an elementary symmetric function of the eigenvalues,
is a linear combination of plane waves
where the frequencies
have coefficients of magnitude at most
. Thus
is a linear combination of plane waves where the frequencies
have coefficients of magnitude less than
, and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus
by the previous arguments. In other words,
has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier
has degree close to or exceeding
, which corresponds to Dirichlet polynomials
of length close to or exceeding
, which is far beyond current technology for such moment estimates.
Remark 8 The GUE hypothesis for the zeta function asserts that the average
for any
and any test function
, where
is the Dyson sine kernel and
are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for
. The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):
This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)
A useful rule of thumb in complex analysis is that holomorphic functions behave like large degree polynomials
. This can be evidenced for instance at a “local” level by the Taylor series expansion for a complex analytic function in the disk, or at a “global” level by factorisation theorems such as the Weierstrass factorisation theorem (or the closely related Hadamard factorisation theorem). One can truncate these theorems in a variety of ways (e.g., Taylor’s theorem with remainder) to be able to approximate a holomorphic function by a polynomial on various domains.
In some cases it can be convenient instead to work with polynomials of another variable
such as
(or more generally
for a scaling parameter
). In the case of the Riemann zeta function, defined by meromorphic continuation of the formula
one ends up having the following heuristic approximation in the neighbourhood of a point on the critical line:
Heuristic 1 (Polynomial approximation) Let
be a height, let
be a “typical” element of
, and let
be an integer. Let
be the linear change of variables
for
and some polynomial
of degree
.
The requirement is necessary since the right-hand side is periodic with period
in the
variable (or period
in the
variable), whereas the zeta function is not expected to have any such periodicity, even approximately.
Let us give two non-rigorous justifications of this heuristic. Firstly, it is standard that inside the critical strip (with ) we have an approximate form
of (11). If we group the integers from
to
into
bins depending on what powers of
they lie between, we thus have
For with
and
we heuristically have
and so
where are the partial Dirichlet series
This gives the desired polynomial approximation.
A second non-rigorous justification is as follows. From factorisation theorems such as the Hadamard factorisation theorem we expect to have
where runs over the non-trivial zeroes of
, and there are some additional factors arising from the trivial zeroes and poles of
which we will ignore here; we will also completely ignore the issue of how to renormalise the product to make it converge properly. In the region
, the dominant contribution to this product (besides multiplicative constants) should arise from zeroes
that are also in this region. The Riemann-von Mangoldt formula suggests that for “typical”
one should have about
such zeroes. If one lets
be any enumeration of
zeroes closest to
, and then repeats this set of zeroes periodically by period
, one then expects to have an approximation of the form
again ignoring all issues of convergence. If one writes and
, then Euler’s famous product formula for sine basically gives
(here we are glossing over some technical issues regarding renormalisation of the infinite products, which can be dealt with by studying the asymptotics as ) and hence we expect
This again gives the desired polynomial approximation.
Below the fold we give a rigorous version of the second argument suitable for “microscale” analysis. More precisely, we will show
Theorem 2 Let
be an integer going sufficiently slowly to infinity. Let
go to zero sufficiently slowly depending on
. Let
be drawn uniformly at random from
. Then with probability
(in the limit
), and possibly after adjusting
by
, there exists a polynomial
of degree
and obeying the functional equation (9) below, such that
whenever
.
It should be possible to refine the arguments to extend this theorem to the mesoscale setting by letting be anything growing like
, and
anything growing like
; also we should be able to delete the need to adjust
by
. We have not attempted these optimisations here.
Many conjectures and arguments involving the Riemann zeta function can be heuristically translated into arguments involving the polynomials , which one can view as random degree
polynomials if
is interpreted as a random variable drawn uniformly at random from
. These can be viewed as providing a “toy model” for the theory of the Riemann zeta function, in which the complex analysis is simplified to the study of the zeroes and coefficients of this random polynomial (for instance, the role of the gamma function is now played by a monomial in
). This model also makes the zeta function theory more closely resemble the function field analogues of this theory (in which the analogue of the zeta function is also a polynomial (or a rational function) in some variable
, as per the Weil conjectures). The parameter
is at our disposal to choose, and reflects the scale
at which one wishes to study the zeta function. For “macroscopic” questions, at which one wishes to understand the zeta function at unit scales, it is natural to take
(or very slightly larger), while for “microscopic” questions one would take
close to
and only growing very slowly with
. For the intermediate “mesoscopic” scales one would take
somewhere between
and
. Unfortunately, the statistical properties of
are only understood well at a conjectural level at present; even if one assumes the Riemann hypothesis, our understanding of
is largely restricted to the computation of low moments (e.g., the second or fourth moments) of various linear statistics of
and related functions (e.g.,
,
, or
).
Let’s now heuristically explore the polynomial analogues of this theory in a bit more detail. The Riemann hypothesis basically corresponds to the assertion that all the zeroes of the polynomial
lie on the unit circle
(which, after the change of variables
, corresponds to
being real); in a similar vein, the GUE hypothesis corresponds to
having the asymptotic law of a random scalar
times the characteristic polynomial of a random unitary
matrix. Next, we consider what happens to the functional equation
where
A routine calculation involving Stirling’s formula reveals that
with ; one also has the closely related approximation
when . Since
, applying (5) with
and using the approximation (2) suggests a functional equation for
:
where is the polynomial
with all the coefficients replaced by their complex conjugate. Thus if we write
then the functional equation can be written as
We remark that if we use the heuristic (3) (interpreting the cutoffs in the summation in a suitably vague fashion) then this equation can be viewed as an instance of the Poisson summation formula.
Another consequence of the functional equation is that the zeroes of are symmetric with respect to inversion
across the unit circle. This is of course consistent with the Riemann hypothesis, but does not obviously imply it. The phase
is of little consequence in this functional equation; one could easily conceal it by working with the phase rotation
of
instead.
One consequence of the functional equation is that is real for any
; the same is then true for the derivative
. Among other things, this implies that
cannot vanish unless
does also; thus the zeroes of
will not lie on the unit circle except where
has repeated zeroes. The analogous statement is true for
; the zeroes of
will not lie on the critical line except where
has repeated zeroes.
Relating to this fact, it is a classical result of Speiser that the Riemann hypothesis is true if and only if all the zeroes of the derivative of the zeta function in the critical strip lie on or to the right of the critical line. The analogous result for polynomials is
Proposition 3 We have
(where all zeroes are counted with multiplicity.) In particular, the zeroes of
all lie on the unit circle if and only if the zeroes of
lie in the closed unit disk.
Proof: From the functional equation we have
Thus it will suffice to show that and
have the same number of zeroes outside the closed unit disk.
Set , then
is a rational function that does not have a zero or pole at infinity. For
not a zero of
, we have already seen that
and
are real, so on dividing we see that
is always real, that is to say
(This can also be seen by writing , where
runs over the zeroes of
, and using the fact that these zeroes are symmetric with respect to reflection across the unit circle.) When
is a zero of
,
has a simple pole at
with residue a positive multiple of
, and so
stays on the right half-plane if one traverses a semicircular arc around
outside the unit disk. From this and continuity we see that
stays on the right-half plane in a circle slightly larger than the unit circle, and hence by the argument principle it has the same number of zeroes and poles outside of this circle, giving the claim.
From the functional equation and the chain rule, is a zero of
if and only if
is a zero of
. We can thus write the above proposition in the equivalent form
One can use this identity to get a lower bound on the number of zeroes of by the method of mollifiers. Namely, for any other polynomial
, we clearly have
By Jensen’s formula, we have for any that
We therefore have
As the logarithm function is concave, we can apply Jensen’s inequality to conclude
where the expectation is over the parameter. It turns out that by choosing the mollifier
carefully in order to make
behave like the function
(while keeping the degree
small enough that one can compute the second moment here), and then optimising in
, one can use this inequality to get a positive fraction of zeroes of
on the unit circle on average. This is the polynomial analogue of a classical argument of Levinson, who used this to show that at least one third of the zeroes of the Riemann zeta function are on the critical line; all later improvements on this fraction have been based on some version of Levinson’s method, mainly focusing on more advanced choices for the mollifier
and of the differential operator
that implicitly appears in the above approach. (The most recent lower bound I know of is
, due to Pratt and Robles. In principle (as observed by Farmer) this bound can get arbitrarily close to
if one is allowed to use arbitrarily long mollifiers, but establishing this seems of comparable difficulty to unsolved problems such as the pair correlation conjecture; see this paper of Radziwill for more discussion.) A variant of these techniques can also establish “zero density estimates” of the following form: for any
, the number of zeroes of
that lie further than
from the unit circle is of order
on average for some absolute constant
. Thus, roughly speaking, most zeroes of
lie within
of the unit circle. (Analogues of these results for the Riemann zeta function were worked out by Selberg, by Jutila, and by Conrey, with increasingly strong values of
.)
The zeroes of tend to live somewhat closer to the origin than the zeroes of
. Suppose for instance that we write
where are the zeroes of
, then by evaluating at zero we see that
and the right-hand side is of unit magnitude by the functional equation. However, if we differentiate
where are the zeroes of
, then by evaluating at zero we now see that
The right-hand side would now be typically expected to be of size , and so on average we expect the
to have magnitude like
, that is to say pushed inwards from the unit circle by a distance roughly
. The analogous result for the Riemann zeta function is that the zeroes of
at height
lie at a distance roughly
to the right of the critical line on the average; see this paper of Levinson and Montgomery for a precise statement.
In this post we assume the Riemann hypothesis and the simplicity of zeroes, thus the zeroes of in the critical strip take the form
for some real number ordinates
. From the Riemann-von Mangoldt formula, one has the asymptotic
as ; in particular, the spacing
should behave like
on the average. However, it can happen that some gaps are unusually small compared to other nearby gaps. For the sake of concreteness, let us define a Lehmer pair to be a pair of adjacent ordinates
such that
The specific value of constant is not particularly important here; anything larger than
would suffice. An example of such a pair would be the classical pair
discovered by Lehmer. It follows easily from the main results of Csordas, Smith, and Varga that if an infinite number of Lehmer pairs (in the above sense) existed, then the de Bruijn-Newman constant is non-negative. This implication is now redundant in view of the unconditional results of this recent paper of Rodgers and myself; however, the question of whether an infinite number of Lehmer pairs exist remain open.
In this post, I sketch an argument that Brad and I came up with (as initially suggested by Odlyzko) the GUE hypothesis implies the existence of infinitely many Lehmer pairs. We argue probabilistically: pick a sufficiently large number , pick
at random from
to
(so that the average gap size is close to
), and prove that the Lehmer pair condition (1) occurs with positive probability.
Introduce the renormalised ordinates for
, and let
be a small absolute constant (independent of
). It will then suffice to show that
(say) with probability , since the contribution of those
outside of
can be absorbed by the
factor with probability
.
As one consequence of the GUE hypothesis, we have with probability
. Thus, if
, then
has density
. Applying the Hardy-Littlewood maximal inequality, we see that with probability
, we have
which implies in particular that
for all . This implies in particular that
and so it will suffice to show that
(say) with probability .
By the GUE hypothesis (and the fact that is independent of
), it suffices to show that a Dyson sine process
, normalised so that
is the first positive point in the process, obeys the inequality
with probability . However, if we let
be a moderately large constant (and assume
small depending on
), one can show using
-point correlation functions for the Dyson sine process (and the fact that the Dyson kernel
equals
to second order at the origin) that
for any natural number , where
denotes the number of elements of the process in
. For instance, the expression
can be written in terms of the three-point correlation function
as
which can easily be estimated to be (since
in this region), and similarly for the other estimates claimed above.
Since for natural numbers , the quantity
is only positive when
, we see from the first three estimates that the event
that
occurs with probability
. In particular, by Markov’s inequality we have the conditional probabilities
and thus, if is large enough, and
small enough, it will be true with probability
that
and
and simultaneously that
for all natural numbers . This implies in particular that
and
for all , which gives (2) for
small enough.
Remark 1 The above argument needed the GUE hypothesis for correlations up to fourth order (in order to establish (3)). It might be possible to reduce the number of correlations needed, but I do not see how to obtain the claim just using pair correlations only.
Brad Rodgers and I have uploaded to the arXiv our paper “The De Bruijn-Newman constant is non-negative“. This paper affirms a conjecture of Newman regarding to the extent to which the Riemann hypothesis, if true, is only “barely so”. To describe the conjecture, let us begin with the Riemann xi function
where is the Gamma function and
is the Riemann zeta function. Initially, this function is only defined for
, but, as was already known to Riemann, we can manipulate it into a form that extends to the entire complex plane as follows. Firstly, in view of the standard identity
, we can write
and hence
By a rescaling, one may write
and similarly
and thus (after applying Fubini’s theorem)
We’ll make the change of variables to obtain
If we introduce the mild renormalisation
of , we then conclude (at least for
) that
which one can verify to be rapidly decreasing both as and as
, with the decrease as
faster than any exponential. In particular
extends holomorphically to the upper half plane.
If we normalize the Fourier transform of a (Schwartz) function
as
, it is well known that the Gaussian
is its own Fourier transform. The creation operator
interacts with the Fourier transform by the identity
Since , this implies that the function
is its own Fourier transform. (One can view the polynomial as a renormalised version of the fourth Hermite polynomial.) Taking a suitable linear combination of this with
, we conclude that
is also its own Fourier transform. Rescaling by
and then multiplying by
, we conclude that the Fourier transform of
is
and hence by the Poisson summation formula (using symmetry and vanishing at to unfold the
summation in (2) to the integers rather than the natural numbers) we obtain the functional equation
which implies that and
are even functions (in particular,
now extends to an entire function). From this symmetry we can also rewrite (1) as
which now gives a convergent expression for the entire function for all complex
. As
is even and real-valued on
,
is even and also obeys the functional equation
, which is equivalent to the usual functional equation for the Riemann zeta function. The Riemann hypothesis is equivalent to the claim that all the zeroes of
are real.
De Bruijn introduced the family of deformations of
, defined for all
and
by the formula
From a PDE perspective, one can view as the evolution of
under the backwards heat equation
. As with
, the
are all even entire functions that obey the functional equation
, and one can ask an analogue of the Riemann hypothesis for each such
, namely whether all the zeroes of
are real. De Bruijn showed that these hypotheses were monotone in
: if
had all real zeroes for some
, then
would also have all zeroes real for any
. Newman later sharpened this claim by showing the existence of a finite number
, now known as the de Bruijn-Newman constant, with the property that
had all zeroes real if and only if
. Thus, the Riemann hypothesis is equivalent to the inequality
. Newman then conjectured the complementary bound
; in his words, this conjecture asserted that if the Riemann hypothesis is true, then it is only “barely so”, in that the reality of all the zeroes is destroyed by applying heat flow for even an arbitrarily small amount of time. Over time, a significant amount of evidence was established in favour of this conjecture; most recently, in 2011, Saouter, Gourdon, and Demichel showed that
.
In this paper we finish off the proof of Newman’s conjecture, that is we show that . The proof is by contradiction, assuming that
(which among other things, implies the truth of the Riemann hypothesis), and using the properties of backwards heat evolution to reach a contradiction.
Very roughly, the argument proceeds as follows. As observed by Csordas, Smith, and Varga (and also discussed in this previous blog post, the backwards heat evolution of the introduces a nice ODE dynamics on the zeroes
of
, namely that they solve the ODE
for all (one has to interpret the sum in a principal value sense as it is not absolutely convergent, but let us ignore this technicality for the current discussion). Intuitively, this ODE is asserting that the zeroes
repel each other, somewhat like positively charged particles (but note that the dynamics is first-order, as opposed to the second-order laws of Newtonian mechanics). Formally, a steady state (or equilibrium) of this dynamics is reached when the
are arranged in an arithmetic progression. (Note for instance that for any positive
, the functions
obey the same backwards heat equation as
, and their zeroes are on a fixed arithmetic progression
.) The strategy is to then show that the dynamics from time
to time
creates a convergence to local equilibrium, in which the zeroes
locally resemble an arithmetic progression at time
. This will be in contradiction with known results on pair correlation of zeroes (or on related statistics, such as the fluctuations on gaps between zeroes), such as the results of Montgomery (actually for technical reasons it is slightly more convenient for us to use related results of Conrey, Ghosh, Goldston, Gonek, and Heath-Brown). Another way of thinking about this is that even very slight deviations from local equilibrium (such as a small number of gaps that are slightly smaller than the average spacing) will almost immediately lead to zeroes colliding with each other and leaving the real line as one evolves backwards in time (i.e., under the forward heat flow). This is a refinement of the strategy used in previous lower bounds on
, in which “Lehmer pairs” (pairs of zeroes of the zeta function that were unusually close to each other) were used to limit the extent to which the evolution continued backwards in time while keeping all zeroes real.
How does one obtain this convergence to local equilibrium? We proceed by broad analogy with the “local relaxation flow” method of Erdos, Schlein, and Yau in random matrix theory, in which one combines some initial control on zeroes (which, in the case of the Erdos-Schlein-Yau method, is referred to with terms such as “local semicircular law”) with convexity properties of a relevant Hamiltonian that can be used to force the zeroes towards equilibrium.
We first discuss the initial control on zeroes. For , we have the classical Riemann-von Mangoldt formula, which asserts that the number of zeroes in the interval
is
as
. (We have a factor of
here instead of the more familiar
due to the way
is normalised.) This implies for instance that for a fixed
, the number of zeroes in the interval
is
. Actually, because we get to assume the Riemann hypothesis, we can sharpen this to
, a result of Littlewood (see this previous blog post for a proof). Ideally, we would like to obtain similar control for the other
,
, as well. Unfortunately we were only able to obtain the weaker claims that the number of zeroes of
in
is
, and that the number of zeroes in
is
, that is to say we only get good control on the distribution of zeroes at scales
rather than at scales
. Ultimately this is because we were only able to get control (and in particular, lower bounds) on
with high precision when
(whereas
has good estimates as soon as
is larger than (say)
). This control is obtained by the expressing
in terms of some contour integrals and using the method of steepest descent (actually it is slightly simpler to rely instead on the Stirling approximation for the Gamma function, which can be proven in turn by steepest descent methods). Fortunately, it turns out that this weaker control is still (barely) enough for the rest of our argument to go through.
Once one has the initial control on zeroes, we now need to force convergence to local equilibrium by exploiting convexity of a Hamiltonian. Here, the relevant Hamiltonian is
ignoring for now the rather important technical issue that this sum is not actually absolutely convergent. (Because of this, we will need to truncate and renormalise the Hamiltonian in a number of ways which we will not detail here.) The ODE (3) is formally the gradient flow for this Hamiltonian. Furthermore, this Hamiltonian is a convex function of the (because
is a convex function on
). We therefore expect the Hamiltonian to be a decreasing function of time, and that the derivative should be an increasing function of time. As time passes, the derivative of the Hamiltonian would then be expected to converge to zero, which should imply convergence to local equilibrium.
Formally, the derivative of the above Hamiltonian is
Again, there is the important technical issue that this quantity is infinite; but it turns out that if we renormalise the Hamiltonian appropriately, then the energy will also become suitably renormalised, and in particular will vanish when the are arranged in an arithmetic progression, and be positive otherwise. One can also formally calculate the derivative of
to be a somewhat complicated but manifestly non-negative quantity (a sum of squares); see this previous blog post for analogous computations in the case of heat flow on polynomials. After flowing from time
to time
, and using some crude initial bounds on
and
in this region (coming from the Riemann-von Mangoldt type formulae mentioned above and some further manipulations), we can eventually show that the (renormalisation of the) energy
at time zero is small, which forces the
to locally resemble an arithmetic progression, which gives the required convergence to local equilibrium.
There are a number of technicalities involved in making the above sketch of argument rigorous (for instance, justifying interchanges of derivatives and infinite sums turns out to be a little bit delicate). I will highlight here one particular technical point. One of the ways in which we make expressions such as the energy finite is to truncate the indices
to an interval
to create a truncated energy
. In typical situations, we would then expect
to be decreasing, which will greatly help in bounding
(in particular it would allow one to control
by time-averaged quantities such as
, which can in turn be controlled using variants of (4)). However, there are boundary effects at both ends of
that could in principle add a large amount of energy into
, which is bad news as it could conceivably make
undesirably large even if integrated energies such as
remain adequately controlled. As it turns out, such boundary effects are negligible as long as there is a large gap between adjacent zeroes at boundary of
– it is only narrow gaps that can rapidly transmit energy across the boundary of
. Now, narrow gaps can certainly exist (indeed, the GUE hypothesis predicts these happen a positive fraction of the time); but the pigeonhole principle (together with the Riemann-von Mangoldt formula) can allow us to pick the endpoints of the interval
so that no narrow gaps appear at the boundary of
for any given time
. However, there was a technical problem: this argument did not allow one to find a single interval
that avoided gaps for all times
simultaneously – the pigeonhole principle could produce a different interval
for each time
! Since the number of times was uncountable, this was a serious issue. (In physical terms, the problem was that there might be very fast “longitudinal waves” in the dynamics that, at each time, cause some gaps between zeroes to be highly compressed, but the specific gap that was narrow changed very rapidly with time. Such waves could, in principle, import a huge amount of energy into
by time
.) To resolve this, we borrowed a PDE trick of Bourgain’s, in which the pigeonhole principle was coupled with local conservation laws. More specifically, we use the phenomenon that very narrow gaps
take a nontrivial amount of time to expand back to a reasonable size (this can be seen by comparing the evolution of this gap with solutions of the scalar ODE
, which represents the fastest at which a gap such as
can expand). Thus, if a gap
is reasonably large at some time
, it will also stay reasonably large at slightly earlier times
for some moderately small
. This lets one locate an interval
that has manageable boundary effects during the times in
, so in particular
is basically non-increasing in this time interval. Unfortunately, this interval is a little bit too short to cover all of
; however it turns out that one can iterate the above construction and find a nested sequence of intervals
, with each
non-increasing in a different time interval
, and with all of the time intervals covering
. This turns out to be enough (together with the obvious fact that
is monotone in
) to still control
for some reasonably sized interval
, as required for the rest of the arguments.
ADDED LATER: the following analogy (involving functions with just two zeroes, rather than an infinite number of zeroes) may help clarify the relation between this result and the Riemann hypothesis (and in particular why this result does not make the Riemann hypothesis any easier to prove, in fact it confirms the delicate nature of that hypothesis). Suppose one had a quadratic polynomial of the form
, where
was an unknown real constant. Suppose that one was for some reason interested in the analogue of the “Riemann hypothesis” for
, namely that all the zeroes of
are real. A priori, there are three scenarios:
- (Riemann hypothesis false)
, and
has zeroes
off the real axis.
- (Riemann hypothesis true, but barely so)
, and both zeroes of
are on the real axis; however, any slight perturbation of
in the positive direction would move zeroes off the real axis.
- (Riemann hypothesis true, with room to spare)
, and both zeroes of
are on the real axis. Furthermore, any slight perturbation of
will also have both zeroes on the real axis.
The analogue of our result in this case is that , thus ruling out the third of the three scenarios here. In this simple example in which only two zeroes are involved, one can think of the inequality
as asserting that if the zeroes of
are real, then they must be repeated. In our result (in which there are an infinity of zeroes, that become increasingly dense near infinity), and in view of the convergence to local equilibrium properties of (3), the analogous assertion is that if the zeroes of
are real, then they do not behave locally as if they were in arithmetic progression.
A major topic of interest of analytic number theory is the asymptotic behaviour of the Riemann zeta function in the critical strip
in the limit
. For the purposes of this set of notes, it is a little simpler technically to work with the log-magnitude
of the zeta function. (In principle, one can reconstruct a branch of
, and hence
itself, from
using the Cauchy-Riemann equations, or tools such as the Borel-Carathéodory theorem, see Exercise 40 of Supplement 2.)
One has the classical estimate
(See e.g. Exercise 37 from Supplement 3.) In view of this, let us define the normalised log-magnitudes for any
by the formula
informally, this is a normalised window into near
. One can rephrase several assertions about the zeta function in terms of the asymptotic behaviour of
. For instance:
- (i) The bound (1) implies that
is asymptotically locally bounded from above in the limit
, thus for any compact set
we have
for
and
sufficiently large. In fact the implied constant in
only depends on the projection of
to the real axis.
- (ii) For
, we have the bounds
which implies that
converges locally uniformly as
to zero in the region
.
- (iii) The functional equation, together with the symmetry
, implies that
which by Exercise 17 of Supplement 3 shows that
as
, locally uniformly in
. In particular, when combined with the previous item, we see that
converges locally uniformly as
to
in the region
.
- (iv) From Jensen’s formula (Theorem 16 of Supplement 2) we see that
is a subharmonic function, and thus
is subharmonic as well. In particular we have the mean value inequality
for any disk
, where the integral is with respect to area measure. From this and (ii) we conclude that
for any disk with
and sufficiently large
; combining this with (i) we conclude that
is asymptotically locally bounded in
in the limit
, thus for any compact set
we have
for sufficiently large
.
From (iv) and the usual Arzela-Ascoli diagonalisation argument, we see that the are asymptotically compact in the topology of distributions: given any sequence
tending to
, one can extract a subsequence such that the
converge in the sense of distributions. Let us then define a normalised limit profile of
to be a distributional limit
of a sequence of
; they are analogous to limiting profiles in PDE, and also to the more recent introduction of “graphons” in the theory of graph limits. Then by taking limits in (i)-(iv) we can say a lot about such normalised limit profiles
(up to almost everywhere equivalence, which is an issue we will address shortly):
- (i)
is bounded from above in the critical strip
.
- (ii)
vanishes on
.
- (iii) We have the functional equation
for all
. In particular
for
.
- (iv)
is subharmonic.
Unfortunately, (i)-(iv) fail to characterise completely. For instance, one could have
for any convex function
of
that equals
for
,
for
, and obeys the functional equation
, and this would be consistent with (i)-(iv). One can also perturb such examples in a region where
is strictly convex to create further examples of functions obeying (i)-(iv). Note from subharmonicity that the function
is always going to be convex in
; this can be seen as a limiting case of the Hadamard three-lines theorem (Exercise 41 of Supplement 2).
We pause to address one minor technicality. We have defined as a distributional limit, and as such it is a priori only defined up to almost everywhere equivalence. However, due to subharmonicity, there is a unique upper semi-continuous representative of
(taking values in
), defined by the formula
for any (note from subharmonicity that the expression in the limit is monotone nonincreasing as
, and is also continuous in
). We will now view this upper semi-continuous representative of
as the canonical representative of
, so that
is now defined everywhere, rather than up to almost everywhere equivalence.
By a classical theorem of Riesz, a function is subharmonic if and only if the distribution
is a non-negative measure, where
is the Laplacian in the
coordinates. Jensen’s formula (or Greens’ theorem), when interpreted distributionally, tells us that
away from the real axis, where ranges over the non-trivial zeroes of
. Thus, if
is a normalised limit profile for
that is the distributional limit of
, then we have
where is a non-negative measure which is the limit in the vague topology of the measures
Thus is a normalised limit profile of the zeroes of the Riemann zeta function.
Using this machinery, we can recover many classical theorems about the Riemann zeta function by “soft” arguments that do not require extensive calculation. Here are some examples:
Theorem 1 The Riemann hypothesis implies the Lindelöf hypothesis.
Proof: It suffices to show that any limiting profile (arising as the limit of some
) vanishes on the critical line
. But if the Riemann hypothesis holds, then the measures
are supported on the critical line
, so the normalised limit profile
is also supported on this line. This implies that
is harmonic outside of the critical line. By (ii) and unique continuation for harmonic functions, this implies that
vanishes on the half-space
(and equals
on the complementary half-space, by (iii)), giving the claim.
In fact, we have the following sharper statement:
Theorem 2 (Backlund) The Lindelöf hypothesis is equivalent to the assertion that for any fixed
, the number of zeroes in the region
is
as
.
Proof: If the latter claim holds, then for any , the measures
assign a mass of
to any region of the form
as
for any fixed
and
. Thus the normalised limiting profile measure
is supported on the critical line, and we can repeat the previous argument.
Conversely, suppose the claim fails, then we can find a sequence and
such that
assigns a mass of
to the region
. Extracting a normalised limiting profile, we conclude that the normalised limiting profile measure
is non-trivial somewhere to the right of the critical line, so the associated subharmonic function
is not harmonic everywhere to the right of the critical line. From the maximum principle and (ii) this implies that
has to be positive somewhere on the critical line, but this contradicts the Lindelöf hypothesis. (One has to take a bit of care in the last step since
only converges to
in the sense of distributions, but it turns out that the subharmonicity of all the functions involved gives enough regularity to justify the argument; we omit the details here.)
Theorem 3 (Littlewood) Assume the Lindelöf hypothesis. Then for any fixed
, the number of zeroes in the region
is
as
.
Proof: By the previous arguments, the only possible normalised limiting profile for is
. Taking distributional Laplacians, we see that the only possible normalised limiting profile for the zeroes is Lebesgue measure on the critical line. Thus,
can only converge to
as
, and the claim follows.
Even without the Lindelöf hypothesis, we have the following result:
Theorem 4 (Titchmarsh) For any fixed
, there are
zeroes in the region
for sufficiently large
.
Among other things, this theorem recovers a classical result of Littlewood that the gaps between the imaginary parts of the zeroes goes to zero, even without assuming unproven conjectures such as the Riemann or Lindelöf hypotheses.
Proof: Suppose for contradiction that this were not the case, then we can find and a sequence
such that
contains
zeroes. Passing to a subsequence to extract a limit profile, we conclude that the normalised limit profile measure
assigns no mass to the horizontal strip
. Thus the associated subharmonic function
is actually harmonic on this strip. But by (ii) and unique continuation this forces
to vanish on this strip, contradicting the functional equation (iii).
Exercise 5 Use limiting profiles to obtain the matching upper bound of
for the number of zeroes in
for sufficiently large
.
Remark 6 One can remove the need to take limiting profiles in the above arguments if one can come up with quantitative (or “hard”) substitutes for qualitative (or “soft”) results such as the unique continuation property for harmonic functions. This would also allow one to replace the qualitative decay rates
with more quantitative decay rates such as
or
. Indeed, the classical proofs of the above theorems come with quantitative bounds that are typically of this form (see e.g. the text of Titchmarsh for details).
Exercise 7 Let
denote the quantity
, where the branch of the argument is taken by using a line segment connecting
to (say)
, and then to
. If we have a sequence
producing normalised limit profiles
for
and the zeroes respectively, show that
converges in the sense of distributions to the function
, or equivalently
Conclude in particular that if the Lindelöf hypothesis holds, then
as
.
A little bit more about the normalised limit profiles are known unconditionally, beyond (i)-(iv). For instance, from Exercise 3 of Notes 5 we have
as
, which implies that any normalised limit profile
for
is bounded by
on the critical line, beating the bound of
coming from convexity and (ii), (iii), and then convexity can be used to further bound
away from the critical line also. Some further small improvements of this type are known (coming from various methods for estimating exponential sums), though they fall well short of determining
completely at our current level of understanding. Of course, given that we believe the Riemann hypothesis (and hence the Lindelöf hypothesis) to be true, the only actual limit profile that should exist is
(in fact this assertion is equivalent to the Lindelöf hypothesis, by the arguments above).
Better control on limiting profiles is available if we do not insist on controlling for all values of the height parameter
, but only for most such values, thanks to the existence of several mean value theorems for the zeta function, as discussed in Notes 6; we discuss this below the fold.
We return to the study of the Riemann zeta function , focusing now on the task of upper bounding the size of this function within the critical strip; as seen in Exercise 43 of Notes 2, such upper bounds can lead to zero-free regions for
, which in turn lead to improved estimates for the error term in the prime number theorem.
In equation (21) of Notes 2 we obtained the somewhat crude estimates
for any and
with
and
. Setting
, we obtained the crude estimate
in this region. In particular, if and
then we had
. Using the functional equation and the Hadamard three lines lemma, we can improve this to
; see Supplement 3.
Now we seek better upper bounds on . We will reduce the problem to that of bounding certain exponential sums, in the spirit of Exercise 34 of Supplement 3:
Proposition 1 Let
with
and
. Then
where
.
Proof: We fix a smooth function with
for
and
for
, and allow implied constants to depend on
. Let
with
. From Exercise 34 of Supplement 3, we have
for some sufficiently large absolute constant . By dyadic decomposition, we thus have
We can absorb the first term in the second using the case of the supremum. Writing
, where
it thus suffices to show that
for each . But from the fundamental theorem of calculus, the left-hand side can be written as
and the claim then follows from the triangle inequality and a routine calculation.
We are thus interested in getting good bounds on the sum . More generally, we consider normalised exponential sums of the form
where is an interval of length at most
for some
, and
is a smooth function. We will assume smoothness estimates of the form
for some , all
, and all
, where
is the
-fold derivative of
; in the case
,
of interest for the Riemann zeta function, we easily verify that these estimates hold with
. (One can consider exponential sums under more general hypotheses than (3), but the hypotheses here are adequate for our needs.) We do not bound the zeroth derivative
of
directly, but it would not be natural to do so in any event, since the magnitude of the sum (2) is unaffected if one adds an arbitrary constant to
.
The trivial bound for (2) is
and we will seek to obtain significant improvements to this bound. Pseudorandomness heuristics predict a bound of for (2) for any
if
; this assertion (a special case of the exponent pair hypothesis) would have many consequences (for instance, inserting it into Proposition 1 soon yields the Lindelöf hypothesis), but is unfortunately quite far from resolution with known methods. However, we can obtain weaker gains of the form
when
and
depends on
. We present two such results here, which perform well for small and large values of
respectively:
Theorem 2 Let
, let
be an interval of length at most
, and let
be a smooth function obeying (3) for all
and
.
The factor of can be removed by a more careful argument, but we will not need to do so here as we are willing to lose powers of
. The estimate (6) is superior to (5) when
for
large, since (after optimising in
) (5) gives a gain of the form
over the trivial bound, while (6) gives
. We have not attempted to obtain completely optimal estimates here, settling for a relatively simple presentation that still gives good bounds on
, and there are a wide variety of additional exponential sum estimates beyond the ones given here; see Chapter 8 of Iwaniec-Kowalski, or Chapters 3-4 of Montgomery, for further discussion.
We now briefly discuss the strategies of proof of Theorem 2. Both parts of the theorem proceed by treating like a polynomial of degree roughly
; in the case of (ii), this is done explicitly via Taylor expansion, whereas for (i) it is only at the level of analogy. Both parts of the theorem then try to “linearise” the phase to make it a linear function of the summands (actually in part (ii), it is necessary to introduce an additional variable and make the phase a bilinear function of the summands). The van der Corput estimate achieves this linearisation by squaring the exponential sum about
times, which is why the gain is only exponentially small in
. The Vinogradov estimate achieves linearisation by raising the exponential sum to a significantly smaller power – on the order of
– by using Hölder’s inequality in combination with the fact that the discrete curve
becomes roughly equidistributed in the box
after taking the sumset of about
copies of this curve. This latter fact has a precise formulation, known as the Vinogradov mean value theorem, and its proof is the most difficult part of the argument, relying on using a “
-adic” version of this equidistribution to reduce the claim at a given scale
to a smaller scale
with
, and then proceeding by induction.
One can combine Theorem 2 with Proposition 1 to obtain various bounds on the Riemann zeta function:
Exercise 3 (Subconvexity bound)
- (i) Show that
for all
. (Hint: use the
case of the Van der Corput estimate.)
- (ii) For any
, show that
as
(the decay rate in the
is allowed to depend on
).
Exercise 4 Let
be such that
, and let
.
- (i) (Littlewood bound) Use the van der Corput estimate to show that
whenever
.
- (ii) (Vinogradov-Korobov bound) Use the Vinogradov estimate to show that
whenever
.
As noted in Exercise 43 of Notes 2, the Vinogradov-Korobov bound leads to the zero-free region , which in turn leads to the prime number theorem with error term
for . If one uses the weaker Littlewood bound instead, one obtains the narrower zero-free region
(which is only slightly wider than the classical zero-free region) and an error term
in the prime number theorem.
Exercise 5 (Vinogradov-Korobov in arithmetic progressions) Let
be a non-principal character of modulus
.
- (i) (Vinogradov-Korobov bound) Use the Vinogradov estimate to show that
whenever
and
(Hint: use the Vinogradov estimate and a change of variables to control
for various intervals
of length at most
and residue classes
, in the regime
(say). For
, do not try to capture any cancellation and just use the triangle inequality instead.)
- (ii) Obtain a zero-free region
for
, for some (effective) absolute constant
.
- (iii) Obtain the prime number theorem in arithmetic progressions with error term
whenever
,
,
is primitive, and
depends (ineffectively) on
.
In Notes 2, the Riemann zeta function (and more generally, the Dirichlet
-functions
) were extended meromorphically into the region
in and to the right of the critical strip. This is a sufficient amount of meromorphic continuation for many applications in analytic number theory, such as establishing the prime number theorem and its variants. The zeroes of the zeta function in the critical strip
are known as the non-trivial zeroes of
, and thanks to the truncated explicit formulae developed in Notes 2, they control the asymptotic distribution of the primes (up to small errors).
The function obeys the trivial functional equation
for all in its domain of definition. Indeed, as
is real-valued when
is real, the function
vanishes on the real line and is also meromorphic, and hence vanishes everywhere. Similarly one has the functional equation
From these equations we see that the zeroes of the zeta function are symmetric across the real axis, and the zeroes of are the reflection of the zeroes of
across this axis.
It is a remarkable fact that these functions obey an additional, and more non-trivial, functional equation, this time establishing a symmetry across the critical line rather than the real axis. One consequence of this symmetry is that the zeta function and
-functions may be extended meromorphically to the entire complex plane. For the zeta function, the functional equation was discovered by Riemann, and reads as follows:
Theorem 1 (Functional equation for the Riemann zeta function) The Riemann zeta function
extends meromorphically to the entire complex plane, with a simple pole at
and no other poles. Furthermore, one has the functional equation
for all complex
other than
, where
is the function
Here
,
are the complex-analytic extensions of the classical trigionometric functions
, and
is the Gamma function, whose definition and properties we review below the fold.
The functional equation can be placed in a more symmetric form as follows:
Corollary 2 (Functional equation for the Riemann xi function) The Riemann xi function
is analytic on the entire complex plane
(after removing all removable singularities), and obeys the functional equations
In particular, the zeroes of
consist precisely of the non-trivial zeroes of
, and are symmetric about both the real axis and the critical line. Also,
is real-valued on the critical line and on the real axis.
Corollary 2 is an easy consequence of Theorem 1 together with the duplication theorem for the Gamma function, and the fact that has no zeroes to the right of the critical strip, and is left as an exercise to the reader (Exercise 19). The functional equation in Theorem 1 has many proofs, but most of them are related in on way or another to the Poisson summation formula
(Theorem 34 from Supplement 2, at least in the case when is twice continuously differentiable and compactly supported), which can be viewed as a Fourier-analytic link between the coarse-scale distribution of the integers and the fine-scale distribution of the integers. Indeed, there is a quick heuristic proof of the functional equation that comes from formally applying the Poisson summation formula to the function
, and noting that the functions
and
are formally Fourier transforms of each other, up to some Gamma function factors, as well as some trigonometric factors arising from the distinction between the real line and the half-line. Such a heuristic proof can indeed be made rigorous, and we do so below the fold, while also providing Riemann’s two classical proofs of the functional equation.
From the functional equation (and the poles of the Gamma function), one can see that has trivial zeroes at the negative even integers
, in addition to the non-trivial zeroes in the critical strip. More generally, the following table summarises the zeroes and poles of the various special functions appearing in the functional equation, after they have been meromorphically extended to the entire complex plane, and with zeroes classified as “non-trivial” or “trivial” depending on whether they lie in the critical strip or not. (Exponential functions such as
or
have no zeroes or poles, and will be ignored in this table; the zeroes and poles of rational functions such as
are self-evident and will also not be displayed here.)
Function | Non-trivial zeroes | Trivial zeroes | Poles |
|
Yes | |
|
|
Yes | |
|
|
No | Even integers | No |
|
No | Odd integers | No |
|
No | Integers | No |
|
No | No | |
|
No | No | |
|
No | No | |
|
No | No | |
|
Yes | No | No |
Among other things, this table indicates that the Gamma and trigonometric factors in the functional equation are tied to the trivial zeroes and poles of zeta, but have no direct bearing on the distribution of the non-trivial zeroes, which is the most important feature of the zeta function for the purposes of analytic number theory, beyond the fact that they are symmetric about the real axis and critical line. In particular, the Riemann hypothesis is not going to be resolved just from further analysis of the Gamma function!
The zeta function computes the “global” sum , with
ranging all the way from
to infinity. However, by some Fourier-analytic (or complex-analytic) manipulation, it is possible to use the zeta function to also control more “localised” sums, such as
for some
and some smooth compactly supported function
. It turns out that the functional equation (3) for the zeta function localises to this context, giving an approximate functional equation which roughly speaking takes the form
whenever and
; see Theorem 39 below for a precise formulation of this equation. Unsurprisingly, this form of the functional equation is also very closely related to the Poisson summation formula (8), indeed it is essentially a special case of that formula (or more precisely, of the van der Corput
-process). This useful identity relates long smoothed sums of
to short smoothed sums of
(or vice versa), and can thus be used to shorten exponential sums involving terms such as
, which is useful when obtaining some of the more advanced estimates on the Riemann zeta function.
We will give two other basic uses of the functional equation. The first is to get a good count (as opposed to merely an upper bound) on the density of zeroes in the critical strip, establishing the Riemann-von Mangoldt formula that the number of zeroes of imaginary part between
and
is
for large
. The other is to obtain untruncated versions of the explicit formula from Notes 2, giving a remarkable exact formula for sums involving the von Mangoldt function in terms of zeroes of the Riemann zeta function. These results are not strictly necessary for most of the material in the rest of the course, but certainly help to clarify the nature of the Riemann zeta function and its relation to the primes.
In view of the material in previous notes, it should not be surprising that there are analogues of all of the above theory for Dirichlet -functions
. We will restrict attention to primitive characters
, since the
-function for imprimitive characters merely differs from the
-function of the associated primitive factor by a finite Euler product; indeed, if
for some principal
whose modulus
is coprime to that of
, then
(cf. equation (45) of Notes 2).
The main new feature is that the Poisson summation formula needs to be “twisted” by a Dirichlet character , and this boils down to the problem of understanding the finite (additive) Fourier transform of a Dirichlet character. This is achieved by the classical theory of Gauss sums, which we review below the fold. There is one new wrinkle; the value of
plays a role in the functional equation. More precisely, we have
Theorem 3 (Functional equation for
-functions) Let
be a primitive character of modulus
with
. Then
extends to an entire function on the complex plane, with
or equivalently
for all
, where
is equal to
in the even case
and
in the odd case
, and
where
is the Gauss sum
and
, with the convention that the
-periodic function
is also (by abuse of notation) applied to
in the cyclic group
.
From this functional equation and (2) we see that, as with the Riemann zeta function, the non-trivial zeroes of (defined as the zeroes within the critical strip
are symmetric around the critical line (and, if
is real, are also symmetric around the real axis). In addition,
acquires trivial zeroes at the negative even integers and at zero if
, and at the negative odd integers if
. For imprimitive
, we see from (9) that
also acquires some additional trivial zeroes on the left edge of the critical strip.
There is also a symmetric version of this equation, analogous to Corollary 2:
Corollary 4 Let
be as above, and set
then
is entire with
.
For further detail on the functional equation and its implications, I recommend the classic text of Titchmarsh or the text of Davenport.
In Notes 1, we approached multiplicative number theory (the study of multiplicative functions and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions
and logarithmic sums
. Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series
, defined (at least for
of sufficiently large real part) by the formula
These series also made an appearance in the elementary approach to the subject, but only for real that were larger than
. But now we will exploit the freedom to extend the variable
to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as
or
from control on the Dirichlet series. Crucially, for many key functions
of number-theoretic interest, the Dirichlet series
can be analytically (or at least meromorphically) continued to the left of the line
. The zeroes and poles of the resulting meromorphic continuations of
(and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of
; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function
are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function
(and hence to the primes) of the form
where the sum is over zeroes (counting multiplicity) of the Riemann zeta function
(with the sum often restricted so that
has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that
for suitable “test functions” (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem
as , with the size of the error term
closely tied to the location of the zeroes
of the Riemann zeta function.
The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation
for the Dirichlet series of the von Mangoldt function; note that (4) is formally the special case of (2) when
. Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.
More generally, one has an explicit formula
for any (non-principal) Dirichlet character , where
now ranges over the zeroes of the associated Dirichlet
-function
; we view this formula as a “twist” of (1) by the Dirichlet character
. The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that
as , whenever
is a fixed primitive residue class. Again, the size of the error term
here is closely tied to the location of the zeroes of the Dirichlet
-function, with particular importance given to whether there is a zero very close to
(such a zero is known as an exceptional zero or Siegel zero).
While any information on the behaviour of zeta functions or -functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are
- The region on or near the point
.
- The region on or near the right edge
of the critical strip
.
- The right half
of the critical strip.
- The region on or near the critical line
that bisects the critical strip.
- Everywhere else.
For instance:
- We will shortly show that the Riemann zeta function
has a simple pole at
with residue
, which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function
. For Dirichlet
-functions, the behaviour is instead controlled by the quantity
discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
- The zeta function is also known to have no zeroes on the right edge
of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for
into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for
-functions and the prime number theorem in arithmetic progressions.
- The (as yet unproven) Riemann hypothesis prohibits
from having any zeroes within the right half
of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis, zero density estimates in this region are available that give some partial control of this form. Similarly for
-functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis.
- Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general GUE hypothesis) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for
-functions and primes in short arithmetic progressions.
- The functional equation of the zeta function describes the behaviour of
to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for
-functions.
Remark 1 If one takes an “adelic” viewpoint, one can unite the Riemann zeta function
and all of the
-functions
for various Dirichlet characters
into a single object, viewing
as a general multiplicative character on the adeles; thus the imaginary coordinate
and the Dirichlet character
are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet
-functions. (The non-Archimedean character
and the Archimedean character
behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)
Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.
As a consequence of this hierarchy of importance, information about the function away from the critical strip, such as Euler’s identity
or equivalently
or the infamous identity
which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as
are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value
of an
-function at
turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.
For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.
Mertens’ theorems are a set of classical estimates concerning the asymptotic distribution of the prime numbers:
Theorem 1 (Mertens’ theorems) In the asymptotic limit
, we have
where
is the Euler-Mascheroni constant, defined by requiring that
in the limit
.
The third theorem (3) is usually stated in exponentiated form
but in the logarithmic form (3) we see that it is strictly stronger than (2), in view of the asymptotic .
Remarkably, these theorems can be proven without the assistance of the prime number theorem
which was proven about two decades after Mertens’ work. (But one can certainly use versions of the prime number theorem with good error term, together with summation by parts, to obtain good estimates on the various errors in Mertens’ theorems.) Roughly speaking, the reason for this is that Mertens’ theorems only require control on the Riemann zeta function in the neighbourhood of the pole at
, whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line
. Specifically, Mertens’ theorem is ultimately deduced from the Euler product formula
valid in the region (which is ultimately a Fourier-Dirichlet transform of the fundamental theorem of arithmetic), and following crude asymptotics:
Proposition 2 (Simple pole) For
sufficiently close to
with
, we have
and
Proof: For as in the proposition, we have
for any natural number
and
, and hence
Summing in and using the identity
, we obtain the first claim. Similarly, we have
and by summing in and using the identity
(the derivative of the previous identity) we obtain the claim.
The first two of Mertens’ theorems (1), (2) are relatively easy to prove, and imply the third theorem (3) except with replaced by an unspecified absolute constant. To get the specific constant
requires a little bit of additional effort. From (4), one might expect that the appearance of
arises from the refinement
that one can obtain to (6). However, it turns out that the connection is not so much with the zeta function, but with the Gamma function, and specifically with the identity (which is of course related to (7) through the functional equation for zeta, but can be proven without any reference to zeta functions). More specifically, we have the following asymptotic for the exponential integral:
Proposition 3 (Exponential integral asymptotics) For sufficiently small
, one has
A routine integration by parts shows that this asymptotic is equivalent to the identity
which is the identity mentioned previously.
Proof: We start by using the identity to express the harmonic series
as
or on summing the geometric series
Since , we thus have
making the change of variables , this becomes
As ,
converges pointwise to
and is pointwise dominated by
. Taking limits as
using dominated convergence, we conclude that
or equivalently
The claim then follows by bounding the portion of the integral on the left-hand side.
Below the fold I would like to record how Proposition 2 and Proposition 3 imply Theorem 1; the computations are utterly standard, and can be found in most analytic number theory texts, but I wanted to write them down for my own benefit (I always keep forgetting, in particular, how the third of Mertens’ theorems is proven).
Recent Comments