You are currently browsing the tag archive for the ‘Siegel zero’ tag.

In a recent post I discussed how the Riemann zeta function can be locally approximated by a polynomial, in the sense that for randomly chosen one has an approximation

where grows slowly with , and is a polynomial of degree . Assuming the Riemann hypothesis (as we will throughout this post), the zeroes of should all lie on the unit circle, and one should then be able to write as a scalar multiple of the characteristic polynomial of (the inverse of) a unitary matrix , which we normalise as

Here is some quantity depending on . We view as a random element of ; in the limit , the GUE hypothesis is equivalent to becoming equidistributed with respect to Haar measure on (also known as the Circular Unitary Ensemble, CUE; it is to the unit circle what the Gaussian Unitary Ensemble (GUE) is on the real line). One can also view as analogous to the “geometric Frobenius” operator in the function field setting, though unfortunately it is difficult at present to make this analogy any more precise (due, among other things, to the lack of a sufficiently satisfactory theory of the “field of one element“).

Taking logarithmic derivatives of (2), we have

and hence on taking logarithmic derivatives of (1) in the variable we (heuristically) have

Morally speaking, we have

so on comparing coefficients we expect to interpret the moments of as a finite Dirichlet series:

To understand the distribution of in the unitary group , it suffices to understand the distribution of the moments

where denotes averaging over , and . The GUE hypothesis asserts that in the limit , these moments converge to their CUE counterparts

where is now drawn uniformly in with respect to the CUE ensemble, and denotes expectation with respect to that measure.

The moment (6) vanishes unless one has the homogeneity condition

This follows from the fact that for any phase , has the same distribution as , where we use the number theory notation .

In the case when the degree is low, we can use representation theory to establish the following simple formula for the moment (6), as evaluated by Diaconis and Shahshahani:

Proposition 1 (Low moments in CUE model)Ifthen the moment (6) vanishes unless for all , in which case it is equal to

Another way of viewing this proposition is that for distributed according to CUE, the random variables are distributed like independent complex random variables of mean zero and variance , as long as one only considers moments obeying (8). This identity definitely breaks down for larger values of , so one only obtains central limit theorems in certain limiting regimes, notably when one only considers a fixed number of ‘s and lets go to infinity. (The paper of Diaconis and Shahshahani writes in place of , but I believe this to be a typo.)

*Proof:* Let be the left-hand side of (8). We may assume that (7) holds since we are done otherwise, hence

Our starting point is Schur-Weyl duality. Namely, we consider the -dimensional complex vector space

This space has an action of the product group : the symmetric group acts by permutation on the tensor factors, while the general linear group acts diagonally on the factors, and the two actions commute with each other. Schur-Weyl duality gives a decomposition

where ranges over Young tableaux of size with at most rows, is the -irreducible unitary representation corresponding to (which can be constructed for instance using Specht modules), and is the -irreducible polynomial representation corresponding with highest weight .

Let be a permutation consisting of cycles of length (this is uniquely determined up to conjugation), and let . The pair then acts on , with the action on basis elements given by

The trace of this action can then be computed as

where is the matrix coefficient of . Breaking up into cycles and summing, this is just

But we can also compute this trace using the Schur-Weyl decomposition (10), yielding the identity

where is the character on associated to , and is the character on associated to . As is well known, is just the Schur polynomial of weight applied to the (algebraic, generalised) eigenvalues of . We can specialise to unitary matrices to conclude that

and similarly

where consists of cycles of length for each . On the other hand, the characters are an orthonormal system on with the CUE measure. Thus we can write the expectation (6) as

Now recall that ranges over all the Young tableaux of size with at most rows. But by (8) we have , and so the condition of having rows is redundant. Hence now ranges over *all* Young tableaux of size , which as is well known enumerates all the irreducible representations of . One can then use the standard orthogonality properties of characters to show that the sum (12) vanishes if , are not conjugate, and is equal to divided by the size of the conjugacy class of (or equivalently, by the size of the centraliser of ) otherwise. But the latter expression is easily computed to be , giving the claim.

Example 2We illustrate the identity (11) when , . The Schur polynomials are given aswhere are the (generalised) eigenvalues of , and the formula (11) in this case becomes

The functions are orthonormal on , so the three functions are also, and their norms are , , and respectively, reflecting the size in of the centralisers of the permutations , , and respectively. If is instead set to say , then the terms now disappear (the Young tableau here has too many rows), and the three quantities here now have some non-trivial covariance.

Example 3Consider the moment . For , the above proposition shows us that this moment is equal to . What happens for ? The formula (12) computes this moment aswhere is a cycle of length in , and ranges over all Young tableaux with size and at most rows. The Murnaghan-Nakayama rule tells us that vanishes unless is a hook (all but one of the non-zero rows consisting of just a single box; this also can be interpreted as an exterior power representation on the space of vectors in whose coordinates sum to zero), in which case it is equal to (depending on the parity of the number of non-zero rows). As such we see that this moment is equal to . Thus in general we have

Now we discuss what is known for the analogous moments (5). Here we shall be rather non-rigorous, in particular ignoring an annoying “Archimedean” issue that the product of the ranges and is not quite the range but instead leaks into the adjacent range . This issue can be addressed by working in a “weak" sense in which parameters such as are averaged over fairly long scales, or by passing to a function field analogue of these questions, but we shall simply ignore the issue completely and work at a heuristic level only. For similar reasons we will ignore some technical issues arising from the sharp cutoff of to the range (it would be slightly better technically to use a smooth cutoff).

One can morally expand out (5) using (4) as

where , , and the integers are in the ranges

for and , and

for and . Morally, the expectation here is negligible unless

in which case the expecation is oscillates with magnitude one. In particular, if (7) fails (with some room to spare) then the moment (5) should be negligible, which is consistent with the analogous behaviour for the moments (6). Now suppose that (8) holds (with some room to spare). Then is significantly less than , so the multiplicative error in (15) becomes an additive error of . On the other hand, because of the fundamental *integrality gap* – that the integers are always separated from each other by a distance of at least – this forces the integers , to in fact be equal:

The von Mangoldt factors effectively restrict to be prime (the effect of prime powers is negligible). By the fundamental theorem of arithmetic, the constraint (16) then forces , and to be a permutation of , which then forces for all ._ For a given , the number of possible is then , and the expectation in (14) is equal to . Thus this expectation is morally

and using Mertens’ theorem this soon simplifies asymptotically to the same quantity in Proposition 1. Thus we see that (morally at least) the moments (5) associated to the zeta function asymptotically match the moments (6) coming from the CUE model in the low degree case (8), thus lending support to the GUE hypothesis. (These observations are basically due to Rudnick and Sarnak, with the degree case of pair correlations due to Montgomery, and the degree case due to Hejhal.)

With some rare exceptions (such as those estimates coming from “Kloostermania”), the moment estimates of Rudnick and Sarnak basically represent the state of the art for what is known for the moments (5). For instance, Montgomery’s pair correlation conjecture, in our language, is basically the analogue of (13) for , thus

for all . Montgomery showed this for (essentially) the range (as remarked above, this is a special case of the Rudnick-Sarnak result), but no further cases of this conjecture are known.

These estimates can be used to give some non-trivial information on the largest and smallest spacings between zeroes of the zeta function, which in our notation corresponds to spacing between eigenvalues of . One such method used today for this is due to Montgomery and Odlyzko and was greatly simplified by Conrey, Ghosh, and Gonek. The basic idea, translated to our random matrix notation, is as follows. Suppose is some random polynomial depending on of degree at most . Let denote the eigenvalues of , and let be a parameter. Observe from the pigeonhole principle that if the quantity

then the arcs cannot all be disjoint, and hence there exists a pair of eigenvalues making an angle of less than ( times the mean angle separation). Similarly, if the quantity (18) falls below that of (19), then these arcs cannot cover the unit circle, and hence there exists a pair of eigenvalues making an angle of greater than times the mean angle separation. By judiciously choosing the coefficients of as functions of the moments , one can ensure that both quantities (18), (19) can be computed by the Rudnick-Sarnak estimates (or estimates of equivalent strength); indeed, from the residue theorem one can write (18) as

for sufficiently small , and this can be computed (in principle, at least) using (3) if the coefficients of are in an appropriate form. Using this sort of technology (translated back to the Riemann zeta function setting), one can show that gaps between consecutive zeroes of zeta are less than times the mean spacing and greater than times the mean spacing infinitely often for certain ; the current records are (due to Goldston and Turnage-Butterbaugh) and (due to Bui and Milinovich, who input some additional estimates beyond the Rudnick-Sarnak set, namely the twisted fourth moment estimates of Bettin, Bui, Li, and Radziwill, and using a technique based on Hall’s method rather than the Montgomery-Odlyzko method).

It would be of great interest if one could push the upper bound for the smallest gap below . The reason for this is that this would then exclude the Alternative Hypothesis that the spacing between zeroes are asymptotically always (or almost always) a non-zero half-integer multiple of the mean spacing, or in our language that the gaps between the phases of the eigenvalues of are nasymptotically always non-zero integer multiples of . The significance of this hypothesis is that it is implied by the existence of a Siegel zero (of conductor a small power of ); see this paper of Conrey and Iwaniec. (In our language, what is going on is that if there is a Siegel zero in which is very close to zero, then behaves like the Kronecker delta, and hence (by the Riemann-Siegel formula) the combined -function will have a polynomial approximation which in our language looks like a scalar multiple of , where and is a phase. The zeroes of this approximation lie on a coset of the roots of unity; the polynomial is a factor of this approximation and hence will also lie in this coset, implying in particular that all eigenvalue spacings are multiples of . Taking then gives the claim.)

Unfortunately, the known methods do not seem to break this barrier without some significant new input; already the original paper of Montgomery and Odlyzko observed this limitation for their particular technique (and in fact fall very slightly short, as observed in unpublished work of Goldston and of Milinovich). In this post I would like to record another way to see this, by providing an “alternative” probability distribution to the CUE distribution (which one might dub the *Alternative Circular Unitary Ensemble* (ACUE) which is indistinguishable in low moments in the sense that the expectation for this model also obeys Proposition 1, but for which the phase spacings are always a multiple of . This shows that if one is to rule out the Alternative Hypothesis (and thus in particular rule out Siegel zeroes), one needs to input some additional moment information beyond Proposition 1. It would be interesting to see if any of the other known moment estimates that go beyond this proposition are consistent with this alternative distribution. (UPDATE: it looks like they are, see Remark 7 below.)

To describe this alternative distribution, let us first recall the Weyl description of the CUE measure on the unitary group in terms of the distribution of the phases of the eigenvalues, randomly permuted in any order. This distribution is given by the probability measure

is the Vandermonde determinant; see for instance this previous blog post for the derivation of a very similar formula for the GUE distribution, which can be adapted to CUE without much difficulty. To see that this is a probability measure, first observe the Vandermonde determinant identity

where , denotes the dot product, and is the “long word”, which implies that (20) is a trigonometric series with constant term ; it is also clearly non-negative, so it is a probability measure. One can thus generate a random CUE matrix by first drawing using the probability measure (20), and then generating to be a random unitary matrix with eigenvalues .

For the alternative distribution, we first draw on the discrete torus (thus each is a root of unity) with probability density function

shift by a phase drawn uniformly at random, and then select to be a random unitary matrix with eigenvalues . Let us first verify that (21) is a probability density function. Clearly it is non-negative. It is the linear combination of exponentials of the form for . The diagonal contribution gives the constant function , which has total mass one. All of the other exponentials have a frequency that is not a multiple of , and hence will have mean zero on . The claim follows.

From construction it is clear that the matrix drawn from this alternative distribution will have all eigenvalue phase spacings be a non-zero multiple of . Now we verify that the alternative distribution also obeys Proposition 1. The alternative distribution remains invariant under rotation by phases, so the claim is again clear when (8) fails. Inspecting the proof of that proposition, we see that it suffices to show that the Schur polynomials with of size at most and of equal size remain orthonormal with respect to the alternative measure. That is to say,

when have size equal to each other and at most . In this case the phase in the definition of is irrelevant. In terms of eigenvalue measures, we are then reduced to showing that

By Fourier decomposition, it then suffices to show that the trigonometric polynomial does not contain any components of the form for some non-zero lattice vector . But we have already observed that is a linear combination of plane waves of the form for . Also, as is well known, is a linear combination of plane waves where is majorised by , and similarly is a linear combination of plane waves where is majorised by . So the product is a linear combination of plane waves of the form . But every coefficient of the vector lies between and , and so cannot be of the form for any non-zero lattice vector , giving the claim.

Example 4If , then the distribution (21) assigns a probability of to any pair that is a permuted rotation of , and a probability of to any pair that is a permuted rotation of . Thus, a matrix drawn from the alternative distribution will be conjugate to a phase rotation of with probability , and to with probability .A similar computation when gives conjugate to a phase rotation of with probability , to a phase rotation of or its adjoint with probability of each, and a phase rotation of with probability .

Remark 5For large it does not seem that this specific alternative distribution is the only distribution consistent with Proposition 1 and which has all phase spacings a non-zero multiple of ; in particular, it may not be the only distribution consistent with a Siegel zero. Still, it is a very explicit distribution that might serve as a test case for the limitations of various arguments for controlling quantities such as the largest or smallest spacing between zeroes of zeta. The ACUE is in some sense the distribution that maximally resembles CUE (in the sense that it has the greatest number of Fourier coefficients agreeing) while still also being consistent with the Alternative Hypothesis, and so should be the most difficult enemy to eliminate if one wishes to disprove that hypothesis.

In some cases, even just a tiny improvement in known results would be able to exclude the alternative hypothesis. For instance, if the alternative hypothesis held, then is periodic in with period , so from Proposition 1 for the alternative distribution one has

which differs from (13) for any . (This fact was implicitly observed recently by Baluyot, in the original context of the zeta function.) Thus a verification of the pair correlation conjecture (17) for even a single with would rule out the alternative hypothesis. Unfortunately, such a verification appears to be on comparable difficulty with (an averaged version of) the Hardy-Littlewood conjecture, with power saving error term. (This is consistent with the fact that Siegel zeroes can cause distortions in the Hardy-Littlewood conjecture, as (implicitly) discussed in this previous blog post.)

Remark 6One can view the CUE as normalised Lebesgue measure on (viewed as a smooth submanifold of ). One can similarly view ACUE as normalised Lebesgue measure on the (disconnected) smooth submanifold of consisting of those unitary matrices whose phase spacings are non-zero integer multiples of ; informally, ACUE is CUE restricted to this lower dimensional submanifold. As is well known, the phases of CUE eigenvalues form a determinantal point process with kernel (or one can equivalently take ; in a similar spirit, the phases of ACUE eigenvalues, once they are rotated to be roots of unity, become a discrete determinantal point process on those roots of unity with exactly the same kernel (except for a normalising factor of ). In particular, the -point correlation functions of ACUE (after this rotation) are precisely the restriction of the -point correlation functions of CUE after normalisation, that is to say they are proportional to .

Remark 7One family of estimates that go beyond the Rudnick-Sarnak family of estimates are twisted moment estimates for the zeta function, such as ones that give asymptotics forfor some small even exponent (almost always or ) and some short Dirichlet polynomial ; see for instance this paper of Bettin, Bui, Li, and Radziwill for some examples of such estimates. The analogous unitary matrix average would be something like

where is now some random medium degree polynomial that depends on the unitary matrix associated to (and in applications will typically also contain some negative power of to cancel the corresponding powers of in ). Unfortunately such averages generally are unable to distinguish the CUE from the ACUE. For instance, if all the coefficients of involve products of traces of total order less than , then in terms of the eigenvalue phases , is a linear combination of plane waves where the frequencies have coefficients of magnitude less than . On the other hand, as each coefficient of is an elementary symmetric function of the eigenvalues, is a linear combination of plane waves where the frequencies have coefficients of magnitude at most . Thus is a linear combination of plane waves where the frequencies have coefficients of magnitude less than , and thus is orthogonal to the difference between the CUE and ACUE measures on the phase torus by the previous arguments. In other words, has the same expectation with respect to ACUE as it does with respect to CUE. Thus one can only start distinguishing CUE from ACUE if the mollifier has degree close to or exceeding , which corresponds to Dirichlet polynomials of length close to or exceeding , which is far beyond current technology for such moment estimates.

Remark 8The GUE hypothesis for the zeta function asserts that the averagefor any and any test function , where is the Dyson sine kernel and are the ordinates of zeroes of the zeta function. This corresponds to the CUE distribution for . The ACUE distribution then corresponds to an “alternative gaussian unitary ensemble (AGUE)” hypothesis, in which the average (22) is instead predicted to equal a Riemann sum version of the integral (23):

This is a stronger version of the alternative hypothesis that the spacing between adjacent zeroes is almost always approximately a half-integer multiple of the mean spacing. I do not know of any known moment estimates for Dirichlet series that is able to eliminate this AGUE hypothesis (even assuming GRH). (UPDATE: These facts have also been independently observed in forthcoming work of Lagarias and Rodgers.)

The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the *Möbius pseudorandomness principle* that asserts that the Möbius function is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function *is* strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, *in principle*, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1At least one of the following two statements are true:

- (Twin prime conjecture) There are infinitely many primes such that is also prime.
- (No Siegel zeroes) There exists a constant such that for every real Dirichlet character of conductor , the associated Dirichlet -function has no zeroes in the interval .

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

for some large value of , where is the von Mangoldt function. Actually, in this post we will work with the slight variant

where

is the second von Mangoldt function, and denotes Dirichlet convolution, and is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval and remove very small primes from , but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve is essentially replaced by the more combinatorial restriction for some large , where is the primorial of , but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero with close to and a Dirichlet character of conductor , then multiplicative number theory methods can be used to show that the Möbius function “pretends” to be like the character in the sense that for “most” primes near (e.g. in the range for some small and large ). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that pretends to be like can be used to construct a tractable approximation (after inserting the sieve weight ) in the range (where for some large ) for the second von Mangoldt function , namely the function

Roughly speaking, we think of the periodic function and the slowly varying function as being of about the same “complexity” as the constant function , so that is roughly of the same “complexity” as the divisor function

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate to accuracy with little difficulty, whereas to obtain a comparable level of accuracy for or is essentially the Riemann hypothesis.)

One expects to be a good approximant to if is of size and has no prime factors less than for some large constant . The Selberg sieve will be mostly supported on numbers with no prime factor less than . As such, one can hope to approximate (1) by the expression

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation with in various ranges; this is clearly related to understanding the equidistribution of the hyperbola in . Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

where denotes the inverse of in . One can then use the Weil bound

where is the greatest common divisor of (with the convention that this is equal to if vanish), and the decays to zero as . The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that with large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound)One has

whenever and are coprime to , where the is with respect to the limit (and is uniform in ).

*Proof:* Observe from change of variables that the Kloosterman sum is unchanged if one replaces with for . For fixed , the number of such pairs is at least , thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

The left-hand side can be rearranged as

which by Fourier summation is equal to

Observe from the quadratic formula and the divisor bound that each pair has at most solutions to the system of equations . Hence the number of quadruples of the desired form is , and the claim follows.

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound)Let be a primitive real Dirichlet character of conductor , and let . Then

*Proof:* As is the conductor of a primitive real Dirichlet character, is equal to times a squarefree odd number for some . By the Chinese remainder theorem, it thus suffices to establish the claim when is an odd prime. We may assume that is not divisible by this prime , as the claim is trivial otherwise. If vanishes then does not vanish, and the claim follows from the mean zero nature of ; similarly if vanishes. Hence we may assume that do not vanish, and then we can normalise them to equal . By completing the square it now suffices to show that

whenever . As is on the quadratic residues and on the non-residues, it now suffices to show that

But by making the change of variables , the left-hand side becomes , and the claim follows.

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function in place of . These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

In Notes 1, we approached multiplicative number theory (the study of multiplicative functions and their relatives) via elementary methods, in which attention was primarily focused on obtaining asymptotic control on summatory functions and logarithmic sums . Now we turn to the complex approach to multiplicative number theory, in which the focus is instead on obtaining various types of control on the Dirichlet series , defined (at least for of sufficiently large real part) by the formula

These series also made an appearance in the elementary approach to the subject, but only for real that were larger than . But now we will exploit the freedom to extend the variable to the complex domain; this gives enough freedom (in principle, at least) to recover control of elementary sums such as or from control on the Dirichlet series. Crucially, for many key functions of number-theoretic interest, the Dirichlet series can be analytically (or at least meromorphically) continued to the left of the line . The zeroes and poles of the resulting meromorphic continuations of (and of related functions) then turn out to control the asymptotic behaviour of the elementary sums of ; the more one knows about the former, the more one knows about the latter. In particular, knowledge of where the zeroes of the Riemann zeta function are located can give very precise information about the distribution of the primes, by means of a fundamental relationship known as the explicit formula. There are many ways of phrasing this explicit formula (both in exact and in approximate forms), but they are all trying to formalise an approximation to the von Mangoldt function (and hence to the primes) of the form

where the sum is over zeroes (counting multiplicity) of the Riemann zeta function (with the sum often restricted so that has large real part and bounded imaginary part), and the approximation is in a suitable weak sense, so that

for suitable “test functions” (which in practice are restricted to be fairly smooth and slowly varying, with the precise amount of restriction dependent on the amount of truncation in the sum over zeroes one wishes to take). Among other things, such approximations can be used to rigorously establish the prime number theorem

as , with the size of the error term closely tied to the location of the zeroes of the Riemann zeta function.

The explicit formula (1) (or any of its more rigorous forms) is closely tied to the counterpart approximation

for the Dirichlet series of the von Mangoldt function; note that (4) is formally the special case of (2) when . Such approximations come from the general theory of local factorisations of meromorphic functions, as discussed in Supplement 2; the passage from (4) to (2) is accomplished by such tools as the residue theorem and the Fourier inversion formula, which were also covered in Supplement 2. The relative ease of uncovering the Fourier-like duality between primes and zeroes (sometimes referred to poetically as the “music of the primes”) is one of the major advantages of the complex-analytic approach to multiplicative number theory; this important duality tends to be rather obscured in the other approaches to the subject, although it can still in principle be discernible with sufficient effort.

More generally, one has an explicit formula

for any (non-principal) Dirichlet character , where now ranges over the zeroes of the associated Dirichlet -function ; we view this formula as a “twist” of (1) by the Dirichlet character . The explicit formula (5), proven similarly (in any of its rigorous forms) to (1), is important in establishing the prime number theorem in arithmetic progressions, which asserts that

as , whenever is a fixed primitive residue class. Again, the size of the error term here is closely tied to the location of the zeroes of the Dirichlet -function, with particular importance given to whether there is a zero very close to (such a zero is known as an *exceptional zero* or Siegel zero).

While any information on the behaviour of zeta functions or -functions is in principle welcome for the purposes of analytic number theory, some regions of the complex plane are more important than others in this regard, due to the differing weights assigned to each zero in the explicit formula. Roughly speaking, in descending order of importance, the most crucial regions on which knowledge of these functions is useful are

- The region on or near the point .
- The region on or near the right edge of the
*critical strip*. - The right half of the critical strip.
- The region on or near the
*critical line*that bisects the critical strip. - Everywhere else.

For instance:

- We will shortly show that the Riemann zeta function has a simple pole at with residue , which is already sufficient to recover much of the classical theorems of Mertens discussed in the previous set of notes, as well as results on mean values of multiplicative functions such as the divisor function . For Dirichlet -functions, the behaviour is instead controlled by the quantity discussed in Notes 1, which is in turn closely tied to the existence and location of a Siegel zero.
- The zeta function is also known to have no zeroes on the right edge of the critical strip, which is sufficient to prove (and is in fact equivalent to) the prime number theorem. Any enlargement of the zero-free region for into the critical strip leads to improved error terms in that theorem, with larger zero-free regions leading to stronger error estimates. Similarly for -functions and the prime number theorem in arithmetic progressions.
- The (as yet unproven) Riemann hypothesis prohibits from having any zeroes within the right half of the critical strip, and gives very good control on the number of primes in intervals, even when the intervals are relatively short compared to the size of the entries. Even without assuming the Riemann hypothesis,
*zero density estimates*in this region are available that give some partial control of this form. Similarly for -functions, primes in short arithmetic progressions, and the generalised Riemann hypothesis. - Assuming the Riemann hypothesis, further distributional information about the zeroes on the critical line (such as Montgomery’s pair correlation conjecture, or the more general
*GUE hypothesis*) can give finer information about the error terms in the prime number theorem in short intervals, as well as other arithmetic information. Again, one has analogues for -functions and primes in short arithmetic progressions. - The functional equation of the zeta function describes the behaviour of to the left of the critical line, in terms of the behaviour to the right of the critical line. This is useful for building a “global” picture of the structure of the zeta function, and for improving a number of estimates about that function, but (in the absence of unproven conjectures such as the Riemann hypothesis or the pair correlation conjecture) it turns out that many of the basic analytic number theory results using the zeta function can be established without relying on this equation. Similarly for -functions.

Remark 1If one takes an “adelic” viewpoint, one can unite the Riemann zeta function and all of the -functions for various Dirichlet characters into a single object, viewing as a general multiplicative character on the adeles; thus the imaginary coordinate and the Dirichlet character are really the Archimedean and non-Archimedean components respectively of a single adelic frequency parameter. This viewpoint was famously developed in Tate’s thesis, which among other things helps to clarify the nature of the functional equation, as discussed in this previous post. We will not pursue the adelic viewpoint further in these notes, but it does supply a “high-level” explanation for why so much of the theory of the Riemann zeta function extends to the Dirichlet -functions. (The non-Archimedean character and the Archimedean character behave similarly from an algebraic point of view, but not so much from an analytic point of view; as such, the adelic viewpoint is well suited for algebraic tasks (such as establishing the functional equation), but not for analytic tasks (such as establishing a zero-free region).)

Roughly speaking, the elementary multiplicative number theory from Notes 1 corresponds to the information one can extract from the complex-analytic method in region 1 of the above hierarchy, while the more advanced elementary number theory used to prove the prime number theorem (and which we will not cover in full detail in these notes) corresponds to what one can extract from regions 1 and 2.

As a consequence of this hierarchy of importance, information about the function away from the critical strip, such as Euler’s identity

or equivalently

or the infamous identity

which is often presented (slightly misleadingly, if one’s conventions for divergent summation are not made explicit) as

are of relatively little direct importance in analytic prime number theory, although they are still of interest for some other, non-number-theoretic, applications. (The quantity does play a minor role as a normalising factor in some asymptotics, see e.g. Exercise 28 from Notes 1, but its precise value is usually not of major importance.) In contrast, the value of an -function at turns out to be extremely important in analytic number theory, with many results in this subject relying ultimately on a non-trivial lower-bound on this quantity coming from Siegel’s theorem, discussed below the fold.

For a more in-depth treatment of the topics in this set of notes, see Davenport’s “Multiplicative number theory“.

## Recent Comments