The Riemann zeta function , defined for by
and then continued meromorphically to other values of by analytic continuation, is a fundamentally important function in analytic number theory, as it is connected to the primes via the Euler product formula
(for , at least), where ranges over primes. (The equivalence between (1) and (2) is essentially the generating function version of the fundamental theorem of arithmetic.) The function has a pole at and a number of zeroes . A formal application of the factor theorem gives
where ranges over zeroes of , and we will be vague about what the factor is, how to make sense of the infinite product, and exactly which zeroes of are involved in the product. Equating (2) and (3) and taking logarithms gives the formal identity
where is the von Mangoldt function, defined to be when is a power of a prime , and zero otherwise. Thus we see that the behaviour of the primes (as encoded by the von Mangoldt function) is intimately tied to the distribution of the zeroes . For instance, if we knew that the zeroes were far away from the axis , then we would heuristically have
for real . On the other hand, the integral test suggests that
and thus we see that and have essentially the same (multiplicative) Fourier transform:
Inverting the Fourier transform (or performing a contour integral closely related to the inverse Fourier transform), one is led to the prime number theorem
In fact, the standard proof of the prime number theorem basically proceeds by making all of the above formal arguments precise and rigorous.
Unfortunately, we don’t know as much about the zeroes of the zeta function (and hence, about the function itself) as we would like. The Riemann hypothesis (RH) asserts that all the zeroes (except for the “trivial” zeroes at the negative even numbers) lie on the critical line ; this hypothesis would make the error terms in the above proof of the prime number theorem significantly more accurate. Furthermore, the stronger GUE hypothesis asserts in addition to RH that the local distribution of these zeroes on the critical line should behave like the local distribution of the eigenvalues of a random matrix drawn from the gaussian unitary ensemble (GUE). I will not give a precise formulation of this hypothesis here, except to say that the adjective “local” in the context of distribution of zeroes means something like “at scale when “.
Nevertheless, we do know some reasonably non-trivial facts about the zeroes and the zeta function , either unconditionally, or assuming RH (or GUE). Firstly, there are no zeroes for (as one can already see from the convergence of the Euler product (2) in this case) or for (this is trickier, relying on (6) and the elementary observation that
is non-negative for and ); from the functional equation
(which can be viewed as a consequence of the Poisson summation formula, see e.g. my blog post on this topic) we know that there are no zeroes for either (except for the trivial zeroes at negative even integers, corresponding to the poles of the Gamma function). Thus all the non-trivial zeroes lie in the critical strip .
We also know that there are infinitely many non-trivial zeroes, and can approximately count how many zeroes there are in any large bounded region of the critical strip. For instance, for large , the number of zeroes in this strip with is . This can be seen by applying (6) to (say); the trivial zeroes at the negative integers end up giving a contribution of to this sum (this is a heavily disguised variant of Stirling’s formula, as one can view the trivial zeroes as essentially being poles of the Gamma function), while the and terms end up being negligible (of size ), while each non-trivial zero contributes a term which has a non-negative real part, and furthermore has size comparable to if . (Here I am glossing over a technical renormalisation needed to make the infinite series in (6) converge properly.) Meanwhile, the left-hand side of (6) is absolutely convergent for and of size , and the claim follows. A more refined version of this argument shows that the number of non-trivial zeroes with is , but we will not need this more precise formula here. (A fair fraction – at least 40%, in fact – of these zeroes are known to lie on the critical line; see this earlier blog post of mine for more discussion.)
Another thing that we happen to know is how the magnitude of the zeta function is distributed as ; it turns out to be log-normally distributed with log-variance about . More precisely, we have the following result of Selberg:
Theorem 1 Let be a large number, and let be chosen uniformly at random from between and (say). Then the distribution of converges (in distribution) to the normal distribution .
To put it more informally, behaves like plus lower order terms for “typical” large values of . (Zeroes of are, of course, certainly not typical, but one can show that one can usually stay away from these zeroes.) In fact, Selberg showed a slightly more precise result, namely that for any fixed , the moment of converges to the moment of .
Remarkably, Selberg’s result does not need RH or GUE, though it is certainly consistent with such hypotheses. (For instance, the determinant of a GUE matrix asymptotically obeys a remarkably similar log-normal law to that given by Selberg’s theorem.) Indeed, the net effect of these hypotheses only affects some error terms in of magnitude , and are thus asymptotically negligible compared to the main term, which has magnitude about . So Selberg’s result, while very pretty, manages to finesse the question of what the zeroes of are actually doing – he makes the primes do most of the work, rather than the zeroes.
Selberg never actually published the above result, but it is reproduced in a number of places (e.g. in this book by Joyner, or this book by Laurincikas). As with many other results in analytic number theory, the actual details of the proof can get somewhat technical; but I would like to record here (partly for my own benefit) an informal sketch of some of the main ideas in the argument.
— 1. Informal overview of argument —
The first step is to get a usable (approximate) formula for . On the one hand, from the second part of (4) one has
This formula turns out not to be directly useful because it requires one to know much more about the distribution of the zeroes than we currently possess. On the other hand, from the first part of (4) and (5) one also has the formula
This formula also turns out not to be directly useful, because it requires one to know much more about the distribution of the primes than we currently possess.
However, it turns out that we can “split the difference” between (7), (8), and get a formula for which involves some zeroes and some primes , in a manner that one can control them both. Roughly speaking, the formula looks like this:
for and , where is a small parameter that we can choose (e.g. ); thus we have localised the prime sum to the primes of size , and the zero sum to those zeroes at a distance from . (This is an oversimplification; there is a “tail” coming from those zeroes that are more distant from than , and also one has to smooth out the sum in a little bit, and allow the implied constants in the notation to depend on , but let us ignore these technical issues here, as well as the issue of what exactly is hiding in the error.)
It turns out that all of these expressions can be controlled. The error term coming from the zeroes (as well as the error term) turn out to be of size for most values of , so are a lower order term. (As mentioned before, it is this error term that would be better controlled if one had RH or GUE, but this is not necessary to establish Selberg’s result.) The main term is the one coming from the primes.
We can heuristically argue as follows. The expression , for ranging between and , is a random variable of mean zero and variance approximately (if and is small). Making the heuristic assumption that the behave as if they were independent, the central limit theorem then suggests that the sum should behave like a normal distribution of mean zero and variance . But the claim now follows from the classical estimate
(which follows from the prime number theorem, but can also be deduced from the formula (8) for , using the fact that has a simple pole at ).
To summarise, there are three main tasks to establish Selberg’s theorem:
- Establish a formula along the lines of (9);
- Show that the error terms arising from zeroes are on the average;
- Justify the central limit calculation for .
I’ll now briefly talk (informally) about each of the three steps in turn.
— 2. The explicit formula —
To get a formula such as (9), the basic strategy is to take a suitable average of the formula (7) and the formula (8). Traditionally, this is done by contour integration; however, I prefer (perhaps idiosyncratically) to take a more Fourier-analytic perspective, using convolutions rather than contour integrals. (The two approaches are largely equivalent, though.) The basic point is that the imaginary part of the zeroes inhabits the same space as the imaginary part of the variable, which in turn is the Fourier analytic dual of the variable that the logarithm of the primes live in; this can be seen by writing (7), (8) in a more Fourier-like manner as
(These sorts of Fourier-analytic connections are often summarised by the slogan “the zeroes of the zeta function are the music of the primes”.) The uncertainty principle then predicts that localising to the scale should result in blurring out the zeroes at scale , which is where (9) is going to come from.
where is some bump function with total mass ; informally, this is averaged out in the vertical direction at scale (we allow implied constants to depend on ).
We can express (10) in two different ways, one using (7), and one using (8). Let’s look at (7) first. If one modifies by , then the quantity doesn’t fluctuate very much, unless is within of , in which case it can move by about . As a consequence, we see that
when , and
Introducing the Fourier transform of , one can write this as
Now we took to be a bump function, so its Fourier transform should also be like a bump function (or perhaps a Schwartz function). As a first approximation, one can thus think of as a smoothed truncation to the region , thus the weight is morally restricting to the region . Thus we (morally) can express (10) as
— 3. Controlling the zeroes —
Next, we want to show that the quantity
is on the average, when and is chosen uniformly at random from to .
For this, we can use the first moment method. For each zero , let be the random variable which equals when and zero otherwise, thus we are trying to control the expectation of . The only zeroes which are relevant are those which are of size , and we know that there are of these (indeed, we have an even more precise formula, as remarked earlier). On the other hand, a randomly chosen has a probability of of falling within of , and so we expect each to have an expected value of . (The logarithmic factor in the definition of turns out not to be an issue, basically because is locally integrable.) By linearity of expectation, we conclude that has expectation , and the claim follows.
Remark 1 One can actually do a bit better than this, showing that higher order moments of are also , by using a variant of (9) together with the moment bounds in the next section; but we will not need that refinement here.
— 4. The central limit theorem —
Finally, we have to show that behaves like a normal distribution, as predicted by the central limit theorem heuristic. The key is to show that the behave “as if” they were jointly independent. In particular, as the all have mean zero, one would like to show that products such as
have a negligible expectation as long as at least one of the primes in occurs at most once. Once one has this (as well as a similar formula for the case when all primes appear at least twice), one can then do a standard moment computation of the moment and verify that this moment then matches the answer predicted by the central limit theorem, which by standard arguments (involving the Weierstrass approximation theorem) is enough to establish the distributional law. Note that to get close to the normal distribution by a fixed amount of accuracy, it suffices to control a bounded number of moments, which ultimately means that we can treat as being bounded, .
If we expand out the product (12), we get
Using the product formula for cosines (or Euler’s formula), the product of cosines here can be expressed as a linear combination of cosines , where the frequency takes the form
Thus, is the logarithm of a rational number, whose numerator and denominator are the product of some of the . Since all the are at most , we see that the numerator and denominator here are at most .
Now for the punchline. If there is a prime in that appears only once, then the numerator and denominator cannot fully cancel, by the fundamental theorem of arithmetic. Thus cannot be . Furthermore, since the denominator is at most , we see that must stay away from by a distance of about or more, and so has a wavelength of at most . On the other hand, ranges between and . If is fixed and is small enough (much smaller than ), we thus see that the average value of between and is close to zero, and so (12) does indeed have negligible expectation as claimed. (A similar argument lets one compute the expectation of (12) when all primes appear at least twice.)
Remark 2 A famous theorem of Erdös and Kac gives a normal distribution for the number of prime factors of a large number , with mean and variance . One can view Selberg’s theorem as a sort of Fourier-analytic variant of the Erdös-Kac theorem.
Remark 3 The Fourier-like correspondence between zeroes of the zeta function and primes can be used to convert statements about zeroes, such as the Riemann hypothesis and the GUE hypothesis, into equivalent statements about primes. For instance, the Riemann hypothesis is equivalent to having the square root error term
in the prime number theorem holding asymptotically as for all and all intervals which are large in the sense that is comparable to . Meanwhile, the pair correlation conjecture (the simplest component of the GUE hypothesis) is equivalent (on RH) to the square root error term holding (with the expected variance) for all and almost all intervals which are short in the sense that for some small (fixed) . (This is a rough statement; a more precise formulation can be found in this paper of Goldston and Montgomery.) It seems to me that reformulation of the full GUE hypothesis in terms of primes should be similar, but would assert that the error term in the prime number theorem (as well as variants of this theorem for almost primes) in short intervals enjoys the expected normal distribution; I don’t know of a precise formulation of this assertion, but calculations in this direction lie in this paper of Bogomolny and Keating.)