A fundamental problem in analytic number theory is to understand the distribution of the prime numbers . For technical reasons, it is convenient not to study the primes directly, but a proxy for the primes known as the von Mangoldt function , defined by setting to equal when is a prime (or a power of that prime) and zero otherwise. The basic reason why the von Mangoldt function is useful is that it encodes the fundamental theorem of arithmetic (which in turn can be viewed as the defining property of the primes) very neatly via the identity
for every natural number .
The most important result in this subject is the prime number theorem, which asserts that the number of prime numbers less than a large number is equal to :
Here, of course, denotes a quantity that goes to zero as .
It is not hard to see (e.g. by summation by parts) that this is equivalent to the asymptotic
for the von Mangoldt function (the key point being that the squares, cubes, etc. of primes give a negligible contribution, so is essentially the same quantity as ). Understanding the nature of the term is a very important problem, with the conjectured optimal decay rate of being equivalent to the Riemann hypothesis, but this will not be our concern here.
The prime number theorem has several important generalisations (for instance, there are analogues for other number fields such as the Chebotarev density theorem). One of the more elementary such generalisations is the prime number theorem in arithmetic progressions, which asserts that for fixed and with coprime to (thus ), the number of primes less than equal to mod is equal to , where is the Euler totient function:
(Of course, if is not coprime to , the number of primes less than equal to mod is . The subscript in the and notation denotes that the implied constants in that notation is allowed to depend on .) This is a more quantitative version of Dirichlet’s theorem, which asserts the weaker statement that the number of primes equal to mod is infinite. This theorem is important in many applications in analytic number theory, for instance in Vinogradov’s theorem that every sufficiently large odd number is the sum of three odd primes. (Imagine for instance if almost all of the primes were clustered in the residue class mod , rather than mod . Then almost all sums of three odd primes would be divisible by , leaving dangerously few sums left to cover the remaining two residue classes. Similarly for other moduli than . This does not fully rule out the possibility that Vinogradov’s theorem could still be true, but it does indicate why the prime number theorem in arithmetic progressions is a relevant tool in the proof of that theorem.)
As before, one can rewrite the prime number theorem in arithmetic progressions in terms of the von Mangoldt function as the equivalent form
Philosophically, one of the main reasons why it is so hard to control the distribution of the primes is that we do not currently have too many tools with which one can rule out “conspiracies” between the primes, in which the primes (or the von Mangoldt function) decide to correlate with some structured object (and in particular, with a totally multiplicative function) which then visibly distorts the distribution of the primes. For instance, one could imagine a scenario in which the probability that a randomly chosen large integer is prime is not asymptotic to (as is given by the prime number theorem), but instead to fluctuate depending on the phase of the complex number for some fixed real number , thus for instance the probability might be significantly less than when is close to an integer, and significantly more than when is close to a half-integer. This would contradict the prime number theorem, and so this scenario would have to be somehow eradicated in the course of proving that theorem. In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the Riemann zeta function at .
In the above scenario, the primality of a large integer was somehow sensitive to asymptotic or “Archimedean” information about , namely the approximate value of its logarithm. In modern terminology, this information reflects the local behaviour of at the infinite place . There are also potential consipracies in which the primality of is sensitive to the local behaviour of at finite places, and in particular to the residue class of mod for some fixed modulus . For instance, given a Dirichlet character of modulus , i.e. a completely multiplicative function on the integers which is periodic of period (and vanishes on those integers not coprime to ), one could imagine a scenario in which the probability that a randomly chosen large integer is prime is large when is close to , and small when is close to , which would contradict the prime number theorem in arithmetic progressions. (Note the similarity between this scenario at and the previous scenario at ; in particular, observe that the functions and are both totally multiplicative.) In the language of Dirichlet series, this conspiracy is more commonly known as a zero of the -function of at .
An especially difficult scenario to eliminate is that of real characters, such as the Kronecker symbol , in which numbers which are quadratic nonresidues mod are very likely to be prime, and quadratic residues mod are unlikely to be prime. Indeed, there is a scenario of this form – the Siegel zero scenario – which we are still not able to eradicate (without assuming powerful conjectures such as GRH), though fortunately Siegel zeroes are not quite strong enough to destroy the prime number theorem in arithmetic progressions.
It is difficult to prove that no conspiracy between the primes exist. However, it is not entirely impossible, because we have been able to exploit two important phenomena. The first is that there is often a “all or nothing dichotomy” (somewhat resembling the zero-one laws in probability) regarding conspiracies: in the asymptotic limit, the primes can either conspire totally (or more precisely, anti-conspire totally) with a multiplicative function, or fail to conspire at all, but there is no middle ground. (In the language of Dirichlet series, this is reflected in the fact that zeroes of a meromorphic function can have order , or order (i.e. are not zeroes after all), but cannot have an intermediate order between and .) As a corollary of this fact, the prime numbers cannot conspire with two distinct multiplicative functions at once (by having a partial correlation with one and another partial correlation with another); thus one can use the existence of one conspiracy to exclude all the others. In other words, there is at most one conspiracy that can significantly distort the distribution of the primes. Unfortunately, this argument is ineffective, because it doesn’t give any control at all on what that conspiracy is, or even if it exists in the first place!
But now one can use the second important phenomenon, which is that because of symmetries, one type of conspiracy can lead to another. For instance, because the von Mangoldt function is real-valued rather than complex-valued, we have conjugation symmetry; if the primes correlate with, say, , then they must also correlate with . (In the language of Dirichlet series, this reflects the fact that the zeta function and -functions enjoy symmetries with respect to reflection across the real axis (i.e. complex conjugation).) Combining this observation with the all-or-nothing dichotomy, we conclude that the primes cannot correlate with for any non-zero , which in fact leads directly to the prime number theorem (2), as we shall discuss below. Similarly, if the primes correlated with a Dirichlet character , then they would also correlate with the conjugate , which also is inconsistent with the all-or-nothing dichotomy, except in the exceptional case when is real – which essentially means that is a quadratic character. In this one case (which is the only scenario which comes close to threatening the truth of the prime number theorem in arithmetic progressions), the above tricks fail and one has to instead exploit the algebraic number theory properties of these characters instead, which has so far led to weaker results than in the non-real case.
As mentioned previously in passing, these phenomena are usually presented using the language of Dirichlet series and complex analysis. This is a very slick and powerful way to do things, but I would like here to present the elementary approach to the same topics, which is slightly weaker but which I find to also be very instructive. (However, I will not be too dogmatic about keeping things elementary, if this comes at the expense of obscuring the key ideas; in particular, I will rely on multiplicative Fourier analysis (both at and at finite places) as a substitute for complex analysis in order to expedite various parts of the argument. Also, the emphasis here will be more on heuristics and intuition than on rigour.)
The material here is closely related to the theory of pretentious characters developed by Granville and Soundararajan, as well as an earlier paper of Granville on elementary proofs of the prime number theorem in arithmetic progressions.
— 1. A heuristic elementary proof of the prime number theorem —
To motivate some of the later discussion, let us first give a highly non-rigorous heuristic elementary “proof” of the prime number theorem (2). Since we clearly have
where we will be deliberately vague as to what the “” symbol means. (One can think of this symbol as denoting some sort of proximity in the weak topology or vague topology, after suitable normalisation.)
(one can also use Stirling’s formula to obtain a similar asymptotic). As for the left-hand side, we write and then make the substitution to obtain
The right-hand side is the number of lattice points underneath the hyperbola , and can be counted using the Dirichlet hyperbola method:
The third sum is equal to . The second sum is equal to the first. The first sum can be computed as
Comparing this with (7) we do see that and are roughly equal “to top order” on average, thus giving some form of (5) and hence (4); if one could somehow invert the divisor sum operation, one could hope to get (3) and thus the prime number theorem.
(Looking at the next highest order terms in (7), (9), we see that we expect to in fact be slightly larger than on the average, and so should be slightly less than on the average. There is indeed a slight effect of this form; for instance, it is possible (using the prime number theorem) to prove
which should be compared with (8).)
One can partially translate the above discussion into the language of Dirichlet series, by transforming various arithmetical functions to their associated Dirichlet series
ignoring for now the issue of convergence of this series. By definition, the constant function transforms to the Riemann zeta function . Taking derivatives in , we see (formally, at least) that if has Dirichlet series , then has Dirichlet series ; thus, for instance, has Dirichlet series .
Most importantly, though, if have Dirichlet series respectively, then their Dirichlet convolution has Dirichlet series ; this is closely related to the well-known ability of the Fourier transform to convert convolutions to pointwise multiplication. Thus, for instance, has Dirichlet series . Also, from (1) and the preceding discussion, we see that has Dirichlet series (formally, at least). This already suggests that the von Mangoldt function will be sensitive to the zeroes of the zeta function.
An integral test computation closely related to (8) gives the asymptotic
for close to one (and , to avoid issues of convergence). This implies that the Dirichlet series for has asymptotic
thus giving support to (3); similarly, the Dirichlet series for and have asymptotic
Remark 1 One can connect the properties of Dirichlet series more rigorously to asymptotics of partial sums by means of various transforms in Fourier analysis and complex analysis, in particular contour integration or the Hilbert transform, but this becomes somewhat technical and we will not do so here. I will remark, though, that asymptotics of for close to are not enough, by themselves, to get really precise asymptotics for the sharply truncated partial sums , for reasons related to the uncertainty principle; in order to control such sums one also needs to understand the behaviour of far away from , and in particular for for large real . On the other hand, the asymptotics for for near are just about all one needs to control smoothly truncated partial sums such as for suitable cutoff functions . Also, while Dirichlet series are very powerful tools, particularly with regards to understanding Dirichlet convolution identities, and controlling everything in terms of the zeroes and poles of such series, they do have the drawback that they do not easily encode such fundamental “physical space” facts as the pointwise inequalities and , which are also an important aspect of the theory.
— 2. Almost primes —
One can hope to make the above heuristics precise by applying the Möbius inversion formula
where is the Möbius function, defined as when is the product of distinct primes for some , and zero otherwise. In terms of Dirichlet series, we thus see that has the Dirichlet series of , and so can invert the divisor sum operation (which corresponds to multiplication by ):
From (1) we then conclude
As noted in this previous post, we have
so we only obtain a bound of for (12) instead of . (A refinement of this argument, though, shows that the prime number theorem would follow if one had the asymptotic , which is in fact equivalent to the prime number theorem; see the previous post.)
We remark that if one computed or by the above methods, one would eventually be led to a variant of (13), namely
which is an estimate which will be useful later.
So we see that when trying to sum the von Mangoldt function by elementary means, the error term overwhelms the main term . But there is a slight tweaking of the von Mangoldt function, the second von Mangoldt function , that increases the size of the main term to while keeping the error term at , thus leading to a useful estimate; the price one pays for this is that this function is now a proxy for the almost primes rather than the primes. This function is defined by a variant of (10), namely
It is not hard to see that vanishes once has at least three distinct prime factors (basically because the quadratic function vanishes after being differentiated three or more times). Indeed, one can easily verify the identity
(which corresponds to the Dirichlet series identity ); the first term is mostly concentrated on primes, while the second term is mostly concentrated on semiprimes (products of two distinct primes).
Now let us sum . In analogy with the previous discussion, we will do so by comparing the function with something involving the divisor function. In view of (5), it is reasonable to try the approximation
Now we make these heuristics more precise. From the integral test we have
while from (9) and summation by parts one has
for some other constants .
One can view (20) as the “almost prime number theorem” – the analogue of the prime number theorem for almost primes.
The fact that the almost primes have a relatively easy asymptotic, while the genuine primes do not, is a reflection of the parity problem in sieve theory; see this post for further discussion. The symmetry formula is however enough to get “within a factor of two” of the prime number theorem: if we discard the semiprimes from (16), we see that , and thus
which by a summation by parts argument leads to
which is within a factor of of (2) in some sense.
One can “twist” all of the above arguments by a Dirichlet character . For instance, (15) twists to
On the other hand, if is a non-principal character of modulus , then it has mean zero on any interval with length , and it is then not hard to establish the asymptotic
This soon leads to the twisted version of (20):
thus almost primes are asymptotically unbiased with respect to non-principal characters.
From the multiplicative Fourier analysis of Dirichlet characters modulo (and the observation that is quite small on residue classes not coprime to ) one then has an “almost prime number theorem in arithmetic progressions”:
As before, this lets us come within a factor of two of the actual prime number theorem in arithmetic progressions:
One can also twist things by the completely multiplicative function , but with the caveat that the approximation to can locally correlate with . Thus for instance one has
for any fixed and ; in particular, if is non-principal, one has
— 3. The all-or-nothing dichotomy —
To summarise so far, the almost primes (as represented by ) are quite uniformly distributed. These almost primes can be split up into the primes (as represented by ) and the semiprimes (as represented by ), thanks to (16).
One can rewrite (16) as a recursive formula for :
This recursion, combined with the uniform distribution properties on , lead to various all-or-nothing dichotomies for . Suppose, for instance, that behaves like a constant on the average for some non-principal character :
Then (from (5)) we expect to behave like , thus
On the other hand, from (21), is asymptotically uncorrelated with :
Putting all this together, one obtains
which suggests that must be either close to , or close to .
Basically, the point is that there are only two equilibria for the recursion (23). One equilibrium occurs when is asymptotically uncorrelated with ; the other is when it is completely anti-correlated with , so that is supported primarily on those for which is close to . Note in the latter case for most primes , and thus for most semiprimes , thus leading to an equidistribution of for almost primes (weighted by ). Any intermediate distribution of would be inconsistent with the distribution of . (In terms of Dirichlet series, this assertion corresponds to the fact that the -function of either has a zero of order , or a zero of order (i.e. not a zero at all) at .)
A similar phenomenon occurs when twisting by ; basically, the average value of must asymptotically either be close to , or close to ; no other asymptotic ends up being compatible with the distribution of . (Again, this corresponds to the fact that the Riemann zeta function has a zero of order or at .) More generally, the average value of must asymptotically approach either or .
Remark 2 One can make the above heuristics precise either by using Dirichlet series (and analytic continuation, and the theory of zeroes of meromorphic functions), or by smoothing out arithmetic functions such as by a suitable multiplicative convolution with a mollifier (as is basically done in elementary proofs of the prime number theorem); see also the Granville-Soundararajan theory of pretentious characters for a closely related theory. We will not pursue these details here, however.
— 4. Dueling conspiracies —
In the previous section we have seen (heuristically, at least), that the von Mangoldt function (or more precisely, ) will either have no correlation, or a maximal amount of anti-correlation, with a completely multiplicative function such as , , or . On the other hand, it is not possible for this function to maximally anti-correlate (or to conspire) with two such functions; thus the presence of one conspiracy excludes the presence of all others.
Suppose for instance that we had two distinct non-principal characters for which one had maximal anti-correlation:
One could then combine the two statements to obtain
Meanwhile, doesn’t correlate with either or . It will be convenient to exploit this to normalise , obtaining
On the other hand, since , one has
and hence by the triangle inequality
in the sense that averages of the left-hand side should be at least as large as averages of the right-hand side. From this, (18), and Cauchy-Schwarz, one thus expects
Remark 3 The above argument belongs to a family of -based arguments which go by various names (almost orthogonality, , large sieve, etc.). The argument can more generally be used to establish square-summability estimates on averages such as as varies, but we will not make this precise here.
As one consequence of the above arguments, one can show that cannot maximally anti-correlate with any non-real character , since (by the reality of ) it would then also maximally anti-correlate with the complex conjugate , which is distinct from . A similar argument shows that cannot maximally anti-correlate with for any non-zero , a fact which can soon lead to the prime number theorem, either by Dirichlet series methods, by Fourier-analytic means, or by elementary means. (Sketch of Fourier-analytic proof: methods provide -type bounds on the averages of in , while the above arguments show that these averages are also small in . Applying (22) a few times to take advantage of the smoothing effects of convolution, one eventually concludes that these averages can be made arbitrarily small in asymptotically, at which point the prime number theorem follows from Fourier inversion.)
Remark 4 There is a slightly different argument of an nature rather than an nature (i.e. using tools such as the triangle inequality, union bound, etc.) that can also achieve similar results. For instance, suppose that maximally anti-correlates with and . Then for most primes , which implies that for most primes , which is incompatible with the all-or-nothing dichotomy unless is principal. This is an alternate way to exclude correlation with non-real characters. Similarly, if , then , which is also incompatible with the zero-one law; this is essentially the method underlying the standard proof of the prime number theorem (which relates with ).
— 5. Quadratic characters —
The one difficult scenario to eliminate is that of maximal anti-correlation with a real non-principal (i.e. quadratic) character , thus
This scenario implies that the quantity
vanishes. Indeed, if one starts with the identity
and sums in , one sees that
The left-hand side is by the mean zero and periodicity properties of . To estimate the right-hand side, we use the hyperbola method and rewrite it as
for some parameter (sufficiently slowly growing in ) to be optimised later. Writing and , we can express this as
sending (and at a slower rate) we conclude as required.
It is remarkably difficult to show that does not, in fact, vanish. One way to do this is to use the class number formula, that relates this quantity to the class number of the quadratic number field associated to the conductor of , together with some related number-theoretic quantities. A more elementary (but significantly weaker) method proceeds by using the easily verified fact that the convolution is non-negative, and is at least on the squares; this should be interpreted as a fact from algebraic number theory, and basically corresponds to the fact that the number of representations of an integer as the norm of an integer in (or more generally, as the norm of an ideal in that ring) is non-negative, and is at least on the squares. In particular we have
Putting all this together we have
which leads to a contradiction as if vanishes.
Note in fact that the above argument shows that is positive. If one carefully computes the dependence of the above argument on the modulus , one obtains a lower bound of the form , which is quite poor. Using a non-trivial improvement on the error term in counting lattice points under the hyperbola (or better still, by smoothing the sum ), one can improve this a bit, to . In contrast, the class number method gives a bound .
We can improve this even further for all but at most one real primitive character :
Theorem 1 (Siegel’s theorem) For every , one has for all but at most one real primitive character , where the implied constant is effective, and is the modulus of .
Throwing in this (hypothetical) one exceptional character, we conclude that for all real primitive characters , but now the implied constant is ineffective, which is the usual way in which Siegel’s theorem is formulated (but the above nearly effective refinement can be obtained by the same methods). It is a major open problem in the subject to eliminate this exceptional character and recover an effective estimate for some .
Proof: Let (which we can assume to be small), and let be a small number depending (effectively) on to be chosen later. Our task is to show that for all but at most one primitive real character . Note we may assume is large (effectively) depending on , as the claim follows from the previous bounds on otherwise.
Suppose then for contradiction that and for two distinct primitive real characters of (large) modulus respectively.
for any and any real . (One can get slightly better bounds by exploiting that is also at least on square numbers, as before, but this is really only useful for , and we are now going to take much closer to .)
On the other hand, one has the asymptotics
for any real close (but not equal) to , and similarly
for all real sufficiently close to . Indeed, one can expand the left-hand side of (26) as
and the claim then follows from the previous asymptotics. (One can improve the error term by smoothing the summation, but we will not need to do so here.)
Now set for a large absolute constant . If , then the error term in is then at most (say) if is large enough. We conclude from (25) that
for . Since and is assumed small (depending on ), this implies that is positive in the range
(this can be seen by checking the cases and separately). On the other hand, has a simple pole at with positive residue, and is thus negative for extremely close to . By the intermediate value theorem, we conclude that has a zero for some . Conversely, it is not difficult (using summation by parts) to show that for , and so by the mean value theorem we see that the zero of must also obey . Thus has a zero for some with
Now, we consider the function
One can also show that is non-negative and equals at , thus
(The algebraic number theory interpretation of this positivity is that is the number of representations of as the norm of an ideal in the biquadratic field generated by and .)
Also, by (a more complicated version of) the derivation of (26), one has
(say). Arguing as before, we conclude that
for . Using the bound (which can be established by summation by parts), we conclude that is positive in the range
taking geometric means and rearranging we obtain
But this contradicts the hypotheses , if is small enough.
Remark 5 Siegel’s theorem leads to a version of the prime number theorem in arithmetic progressions known as the Siegel-Walfisz theorem. As with Siegel’s theorem, the bounds are ineffective unless one is allowed to exclude a single exceptional modulus (and its multiples), in which case one has a modified prime number theorem which favours the quadratic nonresidues mod ; see this paper of Granville.
Remark 6 One can improve the effective bounds in Siegel’s theorem if one is allowed to exclude a larger set of bad moduli. For instance, the arguments in Section 4 allow one to establish a bound of the form after excluding at most one in each hyper-dyadic range for each ; one can of course replace by other exponents here, but at the cost of worsening the term. (This is essentially an observation of Landau.)