Mertens’ theorems are a set of classical estimates concerning the asymptotic distribution of the prime numbers:

Theorem 1 (Mertens’ theorems)In the asymptotic limit , we havewhere is the Euler-Mascheroni constant, defined by requiring that

The third theorem (3) is usually stated in exponentiated form

but in the logarithmic form (3) we see that it is strictly stronger than (2), in view of the asymptotic .

Remarkably, these theorems can be proven without the assistance of the prime number theorem

which was proven about two decades after Mertens’ work. (But one can certainly use versions of the prime number theorem with good error term, together with summation by parts, to obtain good estimates on the various errors in Mertens’ theorems.) Roughly speaking, the reason for this is that Mertens’ theorems only require control on the Riemann zeta function in the neighbourhood of the pole at , whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line . Specifically, Mertens’ theorem is ultimately deduced from the Euler product formula

valid in the region (which is ultimately a Fourier-Dirichlet transform of the fundamental theorem of arithmetic), and following crude asymptotics:

Proposition 2 (Simple pole)For sufficiently close to with , we have

*Proof:* For as in the proposition, we have for any natural number and , and hence

Summing in and using the identity , we obtain the first claim. Similarly, we have

and by summing in and using the identity (the derivative of the previous identity) we obtain the claim.

The first two of Mertens’ theorems (1), (2) are relatively easy to prove, and imply the third theorem (3) except with replaced by an unspecified absolute constant. To get the specific constant requires a little bit of additional effort. From (4), one might expect that the appearance of arises from the refinement

that one can obtain to (6). However, it turns out that the connection is not so much with the zeta function, but with the Gamma function, and specifically with the identity (which is of course related to (7) through the functional equation for zeta, but can be proven without any reference to zeta functions). More specifically, we have the following asymptotic for the exponential integral:

Proposition 3 (Exponential integral asymptotics)For sufficiently small , one has

A routine integration by parts shows that this asymptotic is equivalent to the identity

which is the identity mentioned previously.

*Proof:* We start by using the identity to express the harmonic series as

or on summing the geometric series

Since , we thus have

making the change of variables , this becomes

As , converges pointwise to and is pointwise dominated by . Taking limits as using dominated convergence, we conclude that

or equivalently

The claim then follows by bounding the portion of the integral on the left-hand side.

Below the fold I would like to record how Proposition 2 and Proposition 3 imply Theorem 1; the computations are utterly standard, and can be found in most analytic number theory texts, but I wanted to write them down for my own benefit (I always keep forgetting, in particular, how the third of Mertens’ theorems is proven).

** — 1. Proof of Mertens’ theorems — **

Let be as in Proposition 2. Taking logarithms of using (5), we obtain the asymptotic

using the standard branch of the logarithm; if we instead compute the logarithmic derivative , we obtain the closely related asymptotic

which by the approximation also gives

These are already very close to Mertens’ theorems, except that the sharp cutoff has been replaced by a smoother cutoff . To pass from the smooth cutoff to the rough, we use Fourier analysis:

Proposition 4Let be a compactly supported, Riemann integrable function independent of . Then as , we have

*Proof:* By approximating above and below by smooth functions, we may assume without loss of generality that is smooth. Applying the Fourier inversion formula to the function , we thus have a Fourier representation of the form

for some rapidly decreasing function . The left-hand side of (10) may then be rewritten as

The summation here may be crudely bounded in magnitude by

thanks to (9). From this and the rapid decrease of , we see that the contribution of the integrand for which (say) is . In the remaining region, we may apply (9) to estimate the left-hand side of (10) as

The net contribution of the error is again , so from the rapid decrease of again we may rewrite this expression as

and the claim then follows by integrating (11) on .

Setting in the above proposition gives a weak version of the first Mertens’ theorem (1), in which the error term has worsened to . To recover the full term by this Fourier-analytic method requires a more precise accounting of error terms in the above estimates. Firstly, if one uses (7) instead of (6), one eventually can improve (9) to

for some absolute constant , and for sufficiently close to with . Next, we recall the analogue of (11) for the function , namely

which can be easily verified through contour integration. This formula is not ideal for our purposes (it would require controlling up and down the line ), so we truncate it, introducing the function

where is a smooth cutoff to a sufficiently small interval around the origin, that equals one on , for some small fixed . Repeating the proof of Proposition 4, and using (12) instead of (9), we conclude (if is small enough) that

Routine calculation shows that the right-hand side is . On the other hand, Littlewood-Paley theory can be used to show that

(say). From this we have that

and (1) follows.

Remark 1A more elementary proof of (1) can be obtained by starting with the identity and summing to conclude thatOn the one hand, Stirling’s formula gives . On the other hand, we have and , hence

from which (1) easily follows.

We now skip to the third Mertens’ theorem (3), since as observed previously the second Mertens’ theorem (2) follows as a corollary (and also follows from the first theorem through summation by parts). Let be a fixed quantity (independent of ). From (8) with we have

where the asymptotic notation refers to the limit . Let us first consider the contribution to this sum from those with :

By Taylor expansion we have , and hence the above sum is equal to

or equivalently

where . From Proposition 4 we have

for any fixed . One can bound the tail error from from the first Mertens’ theorem (1) and dyadic decomposition, and on taking limits as we conclude that

scaling by and then using Proposition 3, we conclude that

subtracting this from (13) we conclude that

From the mean value theorem we have

so from the first Mertens’ theorem (1), we can write the left-hand side of (14) as

Putting all this together, we see that

Since may be chosen arbitrarily small, we obtain the third Mertens’ theorem (3) as required.

## 7 comments

Comments feed for this article

18 December, 2013 at 3:28 pm

The beauty of the Riemann sphere | cartesian product[…] Mertens’ theorems (terrytao.wordpress.com) […]

26 December, 2013 at 10:41 am

MrCactu5 (@MonsieurCactus)So let me get this straight, Prof Tao. In order to do analysis on the primes we need a good scaling for our graph paper:

Rescale the primes as

Weight our function by

And it is just like the Riemann integral. The closest I can find is logarithmic graph paper. http://www.printablepaper.net/category/log

7 January, 2014 at 6:48 am

MizarDear Prof. Tao, thank you for this great post!

Using Proposition 4 we obtain a slightly weaker statement than that in the first Theorem, namely , right?

Moreover I want to point out two (negligible) typos: after you talk about dyadic decomposition, you write twice, instead of .

[Corrected, thanks - T.]28 January, 2014 at 9:19 pm

Polymath8b, VII: Using the generalised Elliott-Halberstam hypothesis to enlarge the sieve support yet further | What's new[…] from Mertens’ theorem one easily verifies […]

6 June, 2014 at 12:23 pm

jaspgusHello,

I have been finding some difficulty when I try to fill in some of the details in the proof of proposition 4, specifically when applying it to the smooth approximations of . Namely, the implied constants I’ve been getting depend on the magnitude of the derivatives of the approximations, which becomes a problem since I think these get very large as the approximations get better.

Do you have any tips I could try? The only kind of rapid decay on the fourier transform I’ve really tried come from repeated integration by parts, perhaps there is a stronger type of decay?

6 June, 2014 at 4:10 pm

Terence TaoOops, you’re right, the argument provided only gives the weaker error term of . I’ve now put in the details of the more complicated argument that gives the O(1) error term this way (as well as a more elementary proof of the O(1) term that avoids all Fourier analysis and zeta function estimates).

5 July, 2014 at 8:58 am

sdwkn__lshaMay will be useful