Mertens’ theorems are a set of classical estimates concerning the asymptotic distribution of the prime numbers:

Theorem 1 (Mertens’ theorems) In the asymptotic limit {x \rightarrow \infty}, we have

\displaystyle  \sum_{p\leq x} \frac{\log p}{p} = \log x + O(1), \ \ \ \ \ (1)

\displaystyle  \sum_{p\leq x} \frac{1}{p} = \log \log x + O(1), \ \ \ \ \ (2)


\displaystyle  \sum_{p\leq x} \log(1-\frac{1}{p}) = -\log \log x - \gamma + o(1) \ \ \ \ \ (3)

where {\gamma} is the Euler-Mascheroni constant, defined by requiring that

\displaystyle  1 + \frac{1}{2} + \ldots + \frac{1}{n} = \log n + \gamma + o(1) \ \ \ \ \ (4)

in the limit {n \rightarrow \infty}.

The third theorem (3) is usually stated in exponentiated form

\displaystyle  \prod_{p \leq x} (1-\frac{1}{p}) = \frac{e^{-\gamma}+o(1)}{\log x},

but in the logarithmic form (3) we see that it is strictly stronger than (2), in view of the asymptotic {\log(1-\frac{1}{p}) = -\frac{1}{p} + O(\frac{1}{p^2})}.

Remarkably, these theorems can be proven without the assistance of the prime number theorem

\displaystyle  \sum_{p \leq x} 1 = \frac{x}{\log x} + o( \frac{x}{\log x} ),

which was proven about two decades after Mertens’ work. (But one can certainly use versions of the prime number theorem with good error term, together with summation by parts, to obtain good estimates on the various errors in Mertens’ theorems.) Roughly speaking, the reason for this is that Mertens’ theorems only require control on the Riemann zeta function {\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}} in the neighbourhood of the pole at {s=1}, whereas (as discussed in this previous post) the prime number theorem requires control on the zeta function on (a neighbourhood of) the line {\{ 1+it: t \in {\bf R} \}}. Specifically, Mertens’ theorem is ultimately deduced from the Euler product formula

\displaystyle  \zeta(s) = \prod_p (1-\frac{1}{p^s})^{-1}, \ \ \ \ \ (5)

valid in the region {\hbox{Re}(s) > 1} (which is ultimately a Fourier-Dirichlet transform of the fundamental theorem of arithmetic), and following crude asymptotics:

Proposition 2 (Simple pole) For {s} sufficiently close to {1} with {\hbox{Re}(s) > 1}, we have

\displaystyle  \zeta(s) = \frac{1}{s-1} + O(1) \ \ \ \ \ (6)


\displaystyle  \zeta'(s) = \frac{-1}{(s-1)^2} + O(1).

Proof: For {s} as in the proposition, we have {\frac{1}{n^s} = \frac{1}{t^s} + O(\frac{1}{n^2})} for any natural number {n} and {n \leq t \leq n+1}, and hence

\displaystyle  \frac{1}{n^s} = \int_n^{n+1} \frac{1}{t^s}\ dt + O( \frac{1}{n^2} ).

Summing in {n} and using the identity {\int_1^\infty \frac{1}{t^s}\ dt = \frac{1}{s-1}}, we obtain the first claim. Similarly, we have

\displaystyle  \frac{-\log n}{n^s} = \int_n^{n+1} \frac{-\log t}{t^s}\ dt + O( \frac{\log n}{n^2} ),

and by summing in {n} and using the identity {\int_1^\infty \frac{-\log t}{t^s}\ dt = \frac{-1}{(s-1)^2}} (the derivative of the previous identity) we obtain the claim. \Box

The first two of Mertens’ theorems (1), (2) are relatively easy to prove, and imply the third theorem (3) except with {\gamma} replaced by an unspecified absolute constant. To get the specific constant {\gamma} requires a little bit of additional effort. From (4), one might expect that the appearance of {\gamma} arises from the refinement

\displaystyle  \zeta(s) = \frac{1}{s-1} + \gamma + O(|s-1|) \ \ \ \ \ (7)

that one can obtain to (6). However, it turns out that the connection is not so much with the zeta function, but with the Gamma function, and specifically with the identity {\Gamma'(1) = - \gamma} (which is of course related to (7) through the functional equation for zeta, but can be proven without any reference to zeta functions). More specifically, we have the following asymptotic for the exponential integral:

Proposition 3 (Exponential integral asymptotics) For sufficiently small {\epsilon}, one has

\displaystyle  \int_\epsilon^\infty \frac{e^{-t}}{t}\ dt = \log \frac{1}{\epsilon} - \gamma + O(\epsilon).

A routine integration by parts shows that this asymptotic is equivalent to the identity

\displaystyle  \int_0^\infty e^{-t} \log t\ dt = -\gamma

which is the identity {\Gamma'(1)=-\gamma} mentioned previously.

Proof: We start by using the identity {\frac{1}{i} = \int_0^1 x^{i-1}\ dx} to express the harmonic series {H_n := 1+\frac{1}{2}+\ldots+\frac{1}{n}} as

\displaystyle  H_n = \int_0^1 1 + x + \ldots + x^{n-1}\ dx

or on summing the geometric series

\displaystyle  H_n = \int_0^1 \frac{1-x^n}{1-x}\ dx.

Since {\int_0^{1-1/n} \frac{1}{1-x} = \log n}, we thus have

\displaystyle  H_n - \log n = \int_0^1 \frac{1_{[1-1/n,1]}(x) - x^n}{1-x}\ dx;

making the change of variables {x = 1-\frac{t}{n}}, this becomes

\displaystyle  H_n - \log n = \int_0^n \frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}\ dt.

As {n \rightarrow \infty}, {\frac{1_{[0,1]}(t) - (1-\frac{t}{n})^n}{t}} converges pointwise to {\frac{1_{[0,1]}(t) - e^{-t}}{t}} and is pointwise dominated by {O( e^{-t} )}. Taking limits as {n \rightarrow \infty} using dominated convergence, we conclude that

\displaystyle  \gamma = \int_0^\infty \frac{1_{[0,1]}(t) - e^{-t}}{t}\ dt.

or equivalently

\displaystyle  \int_0^\infty \frac{e^{-t} - 1_{[0,\epsilon]}(t)}{t}\ dt = \log \frac{1}{\epsilon} - \gamma.

The claim then follows by bounding the {\int_0^\epsilon} portion of the integral on the left-hand side. \Box

Below the fold I would like to record how Proposition 2 and Proposition 3 imply Theorem 1; the computations are utterly standard, and can be found in most analytic number theory texts, but I wanted to write them down for my own benefit (I always keep forgetting, in particular, how the third of Mertens’ theorems is proven).

— 1. Proof of Mertens’ theorems —

Let {s} be as in Proposition 2. Taking logarithms of {\zeta(s)} using (5), we obtain the asymptotic

\displaystyle  \sum_p \log( 1 - \frac{1}{p^s} ) = \log(s-1) + O( |s-1| ) \ \ \ \ \ (8)

using the standard branch of the logarithm; if we instead compute the logarithmic derivative {\frac{\zeta'(s)}{\zeta(s)}}, we obtain the closely related asymptotic

\displaystyle  \sum_p \frac{\log p}{p^s - 1} = \frac{1}{s-1} + O( 1 )

which by the approximation {\frac{\log p}{p^s - 1} = \frac{\log p}{p^s} + O(\frac{\log p}{p^2} )} also gives

\displaystyle  \sum_p \frac{\log p}{p^s} = \frac{1}{s-1} + O( 1 ). \ \ \ \ \ (9)

These are already very close to Mertens’ theorems, except that the sharp cutoff {1_{[0,x]}(p)} has been replaced by a smoother cutoff {\frac{1}{p^{s-1}}}. To pass from the smooth cutoff to the rough, we use Fourier analysis:

Proposition 4 Let {F: {\bf R} \rightarrow {\bf C}} be a compactly supported, Riemann integrable function independent of {x}. Then as {x \rightarrow \infty}, we have

\displaystyle  \sum_p \frac{\log p}{p} F( \frac{\log p}{\log x} ) = \log x \int_0^\infty F(t)\ dt + o(\log x). \ \ \ \ \ (10)

Proof: By approximating {F} above and below by smooth functions, we may assume without loss of generality that {F} is smooth. Applying the Fourier inversion formula to the function {t \mapsto e^t F(t)}, we thus have a Fourier representation of the form

\displaystyle  F(t) = \int_{\bf R} e^{-(1+iu) t} f(u)\ du \ \ \ \ \ (11)

for some rapidly decreasing function {f}. The left-hand side of (10) may then be rewritten as

\displaystyle  \int_{\bf R} f(u) (\sum_p \frac{\log p}{p^{1 + \frac{1+iu}{\log x}}})\ du.

The summation here may be crudely bounded in magnitude by

\displaystyle  |\sum_p \frac{\log p}{p^{1 + \frac{1+iu}{\log x}}}| \leq \sum_p \frac{\log p}{p^{1+1/\log x}} \ll \log x

thanks to (9). From this and the rapid decrease of {f}, we see that the contribution of the integrand for which {|u| \geq \sqrt{\log x}} (say) is {O(1)}. In the remaining region, we may apply (9) to estimate the left-hand side of (10) as

\displaystyle  \int_{|u| \leq \sqrt{\log x}} f(u) (\frac{\log x}{1+iu} + O(1) )\ du + O(1).

The net contribution of the {O(1)} error is again {O(1)}, so from the rapid decrease of {f} again we may rewrite this expression as

\displaystyle  \log x \int_{\bf R} \frac{f(u)}{1+iu}\ du + O(1)

and the claim then follows by integrating (11) on {[0,+\infty)}. \Box

Setting {F:= 1_{[0,1]}} in the above proposition gives a weak version of the first Mertens’ theorem (1), in which the {O(1)} error term has worsened to {o(\log x)}. To recover the full {O(1)} term by this Fourier-analytic method requires a more precise accounting of error terms in the above estimates. Firstly, if one uses (7) instead of (6), one eventually can improve (9) to

\displaystyle  \sum_p \frac{\log p}{p^s} = \frac{1}{s-1} + C + O( |s-1| ) \ \ \ \ \ (12)

for some absolute constant {C}, and for {s} sufficiently close to {1} with {\hbox{Re}(s)>1}. Next, we recall the analogue of (11) for the function {1_{(-\infty,1]}}, namely

\displaystyle  1_{(-\infty,1]}(t) = \frac{1}{2\pi} p.v. \int_{\bf R} \frac{e^{(1+iu)(1-t)}}{1+iu}\ du

which can be easily verified through contour integration. This formula is not ideal for our purposes (it would require controlling {\zeta} up and down the line {\hbox{Re} s = 1}), so we truncate it, introducing the function

\displaystyle  F(t) := \frac{1}{2\pi} \int_{\bf R} \frac{e^{(1+iu)(1-t)}}{1+iu} \psi( \frac{u}{\log x} )\ du

where {\psi} is a smooth cutoff to a sufficiently small interval {[-\epsilon,\epsilon]} around the origin, that equals one on {[-\epsilon/2,\epsilon/2]}, for some small fixed {\epsilon>0}. Repeating the proof of Proposition 4, and using (12) instead of (9), we conclude (if {\epsilon} is small enough) that

\displaystyle  \sum_p \frac{\log p}{p} F( \frac{\log p}{\log x} ) = \frac{1}{2\pi} \int_{\bf R} ( \frac{\log x}{1+iu} + C + O( \frac{1+|u|}{\log x} )) \frac{\psi(\frac{u}{\log x})}{1+iu}\ du.

Routine calculation shows that the right-hand side is {\log x + O(1)}. On the other hand, Littlewood-Paley theory can be used to show that

\displaystyle  F(t) = 1_{[-\infty,1)}(t) + O( (1 + |1-t| \log x )^{-10} )

(say). From this we have that

\displaystyle  \sum_p \frac{\log p}{p} F( \frac{\log p}{\log x} ) = \sum_{p \leq x} \frac{\log p}{p}

and (1) follows.

Remark 1 A more elementary proof of (1) can be obtained by starting with the identity {\log n = \sum_{d|n} \Lambda(d)} and summing to conclude that

\displaystyle  \sum_{n \leq x} \log n = \sum_{d \leq x} \Lambda(d) \lfloor \frac{x}{d} \rfloor.

On the one hand, Stirling’s formula gives {\sum_{n \leq x} \log n = x \log x + O(x)}. On the other hand, we have {\lfloor \frac{x}{d} \rfloor = \frac{x}{d}+O(1)} and {\sum_{d \leq x} \Lambda(d) = O(x)}, hence

\displaystyle  \sum_{d \leq x} \frac{\Lambda(d)}{d} = \log x + O(1),

from which (1) easily follows.

We now skip to the third Mertens’ theorem (3), since as observed previously the second Mertens’ theorem (2) follows as a corollary (and also follows from the first theorem through summation by parts). Let {0 < \epsilon < 1} be a fixed quantity (independent of {x}). From (8) with {s := 1+\frac{\epsilon}{\log x}} we have

\displaystyle  \sum_p \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = - \log \frac{1}{\epsilon} - \log \log x + o(1) \ \ \ \ \ (13)

where the asymptotic notation refers to the limit {x \rightarrow \infty}. Let us first consider the contribution to this sum from those {p} with {p>x}:

\displaystyle  \sum_{p>x} \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ).

By Taylor expansion we have {\log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = -\frac{1}{p^{1+\epsilon/\log x}} + O( \frac{1}{p^2} )}, and hence the above sum is equal to

\displaystyle  - \sum_{p>x} \frac{1}{p^{1+\epsilon/\log x}} + o(1)

or equivalently

\displaystyle  - \frac{1}{\log x} \sum_{p>x} \frac{\log p}{p} F( \frac{\log p}{\log x} ) + o(1)

where {F(t) := \frac{e^{-\epsilon t}}{t}}. From Proposition 4 we have

\displaystyle  \frac{1}{\log x} \sum_{x \leq p \leq x^N} \frac{\log p}{p} F( \frac{\log p}{\log x} ) = \int_1^N \frac{e^{-\epsilon t}}{t}\ dt + o(1)

for any fixed {N>1}. One can bound the tail error from {p > x^N} from the first Mertens’ theorem (1) and dyadic decomposition, and on taking limits as {N \rightarrow \infty} we conclude that

\displaystyle  \sum_{p>x} \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = - \int_1^\infty \frac{e^{-\epsilon t}}{t} \ dt + o(1);

scaling {t} by {\epsilon} and then using Proposition 3, we conclude that

\displaystyle  \sum_{p>x} \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = - \log \frac{1}{\epsilon} + \gamma + O(\epsilon) + o(1);

subtracting this from (13) we conclude that

\displaystyle  \sum_{p \leq x} \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = - \log\log x - \gamma + O(\epsilon) + o(1). \ \ \ \ \ (14)

From the mean value theorem we have

\displaystyle  \log( 1 - \frac{1}{p^{1+\epsilon/\log x}} ) = \log(1-\frac{1}{p}) + O( \frac{\epsilon \log p}{p \log x} )

so from the first Mertens’ theorem (1), we can write the left-hand side of (14) as

\displaystyle  \sum_{p \leq x} \log( 1 - \frac{1}{p} ) + O( \epsilon ).

Putting all this together, we see that

\displaystyle  \sum_{p \leq x} \log( 1 - \frac{1}{p} ) = -\log\log x - \gamma + O(\epsilon) + o(1).

Since {\epsilon} may be chosen arbitrarily small, we obtain the third Mertens’ theorem (3) as required.