In analytic number theory, an arithmetic function is simply a function from the natural numbers to the real or complex numbers. (One occasionally also considers arithmetic functions taking values in more general rings than or , as in this previous blog post, but we will restrict attention here to the classical situation of real ofr complex arithmetic functions.) Experience has shown that a particularly tractable and relevant class of arithmetic functions for analytic number theory are the multiplicative functions, which are arithmetic functions with the additional property that

whenever are coprime. (One also considers arithmetic functions, such as the logarithm function or the von Mangoldt function, that are not genuinely multiplicative, but interact closely with multiplicative functions, and can be viewed as “derived” versions of multiplicative functions; see this previous post.) A typical example of a multiplicative function is the divisor function

that counts the number of divisors of a natural number . (The divisor function is also denoted in the literature.) The study of asymptotic behaviour of multiplicative functions (and their relatives) is known as multiplicative number theory, and is a basic cornerstone of modern analytic number theory.

There are various approaches to multiplicative number theory, each of which focuses on different asymptotic statistics of arithmetic functions . In *elementary multiplicative number theory*, which is the focus of this set of notes, particular emphasis is given on the following two statistics of a given arithmetic function :

- The
*summatory functions*of an arithmetic function , as well as the associated natural density

(if it exists).

- The
*logarithmic sums*of an arithmetic function , as well as the associated

*logarithmic density*(if it exists).

Here, we are normalising the arithmetic function being studied to be of roughly unit size up to logarithms, obeying bounds such as , , or at worst

A classical case of interest is when is an indicator function of some set of natural numbers, in which case we also refer to the natural or logarithmic density of as the natural or logarithmic density of respectively. However, in analytic number theory it is usually more convenient to replace such indicator functions with other related functions that have better multiplicative properties. For instance, the indicator function of the primes is often replaced with the von Mangoldt function .

Typically, the logarithmic sums are relatively easy to control, but the summatory functions require more effort in order to obtain satisfactory estimates; see Exercise 7 below.

If an arithmetic function is multiplicative (or closely related to a multiplicative function), then there is an important further statistic on an arithmetic function beyond the summatory function and the logarithmic sum, namely the Dirichlet series

for various real or complex numbers . Under the hypothesis (3), this series is absolutely convergent for real numbers , or more generally for complex numbers with . As we will see below the fold, when is multiplicative then the Dirichlet series enjoys an important Euler product factorisation which has many consequences for analytic number theory.

In the elementary approach to multiplicative number theory presented in this set of notes, we consider Dirichlet series only for real numbers (and focusing particularly on the asymptotic behaviour as ); in later notes we will focus instead on the important *complex-analytic* approach to multiplicative number theory, in which the Dirichlet series (4) play a central role, and are defined not only for complex numbers with large real part, but are often extended analytically or meromorphically to the rest of the complex plane as well.

Remark 1The elementary and complex-analytic approaches to multiplicative number theory are the two classical approaches to the subject. One could also consider a more “Fourier-analytic” approach, in which one studies convolution-type statistics such as

as for various cutoff functions , such as smooth, compactly supported functions. See for instance this previous blog post for an instance of such an approach. Another related approach is the “pretentious” approach to multiplicative number theory currently being developed by Granville-Soundararajan and their collaborators. We will occasionally make reference to these more modern approaches in these notes, but will primarily focus on the classical approaches.

To reverse the process and derive control on summatory functions or logarithmic sums starting from control of Dirichlet series is trickier, and usually requires one to allow to be complex-valued rather than real-valued if one wants to obtain really accurate estimates; we will return to this point in subsequent notes. However, there is a cheap way to get *upper bounds* on such sums, known as *Rankin’s trick*, which we will discuss later in these notes.

The basic strategy of elementary multiplicative theory is to first gather useful estimates on the statistics of “smooth” or “non-oscillatory” functions, such as the constant function , the harmonic function , or the logarithm function ; one also considers the statistics of periodic functions such as Dirichlet characters. These functions can be understood without any multiplicative number theory, using basic tools from real analysis such as the (quantitative version of the) integral test or summation by parts. Once one understands the statistics of these basic functions, one can then move on to statistics of more arithmetically interesting functions, such as the divisor function (2) or the von Mangoldt function that we will discuss below. A key tool to relate these functions to each other is that of Dirichlet convolution, which is an operation that interacts well with summatory functions, logarithmic sums, and particularly well with Dirichlet series.

This is only an introduction to elementary multiplicative number theory techniques. More in-depth treatments may be found in this text of Montgomery-Vaughan, or this text of Bateman-Diamond.

** — 1. Summing monotone functions — **

The most fundamental estimate regarding the equidistribution of the natural numbers is the trivial bound

for any , which reflects the evenly spaced nature of the natural numbers. One also has the variant

also valid for any . (But note that if the summation is not over the natural numbers, but is also allowed to contain , then the sum (or ) is no longer when is small, and one should instead revert to (6).)

We have the following generalisation of (6) to summation of monotone functions:

Lemma 2 (Quantitative integral test)Let be real numbers, and let be a monotone function. Then

Note that monotone functions are automatically Riemann integrable and Lebesgue integrable, so there is no difficulty in defining the integral appearing above.

*Proof:* By replacing with if necessary, we may assume that is non-decreasing. By rounding up and rounding down , we may assume that are integers. We have

and similarly

and the claim follows.

for any (a weak form of Stirling’s formula, discussed in this previous blog post), and more generally one has

for all and some polynomial with leading term . (The remaining terms in may be computed explicitly, but for our purposes it will not be essential to know what they are.)

Remark 3If are not required to be integers, then one cannot improve substantially on the size of the error term , as one can see by considering what happens if or transitions from being infinitesimally smaller than an integer to infinitesimally larger than that integer. But if are integer and one assumes more differentiability on , one can get more precise control on the error term in Lemma 2 using the Euler-Maclaurin formula; see e.g. Exercise 11 below. However, we will usually not need these more refined estimates here. In any event, one can get even better control on the error term if one works with smoother sums such as (5) with smooth, thanks to tools such as the Poisson summation formula. See this previous blog post for some related discussion.In the converse direction, if is highly oscillatory then there is usually no simple relationship between the sum and the integral ; consider for instance the example , in which the sum grows linearly in but the integral stays bounded.

Exercise 4For non-negative natural numbers , show that

Lemma 2 combines well with the following basic lemma.

Lemma 5 (Cauchy sequences converge)Let , and be functions such that as . Then the following are equivalent:

- (i) One has
for all .

- (ii) There exists a constant such that
for all . In particular, .

The quantity in (ii) is unique; it is also real-valued if are real-valued.

If in addition as , then when (ii) holds, converges conditionally to .

Exercise 6Prove Lemma 5.

We now give some basic applications of these lemmas. If is a real number not equal to , then from Lemma 2 we have

for , and hence by Lemma 5, there is a real number with

for all . In the case , we conclude that the sum is absolutely convergent, and

however for this identity is technically not true, since the sum on the right-hand side is now divergent using the usual conventions for infinite sums. (The identity can however be recovered by using more general interpretations of infinite sums; see this previous blog post. The function is of course the famous Riemann zeta function.

For the case, we again see from Lemma 2 that

for , and thus there exists a real number such that

for . The constant is known as Euler’s constant (or the *Euler-Mascheroni constant*), and has a value of .

Exercise 7Let be an arithmetic function. If the natural density of exists and is equal to some complex number , show that the logarithmic density also exists and is equal to . (Hint:first establish the identity .) An important counterexample to the converse claim is given in Exercise 11 below.

Exercise 8Let be an arithmetic function obeying the crude bound (3), and let be a complex number. If the logarithmic density of exists and is equal to , show that as , or in other words that as . (Hint:.)

Exercise 9

- (i) For any integer , show that
for all , where is a polynomial with leading term .

- (ii) More generally, for any integers , show that
for all , where is a polynomial with leading term .

Exercise 10Show rigorously that for any non-negative integer and real , the Riemann zeta function is -times differentiable at and one has

(There are several ways to justify the term-by-term differentiation in the first equation; one way is to instead establish a term-by-term integration formula and then apply the fundamental theorem of calculus. Another is to use Taylor series with remainder to control the error in Newton quotients. A third approach is to use complex analysis.)

Exercise 11Let be real numbers, and let be a continuously differentiable function. Show that(

Hint:compare with . One may first wish to consider the case when are integers, and deal with the roundoff errors when this is not the case later.) Conclude that if is a non-zero number, that the function has logarithmic density zero, but does not have a natural density. (Hint:for the latter claim, argue by contradiction and consider sums of the form .)

Remark 12The above exercise demonstrates that logarithmic density is a cruder statistic than natural density, as it completely ignores oscillations by , whereas natural density is very sensitive to such oscillations. As such, controlling natural density of some arithmetic functions (such as the von Mangoldt function ) often boils down to determining to what extent such functions “conspire”, “correlate”, or “pretend to be” for various real numbers . This fact is a little tricky to discern in the elementary approach to multiplicative number theory, but is more readily apparent in the complex analytic approach, which we will discuss in later notes.

** — 2. The Euler product and Rankin’s trick — **

The fundamental theorem of arithmetic tells us that every natural number is uniquely expressible as a prime factorisation

A very useful way to encode this fact analytically (in a “generating function” form) is as follows. Recall from the introduction that an arithmetic function is multiplicative if one has and whenever are coprime. If is a multiplicative function, then we have

at least when is non-negative. If is complex-valued and the product

is finite, then an application of dominated convergence shows that (18) holds in this case also, with both sides of the equation being absolutely convergent. Multiplying by the multiplicative function , we conclude in particular the Euler product identity

whenever the right-hand side is absolutely convergent in the sense that

is finite, or equivalently that

is finite. Observe that if obeys the crude bound (3), then this absolute convergence is obtained whenever . Thus for instance one has the Euler product identity for the Riemann zeta function ,

whenever . From (11) we therefore have

Taking logarithms, we thus have

when (note that is clearly at least one). The logarithm here on the left-hand side is not so desirable, but we may remove it by using the Taylor expansion

and noting that is absolutely convergent, to conclude that

Taking , and using monotone convergence, we recover Euler’s theorem , but (23) gives more precise information. For instance, we have the following bound, already anticipated by Euler:

Theorem 13 (Cheap second Mertens’ theorem)We havefor .

We will improve this bound later in these notes.

*Proof:* We use a device known as *Rankin’s trick* to compare a Dirichlet series with a natural or logarithmic mean. Namely, observe that

for all and all natural numbers (in fact the implied constant can be given to be ), and hence

The claim now follows from (23).

Remark 14The same Rankin trick (24), when used to bound the harmonic series , gives an upper bound of , which is inferior to (13) by a factor of about . In general, one expects Rankin’s trick to lose a constant factor when estimating non-negative logarithmic sums.

Rankin’s trick can also be used to cheaply bound other means of multiplicative functions, as follows.

Theorem 15 (Cheap upper bound on multiplicative functions)Let be a real number, and let be a multiplicative function such that

for all primes and . Then we have

for .

*Proof:* For brevity we allow all implied constants here to depend on . By replacing by , we may assume that is non-negative. From Rankin’s trick (24) we have

Using the Euler product (19), we conclude that

We can crudely bound

and hence by (25)

By Taylor expansion we have

and hence by (21)

But , and the claim follows.

Comparing this bound with (14), we are led to the guess that for as in the above theorem, should behave roughly like on the average. In the next section we will see some arguments that can make this more precise.

We can get more mileage out of (22) by differentiating in . Formally differentiating in , we arrive at the identity

Exercise 16Derive (26) rigorously. (Hint:For fixed and small , expand using Taylor series with remainder.)

Using the geometric series expansion

and introducing the von Mangoldt function , defined by setting whenever is a prime and is a natural number, and for all other , we thus obtain the important identity

for the Dirichlet series of , and for all . (In fact this identity will hold for a wider range of , but we will address this in later notes.) For comparison, a Taylor expansion of (22) gives the closely related identity

Indeed, (27) is essentially the derivative of (28) in .

for (note the claim is trivial if is large).

Exercise 17 (Cheap first Mertens’ theorem)Show thatfor all . (

Hint:use Rankin’s trick.)

One can try to use Rankin’s trick to bound summatory functions by using the very crude bound

but this is wasteful (often losing a factor of about ). For instance, for as in Theorem 15, one could bound the summatory function by using

and then using that Theorem, but this only gives the crude upper bound

for , which turns out to be off from the truth by a factor of (as can be seen for instance in the simple example , ). The bound (30) also only gives the trivial upper bound of for .

The problem is that (30) is very inefficient when is much smaller than . To do better, there is a standard rearrangement trick that moves much of the “mass” of a summatory function to smaller values of , effectively increasing the cutoff and so replacing (30) with a less inefficient comparison. To describe this rearrangement trick, we return to the fundamental theorem of arithmetic (17) and take logarithms to obtain

whenever is a natural number with prime factorisation (17). Using the von Mangoldt function defined previously, we can thus encode the fundamental theorem of arithmetic neatly in the important identity

For , we rearrange this identity as

which allows us to rewrite an arbitrary summatory function for some as

For the latter sum, we write and interchange summations to obtain (after replacing with ) the rearrangement identity

The expression in parentheses can be viewed as a weighted version of , with the weight tending to be larger for small than for large . Because of this reweighting, if one applies Rankin’s trick to bound the sum on the right-hand side, one can often obtain an upper bound on that recovers the loss of that would occur if one applied Rankin’s trick directly.

To use (32) effectively, we need the following basic upper bound on the von Mangoldt function, which unfortunately does not seem to be provable by the techniques outlined above, but can be established fairly quickly by alternate means:

Proposition 18 (Chebyshev upper bound)We havefor all . In particular, specialising to primes we have

*Proof:* We use the following slick (but perhaps somewhat unmotivated) argument. By telescoping series it suffices to establish the bound

for all natural numbers . We first consider the contribution of those that are powers of a prime for some . One has , so there are at most such primes, and each prime contributes at most to the above sum; since , we may discard this contribution and reduce to showing that

We will achieve this through an inspection of the binomial coefficient

Observe that if is a prime with , then divides but not , and so divides . Taking logarithms, we conclude that

From the binomial theorem we have , and the claim follows.

One can also prove this proposition by several other means, for instance through Möbius inversion; see Exercise 60 below. Unfortunately, the rearrangement identity (32) does not help directly with establishing this proposition, as is supported on numbers with too few prime factors for the rearrangement to have any non-trivial effect.

Remark 19Chebyshev also established the matching lower bound for ; we will establish this bound (by a slightly different method) in Exercise 35. Both of Chebyshev’s bounds will later be superseded by the prime number theorem , discussed in later notes.

Exercise 20Show that the number of primes up to is for any .

We now give an illustration of (32):

Proposition 21 (Cheap upper bound for summatory functions)With as in Theorem 15, we havefor .

*Proof:* By replacing by , we may assume to be non-negative. By (32), it suffices to establish the bounds

The first estimate follows directly from Theorem 15 after using the inequality , so we turn to the latter. We first consider the contribution when are coprime. In that case, we may factor as , so the contribution of this case to (33) is bounded by

To control the sum , we observe that each prime contributes at most to this sum, while the primes contribute through all the powers of less than . By Proposition 18, we thus have

and so the contribution of the coprime case is acceptable by Theorem 15.

It remains to deal with the case when are not coprime; thus and for some prime , some , and some . We can then bound this contribution to (33) by

We have , so by Theorem 15 we may bound this expression by

Performing the sums, this is

since the summation is convergent, the claim follows.

Remark 22We have seen that the von Mangoldt function obeys at least three useful identities: (27), (28) and (31). (Not surprisingly, these identities are closely related to each other: (27) is basically the derivative of (28) and the Dirichlet series transform of (31).) It is because of these identities (and further related identities which we will see in later posts) that the von Mangoldt function is an extremely convenient proxy for the prime numbers in analytic number theory. Indeed, many question about primes in analytic number theory are most naturally tackled by converting them to a statement about the von Mangoldt function, by adding or subtracting the contribution of prime powers for , which are typically of significantly lower order), in order to access identities such as (27), (28) or (31).

** — 3. The number of divisors — **

We now consider the asymptotics of the divisor function

that counts the number of divisors of a given natural number . Informally: how many divisors does one expect a typical large number to have? Remarkably, the answer turns out to depend on exactly on what one means by “typical”.

The first basic observation to make is that is a multiplicative function: whenever are coprime, since every divisor of can be uniquely expressed as the product of a divisor of and a divisor of . The same argument has an important generalisation: if are multiplicative functions, then the Dirichlet convolution , defined by

is also multiplicative. The multiplicativity of the divisor function then follows from the identity

since the constant function is clearly multiplicative.

We begin with a crude bound:

Lemma 23 (Divisor bound)We have as . Equivalently, we have for any fixed .

*Proof:* Let be fixed. Observe that for any prime power , we have . In particular, we see that whenever is sufficiently large depending on , and is a natural number. For the remaining values of , we have . By the multiplicativity of , we thus have as required.

Exercise 24If are arithmetic functions obeying (3), show that the Dirichlet convolution also obeys (3). Then establish the fundamental relationship

relating the Dirichlet series of for all . (Compare with analogous identities for the Fourier transform or Laplace transform of ordinary convolutions.) Use this and (31) to obtain an alternate proof of (27). We will make heavier use of (34) in later notes.

Exercise 25Obtain the sharper bound for all . (Hint:first establish that for any by performing the proof of Lemma 23 more carefully, then optimise in .) It was in fact established by Wigert that one in fact has as , and that the constant cannot be replaced by any smaller constant.

Now we consider the mean value of . From the case of Theorem 15 we have

and from Proposition 21 we have

which suggest that the mean value of for is comparable to . We can verify this by using the general identity

A similar argument gives the variant

We can use this to control the summatory function and logarithmic sum of the divisor function. For instance, by applying (35) with followed by (6), we have

and hence by (13)

(We will improve the control on the error term here later in this section.) Similarly, from (36) followed by (13) we have

and thus by (14) and a brief calculation

for some quadratic polynomial with leading term . Comparing these bounds with (9) and (8), we see this is indeed compatible with the heuristic that behaves like on the average.

Remark 26One can justify the heuristic by the following non-rigorous probabilistic argument: for a typical large number , each number would be expected to divide with probability about ; since , the “expected value” of should thus be about . We will continue this heuristic argument later in this set of notes.

Even though behaves like on the average, it can fluctuate to be much larger or much smaller than this value. For instance, equals when is prime, and when is odd, is twice as large as , even though and are roughly the same size. A further hint of the irregularity of distribution can be seen by considering the * moments* and of for natural numbers . The function is multiplicative with for every prime . From Theorem 15 and Proposition 21 we have the upper bounds

for all , suggesting that behaves like on average. This may seem at first glance to be incompatible with the heuristic that behaves like on the average, but what is happening here is that can occasionally be much larger than , which does not significantly affect the mean of , but does affect higher moments with . (Continuing Remark 26, the issue here is that the events “ divides ” can be highly correlated with each other as varies, due to common factors between different choices of .) In fact, the *typical* value (in the sense of median, rather than mean) of is not or , but is in fact . See Section 5 below.

We can be more precise on the mean behaviour of , by establishing the following variant of Theorem 15.

Theorem 27 (Mean values of divisor-type multiplicative functions)Let be a natural number, and let be a multiplicative function obeying the estimates

for all primes and all . Let denote the

singular series

Note from Taylor expansion that

so is finite (though it may vanish, if one of its factors vanishes), with .

- (i) If , then one has
for all .

- (ii) If , we have
for .

- (iii) If , one has
for .

Note that the case of this proposition gives (a slightly weaker form of) the above estimates for , since in that case . Comparing with (9), (14), (16) we see that behaves approximately like on the average for . The hypotheses on this theorem may be relaxed somewhat; see Exercise 29 below.

*Proof:* For brevity we omit the dependence of implicit constants on . We begin with (i). If , then from (19) and crude bounds we have , which suffices, so we will assume that is sufficiently close to . From (19) we have

By Taylor expansion, we may write

where

Since is convergent, we conclude that

and the claim follows from (21).

To prove (ii) we induct on . First we establish the base case of (ii). From (19) we have

so it suffices to show that

But from (19) we see that

(for instance), and the claim follows.

Now suppose that and the claim (ii) has already been proven for . We let be the multiplicative function with

for primes and , then one easily verifies that . Note that satisfies the same hypotheses as , but with replaced by . A brief calculation shows that

and thus . From (36) one has

and thus by induction hypothesis

From Lemma 2 we have

and

and the claim (ii) follows.

Finally, we prove (iii). Let ; if , we assume inductively that (iii) is established for . Let be as before. From (35) one has

and thus by (6)

From (ii) we have

The error term can be treated by the induction hypothesis when , giving the claim. When , we instead use the Rankin trick and bound

and use (19) to bound as before, giving the claim.

Thus for instance we have

for any , where is the quantity

Among other things, this shows that can be larger than any fixed power of for arbitrarily large ; compare with Lemma 23.

A more accurate description of the distribution of is that for is asymptotically distributed in the limit like a gaussian random variable with mean and variance comparable to ; see Section 5 below.

Exercise 28Let be the arithmetic function defined by setting when is squarefree (that is, is not divisible by any perfect square other than ) and zero otherwise; the reason for the notation will be given later. Show thatand

for . Thus, the square-free numbers have natural and logarithmic density . It can be shown by a variety of methods that , although we will not use this fact here. The error term may also be improved significantly (as we shall see when we turn to sieve-theoretic methods).

Exercise 29The purpose of this exercise is to give an alternate derivation of parts (ii) and (iii) of Theorem 27 that does not rely so strongly on being an integer; it also allows to only be close to on average, rather than in the pointwise sense. The arguments here are from Appendix A.2 Friedlander-Iwaniec, which are in turn based on this paper of Wirsing.

- (i) Let be as in Theorem 27, and define . By using (32) with replaced by , establish the integral equation
for all , and deduce that

for all and some independent of . Then use part (i) of Theorem 27 to deduce that , thus establishing part (ii) of Theorem 27.

- (ii) Assume the following form of the prime number theorem: for all . By using (32) for , establish part (iii) of Theorem 27.
- (iii) Extend Theorem 27 to the case when is real rather than integer (replacing factorials by Gamma functions as appropriate), and (40) is replaced by for all . (For part (ii) of Theorem 27, one can weaken (40) further to .)

We now present a basic method to improve upon the estimate (37), namely the Dirichlet hyperbola method. The point here is that the error term in (6) is much stronger for large than for small , so one would like to rearrange the proof of (37) so that (6) is only used in the former case and not the latter. To do this, we decompose

where and , so that may be decomposed as

Actually, it is traditional to rearrange this a bit as the identity

This decomposition is convenient for improving (37) for two reasons. Firstly, is supported on and thus makes no contribution to . At the other extreme, is supported in and so the restriction may be removed here, simplifying the sum substantially:

For the final sum , we use (35) and (6) as before, to conclude that

By (13), we thus obtain an improved version of (37):

Remark 30The Dirichlet hyperbola method may be visualised geometrically as follows, in a fashion that explains the terminology “hyperbola method”. The sum can be interpreted as the number of lattice points (that is to say, elements of ) lying underneath the hyperbola . The proof of (37) basically proceeded to count these lattice points by summing up the contribution of each column separately; this was an efficient process for the columns close to the -axis, but led to relatively large error terms for columns far away from the -axis. Symmetrically, one could proceed by summing by rows, which is efficient for rows close to the -axis, but not far from that axis. The hyperbola method splits the difference between these two counting procedures, counting rows within of the -axis and columns within of the -axis, and then removing one copy of the square to correct for it being double-counted. The estimation of this lattice point problem can be made more precise still by more sophisticated decompositions and approximations of the hyperbola, but we will not discuss this problem (known as the Dirichlet divisor problem) here.From an algebraic perspective, it is the identity (42), decomposing into Dirichlet convolutions of expressions with good spatial support properties, that is the key to the successful application of the hyperbola method. In later posts we will encounter more sophisticated identities that decompose various arithmetic functions (such as the von Mangoldt function ) into similar convolutions of spatially localised expressions. Unfortunately these identities are not as easy to visualise geometrically as the hyperbola method identity, as the corresponding geometric picture often takes place in three or higher dimensions.

Exercise 31Define the third divisor function by , or equivalently . Show thatfor all and some polynomial with leading term . (Note that Theorem 27(iii) only gives ; one needs a three-dimensional version of the hyperbola method to get the better error term.

Hint:Decompose at the threshold . If one is having difficulty figuring out exactly what identity to use, try working backwards, by writing down all the relevant convolutions (e.g. ) that look tractable, and then do some elementary linear algebra to express in terms of all the expressions that you know how to estimate well.) The term can be improved further (for instance, the logarithmic factor can be removed), but this requires more sophisticated methods than the ones given above.

Exercise 32State and prove an extension of the previous exercise to the divisor function for , defined as the Dirichlet convolution of copies of .

** — 4. Mertens’ theorems — **

In the previous section, we used identities such as (35) and (36) to obtain control on statistics of Dirichlet convolutions in terms of statistics of the individual factors . This method is particularly well suited for functions such as the divisor function , which can be expressed conveniently in such Dirichlet convolution form.

When dealing with arithmetic functions related to the primes, it turns out that one often has to run this procedure in reverse, for instance trying to control statistics of given information on and . This is basically a deconvolution problem, which from a Fourier-analytic point of view corresponds to dividing one Fourier transform by another. This can become problematic when the latter Fourier transform vanishes; in our arithmetic context, this corresponds to zeroes of the Dirichlet series . As such, we will see the location of the zeroes of Dirichlet series such as the Riemann zeta function play an extremely important role in later posts. However, the elementary approach cannot easily access this information directly. As such, it is somewhat limited in the type of information it can recover about the primes (at least in the more basic formulations of the theory); however, one can still obtain some non-trivial control on the primes by purely elementary methods, as we shall shortly see.

Let us first see where this deconvolution problem is coming from. As discussed in Remark 22, it is convenient to study the primes through the von Mangoldt function. The identity (31) concerning this function can be rewritten as

where is the logarithm function. Thanks to the methods in Section 1, we understand the statistics of and relatively well. The “deconvolution problem” is then to somehow use this information to control the statistics of . We demonstrate this with the following improvement of Exercise 17, controlling logarithmic sums of the von Mangoldt function:

Theorem 33 (First Mertens theorem)We have the estimate

*Proof:* Applying (35) to (45) we have

If we apply (8) and (6), we conclude that

and so (46) follows from Proposition 18.

It is easy to convert control of to control on primes:

Corollary 34 (Alternate form of first Mertens’ theorem)We havefor all .

*Proof:* From the definition of the von Mangoldt function, one has

We crudely bound for . Since , we obtain

and the claim follows from (46).

Exercise 35 (Chebyshev’s theorem)Show that there exists an absolute constant such that there are primes in for all sufficiently large . Conclude in particular that .

From this we can obtain a sharper form of Theorem 13, controlling logarithmic sums of the primes:

Theorem 36 (Second Mertens’ theorem)One hasas for some absolute constant . Similarly, one has

The constant has no particular significance in applications, but the constant can be usefully computed: see (51) below.

*Proof:* We just prove the first claim, as the second is similar (and can be deduced from the first by a modification of the argument used to prove Corollary 34). One could proceed here using summation by parts, but we will argue using an essentially equivalent method, based on the fundamental theorem of calculus. By Lemma 5, it suffices to show that

as both go to infinity. From the fundamental theorem of calculus we have

for all , and thus

From Corollary 34 one has

for all ; since

and

the claim follows.

This leads to a useful general formula for computing various slowly varying sums over primes:

- (i) For any fixed , show that
as .

- (ii) If is a fixed compactly supported, Riemann integrable function, show that
as .

- (iii) If is a fixed natural number and is a fixed compactly supported, Riemann integrable function, show that
as .

- (iv) Obtain analogues of (i)-(iii) when the sum over primes are replaced by the sum over integers , but with factors of replaced by (with the convention that this expression vanishes at ).

Remark 38An alternate way to phrase the above exercise is that the Radon measureson converge in the vague topology to the absolutely continuous measure in the limit , where denotes the Dirac probability measure at . Similarly for the Radon measures

To put this another way, the Radon measures or behave like Lebesgue measure on dyadic intervals such as for fixed and large . This is weaker than the prime number theorem that we will prove in later notes, which basically asserts the same statement but on the much smaller intervals . (Statements such as the Riemann hypothesis make the same assertion on even finer intervals, such as .) See this previous blog post for some examples of this Radon measure perspective, which we will not emphasise in this set of notes.

Exercise 39 (Smooth numbers)For any , let denote the set of natural numbers less than which are -smooth, in the sense that they have no prime factors larger than .

- (i) Show that for any fixed , one has
where the Dickman function is defined by the alternating series

(note that for any given that only finitely many of the summands are non-zero; one can also view the term as the term of the summation after carefully working out what zero-dimensional spaces and empty products evaluate to).

Hint:Use the inclusion-exclusion principle to remove the multiples of from for each prime . This is a simple example of a sieve, which we will study in more detail in later notes.- (ii) (Delay-differential equation) Show that is continuous on , continuously differentiable except at , equals for and obeys the equation
for . Give a heuristic justification for this equation by considering how varies with respect to small perturbations of .

- (iii) (Wirsing integral equation) Show that
for all , or equivalently that

Give a heuristic justification for this equation by starting with a -smooth number and considering a factor , where is a prime factor of chosen at random with probability (if occurs times in the prime factorisation of ).

- (iv) (Laplace transform) Show that
for all .

Now we incorporate the information on primes coming from Euler products that we established in Section 2. Recall from (11), (28) that

for . We can compare this against (49) by a variant of the Rankin trick. Namely, if we apply (50) with for some large , one obtains

and thus on subtracting (49)

For any fixed , we see from Exercise 37 that

On the other hand, by using Exercise 37 and dyadic decomposition of we see that the expressions

and

can be made arbitrarily small by making sufficiently small, sufficiently large, and sufficiently large. Putting all this together, we conclude that

We can compute this limit explicitly:

Lemma 40 (Exponential integral asymptotics)For sufficiently small , one has

*Proof:* We start by using the identity to express the harmonic series as

or on summing the geometric series

Since , we thus have

making the change of variables , this becomes

As , converges pointwise to and is pointwise dominated by . Taking limits as using dominated convergence and (13), we conclude that

or equivalently

The claim then follows by bounding the portion of the integral on the left-hand side.

Exercise 41Show thatConclude that if is the Gamma function, defined for by the formula

then one has .

By arguing as in the proof of Corollary 34, we then have

or equivalently by Taylor expansion

Taking exponentials, we conclude

Theorem 42 (Third Mertens’ theorem)We haveas .

Because of this theorem, the factors and frequently arise in analytic prime number theory. We give two examples of this below.

Exercise 43Let be the Dickman function from Exercise 39. Show that .

Exercise 44For any natural number , define the Euler totient function of to be the number of natural numbers less than that are coprime to , or equivalently is the order of the multiplicative group of .

- (i) Show that
for all .

- (ii) Show the more refined lower bound
as . Show that cannot be replaced here by any larger quantity. (This result is due to Landau.)

** — 5. The number of prime factors (optional) — **

Given a natural number , we use to denote the number of prime factors of (not counting multiplicity), and to denote the number of prime factors of (counting multiplicity). Thus we can write as *divisor sums*

We can ask what the mean values of and are. We start with the estimation of

We can rearrange this sum (cf. (35)) as

which by (6) is

so by Theorem 36 (and crudely bounding by , as we will not need additional accuracy here) gives

for (say). Thus we see that for , one expects to have about prime factors on the average.

Now we look at the second moment

If we expand this out directly, we get

which we can rearrange as

where is the least common multiple of , that is to say if and if . Here we run into a serious problem, which is that can be significantly less than , in which case the estimate

is horrendously inefficient (the error term is larger than the main term, which is always a bad sign). One could fix this by using the trivial estimate when . But there is another cheap trick one can use here, due to Turán, coming from the fact that a given natural number can have at most prime factors that are larger than, say, . Thus we can approximate the divisor sum (52) by a *truncated* divisor sum:

One can then run the previous argument with the truncated divisor sum and avoid the problem of dipping below . Indeed, on squaring we see that

and hence (by (53))

for . Rearranging and using (6) as before, we obtain

The contribution of the error may be crudely bounded by , which can easily be absorbed into the error term. The diagonal case also contributes thanks to Theorem 36. We conclude that

and thus by a final application of Theorem 36

We can combine this with (53) to give a variance bound:

To interpret this probabilistically, we see that if we pick uniformly at random, then the random quantity will have mean and standard deviation . In particular, by Chebyshev’s inequality, we expect to have prime factors most of the time.

Remark 45The strategy of truncating a divisor sum to obtain better control on error terms (perhaps at the expense of some inefficiency in the main terms) is one of the core techniques insieve theory, which we will discuss in later notes.

The same estimates are true for :

Exercise 46

- (i) Show that
for , and conclude that

- (ii) Show that
for all natural numbers .

From the above exercise and Chebyshev’s inequality, we now know the typical number of prime factors of a large number, a fact known as the Hardy-Ramanujan theorem:

Theorem 47 (Hardy-Ramanujan theorem)Let be an asymptotic parameter going to infinity, and let be any quantity depending on that goes to infinity as . Let be a natural number selected uniformly at random from . Then with probability , has distinct prime factors, and repeated prime factors (counting multiplicity, thus counts as repeated prime factors). In particular, has prime factors counting multiplicity.

This already has a cute consequence with regards to the multiplication table:

Proposition 48 (Multiplication table bound)At most of the natural numbers up to can be written in the form with natural numbers.

In other words, for large , the multiplication table only uses a small fraction of the numbers up to . This simple-sounding fact is surprisingly hard to prove if one does not use the simple argument provided below.

*Proof:* Pick natural numbers uniformly at random. By the Hardy-Ramanujan theorem, with probability , and will each have prime factors counting multiplicity. Hence with probability , will have prime factors counting multiplicity. But by a further application of the Hardy-Ramanujan theorem, the set of natural numbers up to with this property has cardinality . Thus all but of the products with are contained in a set of cardinality , and the claim follows.

Remark 49In fact, the cardinality of the multiplication table is known to be comparable to with ; see this paper of Ford.

Exercise 50 (Typical number of divisors)Let be an asymptotic parameter going to infinity. Show that one has for all but of the natural numbers less than . (Hint:first establish the bounds .)

In fact, one can describe the distribution of or more precisely, a fact known as the Erdös-Kac theorem:

Exercise 51 (Erdös-Kac theorem)(This exercise is intended for readers who are familiar with probability theory, and more specifically with the moment method proof of the central limit theorem, as discussed in this previous set of notes.) Let be an asymptotic parameter going to infinity, and let denote the truncated divisor sumDefine the quantity

thus by Mertens’ theorem

- (i) Show that for all .
- (ii) For any fixed natural number , show that
where the quantity is defined to equal zero when is odd, or

when is even.

- (iii) If is drawn uniformly at random from the natural numbers up to , show that the random variable
converges in distribution to the standard normal distribution in the limit . (You will need something like the

moment continuity theoremfrom Theorem 4 of these notes.)- (iv) Obtain the same claim as (iii), but with replaced by .

Informally, we thus have the heuristic formula

for , where is distributed approximately according to a standard normal distribution. As in Exercise 50, this leads to a related heuristic formula

for the number of divisors. This helps reconcile (though does not fully explain) the discrepancy between the typical (or median) value of , which is , and the mean (or higher moments) of , which is of the order of or , as it suggests that is in fact significantly larger than its median value of with a relatively high probability. (Unfortunately, though, the heuristic (54) is not very accurate at the very tail end of the distribution when is extremely large, and one cannot recover the correct exponent of the logarithm in (39), for instance, through a naive application of this heuristic.)

Remark 52Another rough heuristic to keep in mind is that a “typical” number less than would be expected to have divisors in any dyadic block between and , andprime divisorsin any hyper-dyadic block between and . For instance, for any fixed , one should have divisors in the interval , but only prime divisors. Typically, the only prime divisors that occur with multiplicity will be quite small (of size ). Note that such heuristics are compatible with the fact that has mean and that and both have mean . One can make these heuristics more precise by introducing the Poisson-Dirichlet process, as is done in this previous blog post, but we will not do so here. The study of the distribution of factors of “typical” large natural numbers is a topic sometimes referred to asanatomy of integers. Interestingly, there is a strong analogy between this problem and the problem of studying the distribution of cycles of “typical” large permutations; see for instance this article of Granville for further discussion.

Exercise 53Let be a natural number. Show that for sufficiently large , the number of natural numbers up to that are the products of exactly primes is .

** — 6. Mobius inversion and the Selberg symmetry formula — **

In Section 4, we used the identity , together with elementary estimates on and , to deduce various estimates on the von Mangoldt function . Another way to extract information about from this identity is to “deconvolve” or “invert” the operation of convolution to . This can be achieved by the basic tool of Möbius inversion, which we now discuss. We first observe that the Kronecker delta function , defined by , is an identity for Dirichlet convolution, thus

for any arithmetic function . Since Dirichlet convolution is associative and commutative, this implies that if we can find an arithmetic function with the property that

then any formula of the form may be inverted to the equivalent form , a fact known as the *Möbius inversion formula*. It is then a routine matter to locate such a function :

Exercise 54Define the Möbius function by setting when is the product of distinct primes for some , and otherwise. Show that is the unique arithmetic function that obeys (55).

Observe that is a multiplicative function that obeys the trivial bound

for all . Furthermore, is precisely when is square-free, and zero otherwise, so the notation here is consistent with that in Exercise 28.

One can express the Möbius inversion formula as the assertion that

for any compactly supported arithmetic function . This already reveals that must exhibit some cancellation beyond the trivial bound (56):

Lemma 55 (Weak cancellation for )For any non-negative integer , we have

*Proof:* We may of course take .

First suppose that . We apply (57) with to conclude that

Since , we conclude from (56) that

and the case of (58) follows.

Now suppose inductively that , and that the claim has already been proven for smaller values of . We apply (57) with to conclude that

By (15) we have

for some polynomial with leading term . Inserting this asymptotic and using the induction hypothesis to handle all the lower order terms of , and (56) to handle the error term, we conclude that

for some polynomial of degree at most . By (10) the error term is , and the claim follows.

Exercise 56Sharpen the case of the above lemma to , for any .

From Möbius inversion we can write in terms of :

Since , we have the general Leibniz-type identity

Since , we can obtain the alternative representation

by multiplying (55) by then applying (61), (60).

Using these identities and Lemma 55, we can recover many of the estimates in Section 4:

Exercise 57 (Alternate proof of Chebyshev and Mertens bounds)Use (60) and Lemma 55 to reprove the estimatesand

If one has a little bit more cancellation in the Möbius function, one can do better:

Theorem 58 (Equivalent forms of the prime number theorem)The following three statements are logically equivalent:

- (i) We have as .
- (ii) We have as .
- (iii) We have as .

In later notes we will prove that the claims (i), (ii), (iii) are indeed true; this is the famous prime number theorem. This result also illustrates a general principle, that one route to distribution estimates of the primes is via distribution estimates on the Möbius function, which can sometimes be a simpler object to study (for instance, the Möbius function is bounded in magnitude by , whereas the von Mangoldt function can grow logarithmically).

*Proof:* We use some arguments of Diamond. We first show that (i) implies (ii). From (62) and Möbius inversion we have

By (35) we thus have

For any fixed , we may use (i) to write as when is sufficiently large, and otherwise. From this, (56), and the case of (58) we thus have

when is large enough. By (10), (56) we have

summing and then dividing by , we obtain (ii) since is arbitrary.

Now we show that (ii) implies (iii). We start with the identity (59), which we write as

Let be a small fixed quantity. For , decreases through a fixed set of values, and from (ii) we conclude that

Meanwhile, since , we see from (56) that

Combining all three inequalities and dividing by , we conclude that

replacing by , then sending to zero, we obtain (iii).

From Exercise 7 we see that (iii) implies (ii). Thus, to conclude the theorem, it suffices to show that (ii) and (iii) jointly imply (i). From (60), (35) we have

Meanwhile, since , we have from (35) that

From (6) we have . It will thus suffice to show that

Let be a small fixed quantity. Arguing as in the implication of (iii) from (ii), we see from (ii) that

Next, we see from (8), (44) that

From Lemma 2 we have

and so from (iii) and (56) we conclude that

Summing this with (64) and then sending to zero, we obtain the claim.

Exercise 59 (Further reformulations of the prime number theorem)Show that the statements (i)-(iii) in the above theorem are also equivalent to the following statements:

- (iv) The number of primes less than or equal to is as .
- (v) The prime number is equal to as .

Unfortunately it is not so easy to actually obtain the required cancellation of the Möbius function , and to obtain the desired asymptotics for . However, one can do better if one works with the *higher-order von Mangoldt functions* , defined by setting

for all . Thus is the usual von Mangoldt function, and from (61), (63) we easily obtain the recursive identity

for . Among other things, this implies by induction that the are non-negative, and are supported on those natural numbers that have at most distinct prime factors. We have the following asymptotic for the summatory functions of :

Proposition 60 (Summatory function of )For any , we havefor all , and for some polynomial of leading term .

For , the error term here is comparable to the main term, and we obtain no improvement over the Chebyshev bound (Proposition 18). However, the estimates here become more useful for . For an explicit formula for the polynomials , together with sharper bounds on the error term, see this paper of Balazard.

*Proof:* From (65), (35) we have

By (9) we have

for some polynomial with leading term . The claim then follows from (58), using (56), (10) to control the error term.

The case of this proposition is known as the *Selberg symmetry formula*:

Among other things, this gives an upper bound that comes within a factor of two of the prime number theorem:

Corollary 61 (Cheap Brun-Titchmarsh theorem)For any , one has

Using the methods of sieve theory, we will obtain a stronger inequality, known as the Brun-Titchmarsh inequality, in later notes. This loss of a factor of two reflects a basic problem in analytic prime number theory known as the parity problem: estimates which involve only primes (or more generally, numbers whose number of prime factors has a fixed parity) often lose a factor of two in their upper bounds, and are trivial with regards to lower bounds, unless some non-trivial input about prime numbers is somehow injected into the argument. We will discuss the parity problem in more detail in later notes.

*Proof:* From the Selberg symmetry formula we have

(since ). From (66) we have the pointwise bound , thus

By Proposition 18 and dyadic decomposition we have

and the claim follows.

With some additional argument of a “Fourier-analytic” flavour (or using arguments closely related to Fourier analysis, such as Tauberian theorems or Gelfand’s theory of Banach algebras), one can use the Selberg symmetry formula to derive the prime number theorem; see for instance these previous blog posts for examples of this. However, in this course we will focus instead on the more traditional complex-analytic proof of the prime number theorem, which highlights an important connection between the distribution of the primes and the zeroes of the Riemann zeta function.

Exercise 62 (Cheap Brun-Titchmarsh, again)Show that for any , the number of primes between and is at most .

Exercise 63 (Golomb identity)Let be coprime natural numbers. Show that

Exercise 64 (Diamond-Steinig identity)Let . Show that can be expressed as a linear combination of convolutions of the form , where appears times and are non-negative integers with and . Identities of this form are due to Diamond and Steinig.

** — 7. Dirichlet characters — **

Now we consider the following vaguely worded question: how many primes are there in a given congruence class ? For instance, how many primes are there whose last digit is (i.e. lie in )?

If the congruence class is not primitive, that is to say that and share a common factor, then clearly the answer is either zero or one, with the latter occurring if the greatest common divisor of and is a prime which is congruent to modulo . So the interesting case is when is primitive, that is to say that it lies in the multiplicative group of primitive congruence classes.

In this case, we have the fundamental theorem of Dirichlet:

Theorem 65 (Dirichlet’s theorem, Euclid form)Every primitive congruence class contains infinitely many primes.

For a small number of primitive congruence classes, such as or , it is possible to prove Dirichlet’s theorem by mimicking one of the elementary proofs of Euclid’s theorem, but we do not know of a general way to do so; see this paper of Keith Conrad for some further discussion. For instance, there is no proof known that there are infinitely many primes that end in that does not basically go through most of the machinery of Dirichlet’s proof (in particular introducing the notion of a Dirichlet character). Indeed, it looks like the problem of finding a new proof of Dirichlet’s theorem is an excellent test case for any proposed alternative approach to studying the primes that does not go through the standard approach of analytic number theory (cf. Remark 2 from the announcement for this course).

In fact, Dirichlet’s arguments prove the following stronger statement, generalising Euler’s theorem (Theorem 2 from this set of notes):

Theorem 66 (Dirichlet’s theorem, Euler form)Let be a primitive residue class. Then the sum is divergent.

There is a more quantitative form, analogous to Mertens’ theorem:

Theorem 67 (Dirichlet’s theorem, Mertens form)Let be a primitive residue class. Then one hasfor any , where the Euler totient function is defined as the order of the multiplicative group .

Exercise 68Let be a primitive residue class. Use Theorem 67 to show thatas for some quantity , thus giving Theorem 66. (

Hint:adapt the proof of Theorem 36.)

If one tries to adapt one of the above proofs of Mertens’ theorem (or Euler’s theorem) to this setting, one soon runs into the problem that the function is not multiplicative: . To resolve this issue, Dirichlet used some Fourier analysis to express in terms of completely multiplicative functions, known as Dirichlet characters.

We first quickly recall the Fourier analysis of finite abelian groups:

Theorem 69 (Fourier transform for finite abelian groups)Let be a finite abelian group (which can be written additively or multiplicatively). Define acharacteron to be a homomorphism to the unit circle of the complex numbers, and let be the set of characters. Then . Furthermore, given any function , one has a Fourier decompositionfor all , where the Fourier coefficients are given by the formula

*Proof:* Let be the -dimensional complex Hilbert space of functions with inner product . Clearly any character is a unit vector in this space. Furthermore, for any two characters , we may shift the variable by any shift and conclude that

for any ; in particular, we see that if , then . Thus is an orthonormal system in . To complete the proof of the theorem, it thus suffices to show that this orthonormal system is complete, that is to say that the characters span .

Each shift generates a unitary shift operator , defined by setting (if the group is written multiplicatively). These operators all commute with each other, so by the spectral theorem they may all be simultaneously diagonalised by an orthonormal basis of joint eigenvectors. It is easy to see that these eigenvectors are characters (up to scaling), and so the characters span as required.

See this previous post for a more detailed discussion of the Fourier transform on both finite and infinite abelian groups. We remark that an alternate way to establish that the characters of span is to use the classification of finite abelian groups to express as the product of cyclic groups, at which point one can write down the characters explicitly.

Define a Dirichlet character of modulus to be a function of the form

where is a character of . Thus, for instance, we have the *principal character*

of modulus . Another important example of a Dirichlet character is the quadratic character to a prime modulus , defined by setting to be when is a non-zero quadratic residue modulo , if is a quadratic residue modulo , and zero if is divisible by . (There are quadratic characters to composite moduli as well, but one needs to define them using the Kronecker symbol.) One can also easily verify that the product of two Dirichlet characters is again a Dirichlet character (even if the characters were initially of different modulus).

Dirichlet characters of modulus are completely multiplicative (thus for all , not necessarily coprime) and periodic of period (thus for all ). From Theorem 3 we see that there are exactly Dirichlet characters of modulus , and from (68) one has the Fourier inversion formula

From Mertens’ theorem we have

since the contribution of those for which is not coprime to is easily seen to be ( must be a power of a prime dividing in these cases). Thus, Theorem 67 follows from

Theorem 70 (Dirichlet’s theorem, character form)Let be a non-principal Dirichlet character of modulus . Thenfor any .

To prove this theorem, we use the “deconvolution” strategy. Observe that for any completely multiplicative function , one has

for any arithmetic functions . In particular, from (45) one has

Theorem 70 is seeking control on the logarithmic sums of , so it is natural to first control the logarithmic sums of and . To do this we use a somewhat crude lemma (cf. Lemma 2):

Lemma 71Let be a non-principal character of modulus , and let be an arithmetic function that is monotone on an interval . Then

*Proof:* Without loss of generality we may assume that is monotone non-increasing. By rounding up and down to the nearest multiple of , we may assume that are multiples of , then the left-hand side may be split into the sum of expressions of the form . As is non-principal, it is orthogonal to the principal character, and in particular . Thus we may write as , which by the trivial bound and monotonicity may be bounded by . The claim then follows from telescoping series.

From this lemma we obtain the crude upper bounds

for any . (Strictly speaking, the function is not monotone decreasing for , but clearly we may just delete this portion of the sum from (70) without significantly affecting the estimate.) By Lemma 5 we thus have

for all and some complex number .

Exercise 72 (Continuity of -function at )Let be a non-principal character. For any , define the Dirichlet -function by the formulaShow that for any . In particular, the Dirichlet -function extends continuously to . (In later notes we will extend this function to a much larger domain.)

We will shortly prove the following fundamental fact:

Theorem 73 (Non-vanishing)One has for any non-principal character .

Let us assume this theorem for now and conclude the proof of Theorem 70. Starting with the identity (69) and using (36), we see that

Inserting (70), (72) and using the trivial bound to control error terms, we conclude that

and Theorem 70 follows by dividing by and using Proposition 18.

Remark 74It is important to observe that this argument is potentiallyineffective: the implied constant in Theorem 70 will depend on what upper bound one can obtain for the quantity . Theorem 73 ensures that this quantity is finite, but does not directly supply a bound for it, and so we cannot explicitly (oreffectively) describe what the implied constant is as a computable function of , at least if one only uses Theorem 73 as a “black box”. It is thus of interest to strengthen Theorem 73 by obtaining effective lower bounds on for various characters . This can be done in some cases (particularly if is not real-valued), but to get a good effective bound for all characters is a surprisingly difficult problem, essentially the Siegel zero problem; we will return to this issue in later notes.

Exercise 75Show thatfor any and any non-principal character . (You will not need to know the non-vanishing of to establish this.) Conclude that

for any and any primitive residue class , where is the polynomial in Proposition 60. Deduce in particular the cheap Brun-Titchmarsh bound

for any and primitive residue class .

It remains to prove the non-vanishing of . Here we encounter a curious *repulsion* phenomenon (a special case of the Deuring-Heilbronn repulsion phenonemon): the vanishing of for one character prevents (or “repels”) the vanishing of for another character . More precisely, we have

Proposition 76Let . Then there is at most one non-principal Dirichlet character of modulus for which .

*Proof:* Let denote all the Dirichlet characters of modulus , including the principal character . The idea is to exploit a certain positivity when all the characters are combined together, which will be incompatible with two or more of the vanishing.

There are a number of ways to see the positivity, but we will start with the Euler product identity

from (22). We can “twist” this identity by replacing by for any Dirichlet character , which by the complete multiplicativity of gives

for any , where we allow for logarithms to be ambiguous up to multiples of . By Taylor expansion, we thus have

for (cf. (28)). Summing this for , we have

From (68) we see that . In particular, the left-hand side is non-negative. Exponentiating, we conclude the lower bound

Now we let . For non-principal characters , we see from Exercise 72 that stays bounded as , and decays like if vanishes. For the principal character , we will just use the crude upper bound . By (11), we conclude that if two or more are vanishing, then the product will go to zero as , contradicting (73), and the claim follows.

Call a Dirichlet character *real* if it only takes real values, and *complex* if it is not real. For instance, the character of modulus that takes the values on , on , on , on , and on is a complex character. The above theorem, together with conjugation symmetry, quickly disposes of the complex characters , as such characters can be “repelled” by their complex conjugates:

Corollary 77Let be a complex character. Then .

*Proof:* If is a complex character of some modulus , then its complex conjugate is a different complex character with the same modulus , and . If vanishes, we therefore have at least two non-principal characters of modulus whose -function vanishes at , contradicting Theorem 73.

This only leaves the case of real non-principal characters to deal with. These characters are also known as *quadratic characters*, as is the principal character; they are also connected to quadratic number fields, as we will discuss in a subsequent post. In this case, we cannot exploit the repulsion phenomenon, as we now only have one character for which vanishes. On the other hand, for quadratic characters we have a much simpler positivity property, namely that

for all natural numbers . Actually, it is convenient to use a variant of this positivity property, namely that

which can be proven first by working in the case that is a power of a prime and using (74), and then using multiplicativity to handle the general case. Crucially, we can do a little better than this: we can improve (75) to

whenever is a perfect square. Again, this can be verified by first working in the case when is an even prime power.

It is now natural to consider sums such as to exploit this positivity. It turns out that the best choice of to use here is , that is to say to control the sum

On the one hand, from positivity on the squares (76), we can bound this sum by

for (say), thanks to (13). On the other hand, we can expand (77) using the Dirichlet hyperbola method (cf. (43)) as

From (12) one has

while from Lemma 71 we have

and so (using the trivial bound to control error terms) the previous expression can be rewritten as

The final error is . From Lemma 71 we have . Inserting (72), we conclude that

If vanishes, then this leads to a contradiction if is large enough. This concludes the proof of Theorem 73, and hence Dirichlet’s theorem.

Remark 78The inequality (78) in fact shows that is positive for every real character . In fact, with the assistance of some algebraic number theory, one can show the class number formula which asserts (roughly speaking) that is proportional to the class number of a certain quadratic number field. This will be discussed in a subsequent post.

Exercise 79By using an effective version of the above arguments, establish the lower bound

for all non-principal characters of modulus (both real and complex).

Remark 80The bound (79) is very poor and can be improved. For instance, the class number formula alluded to in the previous remark gives the effective bound for real non-principal characters. In later notes we will also establishSiegel’s theorem, which gives anineffectivebound of the form for such characters, and any .

Exercise 81Let be a non-principal character. Show that the sum is conditionally convergent. Then show that the product is conditionally convergent to .

[The exercise below will be moved to a more appropriate location in the published version of these notes, but is placed at the end for now to avoid renumbering issues.]

Exercise 82Show thatfor any . Conclude that the proportion of pairs of natural numbers in the square which are coprime converges to as .

## 82 comments

Comments feed for this article

23 November, 2014 at 10:26 pm

AnonymousGood

24 November, 2014 at 1:57 am

MaDiha HouRithank you

24 November, 2014 at 5:24 am

AnonymousBeyond excellence!

24 November, 2014 at 7:15 am

tomcircleReblogged this on Math Online Tom Circle.

24 November, 2014 at 9:20 am

Eytan PaldiCan theorem 7 be generalized for any ?

24 November, 2014 at 10:11 am

Terence TaoYes (after replacing factorials with Gamma functions, of course), for instance by using the arguments of Wirsing (which are basically a refined version of what is needed to solve Exercise 12, which also works for non-integer k). I’ve updated the notes with this remark.

24 November, 2014 at 12:45 pm

Eytan PaldiThanks! (in fact, this seems plausible because part (i) is proved for any , and the main terms of parts (ii), (iii) can be calculated by Perron’s formula from the main term in (i). Although the estimation of the remainder terms in (ii), (iii) seems to be the real difficulty!)

24 November, 2014 at 9:21 pm

Scott MorrisonEquation (8) is identical to the equation before it.

[Duplicate equation removed – T.]25 November, 2014 at 1:09 am

AnonymousIs there a sign error in the “In particular” part of Lemma 2(ii)?

[No, but I have swapped around x and y to avoid the confusion. -T.]25 November, 2014 at 10:54 am

anthonyquasI see now. Sorry.

25 November, 2014 at 10:48 am

schlaflyDo you always exclude zero from N, the natural numbers? Maybe that is more convenient for 19th century multiplicative number theory, but including the additive identity seems preferable for just about every other area of math.

25 November, 2014 at 12:08 pm

Terence TaoThe convention does vary with the field, see my earlier comment at https://terrytao.wordpress.com/books/analysis-i/#comment-439818 . Personally, when such minor notational conflicts occur, I believe it better to adapt the notation to the local application at hand, rather than artificially enforce a globally consistent notation, the justification for which may be completely irrelevant for the application.

In number theory, for instance, both the fundamental theorem of arithmetic and the definition of the zeta function (and other Dirichlet series) are much more natural to state if zero is excluded from . These concepts (and their generalisations to other number fields, etc.) are both fundamentally important concepts to the subject (in the 19th, 20th, or 21st centuries), so it makes sense to reserve the symbol in this context for the multiplicative natural numbers . In contrast, the additive natural numbers plays a much more minor role in analytic number theory; in most cases, the multiplicative natural numbers or the integers can be used instead as a substitute. (Perhaps the one exception is in combinatorial number theory, in which cardinalities of (possibly empty) sets play a more prominent role, although even in those cases the empty set case is often degenerate and needs to be removed anyway.)

25 November, 2014 at 4:00 pm

anthonyquasIn the discussion after (26), there is the temporary requirement that $s>1$ is sufficiently close to 1. This appeared in the formal derivation of (25), but as far as I could tell the rigorous derivation in Exercise 8 is valid for all $s>1$. Or did I miss something here?

[You’re right, the restriction is unnecessary (since is clearly non-zero for all ). -T.]25 November, 2014 at 4:08 pm

anthonyquasIn the second equation of (15), it looks as though the main term should be ?

[Corrected, thanks – T.]26 November, 2014 at 9:44 am

David SpeyerHoly cow, that’s lecture 1? That looks like a whole course!

Slightly more substantively, I now know a new argument for nonvanishing of , thanks. It’s interesting that you manage to use a computation at to control what happens at ; is there some intuitive way to think about that?

26 November, 2014 at 10:07 am

Terence TaoI don’t plan to cover all of the sections of the notes explicitly in class (for instance, I will skip Section 5). I expect this set of notes to take maybe five lectures to get through.

The way I think of the argument is that this exponent is the natural exponent associated to the squares , just as is the natural exponent associated to the natural numbers . The argument here is exploiting information at the squares, namely that on the square numbers, so is the natural place to try to manipulate that information at the Dirichlet series level. But by the same token, the fact that is so far from makes this argument quite poor from a quantitative point of view.

23 February, 2015 at 8:31 am

FanMy feeling on this 1/2 v.s. 1 problem is a little different: it is because we are not powerful enough (at this point) to directly show the Dirichlet series of diverges at 1. If we were allowed to use quadratic residues, we would be able to include something more than the squares, e.g., integers of the form , which must have an even number of factors of of which is not a quadratic residue. Now the set of integers has enough density to allow us to take s=1 and still get divergence… But wait, am I cheating by implicitly using something related to the class number?

26 November, 2014 at 10:44 am

Eytan PaldiFrom the proof of theorem 7, it seems that condition (38) can be relaxed to

For any fixed .

28 November, 2014 at 6:11 am

Maga PéterBeautiful notes! Thanks.

In the last display of Theorem 10 (second Mertens’), {\log \log} should be. The middle display of Corollary 18 (cheap Brun-Titchmarsh) is an inequality. The hyperlinks (42) point to (36). In Exercise 21, ‘itme’ is written.

[Corrected, thanks, though I am unsure what the first issue was – T.]28 November, 2014 at 11:33 am

Maga PéterSorry, I mean the proof of Theorem 10. I guess the integral in the last display is evaluated incorrectly, it should be in place of . And also in place of . Sorry if speak nonsense.

[Corrected, thanks – T.]28 November, 2014 at 7:13 am

Anonymous(24) is undefined for .

[Corrected, thanks – T.]28 November, 2014 at 9:40 am

AnonymousIn (25) can be replaced (as in (39)) by .

[Corrected, thanks – T.]29 November, 2014 at 12:31 am

Anthony QuasHere’s an entry for the most trivial typo competition: There is a repeated ‘the’ half way through the proof of Theorem 7 (iii).

[Corrected, thanks – T.]29 November, 2014 at 12:12 pm

anthonyquasFor the base case of Theorem 7 (ii) when you’re estimating the tail of the Dirichlet series, I think you need to take a power greater than 1/2 to ensure that . If for all and for all other powers and you use the square root, then you get . Is this right?

[Corrected, thanks – T.]29 November, 2014 at 12:46 pm

anthonyquasI think the same issue arises in the case of Theorem 7(iii).

[Corrected, thanks – T.]29 November, 2014 at 12:26 pm

anthonyquasFor the inductive step of Theorem 7(ii), I think there’s a small mistake in the relationship between and . I think it should read

[Corrected, thanks – T.]29 November, 2014 at 4:19 pm

anthonyquasI think there is a mistake in Exercise 16 (i). If , then (that is there is a extra factor of that does not appear in the integral equation).

29 November, 2014 at 4:44 pm

anthonyquasI think the middle term of (41) should be . Also, this term shows up 6 lines below (but it is the that is used in the calculation).

[Corrected, thanks – T.]30 November, 2014 at 8:33 am

petegCan anyone help me understand how Lemma 1 implies equation 8? Where does the log(2+x) in equation 8 come from?

[Apply Lemma 1 with and . The is a convenient form of the error term that can absorb both the and the errors that arise from Lemma 1. -T.]30 November, 2014 at 3:24 pm

anthonyquasIn Exercise 21(i), presumably should be greater than 1 for the statement to make sense.

[Actually, I believe the statement is still true (albeit somewhat trivial) in the regime. -T.]1 December, 2014 at 12:13 pm

anthonyquasSo I think (48) has a sign error. Also, I think it follows more from (21) and (28) rather than (29).

[Corrected, thanks – T.]1 December, 2014 at 12:49 pm

anthonyquasI think there’s a superfluous exponent in the principal term of the right side of the first equation in Corollary 9.

[Corrected, thanks – T.]1 December, 2014 at 1:44 pm

AnonymousIn Lemma 2 (ii), if taking y=1 in (i), should c= – F(1)+O(g(1))?

[Corrected, thanks – T.]1 December, 2014 at 1:46 pm

cnpdeReblogged this on Qingtian's Garden 晴天的后花园.

1 December, 2014 at 5:29 pm

anthonyquasA typo and two comments/questions.

Typo: In the last line of Lemma 15, the polynomial should be .

Q1: I hope it’s OK that I’m pointing out a succession of essentially trivial corrections. I’ve always wanted to know something about this field, and am trying to read through the notes in detail and do the exercises. Let me know if you’ve had enough of me point out these minor issues;

Q2: Having now spent a chunk of time reading these notes, I have a minor formatting comment. The fact that there are many different parallel indexing sequences (Theorems/Lemmas in one sequence; Exercises in another; Remarks in another) makes eyeball navigation using scrollbars a bit tricky (for me at least).

3 December, 2014 at 8:47 am

Terence TaoThanks for all the corrections! Given that my class will be going through these notes soon, I am sure they will appreciate all the proofreading.

As an experiment, I have put all the theorems/lemmas, remarks, and exercises under a single numbering system (which makes some of the previous references to the numbering in the blog comments to now be incorrect, unfortunately). It would be good to get some feedback as to whether this is an improvement that should be continued in subsequent notes.

1 December, 2014 at 5:57 pm

anthonyquasIn (59), I wonder if you need to point out that means the pointwise product of and . I spent some time trying to understand why this was true as a composition(!)

[Clarified, thanks – T.]2 December, 2014 at 1:15 pm

anthonyquasIn the proof of Theorem 16, at the beginning of the proof of (ii) and (iii) jointly imply (i), I think the equality should read .

[Corrected, thanks – T.]2 December, 2014 at 4:49 pm

anthonyquasSo I’m having a hard time understanding the error term in the displayed equation prior to (76). The error term in the sum of the is of size . This is divided by (giving ) and summed over . Why is this ? Apologies if I’m missing something obvious.

3 December, 2014 at 8:18 am

Anthony QuasI think you can write the term as . This seems that it would be enough to control the error, right?

3 December, 2014 at 8:45 am

Terence TaoUnfortunately it’s more complicated than that because need not be an integer. But actually the Dirichlet hyperbola method works here (the usual proof in fact uses this method; I had experiented with not using this method to try to simplify the argument and made an error in the computations); I’ve reverted back to that proof here.

An alternative method is to use a smoother sum, such as for some bump function supported on some compact interval in . It is particularly convenient to take of the form for some other bump function supported near , since in that case the sum factors cleanly as . Actually this is ultimately the superior way to proceed here (it ends up giving a lower bound on that is polynomial in q rather than exponential in q), but is further from the spirit of the elementary approach of this blog post, so I did not detail it here (it is more of a Fourier-analytic approach, as one needs things like the Poisson summation formula to get the best estimates).

3 December, 2014 at 8:30 am

apatchShould the error term in (10) above be of order ?

[Corrected, thanks – T.]4 December, 2014 at 2:30 am

AnonymousThe alternate notation for the divisor function may be used in cases when denotes Ramanujan tau function (which is also multiplicative.)

4 December, 2014 at 9:06 am

Enrique TreviñoIn the proof of proposition 18 it seems one should take into account the case . If , then is nonzero for any (in fact it would be ).

So it seems there is an extra error term of size: . Maybe I'm missing something here.

4 December, 2014 at 9:59 am

jcsI think you’re right. In fact the first sum in the second member of the first equation of the proof is O(x) as one can see by integrating by part, and so the main contribution, in O(xLog(x)), comes precisely from those {p\gep\sqrt{x}} which have to be taken into account as you point out

4 December, 2014 at 11:40 am

Terence TaoGah, you’re right, my proposed proof of Proposition 18 does not work at all. In retrospect this is obvious, as is supported on numbers with very few factors, so rearranging based on the factors of will not do very much. I’ve rearranged the material to give instead the textbook proof of Chebyshev’s bound at this point (which I personally find a bit too slick, but the only other proofs I know require material that is covered later in the notes, such as Mobius inversion).

11 December, 2014 at 3:59 pm

bradThere’s an approach on p. 33 of Iwaniec and Kowalski that while still ‘slick’ in its application of monotonicity, doesn’t involve inspecting binomial coefficients. It invokes your identity (31) instead of using Mobius inversion proper.

11 December, 2014 at 4:18 pm

Terence TaoActually, the approach in I&K uses binomial coefficients in disguise (note, for instance, that the quantity in I&K is simply the logarithm of the binomial coefficient if is an even natural number)

11 December, 2014 at 4:23 pm

bradYah.. I just came back because I had that realization. The phrasing is more natural there anyway. :)

11 December, 2014 at 6:02 pm

bradThese are very nice notes by the way. Along with everyone else I’m enjoying reading them!

4 December, 2014 at 11:22 am

apatchIn Ex 4 did you mean that the lead term is $l!t^k$?

[Corrected, thanks – T.]7 December, 2014 at 7:09 pm

Santiago LópezCuáles son los últimos adelantos en cuanto a los Primos de Sophie Germain?

Encontré una serie que involucra a los primos de Sophie Germain con la constante sqrt(2)-1 (razón plateada)

Gracias

13 December, 2014 at 7:39 am

apatchEx 8: I wasn’t able to see how to use the hint but using Abel’s Identity I got the equality valid for all s>1. But my little-o term is of the right order but based on x tending to infinity, not s tending to .

What is the right way of looking at this?

Thanks

13 December, 2014 at 8:24 am

Terence TaoThere is no dependence on in either side of the desired asymptotic , so I recommend going through your argument again and clarifying the role of the variable.

The purpose of the hint is to be able to rewrite as an integral involving the logarithmic sums , after interchanging a summation and integral.

A warmup exercise may be to prove the classical fact that any sequence of complex numbers that converges to a limit, also converges in the Cesaro sense.

7 January, 2015 at 4:06 pm

AnonymousIs the reason why we use Monotone Convergence to show (18) because we want to justify the interchange of limits in ? If so, how does Monotone Convergence apply? Doesn’t monotone convergence just deal with interchanging limits and integrals/sums?

[By using logarithms, monotone convergence can also be used to interchange limits and products, if the factors in the products are at least one. -T.]9 January, 2015 at 4:02 pm

qiuHow did you obtain the crude bound in the proof of Theorem 15?

[ converges. -T.]11 January, 2015 at 8:30 am

kazekkurzI do not know why O(s-1) i (22) appears? In (21) there was only O(1) and the step was taking log on both sides…

11 January, 2015 at 10:17 am

Terence Taois factored as , and from the Taylor expansion of the logarithm one has .

11 January, 2015 at 10:17 am

kazekkurzThank You!

19 January, 2015 at 3:41 am

AnonymousBy considering in exercise 11, it seems that the RHS holds also with replacing , so it seems that the RHS can be improved by replacing with – which is also more symmetric.

Moreover, if are integers, it seems that this result may be generalized for bounded variation functions simply by replacing the remainder term in the RHS by the (total) variation of (by considering as a difference between two functions with monotone real and imaginary parts – for which the result seems to be easy.)

24 January, 2015 at 9:50 pm

AnonymousI think there’s an extra in front of the big-Oh term in the centered equation before (78).

[Corrected, thanks – T.]7 February, 2015 at 12:07 pm

254A, Notes 5: Bounding exponential sums and the zeta function | What's new[…] the arguments, we will use an analogy between the sums and the integrals (cf. Exercise 11 from Notes 1). This analogy can be made rigorous by the Poisson summation formula (after applying some smoothing […]

13 February, 2015 at 10:16 pm

254A, Notes 6: Large values of Dirichlet polynomials, zero density estimates, and primes in short intervals | What's new[…] from mean value theorems on multiplicative functions (see Proposition 21 of Notes 1), if one is allowed to delete some unit intervals from the range of , with the bound improving as […]

15 February, 2015 at 5:03 pm

254A, Notes 3: The large sieve and the Bombieri-Vinogradov theorem | What's new[…] we have seen in Notes 1, the von Mangoldt function does indeed admit several factorisations into Dirichlet convolution […]

18 February, 2015 at 1:51 pm

254A, Notes 4: Some sieve theory | What's new[…] from Mertens’ third theorem (Theorem 42 from Notes 1) one […]

22 February, 2015 at 9:02 am

254A, Notes 7: Linnik’s theorem on primes in arithmetic progressions | What's new[…] Theorem 69 of Notes 1), we thus […]

24 February, 2015 at 11:29 am

254A, Supplement 6: A cheap version of the theorems of Halasz and Matomaki-Radziwill | What's new[…] adequately estimated by Mertens’ theorem, which can be easily proven by elementary means (see Notes 1); but a satisfactory estimate on the summatory function requires the prime number theorem, which […]

28 February, 2015 at 6:32 am

jtI believe that the polynomial in part (ii) of Exercise 9 should say Q_{k,l}(t) instead of Q_k(t) after the statement of the formula.

[Corrected, thanks – T.]17 March, 2015 at 8:34 pm

apatchShould the big-oh term in Ex11 include ?

[One can add this factor if desired, but it is not necessary since it is controlled by the other two error terms. – T.]30 March, 2015 at 12:50 pm

254A, Notes 8: The Hardy-Littlewood circle method and Vinogradov’s theorem | What's new[…] (Hint: for the latter estimate, use Theorem 27 of Notes 1.) […]

4 April, 2015 at 6:56 am

AnonymousIn regards to exercise 11, is there a good explanation as to why $e^(i n t)$ has zero mean value ($t \neq 2k\pi$), but when one replaces the $n$ with a $\log(n)$, this is no longer the case?

13 April, 2015 at 6:11 am

KaraskasExercise 31: It should be instead of .

[Corrected, thanks – T.]14 April, 2015 at 3:04 am

AnonymousThe summand should be .

[Corrected, thanks – T.]26 August, 2015 at 4:25 pm

Heath-Brown’s theorem on prime twins and Siegel zeroes | What's new[…] , one can show through summation by parts (see Lemma 71 of this previous post) […]

9 February, 2016 at 11:32 pm

Jarek KubenSmall typographic remark: you’re using (Fraktur G) instead of (Fraktur S) for singular series. Same in Notes 2, however you got it right in Supplement 4 and Notes 4. :)

[Corrected, thanks – T.]14 March, 2016 at 4:22 pm

Biases between consecutive primes | What's new[…] One can also establish (2) (with smoothing) using elementary number theory methods (as in this previous post); we sketch the argument as follows. We can factor as a Dirichlet […]

2 July, 2016 at 6:51 am

254A, Supplement 3: The Gamma function and the functional equation (optional) | What's new[…] use a trick previously employed to prove Lemma 40 of Notes 1. By (12) and the dominated convergence theorem, we […]

31 August, 2016 at 12:28 pm

Heuristic computation of correlations of higher order divisor functions | What's new[…] is a certain explicit polynomial of degree with leading coefficient ; see e.g. Exercise 31 of this previous post for a discussion of the case (which is already typical). Similarly if . For more general , there […]

22 September, 2016 at 4:13 am

AnonymousThe 6th line in the proof of Theorem 36: o(1) should be O(1).

[Actually, I think the equation is correct as it stands – T.]23 September, 2016 at 1:53 am

AnonymousDid you implicitly assume that \frac{\log y}{\log x}-1=o(1)?

23 September, 2016 at 2:03 am

AnonymousPlease ignore my previous comment. I forgot to add the first summand in the final computation. Thanks!

8 October, 2016 at 10:17 am

AnonymousIn Remark 49: should be ?

[Corrected, thanks – T.]