A fundamental and recurring problem in analytic number theory is to demonstrate the presence of *cancellation* in an oscillating sum, a typical example of which might be a correlation

between two arithmetic functions and , which to avoid technicalities we will assume to be finitely supported (or that the variable is localised to a finite range, such as ). A key example to keep in mind for the purposes of this set of notes is the twisted von Mangoldt summatory function

that measures the correlation between the primes and a Dirichlet character . One can get a “trivial” bound on such sums from the triangle inequality

for instance, from the triangle inequality and the prime number theorem we have

as . But the triangle inequality is insensitive to the phase oscillations of the summands, and often we expect (e.g. from the probabilistic heuristics from Supplement 4) to be able to improve upon the trivial triangle inequality bound by a substantial amount; in the best case scenario, one typically expects a “square root cancellation” that gains a factor that is roughly the square root of the number of summands. (For instance, for Dirichlet characters of conductor , it is expected from probabilistic heuristics that the left-hand side of (3) should in fact be for any .)

It has proven surprisingly difficult, however, to establish significant cancellation in many of the sums of interest in analytic number theory, particularly if the sums do not have a strong amount of algebraic structure (e.g. multiplicative structure) which allow for the deployment of specialised techniques (such as multiplicative number theory techniques). In fact, we are forced to rely (to an embarrassingly large extent) on (many variations of) a single basic tool to capture at least some cancellation, namely the Cauchy-Schwarz inequality. In fact, in many cases the classical case

considered by Cauchy, where at least one of is finitely supported, suffices for applications. Roughly speaking, the Cauchy-Schwarz inequality replaces the task of estimating a *cross-correlation* between two different functions , to that of measuring *self-correlations* between and itself, or and itself, which are usually easier to compute (albeit at the cost of capturing less cancellation). Note that the Cauchy-Schwarz inequality requires almost no hypotheses on the functions or , making it a very widely applicable tool.

There is however some skill required to decide exactly how to deploy the Cauchy-Schwarz inequality (and in particular, how to select and ); if applied blindly, one loses all cancellation and can even end up with a worse estimate than the trivial bound. For instance, if one tries to bound (2) directly by applying Cauchy-Schwarz with the functions and , one obtains the bound

The right-hand side may be bounded by , but this is worse than the trivial bound (3) by a logarithmic factor. This can be “blamed” on the fact that and are concentrated on rather different sets ( is concentrated on primes, while is more or less uniformly distributed amongst the natural numbers); but even if one corrects for this (e.g. by weighting Cauchy-Schwarz with some suitable “sieve weight” that is more concentrated on primes), one still does not do any better than (3). Indeed, the Cauchy-Schwarz inequality suffers from the same key weakness as the triangle inequality: it is insensitive to the phase oscillation of the factors .

While the Cauchy-Schwarz inequality can be poor at estimating a single correlation such as (1), its power improves when considering an average (or sum, or square sum) of *multiple* correlations. In this set of notes, we will focus on one such situation of this type, namely that of trying to estimate a square sum

that measures the correlations of a single function with multiple other functions . One should think of the situation in which is a “complicated” function, such as the von Mangoldt function , but the are relatively “simple” functions, such as Dirichlet characters. In the case when the are orthonormal functions, we of course have the classical Bessel inequality:

Lemma 1 (Bessel inequality)Let be finitely supported functions obeying the orthonormality relationshipfor all . Then for any function , we have

For sake of comparison, if one were to apply the Cauchy-Schwarz inequality (4) separately to each summand in (5), one would obtain the bound of , which is significantly inferior to the Bessel bound when is large. Geometrically, what is going on is this: the Cauchy-Schwarz inequality (4) is only close to sharp when and are close to parallel in the Hilbert space . But if are orthonormal, then it is not possible for any other vector to be simultaneously close to parallel to too many of these orthonormal vectors, and so the inner products of with most of the should be small. (See this previous blog post for more discussion of this principle.) One can view the Bessel inequality as formalising a repulsion principle: if correlates too much with some of the , then it does not have enough “energy” to have large correlation with the rest of the .

In analytic number theory applications, it is useful to generalise the Bessel inequality to the situation in which the are not necessarily orthonormal. This can be accomplished via the Cauchy-Schwarz inequality:

Proposition 2 (Generalised Bessel inequality)Let be finitely supported functions, and let be a non-negative function. Let be such that vanishes whenever vanishes, we havefor some sequence of complex numbers with , with the convention that vanishes whenever both vanish.

Note by relabeling that we may replace the domain here by any other at most countable set, such as the integers . (Indeed, one can give an analogue of this lemma on arbitrary measure spaces, but we will not do so here.) This result first appears in this paper of Boas.

*Proof:* We use the *method of duality* to replace the role of the function by a dual sequence . By the converse to Cauchy-Schwarz, we may write the left-hand side of (6) as

for some complex numbers with . Indeed, if all of the vanish, we can set the arbitrarily, otherwise we set to be the unit vector formed by dividing by its length. We can then rearrange this expression as

Applying Cauchy-Schwarz (dividing the first factor by and multiplying the second by , after first removing those for which vanish), this is bounded by

and the claim follows by expanding out the second factor.

Observe that Lemma 1 is a special case of Proposition 2 when and the are orthonormal. In general, one can expect Proposition 2 to be useful when the are *almost orthogonal* relative to , in that the correlations tend to be small when are distinct. In that case, one can hope for the diagonal term in the right-hand side of (6) to dominate, in which case one can obtain estimates of comparable strength to the classical Bessel inequality. The flexibility to choose different weights in the above proposition has some technical advantages; for instance, if is concentrated in a sparse set (such as the primes), it is sometimes useful to tailor to a comparable set (e.g. the almost primes) in order not to lose too much in the first factor . Also, it can be useful to choose a fairly “smooth” weight , in order to make the weighted correlations small.

Remark 3In harmonic analysis, the use of tools such as Proposition 2 is known as themethod of almost orthogonality, or themethod. The explanation for the latter name is as follows. For sake of exposition, suppose that is never zero (or we remove all from the domain for which vanishes). Given a family of finitely supported functions , consider the linear operator defined by the formulaThis is a bounded linear operator, and the left-hand side of (6) is nothing other than the norm of . Without any further information on the function other than its norm , the best estimate one can obtain on (6) here is clearly

where denotes the operator norm of .

The adjoint is easily computed to be

The composition of and its adjoint is then given by

From the spectral theorem (or singular value decomposition), one sees that the operator norms of and are related by the identity

and as is a self-adjoint, positive semi-definite operator, the operator norm is also the supremum of the quantity

where ranges over unit vectors in . Putting these facts together, we obtain Proposition 2; furthermore, we see from this analysis that the bound here is essentially optimal if the only information one is allowed to use about is its norm.

For further discussion of almost orthogonality methods from a harmonic analysis perspective, see Chapter VII of this text of Stein.

Exercise 4Under the same hypotheses as Proposition 2, show thatas well as the variant inequality

Proposition 2 has many applications in analytic number theory; for instance, we will use it in later notes to control the large value of Dirichlet series such as the Riemann zeta function. One of the key benefits is that it largely eliminates the need to consider further correlations of the function (other than its self-correlation relative to , which is usually fairly easy to compute or estimate as is usually chosen to be relatively simple); this is particularly useful if is a function which is significantly more complicated to analyse than the functions . Of course, the tradeoff for this is that one now has to deal with the coefficients , which if anything are even less understood than , since literally the only thing we know about these coefficients is their square sum . However, as long as there is enough almost orthogonality between the , one can estimate the by fairly crude estimates (e.g. triangle inequality or Cauchy-Schwarz) and still get reasonably good estimates.

In this set of notes, we will use Proposition 2 to prove some versions of the *large sieve inequality*, which controls a square-sum of correlations

of an arbitrary finitely supported function with various additive characters (where ), or alternatively a square-sum of correlations

of with various primitive Dirichlet characters ; it turns out that one can prove a (slightly sub-optimal) version of this inequality quite quickly from Proposition 2 if one first prepares the sum by inserting a smooth cutoff with well-behaved Fourier transform. The large sieve inequality has many applications (as the name suggests, it has particular utility within *sieve theory*). For the purposes of this set of notes, though, the main application we will need it for is the Bombieri-Vinogradov theorem, which in a very rough sense gives a prime number theorem in arithmetic progressions, which, “on average”, is of strength comparable to the results provided by the Generalised Riemann Hypothesis (GRH), but has the great advantage of being unconditional (it does not require any unproven hypotheses such as GRH); it can be viewed as a significant extension of the Siegel-Walfisz theorem from Notes 2. As we shall see in later notes, the Bombieri-Vinogradov theorem is a very useful ingredient in sieve-theoretic problems involving the primes.

There is however one additional important trick, beyond the large sieve, which we will need in order to establish the Bombieri-Vinogradov theorem. As it turns out, after some basic manipulations (and the deployment of some multiplicative number theory, and specifically the Siegel-Walfisz theorem), the task of proving the Bombieri-Vinogradov theorem is reduced to that of getting a good estimate on sums that are roughly of the form

for some primitive Dirichlet characters . This looks like the type of sum that can be controlled by the large sieve (or by Proposition 2), except that this is an ordinary sum rather than a square sum (i.e., an norm rather than an norm). One could of course try to control such a sum in terms of the associated square-sum through the Cauchy-Schwarz inequality, but this turns out to be very wasteful (it loses a factor of about ). Instead, one should try to exploit the special structure of the von Mangoldt function , in particular the fact that it can be expressible as a Dirichlet convolution of two further arithmetic sequences (or as a finite linear combination of such Dirichlet convolutions). The reason for introducing this convolution structure is through the basic identity

for any finitely supported sequences , as can be easily seen by multiplying everything out and using the completely multiplicative nature of . (This is the multiplicative analogue of the well-known relationship between ordinary convolution and Fourier coefficients.) This factorisation, together with yet another application of the Cauchy-Schwarz inequality, lets one control (7) by square-sums of the sort that can be handled by the large sieve inequality.

As we have seen in Notes 1, the von Mangoldt function does indeed admit several factorisations into Dirichlet convolution type, such as the factorisation . One can try directly inserting this factorisation into the above strategy; it almost works, however there turns out to be a problem when considering the contribution of the portion of or that is supported at very small natural numbers, as the large sieve loses any gain over the trivial bound in such settings. Because of this, there is a need for a more sophisticated decomposition of into Dirichlet convolutions which are non-degenerate in the sense that are supported away from small values. (As a non-example, the trivial factorisation would be a totally inappropriate factorisation for this purpose.) Fortunately, it turns out that through some elementary combinatorial manipulations, some satisfactory decompositions of this type are available, such as the Vaughan identity and the Heath-Brown identity. By using one of these identities we will be able to complete the proof of the Bombieri-Vinogradov theorem. (These identities are also useful for other applications in which one wishes to control correlations between the von Mangoldt function and some other sequence; we will see some examples of this in later notes.)

For further reading on these topics, including a significantly larger number of examples of the large sieve inequality, see Chapters 7 and 17 of Iwaniec and Kowalski.

Remark 5We caution that the presentation given in this set of notes is highly ahistorical; we are using modern streamlined proofs of results that were first obtained by more complicated arguments.

** — 1. The large sieve inequality — **

We begin with a (slightly weakened) form of the large sieve inequality for additive characters, also known as the *analytic large sieve inequality*, first extracted explicitly by Davenport and Halberstam from previous work on the large sieve, and then refined further by many authors (see Remark 7 below).

Proposition 6 (Analytic large sieve inequality)Let be a function supported on an interval for some and , and let be -separated (thus for all and some , where denotes the distance from to the nearest integer. Then

One can view this proposition as a variant of the Plancherel identity

associated to the Fourier transform on a cyclic group . This identity also shows that apart from the implied constant, the bound (9) is essentially best possible.

*Proof:* By increasing (if ) or decreasing (if ), we can reduce to the case without making the hypotheses any stronger. Thus the are now -separated, and our task is now to show that

We now wish to apply Proposition 2 (with relabeled by ), but we need to choose a suitable weight function . It turns out that it is advantageous for technical reasons to select a weight that has good Fourier-analytic properties.

Fix a smooth non-negative function supported on and not identically zero; we allow implied constants to depend on . (Actually, the smoothness of is not absolutely necessary for this argument; one could take below if desired.) Consider the weight defined by

Observe that for all . Thus, by Proposition 2 with this choice of , we reduce to showing that

whenever are complex numbers with .

We first consider the diagonal contribution . We can write in terms of the Fourier transform of as

and hence by Exercise 28 of Supplement 2, we have the bounds

Thus the diagonal contribution of (10) is acceptable.

Now we consider an off-diagonal term with . By the Poisson summation formula (Theorem 34 from Supplement 2), we can rewrite this as

Now from the Fourier inversion formula, the function has Fourier transform supported on , and so has Fourier transform supported in . Since is at least from the nearest integer, we conclude that for all integers , and hence all the off-diagonal terms in fact vanish! The claim follows.

Remark 7If are integers, one can in fact obtain the sharper bounda result of Montgomery and Vaughan (with replaced by , although it was observed subsequently subsequently by Paul Cohen that the additional term could be deleted by an amplification trick similar to those discussed in this previous post). See this survey of Montgomery for these results and on the (somewhat complicated) evolution of the large sieve, starting with the pioneering work of Linnik. However, in our applications the cruder form of the analytic large sieve inequality given by Proposition 6 will suffice.

Exercise 8Let be a function supported on an interval . Show that for any , one hasfor all outside of the union of intervals (or arcs) of length . In particular, we have

for

all, and the significantly superior estimatefor

most(outside of at most (say) intervals of length ).

Exercise 9 (Continuous large sieve inequality)Let be an interval for some and , and let be -separated.

- (i) For any complex numbers , show that
(

Hint:replace the restriction of to with the weight used to prove Proposition 6.)- (ii) For any continuous , show that

Now we establish a variant of the analytic large sieve inequality, involving Dirichlet characters, due to Bombieri and Davenport.

Proposition 10 (Large sieve inequality for characters)Let be a function supported on an interval for some and , and let . Thenwhere denotes the sum over all primitive Dirichlet characters of modulus .

*Proof:* By increasing or decreasing , we may assume that , so and our task is now to show that

Let be the weight from Proposition 6. By Proposition 2 (and some relabeling), using the functions , it suffices to show that

whenever are complex numbers with

We first establish that the diagonal contribution to (13) is acceptable, that is to say that

The function is the principal character of modulus ; it is periodic with period and has mean value . In particular, since , we see that for any interval of length . From (11) and a partition into intervals of length , we see that

and the claim then follows from (14).

Now consider an off-diagonal term with . As are primitive, this implies that is a non-principal character and thus has mean zero. Let be the modulus of this character; by Fourier expansion we may write as a linear combination of the additive characters for . (We can obtain explicit coefficients from this expansion by invoking Lemma 48 of Supplement 3, but we will not need those coefficients here.) Thus is a linear combination of the quantities for . But the modulus is the least common multiple of the moduli of , so in particular , while as observed in the proof of Proposition 6, we have whenever . So the off-diagonal terms all vanish, and the claim follows.

One can also derive Proposition 10 from Proposition 6:

Exercise 11Let be a finitely supported sequence.

- (i) For any natural number , establish the identity
where is the set of congruence classes coprime to , and is the sum over all characters (not necessarily primitive) of modulus .

- (ii) For any natural number , establish the inequality
and use this and Proposition 6 to derive Proposition 10. (

Hint:use Lemma 48 from Supplement 3.)

Remark 12By combining the arguments in the above exercise with the results in Remark 7, one can sharpen Proposition 10 tothat is to say one can delete the implied constant. See this paper of Montgomery and Vaughan for some further refinements of this inequality.

** — 2. The Barban-Davenport-Halberstam theorem — **

We now apply the large sieve inequality for characters to obtain an analogous inequality for arithmetic progressions, due independently to Barban, and to Davenport and Halberstam; we state a slightly weakened form of that theorem here. For any finitely supported arithmetic function and any primitive residue class , we introduce the *discrepancy*

This quantity measures the extent to which is well distributed among the primitive residue classes modulo . From multiplicative Fourier inversion (see Theorem 69 from Notes 1) we have the identity

where the sum is over non-principal characters of modulus .

Theorem 13 (Barban-Davenport-Halberstam)Let , and let be a function supported on with the property thatand obeying the

Siegel-Walfisz propertyfor any fixed , any primitive residue class , and any . Then one has

for any , provided that for some sufficiently large depending only on .

Informally, (18) is asserting that

for “most” primitive residue classes with much smaller than ; in most applications, the trivial bounds on are of the type , so this represents a savings of an arbitrary power of a logarithm on the average. Note that a direct application of (17) only gives (18) for of size ; it is the large sieve which allows for the significant enlargement of .

*Proof:* Let be as above, with for some large to be chosen later. From (15) and the Plancherel identity one has

We cannot apply the large sieve inequality yet, because the characters here are not necessarily primitive. But we may express any non-principal character as for some primitive character of conductor , where are natural numbers with . In particular, and . Thus we may (somewhat crudely) upper bound the left-hand side of (19) by

From Theorem 27 of Notes 1 we have , so we may bound the above by

By dyadic decomposition (and adjusting slightly), it thus suffices to show that

From Proposition 10 and (16), we may bound the left-hand side of (20) by . If and , then we obtain (20) if is sufficiently large depending on . The only remaining case to consider is when . But from the Siegel-Walfisz hypothesis (17) we easily see that

for any and any primitive character of conductor . Since the total number of primitive characters appearing in (20) is , the claim follows by taking large enough.

One can specialise this to the von Mangoldt function:

Exercise 14Use the Barban-Davenport-Halberstam theorem and the Siegel-Walfisz theorem (Exercise 64 from Notes 2) to conclude thatfor any , provided that for some sufficiently large depending only on . Obtain a similar claim with replaced by the Möbius function.

Remark 15Recall that the implied constants in the Siegel-Walfisz theorem depended on in an ineffective fashion. As such, the implied constants in (21) also depend ineffectively on . However, if one replaces Siegel’s theorem by an effective substitute such as Tatuzawa’s theorem (see Theorem 62 of Notes 2) or the Landau-Page theorem (Theorem 53 of Notes 2), one can obtain an effective version of the Siegel-Walfisz theorem for all moduli that are not multiples of a single exceptional modulus . One can then obtain an effective version of (21) if one restricts to moduli that are not multiples of . Similarly for the Bombieri-Vinogradov theorem in the next section. Such variants of the Barban-Davenport-Halberstam theorem or Bombieri-Vinogradov theorem can be used as a substitute in some applications to remove any ineffective dependence of constants, at the cost of making the argument slightly more convoluted.

** — 3. The Bombieri-Vinogradov theorem — **

The Barban-Davenport-Halberstam theorem controls the discrepancy after averaging in both the modulus and the residue class . For many problems in sieve theory, it turns out to be more important to control the discrepancy with an averaging only in the modulus , with the residue class being allowed to vary in in the “worst-case” fashion. Specifically, one often wishes to control expressions of the form

for some finitely supported and . This expression is difficult to control for arbitrary , but it turns out that one can obtain a good bound if is expressible as a Dirichlet convolution for some suitably “non-degenerate” sequences . More precisely, we have the following general form of the Bombieri-Vinogradov theorem, first articulated by Motohashi:

Theorem 16 (General Bombieri-Vinogradov theorem)Let , let be such that , and let be arithmetic functions supported on , respectively, withSuppose that obeys the

Siegel-Walfisz propertyfor all , all primitive residue classes , and all . Then one has

for any , provided that and for some sufficiently large depending on .

*Proof:* We adapt the arguments of the previous section. From (15) and the triangle inequality, we have

and so we can upper bound the left-hand side of (26) by

As in the previous section, we may reduce to primitive characters and bound this expression by

By dyadic decomposition (and adjusting slightly), it thus suffices to show that

for all and , and any , assuming and with sufficiently large depending on .

We cannot yet easily apply the large sieve inequality, because the character sums here are not squared. But we now crucially exploit the Dirichlet convolution structure using the identity (8), to factor as the product of and . From the Cauchy-Schwarz inequality, we may thus bound (27) by the geometric mean of

Now we have the all-important square needed in the large sieve inequality. From (23), (24) and Proposition 2, we may bound (28) by

and (29) by

and so (27) is bounded by

Since , we can write this as

Since and , we obtain (27) if and is sufficiently large depending on . The only remaining case to handle is when . In this case, we can use the Siegel-Walfisz hypothesis (25) as in the previous section to bound (29) by for any . Meanwhile, from (30), (28) is bounded by . By taking sufficiently large, we conclude (27) in this case also.

In analogy with Exercise 14, we would like to apply this general result to specific arithmetic functions, such as the von Mangoldt function or the Möbius function , and in particular to prove the following famous result of Bombieri and of A. I. Vinogradov (not to be confused with the more well known number theorist I. M. Vinogradov):

Theorem 17 (Bombieri-Vinogradov theorem)Let . Then one hasfor any , provided that for some sufficiently large depending on .

Informally speaking, the Bombieri-Vinogradov theorem asserts that for “almost all” moduli that are significantly less than , one has

for *all* primitive residue classes to this modulus. This should be compared with the Generalised Riemann Hypothesis (GRH), which gives the bound

for *all* ; see Exercise 48 of Notes 2. Thus one can view the Bombieri-Vinogradov theorem as an assertion that the GRH holds (with a slightly weaker error term) “on average”, at least insofar as the impact of GRH on the prime number theorem in arithmetic progressions is concerned.

The initial arguments of Bombieri and Vinogradov were somewhat complicated, in particular involving the explicit formula for -functions (Exercise 45 of Notes 2); the modern proof of the Bombieri-Vinogradov theorem avoids this and proceeds instead through Theorem 16 (or a close cousin thereof). Note that this theorem generalises the Siegel-Walfisz theorem (Exercise 64 of Notes 2), which is equivalent to the special case of Theorem 17 when .

The obvious thing to try when proving Theorem 17 using Theorem 16 is to use one of the basic factorisations of such functions into Dirichlet convolutions, e.g. , and then to decompose that convolution into pieces of the form required in Theorem 16; we will refer to such convolutions as *Type II convolutions*, loosely following the terminology of Vaughan. However, one runs into a problem coming from the components of the factors supported at small numbers (of size ), as the parameters associated to those parameters cannot obey the conditions , . Indeed, observe that any Type II convolution will necessarily vanish at primes of size comparable to , and so one cannot possibly represent functions such as or purely in terms of such Type II convolutions.

However, it turns out that we can still decompose functions such as into *two* types of convolutions: not just the Type II convolutions considered above, but also a further class of *Type I convolutions* , in which one of the factors, say , is very slowly varying (or “smooth”) and supported on a very long interval, e.g. for some large ; with these sums is permitted to be concentrated arbitrarily close to , and in particular the Type I convolution can be non-zero on primes comparable to . It turns out that bounding the discrepancy of Type I convolutions is relatively easy, and leads to a proof of Theorem 17.

We turn to the details. There are a number of decompositions of or that one could use to accomplish the desired task. One popular choice of decomposition is the Vaughan identity, which may be compared with the decompositions appearing in the Dirichlet hyperbola method (see Section 3 of Notes 1):

Lemma 18 (Vaughan identity)For any , one has

In this decomposition, and are typically two small powers of (e.g. ), although the exact choice of is often not of critical importance. The terms and are “Type I” convolutions, the terms should be considered as “Type II” convolutions. The term is a lower order error that is usually disposed of quite quickly. The Vaughan identity is already strong enough for many applications, but in some more advanced applications (particularly those in which one exploits the structure of triple or higher convolutions) it becomes convenient to use more sophisticated identities such as the *Heath-Brown identity*, which we will discuss in later notes.

*Proof:* Since and , we have

Convolving both sides by and using the identities and , we obtain the claim.

Armed with this identity and Proposition 2 we may now finish off the proof of the Bombieri-Vinogradov theorem. We may assume that is large (depending on ) as the claim is trivial otherwise. We apply the Vaughan identity (32) with (actually for our argument below, any choice of with and would have sufficed). By the triangle inequality, it now suffices to establish (31) with replaced by , , , and .

The term is easily disposed of: from the triangle inequality (and crudely bounding by ) we see that

and the claim follows since and .

Next, we deal with the Type II convolution . The presence of the cutoff is slightly annoying (it prevents one from directly applying Proposition 2), but we will deal with this by using the following finer-than-dyadic decomposition trick, originally due to Fouvry and to Fouvry-Iwaniec. We may replace by , since the portion of on has no contribution. We may similarly replace by . Next, we set , and decompose into components , each of which is supported in an interval and bounded in magnitude by for some . We similarly decompose into components supported in an interval and bounded in magnitude by for some . Thus can be decomposed into terms of the form for various that are components of and respectively.

If then vanishes, so we may assume that . By construction we also have , so in particular if depends only on (recall we are assuming to be large). If , then . The bounds (23), (24) are clear (bounding in magnitude by ), and from the Siegel-Walfisz theorem we see that the components obey the hypothesis (25). Thus by applying Proposition 2 (with replaced by ) we see that the total contribution of all the terms with is acceptable.

It remains to control the total contribution of the terms with . Note that for each there are only choices of that are in this range, so there are only such terms to deal with. We then crudely bound

Since and , one has either or . If , we observe that for each fixed in the above sum, there are choices of that contribute, so the double sum is (using the mean value bounds for ). Thus we see that the total contribution of this case is at most

which is acceptable.

Now we consider the case when . For fixed in the above sum, the sum in can be bounded by , thanks to Corollary 26 below (taking, say, and ). The total contribution of this sum is then

which is also acceptable.

Now let us consider a Type I term . From the triangle inequality have

We exploit the monotonicity of via the following simple fact:

Exercise 19Let be a monotone function. Show thatfor all primitive residue classes . (

Hint:use Lemma 2 from Notes 1 and a change of variables.)

Applying this exercise, we see that the contribution of this term to (31) is , which is acceptable since and .

A similar argument for the Type I term term (with and replacing and ) gives a contribution to (31) of

which is also acceptable since and . This concludes the proof of Theorem 17.

Exercise 20Strengthen the Bombieri-Vinogradov theorem by showing thatfor all and , if for some sufficiently large depending on . (

Hint:at present ranges over an uncountable number of values, but if one can round to the nearest multiple of (say) then there are only values of in the supremum that need to be considered. Then use the original Bombieri-Vinogradov theorem as a black box.)

Exercise 21For any , establish the Vaughan-type identityand use this to show that the Bombieri-Vinogradov theorem continues to hold when is replaced by .

Exercise 22Let us say that is alevel of distributionfor the von Mangoldt function if one haswhenever and ; thus, for instance, the Bombieri-Vinogradov theorem implies that every is a level of distribution for . Use the Cramér random model (see Section 1 of Supplement 4) to predict that every is a level of distribution for ; this claim is known as the Elliott-Halberstam conjecture, and would have a number of consequences in sieve theory. Unfortunately, no level of distribution above (or even at ) is currently known, however there are weaker versions of the Elliott-Halberstam conjecture known with levels above which do have some interesting number-theoretic consequences; we will return to this point in later notes. For now, we will just remark that appears to be the limit of what one can do by using the large sieve and Dirichlet character methods in this set of notes, and all the advances beyond have had to rely on other tools (such as exponential sum estimates), although the Cauchy-Schwarz inequality remains an indispensable tool in all of these results.

Exercise 23Strengthen the Bombieri-Vinogradov theorem by showing thatfor all , , and , if for some sufficiently large depending on . (

Hint:use the Cauchy-Schwarz inequality and the trivial bound , together with elementary estimates on such sums as .)

Exercise 24Show that the Bombieri-Vinogradov theorem continues to hold if the von Mangoldt function is replaced by the indicator function of the primes.

** — 4. Appendix: the divisor function in arithmetic progressions — **

We begin with a lemma of Landreau that controls the divisor function by a short divisor sum.

for all natural numbers .

*Proof:* We write , where is the product of all the prime factors of that are greater than or equal to , and is the product of all the prime factors of that are less than , counting multiplicity of course. By a greedy algorithm, we can repeatedly pull out factors of of size between and until the remaining portion of falls below , yielding a factorisation of the form where lie between and , is at most , and (if ) is at least . The lower bounds on and imply that . By the trivial inequality we have

and thus by the pigeonhole principle one has

for some . Since is a factor of that is at most , this gives the claim as long as . If , then as each of the are at least , we see that and cannot exceed . If we now use the inequality

and repeat the pigeonholing argument, we again obtain the claim.

Corollary 26Let , let , let , and let be a primitive residue class with . Then

One can lower the exponent of the logarithm here to (consistent with the heuristic that the average value of for is ), but this requires additional arguments; see for instance this previous blog post. In contrast, the divisor bound only gives an upper bound of here.

*Proof:* We allow implied constants to depend on . Set . Using Lemma 25 we have

for all , and so the left-hand side of (33) is bounded by

The conditions , constrain to either the empty set, or an arithmetic progression of modulus , so the inner sum is . On the other hand, from Theorem 27 (or Exercise 29) of Notes 1 we have

and the claim follows.

Exercise 27Under the same assumptions as in Corollary 26, establish the boundfor any .

## 24 comments

Comments feed for this article

10 January, 2015 at 3:39 pm

AnonymousThese methodologies are too complicated to reach the problems for which they are devised. It detracts attention from the fundamental underlying structure of the integers and primes.

15 January, 2015 at 3:12 am

Sayan ChattopadhyaySir can u please tell me that has the Riemann Hypothesis been proven by someone.

16 January, 2015 at 2:42 am

AnonymousIt seems that in order to apply proposition 2, the almost orthogonality condition on the functions is quite restrictive (e.g. to be exponentials or characters – for which the non-diagonal contribution can be efficiently estimated.)

16 January, 2015 at 9:40 am

Terence TaoYes, the in practice have to be somewhat structured, but this still gives quite a wide range of applicability since remains completely arbitrary. For instance, if one takes then one will be able to bound the number of places where Dirichlet series (with arbitrary coefficients) are large, which, among other things, can be used to establish zero density estimates for the zeta function; similarly if one takes then one can obtain zero density estimates for Dirichlet L-functions. (This will be discussed in later notes.) There are also more sophisticated structured functions than exponential phases or Dirichlet characters that one can compute correlations of using more advanced methods; for instance, “trace weights” that arise from traces of Frobenius on a suitable sheaf can be controlled using Deligne’s work on the Riemann hypothesis for varieties and sheaves, which allows one to take to be an exponential sum such as a Kloosterman sum. (Automorphic forms and homogeneous dynamics are two other sources of non-trivial cancellation that can be exploited in some number-theoretic situations.)

One can also interpret Linnik’s dispersion method as a type of application of Proposition 2 in which the functions are not so completely structured that their correlations can be read off right away, but their correlations are at least simpler to estimate than the original correlations , and one may need to perform some further application of Cauchy-Schwarz or similar tools to get enough control on the former correlations to arrive at control of the latter. (The proof of the equidistribution estimates Zhang used to establish bounded gaps between primes is a good example of this sort of technique.)

16 January, 2015 at 2:49 pm

Kyle PrattI am slightly confused about the treatment of the Type II convolution in the proof of Bombieri-Vinogradov. From Vaughan’s identity we have , but you wrote . Is this just a typo, or is there an easy way to deal with the ?

[Oops, this was indeed a typo; I’ve folded the 1 into the Mobius factor now. -T.]16 January, 2015 at 4:44 pm

Mike_MillerDear Professor Tao. Forgive me but this doesn’t make sense.

17 January, 2015 at 3:05 am

AnonymousIs it possible to increase the exponent in theorem 17 by taking the supremun in (31) over “typical ” or “almost every ” in some sense?

[Yes; see Theorem 13. -T.]20 January, 2015 at 3:19 pm

Kyle PrattI think there is an error in the equation after “We then crudely bound…” The second seems to rely on the assumption that . This is not true since (as is pointed out in the paragraph before the problematic equation) , so really is bounded in magnitude by . I haven’t checked carefully, but this probably just adds some extra logs to the final result, and these will be harmless.

[Corrected, thanks – T.]21 January, 2015 at 10:50 am

MrCactu5 (@MonsieurCactus)I am struggling to follow your FAQ on the primes, but I notice you don’t say much about van Mangoldt until the very very end.

It seems fair to say Bombieri-Vinogradov is a step towards the Generalized Riemann hypothesis, giving very detailed information about the distribution of primes on arithmetic sequences.

13 February, 2015 at 10:16 pm

254A, Notes 6: Large values of Dirichlet polynomials, zero density estimates, and primes in short intervals | What's new[…] for Dirichlet series. Our approach to these theorems will follow the same sort of methods used in Notes 3, in particular relying on the generalised Bessel inequality from those […]

18 February, 2015 at 1:51 pm

254A, Notes 4: Some sieve theory | What's new[…] von Mangoldt function ), then estimates such as the Bombieri-Vinogradov theorem, Theorem 17 from Notes 3, turn out to be particularly useful in this regard; in other contexts, the required […]

22 February, 2015 at 9:02 am

254A, Notes 7: Linnik’s theorem on primes in arithmetic progressions | What's new[…] use the generalised Bessel inequality (Proposition 2 from Notes 3) with to conclude […]

30 March, 2015 at 12:50 pm

254A, Notes 8: The Hardy-Littlewood circle method and Vinogradov’s theorem | What's new[…] that in the proof of the Bombieri-Vinogradov theorem (see Notes 3), sums such as or were handled by using combinatorial identities such as Vaughan’s identity […]

10 April, 2015 at 10:02 am

254A, Supplement 5: The linear sieve and Chen’s theorem (optional) | What's new[…] is the multiplicative function with and for . From the Bombieri-Vinogradov theorem (Theorem 17 of Notes 3) we […]

17 November, 2015 at 8:42 am

Klaus Roth | What's new[…] the clean and general almost orthogonality principle that we have today (discussed for instance in these lecture notes of mine). The paper of Roth that had the most impact on my own personal work was his three-page paper […]

4 January, 2019 at 10:24 am

Spacing of Quadratic Fractions | George Shakan[…] is the Von Mongoldt function. They use Vaughan’s identity on to decompose (3) into Type I and Type II sums for the […]

20 June, 2019 at 10:28 am

Maths studentDear Prof. Tao,

in the proof of the large sieve inequality, you talk about “increasing N or decreasing Q”. That’s a little informal, and this kind of language continues to itch underneath my fingernails.

I should be most pleased if this could be changed.

By the way: Will these notes be transformed into a book? It would be much appreciated!

20 June, 2019 at 11:36 am

AnonymousThat is barely informal. If *that* bothers you so much, you shouldn’t be reading these notes.

12 November, 2019 at 6:47 pm

254A, Notes 9 – second moment and entropy methods | What's new[…] To get a feel for this inequality, suppose for sake of discussion that and all have unit variance and , but that the are pairwise uncorrelated. Then the right-hand side is equal to , and the left-hand side is the sum of squares of the correlations between and each of the . Any individual correlation is then still permitted to be as large as , but it is not possible for multiple correlations to be this large simultaneously. This is geometrically intuitive if one views the random variables as vectors in a Hilbert space (and correlation as a rough proxy for the angle between such vectors). This lemma also shares many commonalities with the large sieve inequality, discussed in this set of notes. […]

12 November, 2019 at 6:47 pm

254A, Notes 9 – second moment and entropy methods | What's new[…] To get a feel for this inequality, suppose for sake of discussion that and all have unit variance and , but that the are pairwise uncorrelated. Then the right-hand side is equal to , and the left-hand side is the sum of squares of the correlations between and each of the . Any individual correlation is then still permitted to be as large as , but it is not possible for multiple correlations to be this large simultaneously. This is geometrically intuitive if one views the random variables as vectors in a Hilbert space (and correlation as a rough proxy for the angle between such vectors). This lemma also shares many commonalities with the large sieve inequality, discussed in this set of notes. […]

4 November, 2020 at 10:34 am

NoamI think that the bound on the LHS in the first inequality of exercise 4 is not optimal, if I’m not wrong using cauchy’s inequality term wise yields a better result (leaving only diagonal terms terms j=j’).

[This is incorrect. Note there is no summation in the first factor on the RHS. -T]20 March, 2022 at 1:46 pm

Quân NguyễnDear professor Tao,

I did try an attempt of proving Analytic large sieve inequality by myself and found out that it seems that the choice for or a constant would satisfy all the conditions in the Generalized Bessel’s inequality though the attempt is only valid for by using .

20 March, 2022 at 3:10 pm

Quân NguyễnI also do have one more inquiry in the proof of BDH theorem, so the same as in displayed proof, we will start at:

and indeed, one needs to show that:

for all . Due to the following decomposition, the right handside is actually less than:

where . Then since the function is supported on , and so by Large sieve inequality for characters, one obtains:

which holds for all where depends on . My apology if it's an obvious error but the attempt does not show the use of the Siegel-Walfisz property so there is something wrong obviously and also for the equations which are not shown perfectly.

20 March, 2022 at 3:45 pm

Terence TaoThere are simple counterexamples to the Bombieri-Davenport-Halberstam inequality when the Siegel-Walfisz property is omitted, for instance if one takes to be the indicator function of the congruence class then the Siegel-Walfisz property is highly false at , and the conclusion (18) also fails as can be seen by inspecting the contribution. I would recommend as an instructive exercise to trace through your proposed argument with this example (trying to calculate the left and right hand sides of every expression you are using to the best of your ability) to locate the problem.