between two arithmetic functions and , which to avoid technicalities we will assume to be finitely supported (or that the variable is localised to a finite range, such as ). A key example to keep in mind for the purposes of this set of notes is the twisted von Mangoldt summatory function
that measures the correlation between the primes and a Dirichlet character . One can get a “trivial” bound on such sums from the triangle inequality
as . But the triangle inequality is insensitive to the phase oscillations of the summands, and often we expect (e.g. from the probabilistic heuristics from Supplement 4) to be able to improve upon the trivial triangle inequality bound by a substantial amount; in the best case scenario, one typically expects a “square root cancellation” that gains a factor that is roughly the square root of the number of summands. (For instance, for Dirichlet characters of conductor , it is expected from probabilistic heuristics that the left-hand side of (3) should in fact be for any .)
It has proven surprisingly difficult, however, to establish significant cancellation in many of the sums of interest in analytic number theory, particularly if the sums do not have a strong amount of algebraic structure (e.g. multiplicative structure) which allow for the deployment of specialised techniques (such as multiplicative number theory techniques). In fact, we are forced to rely (to an embarrassingly large extent) on (many variations of) a single basic tool to capture at least some cancellation, namely the Cauchy-Schwarz inequality. In fact, in many cases the classical case
considered by Cauchy, where at least one of is finitely supported, suffices for applications. Roughly speaking, the Cauchy-Schwarz inequality replaces the task of estimating a cross-correlation between two different functions , to that of measuring self-correlations between and itself, or and itself, which are usually easier to compute (albeit at the cost of capturing less cancellation). Note that the Cauchy-Schwarz inequality requires almost no hypotheses on the functions or , making it a very widely applicable tool.
There is however some skill required to decide exactly how to deploy the Cauchy-Schwarz inequality (and in particular, how to select and ); if applied blindly, one loses all cancellation and can even end up with a worse estimate than the trivial bound. For instance, if one tries to bound (2) directly by applying Cauchy-Schwarz with the functions and , one obtains the bound
The right-hand side may be bounded by , but this is worse than the trivial bound (3) by a logarithmic factor. This can be “blamed” on the fact that and are concentrated on rather different sets ( is concentrated on primes, while is more or less uniformly distributed amongst the natural numbers); but even if one corrects for this (e.g. by weighting Cauchy-Schwarz with some suitable “sieve weight” that is more concentrated on primes), one still does not do any better than (3). Indeed, the Cauchy-Schwarz inequality suffers from the same key weakness as the triangle inequality: it is insensitive to the phase oscillation of the factors .
While the Cauchy-Schwarz inequality can be poor at estimating a single correlation such as (1), its power improves when considering an average (or sum, or square sum) of multiple correlations. In this set of notes, we will focus on one such situation of this type, namely that of trying to estimate a square sum
that measures the correlations of a single function with multiple other functions . One should think of the situation in which is a “complicated” function, such as the von Mangoldt function , but the are relatively “simple” functions, such as Dirichlet characters. In the case when the are orthonormal functions, we of course have the classical Bessel inequality:
for all . Then for any function , we have
For sake of comparison, if one were to apply the Cauchy-Schwarz inequality (4) separately to each summand in (5), one would obtain the bound of , which is significantly inferior to the Bessel bound when is large. Geometrically, what is going on is this: the Cauchy-Schwarz inequality (4) is only close to sharp when and are close to parallel in the Hilbert space . But if are orthonormal, then it is not possible for any other vector to be simultaneously close to parallel to too many of these orthonormal vectors, and so the inner products of with most of the should be small. (See this previous blog post for more discussion of this principle.) One can view the Bessel inequality as formalising a repulsion principle: if correlates too much with some of the , then it does not have enough “energy” to have large correlation with the rest of the .
In analytic number theory applications, it is useful to generalise the Bessel inequality to the situation in which the are not necessarily orthonormal. This can be accomplished via the Cauchy-Schwarz inequality:
for some sequence of complex numbers with , with the convention that vanishes whenever both vanish.
Note by relabeling that we may replace the domain here by any other at most countable set, such as the integers . (Indeed, one can give an analogue of this lemma on arbitrary measure spaces, but we will not do so here.) This result first appears in this paper of Boas.
Proof: We use the method of duality to replace the role of the function by a dual sequence . By the converse to Cauchy-Schwarz, we may write the left-hand side of (6) as
for some complex numbers with . Indeed, if all of the vanish, we can set the arbitrarily, otherwise we set to be the unit vector formed by dividing by its length. We can then rearrange this expression as
Applying Cauchy-Schwarz (dividing the first factor by and multiplying the second by , after first removing those for which vanish), this is bounded by
and the claim follows by expanding out the second factor.
Observe that Lemma 1 is a special case of Proposition 2 when and the are orthonormal. In general, one can expect Proposition 2 to be useful when the are almost orthogonal relative to , in that the correlations tend to be small when are distinct. In that case, one can hope for the diagonal term in the right-hand side of (6) to dominate, in which case one can obtain estimates of comparable strength to the classical Bessel inequality. The flexibility to choose different weights in the above proposition has some technical advantages; for instance, if is concentrated in a sparse set (such as the primes), it is sometimes useful to tailor to a comparable set (e.g. the almost primes) in order not to lose too much in the first factor . Also, it can be useful to choose a fairly “smooth” weight , in order to make the weighted correlations small.
Remark 3 In harmonic analysis, the use of tools such as Proposition 2 is known as the method of almost orthogonality, or the method. The explanation for the latter name is as follows. For sake of exposition, suppose that is never zero (or we remove all from the domain for which vanishes). Given a family of finitely supported functions , consider the linear operator defined by the formula
This is a bounded linear operator, and the left-hand side of (6) is nothing other than the norm of . Without any further information on the function other than its norm , the best estimate one can obtain on (6) here is clearly
where denotes the operator norm of .
The adjoint is easily computed to be
The composition of and its adjoint is then given by
From the spectral theorem (or singular value decomposition), one sees that the operator norms of and are related by the identity
and as is a self-adjoint, positive semi-definite operator, the operator norm is also the supremum of the quantity
where ranges over unit vectors in . Putting these facts together, we obtain Proposition 2; furthermore, we see from this analysis that the bound here is essentially optimal if the only information one is allowed to use about is its norm.
For further discussion of almost orthogonality methods from a harmonic analysis perspective, see Chapter VII of this text of Stein.
Exercise 4 Under the same hypotheses as Proposition 2, show that
as well as the variant inequality
Proposition 2 has many applications in analytic number theory; for instance, we will use it in later notes to control the large value of Dirichlet series such as the Riemann zeta function. One of the key benefits is that it largely eliminates the need to consider further correlations of the function (other than its self-correlation relative to , which is usually fairly easy to compute or estimate as is usually chosen to be relatively simple); this is particularly useful if is a function which is significantly more complicated to analyse than the functions . Of course, the tradeoff for this is that one now has to deal with the coefficients , which if anything are even less understood than , since literally the only thing we know about these coefficients is their square sum . However, as long as there is enough almost orthogonality between the , one can estimate the by fairly crude estimates (e.g. triangle inequality or Cauchy-Schwarz) and still get reasonably good estimates.
In this set of notes, we will use Proposition 2 to prove some versions of the large sieve inequality, which controls a square-sum of correlations
of an arbitrary finitely supported function with various additive characters (where ), or alternatively a square-sum of correlations
of with various primitive Dirichlet characters ; it turns out that one can prove a (slightly sub-optimal) version of this inequality quite quickly from Proposition 2 if one first prepares the sum by inserting a smooth cutoff with well-behaved Fourier transform. The large sieve inequality has many applications (as the name suggests, it has particular utility within sieve theory). For the purposes of this set of notes, though, the main application we will need it for is the Bombieri-Vinogradov theorem, which in a very rough sense gives a prime number theorem in arithmetic progressions, which, “on average”, is of strength comparable to the results provided by the Generalised Riemann Hypothesis (GRH), but has the great advantage of being unconditional (it does not require any unproven hypotheses such as GRH); it can be viewed as a significant extension of the Siegel-Walfisz theorem from Notes 2. As we shall see in later notes, the Bombieri-Vinogradov theorem is a very useful ingredient in sieve-theoretic problems involving the primes.
There is however one additional important trick, beyond the large sieve, which we will need in order to establish the Bombieri-Vinogradov theorem. As it turns out, after some basic manipulations (and the deployment of some multiplicative number theory, and specifically the Siegel-Walfisz theorem), the task of proving the Bombieri-Vinogradov theorem is reduced to that of getting a good estimate on sums that are roughly of the form
for some primitive Dirichlet characters . This looks like the type of sum that can be controlled by the large sieve (or by Proposition 2), except that this is an ordinary sum rather than a square sum (i.e., an norm rather than an norm). One could of course try to control such a sum in terms of the associated square-sum through the Cauchy-Schwarz inequality, but this turns out to be very wasteful (it loses a factor of about ). Instead, one should try to exploit the special structure of the von Mangoldt function , in particular the fact that it can be expressible as a Dirichlet convolution of two further arithmetic sequences (or as a finite linear combination of such Dirichlet convolutions). The reason for introducing this convolution structure is through the basic identity
for any finitely supported sequences , as can be easily seen by multiplying everything out and using the completely multiplicative nature of . (This is the multiplicative analogue of the well-known relationship between ordinary convolution and Fourier coefficients.) This factorisation, together with yet another application of the Cauchy-Schwarz inequality, lets one control (7) by square-sums of the sort that can be handled by the large sieve inequality.
As we have seen in Notes 1, the von Mangoldt function does indeed admit several factorisations into Dirichlet convolution type, such as the factorisation . One can try directly inserting this factorisation into the above strategy; it almost works, however there turns out to be a problem when considering the contribution of the portion of or that is supported at very small natural numbers, as the large sieve loses any gain over the trivial bound in such settings. Because of this, there is a need for a more sophisticated decomposition of into Dirichlet convolutions which are non-degenerate in the sense that are supported away from small values. (As a non-example, the trivial factorisation would be a totally inappropriate factorisation for this purpose.) Fortunately, it turns out that through some elementary combinatorial manipulations, some satisfactory decompositions of this type are available, such as the Vaughan identity and the Heath-Brown identity. By using one of these identities we will be able to complete the proof of the Bombieri-Vinogradov theorem. (These identities are also useful for other applications in which one wishes to control correlations between the von Mangoldt function and some other sequence; we will see some examples of this in later notes.)
For further reading on these topics, including a significantly larger number of examples of the large sieve inequality, see Chapters 7 and 17 of Iwaniec and Kowalski.
Remark 5 We caution that the presentation given in this set of notes is highly ahistorical; we are using modern streamlined proofs of results that were first obtained by more complicated arguments.
— 1. The large sieve inequality —
We begin with a (slightly weakened) form of the large sieve inequality for additive characters, also known as the analytic large sieve inequality, first extracted explicitly by Davenport and Halberstam from previous work on the large sieve, and then refined further by many authors (see Remark 7 below).
Proposition 6 (Analytic large sieve inequality) Let be a function supported on an interval for some and , and let be -separated (thus for all and some , where denotes the distance from to the nearest integer. Then
One can view this proposition as a variant of the Plancherel identity
associated to the Fourier transform on a cyclic group . This identity also shows that apart from the implied constant, the bound (9) is essentially best possible.
Proof: By increasing (if ) or decreasing (if ), we can reduce to the case without making the hypotheses any stronger. Thus the are now -separated, and our task is now to show that
We now wish to apply Proposition 2 (with relabeled by ), but we need to choose a suitable weight function . It turns out that it is advantageous for technical reasons to select a weight that has good Fourier-analytic properties.
Fix a smooth non-negative function supported on and not identically zero; we allow implied constants to depend on . (Actually, the smoothness of is not absolutely necessary for this argument; one could take below if desired.) Consider the weight defined by
Observe that for all . Thus, by Proposition 2 with this choice of , we reduce to showing that
whenever are complex numbers with .
We first consider the diagonal contribution . We can write in terms of the Fourier transform of as
and hence by Exercise 28 of Supplement 2, we have the bounds
(say) and thus
Thus the diagonal contribution of (10) is acceptable.
Now we consider an off-diagonal term with . By the Poisson summation formula (Theorem 34 from Supplement 2), we can rewrite this as
Now from the Fourier inversion formula, the function has Fourier transform supported on , and so has Fourier transform supported in . Since is at least from the nearest integer, we conclude that for all integers , and hence all the off-diagonal terms in fact vanish! The claim follows.
a result of Montgomery and Vaughan (with replaced by , although it was observed subsequently subsequently by Paul Cohen that the additional term could be deleted by an amplification trick similar to those discussed in this previous post). See this survey of Montgomery for these results and on the (somewhat complicated) evolution of the large sieve, starting with the pioneering work of Linnik. However, in our applications the cruder form of the analytic large sieve inequality given by Proposition 6 will suffice.
Exercise 8 Let be a function supported on an interval . Show that for any , one has
for all outside of the union of intervals (or arcs) of length . In particular, we have
for all , and the significantly superior estimate
for most (outside of at most (say) intervals of length ).
Exercise 9 (Continuous large sieve inequality) Let be an interval for some and , and let be -separated.
- (i) For any complex numbers , show that
(Hint: replace the restriction of to with the weight used to prove Proposition 6.)
- (ii) For any continuous , show that
Now we establish a variant of the analytic large sieve inequality, involving Dirichlet characters, due to Bombieri and Davenport.
where denotes the sum over all primitive Dirichlet characters of modulus .
Proof: By increasing or decreasing , we may assume that , so and our task is now to show that
We first establish that the diagonal contribution to (13) is acceptable, that is to say that
The function is the principal character of modulus ; it is periodic with period and has mean value . In particular, since , we see that for any interval of length . From (11) and a partition into intervals of length , we see that
and the claim then follows from (14).
Now consider an off-diagonal term with . As are primitive, this implies that is a non-principal character and thus has mean zero. Let be the modulus of this character; by Fourier expansion we may write as a linear combination of the additive characters for . (We can obtain explicit coefficients from this expansion by invoking Lemma 48 of Supplement 3, but we will not need those coefficients here.) Thus is a linear combination of the quantities for . But the modulus is the least common multiple of the moduli of , so in particular , while as observed in the proof of Proposition 6, we have whenever . So the off-diagonal terms all vanish, and the claim follows.
Exercise 11 Let be a finitely supported sequence.
- (i) For any natural number , establish the identity
where is the set of congruence classes coprime to , and is the sum over all characters (not necessarily primitive) of modulus .
- (ii) For any natural number , establish the inequality
that is to say one can delete the implied constant. See this paper of Montgomery and Vaughan for some further refinements of this inequality.
— 2. The Barban-Davenport-Halberstam theorem —
We now apply the large sieve inequality for characters to obtain an analogous inequality for arithmetic progressions, due independently to Barban, and to Davenport and Halberstam; we state a slightly weakened form of that theorem here. For any finitely supported arithmetic function and any primitive residue class , we introduce the discrepancy
This quantity measures the extent to which is well distributed among the primitive residue classes modulo . From multiplicative Fourier inversion (see Theorem 69 from Notes 1) we have the identity
where the sum is over non-principal characters of modulus .
for any , provided that for some sufficiently large depending only on .
Informally, (18) is asserting that
for “most” primitive residue classes with much smaller than ; in most applications, the trivial bounds on are of the type , so this represents a savings of an arbitrary power of a logarithm on the average. Note that a direct application of (17) only gives (18) for of size ; it is the large sieve which allows for the significant enlargement of .
Proof: Let be as above, with for some large to be chosen later. From (15) and the Plancherel identity one has
We cannot apply the large sieve inequality yet, because the characters here are not necessarily primitive. But we may express any non-principal character as for some primitive character of conductor , where are natural numbers with . In particular, and . Thus we may (somewhat crudely) upper bound the left-hand side of (19) by
From Theorem 27 of Notes 1 we have , so we may bound the above by
for any and .
From Proposition 10 and (16), we may bound the left-hand side of (20) by . If and , then we obtain (20) if is sufficiently large depending on . The only remaining case to consider is when . But from the Siegel-Walfisz hypothesis (17) we easily see that
for any and any primitive character of conductor . Since the total number of primitive characters appearing in (20) is , the claim follows by taking large enough.
One can specialise this to the von Mangoldt function:
Exercise 14 Use the Barban-Davenport-Halberstam theorem and the Siegel-Walfisz theorem (Exercise 64 from Notes 2) to conclude that
for any , provided that for some sufficiently large depending only on . Obtain a similar claim with replaced by the Möbius function.
Remark 15 Recall that the implied constants in the Siegel-Walfisz theorem depended on in an ineffective fashion. As such, the implied constants in (21) also depend ineffectively on . However, if one replaces Siegel’s theorem by an effective substitute such as Tatuzawa’s theorem (see Theorem 62 of Notes 2) or the Landau-Page theorem (Theorem 53 of Notes 2), one can obtain an effective version of the Siegel-Walfisz theorem for all moduli that are not multiples of a single exceptional modulus . One can then obtain an effective version of (21) if one restricts to moduli that are not multiples of . Similarly for the Bombieri-Vinogradov theorem in the next section. Such variants of the Barban-Davenport-Halberstam theorem or Bombieri-Vinogradov theorem can be used as a substitute in some applications to remove any ineffective dependence of constants, at the cost of making the argument slightly more convoluted.
— 3. The Bombieri-Vinogradov theorem —
The Barban-Davenport-Halberstam theorem controls the discrepancy after averaging in both the modulus and the residue class . For many problems in sieve theory, it turns out to be more important to control the discrepancy with an averaging only in the modulus , with the residue class being allowed to vary in in the “worst-case” fashion. Specifically, one often wishes to control expressions of the form
for some finitely supported and . This expression is difficult to control for arbitrary , but it turns out that one can obtain a good bound if is expressible as a Dirichlet convolution for some suitably “non-degenerate” sequences . More precisely, we have the following general form of the Bombieri-Vinogradov theorem, first articulated by Motohashi:
for any , provided that and for some sufficiently large depending on .
Proof: We adapt the arguments of the previous section. From (15) and the triangle inequality, we have
and so we can upper bound the left-hand side of (26) by
As in the previous section, we may reduce to primitive characters and bound this expression by
for all and , and any , assuming and with sufficiently large depending on .
We cannot yet easily apply the large sieve inequality, because the character sums here are not squared. But we now crucially exploit the Dirichlet convolution structure using the identity (8), to factor as the product of and . From the Cauchy-Schwarz inequality, we may thus bound (27) by the geometric mean of
and (29) by
and so (27) is bounded by
Since , we can write this as
Since and , we obtain (27) if and is sufficiently large depending on . The only remaining case to handle is when . In this case, we can use the Siegel-Walfisz hypothesis (25) as in the previous section to bound (29) by for any . Meanwhile, from (30), (28) is bounded by . By taking sufficiently large, we conclude (27) in this case also.
In analogy with Exercise 14, we would like to apply this general result to specific arithmetic functions, such as the von Mangoldt function or the Möbius function , and in particular to prove the following famous result of Bombieri and of A. I. Vinogradov (not to be confused with the more well known number theorist I. M. Vinogradov):
for any , provided that for some sufficiently large depending on .
Informally speaking, the Bombieri-Vinogradov theorem asserts that for “almost all” moduli that are significantly less than , one has
for all primitive residue classes to this modulus. This should be compared with the Generalised Riemann Hypothesis (GRH), which gives the bound
for all ; see Exercise 48 of Notes 2. Thus one can view the Bombieri-Vinogradov theorem as an assertion that the GRH holds (with a slightly weaker error term) “on average”, at least insofar as the impact of GRH on the prime number theorem in arithmetic progressions is concerned.
The initial arguments of Bombieri and Vinogradov were somewhat complicated, in particular involving the explicit formula for -functions (Exercise 45 of Notes 2); the modern proof of the Bombieri-Vinogradov theorem avoids this and proceeds instead through Theorem 16 (or a close cousin thereof). Note that this theorem generalises the Siegel-Walfisz theorem (Exercise 64 of Notes 2), which is equivalent to the special case of Theorem 17 when .
The obvious thing to try when proving Theorem 17 using Theorem 16 is to use one of the basic factorisations of such functions into Dirichlet convolutions, e.g. , and then to decompose that convolution into pieces of the form required in Theorem 16; we will refer to such convolutions as Type II convolutions, loosely following the terminology of Vaughan. However, one runs into a problem coming from the components of the factors supported at small numbers (of size ), as the parameters associated to those parameters cannot obey the conditions , . Indeed, observe that any Type II convolution will necessarily vanish at primes of size comparable to , and so one cannot possibly represent functions such as or purely in terms of such Type II convolutions.
However, it turns out that we can still decompose functions such as into two types of convolutions: not just the Type II convolutions considered above, but also a further class of Type I convolutions , in which one of the factors, say , is very slowly varying (or “smooth”) and supported on a very long interval, e.g. for some large ; with these sums is permitted to be concentrated arbitrarily close to , and in particular the Type I convolution can be non-zero on primes comparable to . It turns out that bounding the discrepancy of Type I convolutions is relatively easy, and leads to a proof of Theorem 17.
We turn to the details. There are a number of decompositions of or that one could use to accomplish the desired task. One popular choice of decomposition is the Vaughan identity, which may be compared with the decompositions appearing in the Dirichlet hyperbola method (see Section 3 of Notes 1):
where , , , and .
In this decomposition, and are typically two small powers of (e.g. ), although the exact choice of is often not of critical importance. The terms and are “Type I” convolutions, the terms should be considered as “Type II” convolutions. The term is a lower order error that is usually disposed of quite quickly. The Vaughan identity is already strong enough for many applications, but in some more advanced applications (particularly those in which one exploits the structure of triple or higher convolutions) it becomes convenient to use more sophisticated identities such as the Heath-Brown identity, which we will discuss in later notes.
Proof: Since and , we have
Convolving both sides by and using the identities and , we obtain the claim.
Armed with this identity and Proposition 2 we may now finish off the proof of the Bombieri-Vinogradov theorem. We may assume that is large (depending on ) as the claim is trivial otherwise. We apply the Vaughan identity (32) with (actually for our argument below, any choice of with and would have sufficed). By the triangle inequality, it now suffices to establish (31) with replaced by , , , and .
The term is easily disposed of: from the triangle inequality (and crudely bounding by ) we see that
and the claim follows since and .
Next, we deal with the Type II convolution . The presence of the cutoff is slightly annoying (it prevents one from directly applying Proposition 2), but we will deal with this by using the following finer-than-dyadic decomposition trick, originally due to Fouvry and to Fouvry-Iwaniec. We may replace by , since the portion of on has no contribution. We may similarly replace by . Next, we set , and decompose into components , each of which is supported in an interval and bounded in magnitude by for some . We similarly decompose into components supported in an interval and bounded in magnitude by for some . Thus can be decomposed into terms of the form for various that are components of and respectively.
If then vanishes, so we may assume that . By construction we also have , so in particular if depends only on (recall we are assuming to be large). If , then . The bounds (23), (24) are clear (bounding in magnitude by ), and from the Siegel-Walfisz theorem we see that the components obey the hypothesis (25). Thus by applying Proposition 2 (with replaced by ) we see that the total contribution of all the terms with is acceptable.
It remains to control the total contribution of the terms with . Note that for each there are only choices of that are in this range, so there are only such terms to deal with. We then crudely bound
Since and , one has either or . If , we observe that for each fixed in the above sum, there are choices of that contribute, so the double sum is (using the mean value bounds for ). Thus we see that the total contribution of this case is at most
which is acceptable.
Now we consider the case when . For fixed in the above sum, the sum in can be bounded by , thanks to Corollary 26 below (taking, say, and ). The total contribution of this sum is then
which is also acceptable.
Now let us consider a Type I term . From the triangle inequality have
We exploit the monotonicity of via the following simple fact:
Exercise 19 Let be a monotone function. Show that
for all primitive residue classes . (Hint: use Lemma 2 from Notes 1 and a change of variables.)
Applying this exercise, we see that the contribution of this term to (31) is , which is acceptable since and .
A similar argument for the Type I term term (with and replacing and ) gives a contribution to (31) of
which is also acceptable since and . This concludes the proof of Theorem 17.
Exercise 20 Strengthen the Bombieri-Vinogradov theorem by showing that
for all and , if for some sufficiently large depending on . (Hint: at present ranges over an uncountable number of values, but if one can round to the nearest multiple of (say) then there are only values of in the supremum that need to be considered. Then use the original Bombieri-Vinogradov theorem as a black box.)
Exercise 21 For any , establish the Vaughan-type identity
and use this to show that the Bombieri-Vinogradov theorem continues to hold when is replaced by .
Exercise 22 Let us say that is a level of distribution for the von Mangoldt function if one has
whenever and ; thus, for instance, the Bombieri-Vinogradov theorem implies that every is a level of distribution for . Use the Cramér random model (see Section 1 of Supplement 4) to predict that every is a level of distribution for ; this claim is known as the Elliott-Halberstam conjecture, and would have a number of consequences in sieve theory. Unfortunately, no level of distribution above (or even at ) is currently known, however there are weaker versions of the Elliott-Halberstam conjecture known with levels above which do have some interesting number-theoretic consequences; we will return to this point in later notes. For now, we will just remark that appears to be the limit of what one can do by using the large sieve and Dirichlet character methods in this set of notes, and all the advances beyond have had to rely on other tools (such as exponential sum estimates), although the Cauchy-Schwarz inequality remains an indispensable tool in all of these results.
Exercise 23 Strengthen the Bombieri-Vinogradov theorem by showing that
for all , , and , if for some sufficiently large depending on . (Hint: use the Cauchy-Schwarz inequality and the trivial bound , together with elementary estimates on such sums as .)
Exercise 24 Show that the Bombieri-Vinogradov theorem continues to hold if the von Mangoldt function is replaced by the indicator function of the primes.
— 4. Appendix: the divisor function in arithmetic progressions —
We begin with a lemma of Landreau that controls the divisor function by a short divisor sum.
for all natural numbers .
Proof: We write , where is the product of all the prime factors of that are greater than or equal to , and is the product of all the prime factors of that are less than , counting multiplicity of course. By a greedy algorithm, we can repeatedly pull out factors of of size between and until the remaining portion of falls below , yielding a factorisation of the form where lie between and , is at most , and (if ) is at least . The lower bounds on and imply that . By the trivial inequality we have
and thus by the pigeonhole principle one has
for some . Since is a factor of that is at most , this gives the claim as long as . If , then as each of the are at least , we see that and cannot exceed . If we now use the inequality
and repeat the pigeonholing argument, we again obtain the claim.
One can lower the exponent of the logarithm here to (consistent with the heuristic that the average value of for is ), but this requires additional arguments; see for instance this previous blog post. In contrast, the divisor bound only gives an upper bound of here.
Proof: We allow implied constants to depend on . Set . Using Lemma 25 we have
for all , and so the left-hand side of (33) is bounded by
The conditions , constrain to either the empty set, or an arithmetic progression of modulus , so the inner sum is . On the other hand, from Theorem 27 (or Exercise 29) of Notes 1 we have
and the claim follows.
Exercise 27 Under the same assumptions as in Corollary 26, establish the bound
for any .