You are currently browsing the tag archive for the ‘Bombieri-Vinogradov theorem’ tag.

A fundamental and recurring problem in analytic number theory is to demonstrate the presence of *cancellation* in an oscillating sum, a typical example of which might be a correlation

between two arithmetic functions and , which to avoid technicalities we will assume to be finitely supported (or that the variable is localised to a finite range, such as ). A key example to keep in mind for the purposes of this set of notes is the twisted von Mangoldt summatory function

that measures the correlation between the primes and a Dirichlet character . One can get a “trivial” bound on such sums from the triangle inequality

for instance, from the triangle inequality and the prime number theorem we have

as . But the triangle inequality is insensitive to the phase oscillations of the summands, and often we expect (e.g. from the probabilistic heuristics from Supplement 4) to be able to improve upon the trivial triangle inequality bound by a substantial amount; in the best case scenario, one typically expects a “square root cancellation” that gains a factor that is roughly the square root of the number of summands. (For instance, for Dirichlet characters of conductor , it is expected from probabilistic heuristics that the left-hand side of (3) should in fact be for any .)

It has proven surprisingly difficult, however, to establish significant cancellation in many of the sums of interest in analytic number theory, particularly if the sums do not have a strong amount of algebraic structure (e.g. multiplicative structure) which allow for the deployment of specialised techniques (such as multiplicative number theory techniques). In fact, we are forced to rely (to an embarrassingly large extent) on (many variations of) a single basic tool to capture at least some cancellation, namely the Cauchy-Schwarz inequality. In fact, in many cases the classical case

considered by Cauchy, where at least one of is finitely supported, suffices for applications. Roughly speaking, the Cauchy-Schwarz inequality replaces the task of estimating a *cross-correlation* between two different functions , to that of measuring *self-correlations* between and itself, or and itself, which are usually easier to compute (albeit at the cost of capturing less cancellation). Note that the Cauchy-Schwarz inequality requires almost no hypotheses on the functions or , making it a very widely applicable tool.

There is however some skill required to decide exactly how to deploy the Cauchy-Schwarz inequality (and in particular, how to select and ); if applied blindly, one loses all cancellation and can even end up with a worse estimate than the trivial bound. For instance, if one tries to bound (2) directly by applying Cauchy-Schwarz with the functions and , one obtains the bound

The right-hand side may be bounded by , but this is worse than the trivial bound (3) by a logarithmic factor. This can be “blamed” on the fact that and are concentrated on rather different sets ( is concentrated on primes, while is more or less uniformly distributed amongst the natural numbers); but even if one corrects for this (e.g. by weighting Cauchy-Schwarz with some suitable “sieve weight” that is more concentrated on primes), one still does not do any better than (3). Indeed, the Cauchy-Schwarz inequality suffers from the same key weakness as the triangle inequality: it is insensitive to the phase oscillation of the factors .

While the Cauchy-Schwarz inequality can be poor at estimating a single correlation such as (1), its power improves when considering an average (or sum, or square sum) of *multiple* correlations. In this set of notes, we will focus on one such situation of this type, namely that of trying to estimate a square sum

that measures the correlations of a single function with multiple other functions . One should think of the situation in which is a “complicated” function, such as the von Mangoldt function , but the are relatively “simple” functions, such as Dirichlet characters. In the case when the are orthonormal functions, we of course have the classical Bessel inequality:

Lemma 1 (Bessel inequality)Let be finitely supported functions obeying the orthonormality relationshipfor all . Then for any function , we have

For sake of comparison, if one were to apply the Cauchy-Schwarz inequality (4) separately to each summand in (5), one would obtain the bound of , which is significantly inferior to the Bessel bound when is large. Geometrically, what is going on is this: the Cauchy-Schwarz inequality (4) is only close to sharp when and are close to parallel in the Hilbert space . But if are orthonormal, then it is not possible for any other vector to be simultaneously close to parallel to too many of these orthonormal vectors, and so the inner products of with most of the should be small. (See this previous blog post for more discussion of this principle.) One can view the Bessel inequality as formalising a repulsion principle: if correlates too much with some of the , then it does not have enough “energy” to have large correlation with the rest of the .

In analytic number theory applications, it is useful to generalise the Bessel inequality to the situation in which the are not necessarily orthonormal. This can be accomplished via the Cauchy-Schwarz inequality:

Proposition 2 (Generalised Bessel inequality)Let be finitely supported functions, and let be a non-negative function. Let be such that vanishes whenever vanishes, we havefor some sequence of complex numbers with , with the convention that vanishes whenever both vanish.

Note by relabeling that we may replace the domain here by any other at most countable set, such as the integers . (Indeed, one can give an analogue of this lemma on arbitrary measure spaces, but we will not do so here.) This result first appears in this paper of Boas.

*Proof:* We use the *method of duality* to replace the role of the function by a dual sequence . By the converse to Cauchy-Schwarz, we may write the left-hand side of (6) as

for some complex numbers with . Indeed, if all of the vanish, we can set the arbitrarily, otherwise we set to be the unit vector formed by dividing by its length. We can then rearrange this expression as

Applying Cauchy-Schwarz (dividing the first factor by and multiplying the second by , after first removing those for which vanish), this is bounded by

and the claim follows by expanding out the second factor.

Observe that Lemma 1 is a special case of Proposition 2 when and the are orthonormal. In general, one can expect Proposition 2 to be useful when the are *almost orthogonal* relative to , in that the correlations tend to be small when are distinct. In that case, one can hope for the diagonal term in the right-hand side of (6) to dominate, in which case one can obtain estimates of comparable strength to the classical Bessel inequality. The flexibility to choose different weights in the above proposition has some technical advantages; for instance, if is concentrated in a sparse set (such as the primes), it is sometimes useful to tailor to a comparable set (e.g. the almost primes) in order not to lose too much in the first factor . Also, it can be useful to choose a fairly “smooth” weight , in order to make the weighted correlations small.

Remark 3In harmonic analysis, the use of tools such as Proposition 2 is known as themethod of almost orthogonality, or themethod. The explanation for the latter name is as follows. For sake of exposition, suppose that is never zero (or we remove all from the domain for which vanishes). Given a family of finitely supported functions , consider the linear operator defined by the formulaThis is a bounded linear operator, and the left-hand side of (6) is nothing other than the norm of . Without any further information on the function other than its norm , the best estimate one can obtain on (6) here is clearly

where denotes the operator norm of .

The adjoint is easily computed to be

The composition of and its adjoint is then given by

From the spectral theorem (or singular value decomposition), one sees that the operator norms of and are related by the identity

and as is a self-adjoint, positive semi-definite operator, the operator norm is also the supremum of the quantity

where ranges over unit vectors in . Putting these facts together, we obtain Proposition 2; furthermore, we see from this analysis that the bound here is essentially optimal if the only information one is allowed to use about is its norm.

For further discussion of almost orthogonality methods from a harmonic analysis perspective, see Chapter VII of this text of Stein.

Exercise 4Under the same hypotheses as Proposition 2, show thatas well as the variant inequality

Proposition 2 has many applications in analytic number theory; for instance, we will use it in later notes to control the large value of Dirichlet series such as the Riemann zeta function. One of the key benefits is that it largely eliminates the need to consider further correlations of the function (other than its self-correlation relative to , which is usually fairly easy to compute or estimate as is usually chosen to be relatively simple); this is particularly useful if is a function which is significantly more complicated to analyse than the functions . Of course, the tradeoff for this is that one now has to deal with the coefficients , which if anything are even less understood than , since literally the only thing we know about these coefficients is their square sum . However, as long as there is enough almost orthogonality between the , one can estimate the by fairly crude estimates (e.g. triangle inequality or Cauchy-Schwarz) and still get reasonably good estimates.

In this set of notes, we will use Proposition 2 to prove some versions of the *large sieve inequality*, which controls a square-sum of correlations

of an arbitrary finitely supported function with various additive characters (where ), or alternatively a square-sum of correlations

of with various primitive Dirichlet characters ; it turns out that one can prove a (slightly sub-optimal) version of this inequality quite quickly from Proposition 2 if one first prepares the sum by inserting a smooth cutoff with well-behaved Fourier transform. The large sieve inequality has many applications (as the name suggests, it has particular utility within *sieve theory*). For the purposes of this set of notes, though, the main application we will need it for is the Bombieri-Vinogradov theorem, which in a very rough sense gives a prime number theorem in arithmetic progressions, which, “on average”, is of strength comparable to the results provided by the Generalised Riemann Hypothesis (GRH), but has the great advantage of being unconditional (it does not require any unproven hypotheses such as GRH); it can be viewed as a significant extension of the Siegel-Walfisz theorem from Notes 2. As we shall see in later notes, the Bombieri-Vinogradov theorem is a very useful ingredient in sieve-theoretic problems involving the primes.

There is however one additional important trick, beyond the large sieve, which we will need in order to establish the Bombieri-Vinogradov theorem. As it turns out, after some basic manipulations (and the deployment of some multiplicative number theory, and specifically the Siegel-Walfisz theorem), the task of proving the Bombieri-Vinogradov theorem is reduced to that of getting a good estimate on sums that are roughly of the form

for some primitive Dirichlet characters . This looks like the type of sum that can be controlled by the large sieve (or by Proposition 2), except that this is an ordinary sum rather than a square sum (i.e., an norm rather than an norm). One could of course try to control such a sum in terms of the associated square-sum through the Cauchy-Schwarz inequality, but this turns out to be very wasteful (it loses a factor of about ). Instead, one should try to exploit the special structure of the von Mangoldt function , in particular the fact that it can be expressible as a Dirichlet convolution of two further arithmetic sequences (or as a finite linear combination of such Dirichlet convolutions). The reason for introducing this convolution structure is through the basic identity

for any finitely supported sequences , as can be easily seen by multiplying everything out and using the completely multiplicative nature of . (This is the multiplicative analogue of the well-known relationship between ordinary convolution and Fourier coefficients.) This factorisation, together with yet another application of the Cauchy-Schwarz inequality, lets one control (7) by square-sums of the sort that can be handled by the large sieve inequality.

As we have seen in Notes 1, the von Mangoldt function does indeed admit several factorisations into Dirichlet convolution type, such as the factorisation . One can try directly inserting this factorisation into the above strategy; it almost works, however there turns out to be a problem when considering the contribution of the portion of or that is supported at very small natural numbers, as the large sieve loses any gain over the trivial bound in such settings. Because of this, there is a need for a more sophisticated decomposition of into Dirichlet convolutions which are non-degenerate in the sense that are supported away from small values. (As a non-example, the trivial factorisation would be a totally inappropriate factorisation for this purpose.) Fortunately, it turns out that through some elementary combinatorial manipulations, some satisfactory decompositions of this type are available, such as the Vaughan identity and the Heath-Brown identity. By using one of these identities we will be able to complete the proof of the Bombieri-Vinogradov theorem. (These identities are also useful for other applications in which one wishes to control correlations between the von Mangoldt function and some other sequence; we will see some examples of this in later notes.)

For further reading on these topics, including a significantly larger number of examples of the large sieve inequality, see Chapters 7 and 17 of Iwaniec and Kowalski.

Remark 5We caution that the presentation given in this set of notes is highly ahistorical; we are using modern streamlined proofs of results that were first obtained by more complicated arguments.

This is one of the continuations of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and a post to come that covers the Type III sums.) The main purpose of this post is to present (and hopefully, to improve upon) the treatment of two of the three key estimates in Zhang’s paper, namely the Type I and Type II estimates.

The main estimate was already stated as Theorem 16 in the previous post, but we quickly recall the relevant definitions here. As in other posts, we always take to be a parameter going off to infinity, with the usual asymptotic notation associated to this parameter.

Definition 1 (Coefficient sequences)Acoefficient sequenceis a finitely supported sequence that obeys the boundsfor all , where is the divisor function.

- (i) If is a coefficient sequence and is a primitive residue class, the (signed)
discrepancyof in the sequence is defined to be the quantity- (ii) A coefficient sequence is said to be
at scalefor some if it is supported on an interval of the form .- (iii) A coefficient sequence at scale is said to
obey the Siegel-Walfisz theoremif one has- (iv) A coefficient sequence at scale is said to be
smoothif it takes the form for some smooth function supported on obeying the derivative boundsfor all fixed (note that the implied constant in the notation may depend on ).

In Lemma 8 of this previous post we established a collection of “crude estimates” which assert, roughly speaking, that for the purposes of averaged estimates one may ignore the factor in (1) and pretend that was in fact . We shall rely frequently on these “crude estimates” without further citation to that precise lemma.

For any , let denote the square-free numbers whose prime factors lie in .

Definition 2 (Singleton congruence class system)Let . Asingleton congruence class systemon is a collection of primitive residue classes for each , obeying the Chinese remainder theorem propertywhenever are coprime. We say that such a system has

controlled multiplicityif the

The main result of this post is then the following:

Theorem 3 (Type I/II estimate)Let be fixed quantities such thatand let be coefficient sequences at scales respectively with

with obeying a Siegel-Walfisz theorem. Then for any and any singleton congruence class system with controlled multiplicity we have

The proof of this theorem relies on five basic tools:

- (i) the Bombieri-Vinogradov theorem;
- (ii) completion of sums;
- (iii) the Weil conjectures;
- (iv) factorisation of smooth moduli ; and
- (v) the Cauchy-Schwarz and triangle inequalities (Weyl differencing and the dispersion method).

For the purposes of numerics, it is the interplay between (ii), (iii), and (v) that drives the final conditions (7), (8). The Weil conjectures are the primary source of power savings ( for some fixed ) in the argument, but they need to overcome power losses coming from completion of sums, and also each use of Cauchy-Schwarz tends to halve any power savings present in one’s estimates. Naively, one could thus expect to get better estimates by relying more on the Weil conjectures, and less on completion of sums and on Cauchy-Schwarz.

## Recent Comments