254A, Notes 3: The large sieve and the Bombieri-Vinogradov theorem

10 January, 2015 in 254A - analytic prime number theory, math.NT | Tags: almost orthogonality, Bombieri-Vinogradov theorem, Dirichlet character, large sieve, Vaughan identity | by Terence Tao

A fundamental and recurring problem in analytic number theory is to demonstrate the presence of cancellation in an oscillating sum, a typical example of which might be a correlation

$\displaystyle \sum_{n} f(n) \overline{g(n)} \ \ \ \ \ (1)$

between two arithmetic functions ${f: {\bf N} \rightarrow {\bf C}}$ and ${g: {\bf N} \rightarrow {\bf C}}$ , which to avoid technicalities we will assume to be finitely supported (or that the ${n}$ variable is localised to a finite range, such as ${\{ n: n \leq x \}}$ ). A key example to keep in mind for the purposes of this set of notes is the twisted von Mangoldt summatory function

$\displaystyle \sum_{n \leq x} \Lambda(n) \overline{\chi(n)} \ \ \ \ \ (2)$

that measures the correlation between the primes and a Dirichlet character ${\chi}$ . One can get a “trivial” bound on such sums from the triangle inequality

$\displaystyle |\sum_{n} f(n) \overline{g(n)}| \leq \sum_{n} |f(n)| |g(n)|;$

for instance, from the triangle inequality and the prime number theorem we have

$\displaystyle |\sum_{n \leq x} \Lambda(n) \overline{\chi(n)}| \leq x + o(x) \ \ \ \ \ (3)$

as ${x \rightarrow \infty}$ . But the triangle inequality is insensitive to the phase oscillations of the summands, and often we expect (e.g. from the probabilistic heuristics from Supplement 4) to be able to improve upon the trivial triangle inequality bound by a substantial amount; in the best case scenario, one typically expects a “square root cancellation” that gains a factor that is roughly the square root of the number of summands. (For instance, for Dirichlet characters ${\chi}$ of conductor ${O(x^{O(1)})}$ , it is expected from probabilistic heuristics that the left-hand side of (3) should in fact be ${O_\varepsilon(x^{1/2+\varepsilon})}$ for any ${\varepsilon>0}$ .)

It has proven surprisingly difficult, however, to establish significant cancellation in many of the sums of interest in analytic number theory, particularly if the sums do not have a strong amount of algebraic structure (e.g. multiplicative structure) which allow for the deployment of specialised techniques (such as multiplicative number theory techniques). In fact, we are forced to rely (to an embarrassingly large extent) on (many variations of) a single basic tool to capture at least some cancellation, namely the Cauchy-Schwarz inequality. In fact, in many cases the classical case

$\displaystyle |\sum_n f(n) \overline{g(n)}| \leq (\sum_n |f(n)|^2)^{1/2} (\sum_n |g(n)|^2)^{1/2}, \ \ \ \ \ (4)$

considered by Cauchy, where at least one of ${f, g: {\bf N} \rightarrow {\bf C}}$ is finitely supported, suffices for applications. Roughly speaking, the Cauchy-Schwarz inequality replaces the task of estimating a cross-correlation between two different functions ${f,g}$ , to that of measuring self-correlations between ${f}$ and itself, or ${g}$ and itself, which are usually easier to compute (albeit at the cost of capturing less cancellation). Note that the Cauchy-Schwarz inequality requires almost no hypotheses on the functions ${f}$ or ${g}$ , making it a very widely applicable tool.

There is however some skill required to decide exactly how to deploy the Cauchy-Schwarz inequality (and in particular, how to select ${f}$ and ${g}$ ); if applied blindly, one loses all cancellation and can even end up with a worse estimate than the trivial bound. For instance, if one tries to bound (2) directly by applying Cauchy-Schwarz with the functions ${\Lambda}$ and ${\chi}$ , one obtains the bound

$\displaystyle |\sum_{n \leq x} \Lambda(n) \overline{\chi(n)}| \leq (\sum_{n \leq x} \Lambda(n)^2)^{1/2} (\sum_{n \leq x} |\chi(n)|^2)^{1/2}.$

The right-hand side may be bounded by ${\ll x \log^{1/2} x}$ , but this is worse than the trivial bound (3) by a logarithmic factor. This can be “blamed” on the fact that ${\Lambda}$ and ${\chi}$ are concentrated on rather different sets ( ${\Lambda}$ is concentrated on primes, while ${\chi}$ is more or less uniformly distributed amongst the natural numbers); but even if one corrects for this (e.g. by weighting Cauchy-Schwarz with some suitable “sieve weight” that is more concentrated on primes), one still does not do any better than (3). Indeed, the Cauchy-Schwarz inequality suffers from the same key weakness as the triangle inequality: it is insensitive to the phase oscillation of the factors ${f, g}$ .

While the Cauchy-Schwarz inequality can be poor at estimating a single correlation such as (1), its power improves when considering an average (or sum, or square sum) of multiple correlations. In this set of notes, we will focus on one such situation of this type, namely that of trying to estimate a square sum

$\displaystyle (\sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}|^2)^{1/2} \ \ \ \ \ (5)$

that measures the correlations of a single function ${f: {\bf N} \rightarrow {\bf C}}$ with multiple other functions ${g_j: {\bf N} \rightarrow {\bf C}}$ . One should think of the situation in which ${f}$ is a “complicated” function, such as the von Mangoldt function ${\Lambda}$ , but the ${g_j}$ are relatively “simple” functions, such as Dirichlet characters. In the case when the ${g_j}$ are orthonormal functions, we of course have the classical Bessel inequality:

Lemma 1 (Bessel inequality) Let ${g_1,\dots,g_J: {\bf N} \rightarrow {\bf C}}$ be finitely supported functions obeying the orthonormality relationship

$\displaystyle \sum_n g_j(n) \overline{g_{j'}(n)} = 1_{j=j'}$

for all ${1 \leq j,j' \leq J}$ . Then for any function ${f: {\bf N} \rightarrow {\bf C}}$ , we have

$\displaystyle (\sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}|^2)^{1/2} \leq (\sum_n |f(n)|^2)^{1/2}.$

For sake of comparison, if one were to apply the Cauchy-Schwarz inequality (4) separately to each summand in (5), one would obtain the bound of ${J^{1/2} (\sum_n |f(n)|^2)^{1/2}}$ , which is significantly inferior to the Bessel bound when ${J}$ is large. Geometrically, what is going on is this: the Cauchy-Schwarz inequality (4) is only close to sharp when ${f}$ and ${g}$ are close to parallel in the Hilbert space ${\ell^2({\bf N})}$ . But if ${g_1,\dots,g_J}$ are orthonormal, then it is not possible for any other vector ${f}$ to be simultaneously close to parallel to too many of these orthonormal vectors, and so the inner products of ${f}$ with most of the ${g_j}$ should be small. (See this previous blog post for more discussion of this principle.) One can view the Bessel inequality as formalising a repulsion principle: if ${f}$ correlates too much with some of the ${g_j}$ , then it does not have enough “energy” to have large correlation with the rest of the ${g_j}$ .

In analytic number theory applications, it is useful to generalise the Bessel inequality to the situation in which the ${g_j}$ are not necessarily orthonormal. This can be accomplished via the Cauchy-Schwarz inequality:

Proposition 2 (Generalised Bessel inequality) Let ${g_1,\dots,g_J: {\bf N} \rightarrow {\bf C}}$ be finitely supported functions, and let ${\nu: {\bf N} \rightarrow {\bf R}^+}$ be a non-negative function. Let ${f: {\bf N} \rightarrow {\bf C}}$ be such that ${f}$ vanishes whenever ${\nu}$ vanishes, we have

$\displaystyle (\sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}|^2)^{1/2} \leq (\sum_n |f(n)|^2 / \nu(n))^{1/2} \ \ \ \ \ (6)$

$\displaystyle \times ( \sum_{j=1}^J \sum_{j'=1}^J c_j \overline{c_{j'}} \sum_n \nu(n) g_j(n) \overline{g_{j'}(n)} )^{1/2}$

for some sequence ${c_1,\dots,c_J}$ of complex numbers with ${\sum_{j=1}^J |c_j|^2 = 1}$ , with the convention that ${|f(n)|^2/\nu(n)}$ vanishes whenever ${f(n), \nu(n)}$ both vanish.

Note by relabeling that we may replace the domain ${{\bf N}}$ here by any other at most countable set, such as the integers ${{\bf Z}}$ . (Indeed, one can give an analogue of this lemma on arbitrary measure spaces, but we will not do so here.) This result first appears in this paper of Boas.

Proof: We use the method of duality to replace the role of the function ${f}$ by a dual sequence ${c_1,\dots,c_J}$ . By the converse to Cauchy-Schwarz, we may write the left-hand side of (6) as

$\displaystyle \sum_{j=1}^J \overline{c_j} \sum_{n} f(n) \overline{g_j(n)}$

for some complex numbers ${c_1,\dots,c_J}$ with ${\sum_{j=1}^J |c_j|^2 = 1}$ . Indeed, if all of the ${\sum_{n} f(n) \overline{g_j(n)}}$ vanish, we can set the ${c_j}$ arbitrarily, otherwise we set ${(c_1,\dots,c_J)}$ to be the unit vector formed by dividing ${(\sum_{n} f(n) \overline{g_j(n)})_{j=1}^J}$ by its length. We can then rearrange this expression as

$\displaystyle \sum_n f(n) \overline{\sum_{j=1}^J c_j g_j(n)}.$

Applying Cauchy-Schwarz (dividing the first factor by ${\nu(n)^{1/2}}$ and multiplying the second by ${\nu(n)^{1/2}}$ , after first removing those ${n}$ for which ${\nu(n)}$ vanish), this is bounded by

$\displaystyle (\sum_n |f(n)|^2 / \nu(n))^{1/2} (\sum_n \nu(n) |\sum_{j=1}^J c_j g_j(n)|^2)^{1/2},$

and the claim follows by expanding out the second factor. $\Box$

Observe that Lemma 1 is a special case of Proposition 2 when ${\nu=1}$ and the ${g_j}$ are orthonormal. In general, one can expect Proposition 2 to be useful when the ${g_j}$ are almost orthogonal relative to ${\nu}$ , in that the correlations ${\sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}}$ tend to be small when ${j,j'}$ are distinct. In that case, one can hope for the diagonal term ${j=j'}$ in the right-hand side of (6) to dominate, in which case one can obtain estimates of comparable strength to the classical Bessel inequality. The flexibility to choose different weights ${\nu}$ in the above proposition has some technical advantages; for instance, if ${f}$ is concentrated in a sparse set (such as the primes), it is sometimes useful to tailor ${\nu}$ to a comparable set (e.g. the almost primes) in order not to lose too much in the first factor ${\sum_n |f(n)|^2 / \nu(n)}$ . Also, it can be useful to choose a fairly “smooth” weight ${\nu}$ , in order to make the weighted correlations ${\sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}}$ small.

Remark 3 In harmonic analysis, the use of tools such as Proposition 2 is known as the method of almost orthogonality, or the ${TT^*}$ method. The explanation for the latter name is as follows. For sake of exposition, suppose that ${\nu}$ is never zero (or we remove all ${n}$ from the domain for which ${\nu(n)}$ vanishes). Given a family of finitely supported functions ${g_1,\dots,g_J: {\bf N} \rightarrow {\bf C}}$ , consider the linear operator ${T: \ell^2(\nu^{-1}) \rightarrow \ell^2(\{1,\dots,J\})}$ defined by the formula

$\displaystyle T f := ( \sum_{n} f(n) \overline{g_j(n)} )_{j=1}^J.$

This is a bounded linear operator, and the left-hand side of (6) is nothing other than the ${\ell^2(\{1,\dots,J\})}$ norm of ${Tf}$ . Without any further information on the function ${f}$ other than its ${\ell^2(\nu^{-1})}$ norm ${(\sum_n |f(n)|^2 / \nu(n))^{1/2}}$ , the best estimate one can obtain on (6) here is clearly

$\displaystyle (\sum_n |f(n)|^2 / \nu(n))^{1/2} \times \|T\|_{op},$

where ${\|T\|_{op}}$ denotes the operator norm of ${T}$ .

The adjoint ${T^*: \ell^2(\{1,\dots,J\}) \rightarrow \ell^2(\nu^{-1})}$ is easily computed to be

$\displaystyle T^* (c_j)_{j=1}^J := (\sum_{j=1}^J c_j \nu(n) g_j(n) )_{n \in {\bf N}}.$

The composition ${TT^*: \ell^2(\{1,\dots,J\}) \rightarrow \ell^2(\{1,\dots,J\})}$ of ${T}$ and its adjoint is then given by

$\displaystyle TT^* (c_j)_{j=1}^J := (\sum_{j=1}^J c_j \sum_n \nu(n) g_j(n) \overline{g_{j'}}(n) )_{j=1}^J.$

From the spectral theorem (or singular value decomposition), one sees that the operator norms of ${T}$ and ${TT^*}$ are related by the identity

$\displaystyle \|T\|_{op} = \|TT^*\|_{op}^{1/2},$

and as ${TT^*}$ is a self-adjoint, positive semi-definite operator, the operator norm ${\|TT^*\|_{op}}$ is also the supremum of the quantity

$\displaystyle \langle TT^* (c_j)_{j=1}^J, (c_j)_{j=1}^J \rangle_{\ell^2(\{1,\dots,J\})} = \sum_{j=1}^J \sum_{j'=1}^J c_j \overline{c_{j'}} \sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}$

where ${(c_j)_{j=1}^J}$ ranges over unit vectors in ${\ell^2(\{1,\dots,J\})}$ . Putting these facts together, we obtain Proposition 2; furthermore, we see from this analysis that the bound here is essentially optimal if the only information one is allowed to use about ${f}$ is its ${\ell^2(\nu^{-1})}$ norm.

For further discussion of almost orthogonality methods from a harmonic analysis perspective, see Chapter VII of this text of Stein.

Exercise 4 Under the same hypotheses as Proposition 2, show that

$\displaystyle \sum_{j=1}^J |\sum_{n} f(n) \overline{g_j(n)}| \leq (\sum_n |f(n)|^2 / \nu(n))^{1/2}$

$\displaystyle \times ( \sum_{j=1}^J \sum_{j'=1}^J |\sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}| )^{1/2}$

as well as the variant inequality

$\displaystyle |\sum_{j=1}^J \sum_{n} f(n) \overline{g_j(n)}| \leq (\sum_n |f(n)|^2 / \nu(n))^{1/2}$

$\displaystyle \times | \sum_{j=1}^J \sum_{j'=1}^J \sum_n \nu(n) g_j(n) \overline{g_{j'}(n)}|^{1/2}.$

Proposition 2 has many applications in analytic number theory; for instance, we will use it in later notes to control the large value of Dirichlet series such as the Riemann zeta function. One of the key benefits is that it largely eliminates the need to consider further correlations of the function ${f}$ (other than its self-correlation ${\sum_n |f(n)|^2 / \nu(n)}$ relative to ${\nu^{-1}}$ , which is usually fairly easy to compute or estimate as ${\nu}$ is usually chosen to be relatively simple); this is particularly useful if ${f}$ is a function which is significantly more complicated to analyse than the functions ${g_j}$ . Of course, the tradeoff for this is that one now has to deal with the coefficients ${c_j}$ , which if anything are even less understood than ${f}$ , since literally the only thing we know about these coefficients is their square sum ${\sum_{j=1}^J |c_j|^2}$ . However, as long as there is enough almost orthogonality between the ${g_j}$ , one can estimate the ${c_j}$ by fairly crude estimates (e.g. triangle inequality or Cauchy-Schwarz) and still get reasonably good estimates.

In this set of notes, we will use Proposition 2 to prove some versions of the large sieve inequality, which controls a square-sum of correlations

$\displaystyle \sum_n f(n) e( -\xi_j n )$

of an arbitrary finitely supported function ${f: {\bf Z} \rightarrow {\bf C}}$ with various additive characters ${n \mapsto e( \xi_j n)}$ (where ${e(x) := e^{2\pi i x}}$ ), or alternatively a square-sum of correlations

$\displaystyle \sum_n f(n) \overline{\chi_j(n)}$

of ${f}$ with various primitive Dirichlet characters ${\chi_j}$ ; it turns out that one can prove a (slightly sub-optimal) version of this inequality quite quickly from Proposition 2 if one first prepares the sum by inserting a smooth cutoff with well-behaved Fourier transform. The large sieve inequality has many applications (as the name suggests, it has particular utility within sieve theory). For the purposes of this set of notes, though, the main application we will need it for is the Bombieri-Vinogradov theorem, which in a very rough sense gives a prime number theorem in arithmetic progressions, which, “on average”, is of strength comparable to the results provided by the Generalised Riemann Hypothesis (GRH), but has the great advantage of being unconditional (it does not require any unproven hypotheses such as GRH); it can be viewed as a significant extension of the Siegel-Walfisz theorem from Notes 2. As we shall see in later notes, the Bombieri-Vinogradov theorem is a very useful ingredient in sieve-theoretic problems involving the primes.

There is however one additional important trick, beyond the large sieve, which we will need in order to establish the Bombieri-Vinogradov theorem. As it turns out, after some basic manipulations (and the deployment of some multiplicative number theory, and specifically the Siegel-Walfisz theorem), the task of proving the Bombieri-Vinogradov theorem is reduced to that of getting a good estimate on sums that are roughly of the form

$\displaystyle \sum_{j=1}^J |\sum_n \Lambda(n) \overline{\chi_j}(n)| \ \ \ \ \ (7)$

for some primitive Dirichlet characters ${\chi_j}$ . This looks like the type of sum that can be controlled by the large sieve (or by Proposition 2), except that this is an ordinary sum rather than a square sum (i.e., an ${\ell^1}$ norm rather than an ${\ell^2}$ norm). One could of course try to control such a sum in terms of the associated square-sum through the Cauchy-Schwarz inequality, but this turns out to be very wasteful (it loses a factor of about ${J^{1/2}}$ ). Instead, one should try to exploit the special structure of the von Mangoldt function ${\Lambda}$ , in particular the fact that it can be expressible as a Dirichlet convolution ${\alpha * \beta}$ of two further arithmetic sequences ${\alpha,\beta}$ (or as a finite linear combination of such Dirichlet convolutions). The reason for introducing this convolution structure is through the basic identity

$\displaystyle (\sum_n \alpha*\beta(n) \overline{\chi_j}(n)) = (\sum_n \alpha(n) \overline{\chi_j}(n)) (\sum_n \beta(n) \overline{\chi_j}(n)) \ \ \ \ \ (8)$

for any finitely supported sequences ${\alpha,\beta: {\bf N} \rightarrow {\bf C}}$ , as can be easily seen by multiplying everything out and using the completely multiplicative nature of ${\chi_j}$ . (This is the multiplicative analogue of the well-known relationship ${\widehat{f*g}(\xi) = \hat f(\xi) \hat g(\xi)}$ between ordinary convolution and Fourier coefficients.) This factorisation, together with yet another application of the Cauchy-Schwarz inequality, lets one control (7) by square-sums of the sort that can be handled by the large sieve inequality.

As we have seen in Notes 1, the von Mangoldt function ${\Lambda}$ does indeed admit several factorisations into Dirichlet convolution type, such as the factorisation ${\Lambda = \mu * L}$ . One can try directly inserting this factorisation into the above strategy; it almost works, however there turns out to be a problem when considering the contribution of the portion of ${\mu}$ or ${L}$ that is supported at very small natural numbers, as the large sieve loses any gain over the trivial bound in such settings. Because of this, there is a need for a more sophisticated decomposition of ${\Lambda}$ into Dirichlet convolutions ${\alpha * \beta}$ which are non-degenerate in the sense that ${\alpha,\beta}$ are supported away from small values. (As a non-example, the trivial factorisation ${\Lambda = \Lambda * \delta}$ would be a totally inappropriate factorisation for this purpose.) Fortunately, it turns out that through some elementary combinatorial manipulations, some satisfactory decompositions of this type are available, such as the Vaughan identity and the Heath-Brown identity. By using one of these identities we will be able to complete the proof of the Bombieri-Vinogradov theorem. (These identities are also useful for other applications in which one wishes to control correlations between the von Mangoldt function ${\Lambda}$ and some other sequence; we will see some examples of this in later notes.)

For further reading on these topics, including a significantly larger number of examples of the large sieve inequality, see Chapters 7 and 17 of Iwaniec and Kowalski.

Remark 5 We caution that the presentation given in this set of notes is highly ahistorical; we are using modern streamlined proofs of results that were first obtained by more complicated arguments.

— 1. The large sieve inequality —

We begin with a (slightly weakened) form of the large sieve inequality for additive characters, also known as the analytic large sieve inequality, first extracted explicitly by Davenport and Halberstam from previous work on the large sieve, and then refined further by many authors (see Remark 7 below).

Proposition 6 (Analytic large sieve inequality) Let ${f: {\bf Z} \rightarrow {\bf C}}$ be a function supported on an interval ${[M,M+N]}$ for some ${M \in {\bf R}}$ and ${N > 0}$ , and let ${\xi_1,\dots,\xi_J \in {\bf R}/{\bf Z}}$ be ${\delta}$ -separated (thus ${\|\xi_i - \xi_j\|_{{\bf R}/{\bf Z}} \ge \delta}$ for all ${1 \leq i < j \leq J}$ and some ${\delta>0}$ , where ${\|\xi\|_{{\bf R}/{\bf Z}}}$ denotes the distance from ${\xi}$ to the nearest integer. Then

$\displaystyle \sum_{j=1}^J |\sum_n f(n) e( - \xi_j n )|^2 \ll (N + \frac{1}{\delta}) \sum_n |f(n)|^2. \ \ \ \ \ (9)$

One can view this proposition as a variant of the Plancherel identity

$\displaystyle \sum_{j=1}^N |\sum_{n=1}^N f(n) e( - jn / N )|^2 = N \sum_{n=1}^N |f(n)|^2$

associated to the Fourier transform on a cyclic group ${{\bf Z}/N{\bf Z}}$ . This identity also shows that apart from the implied constant, the bound (9) is essentially best possible.

Proof: By increasing ${N}$ (if ${N < 1/\delta}$ ) or decreasing ${\delta}$ (if ${1/\delta < N}$ ), we can reduce to the case ${N = \frac{1}{\delta}}$ without making the hypotheses any stronger. Thus the ${\xi_j}$ are now ${1/N}$ -separated, and our task is now to show that

$\displaystyle \sum_{j=1}^J |\sum_n f(n) e( - \xi_j n )|^2 \ll N \sum_n |f(n)|^2.$

We now wish to apply Proposition 2 (with ${{\bf N}}$ relabeled by ${{\bf Z}}$ ), but we need to choose a suitable weight function ${\nu: {\bf Z} \rightarrow {\bf R}^+}$ . It turns out that it is advantageous for technical reasons to select a weight that has good Fourier-analytic properties.

Fix a smooth non-negative function ${\psi: {\bf R} \rightarrow {\bf R}}$ supported on ${[-1/4,1/4]}$ and not identically zero; we allow implied constants to depend on ${\psi}$ . (Actually, the smoothness of ${\psi}$ is not absolutely necessary for this argument; one could take ${\psi = 1_{[-1/4,1/4]}}$ below if desired.) Consider the weight ${\nu: {\bf R} \rightarrow {\bf R}^+}$ defined by

$\displaystyle \nu(n) := |\int_{\bf R} \psi( x ) e( x \frac{n-M}{N} )\ dx|^2.$

Observe that ${\nu \gg 1}$ for all ${n \in [M,M+N]}$ . Thus, by Proposition 2 with this choice of ${\nu}$ , we reduce to showing that

$\displaystyle \sum_{j=1}^J \sum_{j'=1}^J c_j \overline{c_{j'}} \sum_n \nu(n) e( (\xi_j - \xi_{j'}) n ) \ll N \ \ \ \ \ (10)$

whenever ${c_1,\dots,c_J}$ are complex numbers with ${\sum_{j=1}^J |c_j|^2 = 1}$ .

We first consider the diagonal contribution ${j=j'}$ . We can write ${\nu}$ in terms of the Fourier transform ${\hat \psi(\xi) := \int_{\bf R} \psi(x) e^{i x\xi}\ dx}$ of ${\psi}$ as

$\displaystyle \nu(t) = |\hat \psi( 2\pi (t-M) / N )|^2,$

and hence by Exercise 28 of Supplement 2, we have the bounds

$\displaystyle \nu(n) \ll \frac{1}{(1 + (n-M)/N)^2} \ \ \ \ \ (11)$

(say) and thus

$\displaystyle \sum_n \nu(n) \ll N.$

Thus the diagonal contribution ${j=j'}$ of (10) is acceptable.

Now we consider an off-diagonal term ${\sum_n \nu(n) e( (\xi_j - \xi_{j'}) n )}$ with ${j \neq j'}$ . By the Poisson summation formula (Theorem 34 from Supplement 2), we can rewrite this as

$\displaystyle \sum_m \hat \nu( 2\pi (m + \xi_j - \xi_{j'}) ).$

Now from the Fourier inversion formula, the function ${\hat \psi}$ has Fourier transform supported on ${[-1/4,1/4]}$ , and so ${\nu}$ has Fourier transform supported in ${[-\frac{\pi}{N}, \frac{\pi}{N}]}$ . Since ${\xi_j - \xi_{j'}}$ is at least ${1/N}$ from the nearest integer, we conclude that ${\hat \nu( 2\pi (m + \xi_j - \xi_{j'}) )=0}$ for all integers ${m}$ , and hence all the off-diagonal terms in fact vanish! The claim follows. $\Box$

Remark 7 If ${M,N}$ are integers, one can in fact obtain the sharper bound

$\displaystyle \sum_{j=1}^J |\sum_n f(n) e( - \xi_j n )|^2 \leq (N + \frac{1}{\delta}) \sum_n |f(n)|^2,$

a result of Montgomery and Vaughan (with ${N}$ replaced by ${N+1}$ , although it was observed subsequently subsequently by Paul Cohen that the additional ${+1}$ term could be deleted by an amplification trick similar to those discussed in this previous post). See this survey of Montgomery for these results and on the (somewhat complicated) evolution of the large sieve, starting with the pioneering work of Linnik. However, in our applications the cruder form of the analytic large sieve inequality given by Proposition 6 will suffice.

Exercise 8 Let ${f: {\bf N} \rightarrow {\bf C}}$ be a function supported on an interval ${[M,M+N]}$ . Show that for any ${A \geq 1}$ , one has

$\displaystyle |\sum_n f(n) e( - \xi n)| \leq A (\sum_n |f(n)|^2)^{1/2}$

for all ${\xi \in {\bf R}/{\bf Z}}$ outside of the union of ${O( \frac{N}{A^2} )}$ intervals (or arcs) of length ${1/N}$ . In particular, we have

$\displaystyle |\sum_n f(n) e( - \xi n)| \ll \sqrt{N} (\sum_n |f(n)|^2)^{1/2}$

for all ${\xi \in {\bf R}/{\bf Z}}$ , and the significantly superior estimate

$\displaystyle |\sum_n f(n) e( - \xi n)| \ll (\sum_n |f(n)|^2)^{1/2}$

for most ${\xi \in {\bf R}/{\bf Z}}$ (outside of at most (say) ${N/2}$ intervals of length ${1/N}$ ).

Exercise 9 (Continuous large sieve inequality) Let ${[M,M+N]}$ be an interval for some ${M \in {\bf R}}$ and ${N > 0}$ , and let ${\xi_1,\dots,\xi_J \in {\bf R}}$ be ${\delta}$ -separated.

(i) For any complex numbers ${c_1,\dots,c_J}$ , show that
$\displaystyle \int_M^{M+N} |\sum_{j=1}^J c_j e( \xi_j t )|^2\ dt \ll (N + \frac{1}{\delta}) \sum_{j=1}^J |c_j|^2.$

(Hint: replace the restriction of ${t}$ to ${[M,M+N]}$ with the weight ${\nu(t)}$ used to prove Proposition 6.)

(ii) For any continuous ${f: [M,M+N] \rightarrow {\bf C}}$ , show that
$\displaystyle \sum_{j=1}^J |\int_M^{M+N} f(t) e( - \xi_j t )\ dt|^2 \ll (N + \frac{1}{\delta}) \int_M^{M+N} |f(t)|^2\ dt.$

Now we establish a variant of the analytic large sieve inequality, involving Dirichlet characters, due to Bombieri and Davenport.

Proposition 10 (Large sieve inequality for characters) Let ${f: {\bf Z} \rightarrow {\bf C}}$ be a function supported on an interval ${[M,M+N]}$ for some ${M \in {\bf R}}$ and ${N > 0}$ , and let ${Q \geq 1}$ . Then

$\displaystyle \sum_{q \leq Q} \frac{q}{\phi(q)} \sum^*_{\chi\ (q)} |\sum_n f(n) \overline{\chi(n)}|^2 \ll (N + Q^2) \sum_n |f(n)|^2, \ \ \ \ \ (12)$

where ${\sum^*_{\chi\ (q)}}$ denotes the sum over all primitive Dirichlet characters of modulus ${q}$ .

Proof: By increasing ${N}$ or decreasing ${Q}$ , we may assume that ${N=Q^2}$ , so ${N \geq 1}$ and our task is now to show that

$\displaystyle \sum_{q \leq \sqrt{N}} \frac{q}{\phi(q)} \sum^*_{\chi\ (q)} |\sum_n f(n) \overline{\chi(n)}|^2 \ll N \sum_n |f(n)|^2.$

Let ${\nu}$ be the weight from Proposition 6. By Proposition 2 (and some relabeling), using the functions ${(\frac{q}{\phi(q)})^{1/2} \chi}$ , it suffices to show that

$\displaystyle \sum_{q,q' \leq \sqrt{N}} \left(\frac{q}{\phi(q)}\right)^{1/2} \left(\frac{q'}{\phi(q')}\right)^{1/2} \sum^*_{\chi\ (q)} \sum^*_{\chi'\ (q')} c_\chi \overline{c_{\chi'}} \sum_n \nu(n) \chi(n) \overline{\chi'(n)} \ \ \ \ \ (13)$

$\displaystyle \ll N$

whenever ${c_\chi}$ are complex numbers with

$\displaystyle \sum_{q \leq \sqrt{N}} \sum^*_{\chi\ (q)} |c_\chi|^2 = 1. \ \ \ \ \ (14)$

We first establish that the diagonal contribution ${(q,\chi)=(q',\chi')}$ to (13) is acceptable, that is to say that

$\displaystyle \sum_{q \leq \sqrt{N}} \frac{q}{\phi(q)} \sum^*_{\chi\ (q)} |c_\chi|^2 \sum_n \nu(n) |\chi(n)|^2 \ll N.$

The function ${|\chi(n)|^2}$ is the principal character of modulus ${q}$ ; it is periodic with period ${q}$ and has mean value ${\phi(q)/q}$ . In particular, since ${q \leq \sqrt{N} \leq N}$ , we see that ${\sum_{M' \leq n \leq M'+N} |\chi(n)|^2 \ll \frac{\phi(q)}{q} N}$ for any interval ${[M',M'+N]}$ of length ${N}$ . From (11) and a partition into intervals of length ${N}$ , we see that

$\displaystyle \sum_n \nu(n) |\chi(n)|^2 \ll \frac{\phi(q)}{q} N$

and the claim then follows from (14).

Now consider an off-diagonal term ${\sum_n \nu(n) \chi(n) \overline{\chi'(n)}}$ with ${\chi \neq \chi'}$ . As ${\chi,\chi'}$ are primitive, this implies that ${\chi \overline{\chi'}}$ is a non-principal character and thus has mean zero. Let ${r}$ be the modulus of this character; by Fourier expansion we may write ${\chi \overline{\chi'}}$ as a linear combination of the additive characters ${n \mapsto e( k n / r )}$ for ${1 \leq k < r}$ . (We can obtain explicit coefficients from this expansion by invoking Lemma 48 of Supplement 3, but we will not need those coefficients here.) Thus ${\sum_n \nu(n) \chi(n) \overline{\chi'(n)}}$ is a linear combination of the quantities ${\sum_n \nu(n) e( kn/r)}$ for ${1 \leq k < r}$ . But the modulus ${r}$ is the least common multiple of the moduli of ${\chi,\chi'}$ , so in particular ${r \leq N}$ , while as observed in the proof of Proposition 6, we have ${\sum_n \nu(n) e( \theta n ) = 0}$ whenever ${\|\theta\|_{{\bf R}/{\bf Z}} \geq 1/2N}$ . So the off-diagonal terms all vanish, and the claim follows. $\Box$

One can also derive Proposition 10 from Proposition 6:

Exercise 11 Let ${f: {\bf Z} \rightarrow {\bf C}}$ be a finitely supported sequence.

(i) For any natural number ${q}$ , establish the identity
$\displaystyle \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_n f(n) e(an/q)|^2$

$\displaystyle = \frac{1}{\phi(q)} \sum_{\chi\ (q)} |\sum_n f(n) \sum_{a \in {\bf Z}/q{\bf Z}} \chi(a) e(an/q)|^2$

where ${({\bf Z}/q{\bf Z})^\times}$ is the set of congruence classes ${a\ (q)}$ coprime to ${q}$ , and ${\sum_{\chi\ (q)}}$ is the sum over all characters (not necessarily primitive) of modulus ${q}$ .

(ii) For any natural number ${q}$ , establish the inequality
$\displaystyle \frac{q}{\phi(q)} \sum^*_{\chi\ (q)} |\sum_n f(n) \overline{\chi(n)}|^2 \leq \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_n f(n) e(an/q)|^2$

and use this and Proposition 6 to derive Proposition 10. (Hint: use Lemma 48 from Supplement 3.)

Remark 12 By combining the arguments in the above exercise with the results in Remark 7, one can sharpen Proposition 10 to

$\displaystyle \sum_{q \leq Q} \frac{q}{\phi(q)} \sum^*_{\chi\ (q)} |\sum_n f(n) \overline{\chi(n)}|^2 \leq (N + Q^2) \sum_n |f(n)|^2,$

that is to say one can delete the implied constant. See this paper of Montgomery and Vaughan for some further refinements of this inequality.

— 2. The Barban-Davenport-Halberstam theorem —

We now apply the large sieve inequality for characters to obtain an analogous inequality for arithmetic progressions, due independently to Barban, and to Davenport and Halberstam; we state a slightly weakened form of that theorem here. For any finitely supported arithmetic function ${f: {\bf N} \rightarrow {\bf C}}$ and any primitive residue class ${a\ (q)}$ , we introduce the discrepancy

$\displaystyle \Delta(f; a\ (q)) := \sum_{n: n = a\ (q)} f(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1} f(n).$

This quantity measures the extent to which ${f}$ is well distributed among the primitive residue classes modulo ${n}$ . From multiplicative Fourier inversion (see Theorem 69 from Notes 1) we have the identity

$\displaystyle \Delta(f; a\ (q)) = \frac{1}{\phi(q)} \sum_{\chi\ (q): \chi \neq \chi_0} \chi(a) \sum_n f(n) \overline{\chi(n)} \ \ \ \ \ (15)$

where the sum is over non-principal characters ${\chi}$ of modulus ${q}$ .

Theorem 13 (Barban-Davenport-Halberstam) Let ${x > 2}$ , and let ${f: {\bf N} \rightarrow {\bf C}}$ be a function supported on ${[1,x]}$ with the property that

$\displaystyle \sum_n |f(n)|^2 \ll x \log^{O(1)} x \ \ \ \ \ (16)$

and obeying the Siegel-Walfisz property

$\displaystyle \Delta(f 1_{(\cdot,s)=1}; a\ (r)) \ll_A x \log^{-A} x \ \ \ \ \ (17)$

for any fixed ${A>0}$ , any primitive residue class ${a\ (r)}$ , and any ${1 \leq s \leq x}$ . Then one has

$\displaystyle \sum_{q \leq Q} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(f; a\ (q))|^2 \ll_A x^2 \log^{-A} x \ \ \ \ \ (18)$

for any ${A > 0}$ , provided that ${Q \leq x \log^{-B} x}$ for some sufficiently large ${B = B(A)}$ depending only on ${A}$ .

Informally, (18) is asserting that

$\displaystyle \Delta(f; a\ (q)) \ll_A \frac{x}{\phi(q)} \log^{-A} x$

for “most” primitive residue classes ${a\ (q)}$ with ${q}$ much smaller than ${x}$ ; in most applications, the trivial bounds on ${\Delta(f; a\ (q))}$ are of the type ${O( \frac{x}{\phi(q)} \log^{O(1)} x )}$ , so this represents a savings of an arbitrary power of a logarithm on the average. Note that a direct application of (17) only gives (18) for ${Q}$ of size ${\log^{O(1)} x}$ ; it is the large sieve which allows for the significant enlargement of ${Q}$ .

Proof: Let ${x, f, Q, A}$ be as above, with ${Q \leq x \log^{-B} x}$ for some large ${B}$ to be chosen later. From (15) and the Plancherel identity one has

$\displaystyle \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(f; a\ (q))|^2 = \frac{1}{\phi(q)} \sum_{\chi\ (q): \chi \neq \chi_0} |\sum_n f(n) \overline{\chi(n)}|^2$

so our task is to show that

$\displaystyle \sum_{q \leq Q} \frac{1}{\phi(q)} \sum_{\chi\ (q): \chi \neq \chi_0} |\sum_n f(n) \overline{\chi(n)}|^2 \ll_A x^2 \log^{-A} x. \ \ \ \ \ (19)$

We cannot apply the large sieve inequality yet, because the characters ${\chi}$ here are not necessarily primitive. But we may express any non-principal character ${\chi(n)}$ as ${\tilde \chi(n) 1_{(n,s)=1}}$ for some primitive character ${\tilde \chi}$ of conductor ${r>1}$ , where ${r,s,t}$ are natural numbers with ${rst = q}$ . In particular, ${r,s,t \leq Q}$ and ${\frac{1}{\phi(q)} \leq \frac{1}{\phi(s)} \frac{1}{\phi(r)} \frac{1}{\phi(t)}}$ . Thus we may (somewhat crudely) upper bound the left-hand side of (19) by

$\displaystyle \sum_{t \leq Q} \frac{1}{\phi(t)} \sum_{s \leq Q} \frac{1}{\phi(s)} \sum_{1 < r \leq Q} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* |\sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}|^2.$

From Theorem 27 of Notes 1 we have ${\sum_{s \leq Q} \frac{1}{\phi(s)} \ll \log x}$ , so we may bound the above by

$\displaystyle (\log x)^2 \sup_{s \leq Q} \sum_{1 < r \leq Q} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* |\sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}|^2.$

By dyadic decomposition (and adjusting ${A}$ slightly), it thus suffices to show that

$\displaystyle \sum_{R < r \leq 2R} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* |\sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}|^2 \ll_A x^2 \log^{-A} x \ \ \ \ \ (20)$

for any ${1 \leq R \leq Q}$ and ${s \leq Q}$ .

From Proposition 10 and (16), we may bound the left-hand side of (20) by ${\frac{1}{R} (x + R^2) x \log^{O(1)} x}$ . If ${R \geq \log^B x}$ and ${R \leq Q \leq x \log^{-B} x}$ , then we obtain (20) if ${B}$ is sufficiently large depending on ${A}$ . The only remaining case to consider is when ${R < log^B x}$ . But from the Siegel-Walfisz hypothesis (17) we easily see that

$\displaystyle \sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)} \ll_{A'} R x \log^{-A'} x$

for any ${A' > 0}$ and any primitive character ${\tilde \chi}$ of conductor ${R < r \leq 2R}$ . Since the total number of primitive characters appearing in (20) is ${O(R^2) = O( \log^{2B} x)}$ , the claim follows by taking ${A'}$ large enough. $\Box$

One can specialise this to the von Mangoldt function:

Exercise 14 Use the Barban-Davenport-Halberstam theorem and the Siegel-Walfisz theorem (Exercise 64 from Notes 2) to conclude that

$\displaystyle \sum_{q \leq Q} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\Lambda 1_{[1,x]}; a\ (q))|^2 \ll_A x^2 \log^{-A} x \ \ \ \ \ (21)$

for any ${A > 0}$ , provided that ${Q \leq x \log^{-B} x}$ for some sufficiently large ${B = B(A)}$ depending only on ${A}$ . Obtain a similar claim with ${\Lambda}$ replaced by the Möbius function.

Remark 15 Recall that the implied constants in the Siegel-Walfisz theorem depended on ${A}$ in an ineffective fashion. As such, the implied constants in (21) also depend ineffectively on ${A}$ . However, if one replaces Siegel’s theorem by an effective substitute such as Tatuzawa’s theorem (see Theorem 62 of Notes 2) or the Landau-Page theorem (Theorem 53 of Notes 2), one can obtain an effective version of the Siegel-Walfisz theorem for all moduli ${q}$ that are not multiples of a single exceptional modulus ${q_*}$ . One can then obtain an effective version of (21) if one restricts to moduli ${q}$ that are not multiples of ${q_*}$ . Similarly for the Bombieri-Vinogradov theorem in the next section. Such variants of the Barban-Davenport-Halberstam theorem or Bombieri-Vinogradov theorem can be used as a substitute in some applications to remove any ineffective dependence of constants, at the cost of making the argument slightly more convoluted.

— 3. The Bombieri-Vinogradov theorem —

The Barban-Davenport-Halberstam theorem controls the discrepancy ${\Delta(f; a\ (q))}$ after averaging in both the modulus ${q}$ and the residue class ${a}$ . For many problems in sieve theory, it turns out to be more important to control the discrepancy with an averaging only in the modulus ${q}$ , with the residue class ${a}$ being allowed to vary in ${q}$ in the “worst-case” fashion. Specifically, one often wishes to control expressions of the form

$\displaystyle \sum_{q \leq Q} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(f; a\ (q))|. \ \ \ \ \ (22)$

for some finitely supported ${f: {\bf N} \rightarrow {\bf C}}$ and ${Q>1}$ . This expression is difficult to control for arbitrary ${f}$ , but it turns out that one can obtain a good bound if ${f}$ is expressible as a Dirichlet convolution ${f = \alpha*\beta}$ for some suitably “non-degenerate” sequences ${\alpha,\beta}$ . More precisely, we have the following general form of the Bombieri-Vinogradov theorem, first articulated by Motohashi:

Theorem 16 (General Bombieri-Vinogradov theorem) Let ${x > 2}$ , let ${M,N \geq 1}$ be such that ${MN \ll x}$ , and let ${\alpha,\beta: {\bf N} \rightarrow {\bf C}}$ be arithmetic functions supported on ${[1,M]}$ , ${[1,N]}$ respectively, with

$\displaystyle \sum_m |\alpha(m)|^2 \ll M \log^{O(1)} x \ \ \ \ \ (23)$

and

$\displaystyle \sum_n |\beta(n)|^2 \ll N \log^{O(1)} x. \ \ \ \ \ (24)$

Suppose that ${\beta}$ obeys the Siegel-Walfisz property

$\displaystyle \Delta(\beta 1_{(\cdot,s)=1}; a\ (r)) \ll_A N \log^{-A} x \ \ \ \ \ (25)$

for all ${A > 0}$ , all primitive residue classes ${a\ (r)}$ , and all ${1 \leq s \leq x}$ . Then one has

$\displaystyle \sum_{q \leq Q} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha * \beta; a\ (q))| \ll_{A} x \log^{-A} x \ \ \ \ \ (26)$

for any ${A > 0}$ , provided that ${Q \leq x^{1/2} \log^{-B} x}$ and ${M,N \geq \log^B x}$ for some sufficiently large ${B = B(A)}$ depending on ${A}$ .

Proof: We adapt the arguments of the previous section. From (15) and the triangle inequality, we have

$\displaystyle \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha*\beta; a\ (q))| \leq \frac{1}{\phi(q)} \sum_{\chi\ (q): \chi \neq \chi_0} |\sum_n \alpha*\beta(n) \overline{\chi(n)}|$

and so we can upper bound the left-hand side of (26) by

$\displaystyle \sum_{q \leq Q} \frac{1}{\phi(q)} \sum_{\chi\ (q): \chi \neq \chi_0} |\sum_n \alpha*\beta(n) \overline{\chi(n)}|.$

As in the previous section, we may reduce ${\chi}$ to primitive characters and bound this expression by

$\displaystyle \ll (\log x)^2 \sup_{s \leq Q} \sum_{1 < r \leq Q} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* |\sum_n \alpha*\beta(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}|.$

By dyadic decomposition (and adjusting ${A}$ slightly), it thus suffices to show that

$\displaystyle \sum_{R < r \leq 2R} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* |\sum_n \alpha*\beta(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}| \ll_A x \log^{-A} x \ \ \ \ \ (27)$

for all ${1 \leq R \leq Q}$ and ${s \leq Q}$ , and any ${A>1}$ , assuming ${Q \leq x^{1/2} \log^{-B} x}$ and ${M,N \geq \log^B x}$ with ${B}$ sufficiently large depending on ${A}$ .

We cannot yet easily apply the large sieve inequality, because the character sums here are not squared. But we now crucially exploit the Dirichlet convolution structure using the identity (8), to factor ${\sum_n \alpha*\beta(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}}$ as the product of ${\sum_m \alpha(m) 1_{(m,s)=1} \overline{\tilde \chi(m)}}$ and ${\sum_n \beta(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}}$ . From the Cauchy-Schwarz inequality, we may thus bound (27) by the geometric mean of

$\displaystyle \sum_{R < r \leq 2R} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* |\sum_m \alpha(m) 1_{(m,s)=1} \overline{\tilde \chi(m)}|^2 \ \ \ \ \ (28)$

and

$\displaystyle \sum_{R < r \leq 2R} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* |\sum_n \beta(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}|^2. \ \ \ \ \ (29)$

Now we have the all-important square needed in the large sieve inequality. From (23), (24) and Proposition 2, we may bound (28) by

$\displaystyle \ll \frac{1}{R} (M + R^2) M \log^{O(1)} x \ \ \ \ \ (30)$

and (29) by

$\displaystyle \ll \frac{1}{R} (N + R^2) N \log^{O(1)} x$

and so (27) is bounded by

$\displaystyle \ll ( \frac{MN}{R} + M N^{1/2} + M^{1/2} N + R (MN)^{1/2} ) \log^{O(1)} x.$

Since ${MN \leq x}$ , we can write this as

$\displaystyle \ll ( \frac{1}{R} + \frac{1}{N^{1/2}} + \frac{1}{M^{1/2}} + \frac{R}{x^{1/2}} ) x \log^{O(1)} x.$

Since ${N,M \geq \log^B x}$ and ${R \leq Q \leq x^{1/2} \log^{-B} x}$ , we obtain (27) if ${R \geq \log^B x}$ and ${B}$ is sufficiently large depending on ${A}$ . The only remaining case to handle is when ${R \leq \log^B x}$ . In this case, we can use the Siegel-Walfisz hypothesis (25) as in the previous section to bound (29) by ${O_{A'}( N^2 \log^{-A'} x)}$ for any ${A'>0}$ . Meanwhile, from (30), (28) is bounded by ${O( M^2 \log^{O(B+1)} x )}$ . By taking ${A'}$ sufficiently large, we conclude (27) in this case also. $\Box$

In analogy with Exercise 14, we would like to apply this general result to specific arithmetic functions, such as the von Mangoldt function ${\Lambda}$ or the Möbius function ${\mu}$ , and in particular to prove the following famous result of Bombieri and of A. I. Vinogradov (not to be confused with the more well known number theorist I. M. Vinogradov):

Theorem 17 (Bombieri-Vinogradov theorem) Let ${x \geq 2}$ . Then one has

$\displaystyle \sum_{q \leq Q} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\Lambda 1_{[1,x]}; a\ (q))| \ll_{A} x \log^{-A} x \ \ \ \ \ (31)$

for any ${A > 0}$ , provided that ${Q \leq x^{1/2} \log^{-B} x}$ for some sufficiently large ${B = B(A)}$ depending on ${A}$ .

Informally speaking, the Bombieri-Vinogradov theorem asserts that for “almost all” moduli ${q}$ that are significantly less than ${x^{1/2}}$ , one has

$\displaystyle |\Delta(\Lambda 1_{[1,x]}; a\ (q))| \ll_A \frac{x}{\phi(q)} \log^{-A} x$

for all primitive residue classes ${a\ (q)}$ to this modulus. This should be compared with the Generalised Riemann Hypothesis (GRH), which gives the bound

$\displaystyle |\Delta(\Lambda 1_{[1,x]}; a\ (q))| \ll x^{1/2} \log^2 x$

for all ${q \leq x^{1/2}}$ ; see Exercise 48 of Notes 2. Thus one can view the Bombieri-Vinogradov theorem as an assertion that the GRH holds (with a slightly weaker error term) “on average”, at least insofar as the impact of GRH on the prime number theorem in arithmetic progressions is concerned.

The initial arguments of Bombieri and Vinogradov were somewhat complicated, in particular involving the explicit formula for ${L}$ -functions (Exercise 45 of Notes 2); the modern proof of the Bombieri-Vinogradov theorem avoids this and proceeds instead through Theorem 16 (or a close cousin thereof). Note that this theorem generalises the Siegel-Walfisz theorem (Exercise 64 of Notes 2), which is equivalent to the special case of Theorem 17 when ${Q = \log^{O(1)} x}$ .

The obvious thing to try when proving Theorem 17 using Theorem 16 is to use one of the basic factorisations of such functions into Dirichlet convolutions, e.g. ${\Lambda = \mu * L}$ , and then to decompose that convolution into pieces ${\alpha*\beta}$ of the form required in Theorem 16; we will refer to such convolutions as Type II convolutions, loosely following the terminology of Vaughan. However, one runs into a problem coming from the components of the factors ${\mu, L}$ supported at small numbers (of size ${n = O(\log^{O(1)} x)}$ ), as the parameters ${M,N}$ associated to those parameters cannot obey the conditions ${MN \ll x}$ , ${M, N \geq \log^B x}$ . Indeed, observe that any Type II convolution ${\alpha * \beta}$ will necessarily vanish at primes of size comparable to ${x}$ , and so one cannot possibly represent functions such as ${\Lambda}$ or ${\mu}$ purely in terms of such Type II convolutions.

However, it turns out that we can still decompose functions such as ${\Lambda,\mu}$ into two types of convolutions: not just the Type II convolutions considered above, but also a further class of Type I convolutions ${\alpha * \beta}$ , in which one of the factors, say ${\beta}$ , is very slowly varying (or “smooth”) and supported on a very long interval, e.g. ${\beta = 1_{[N,2N]}}$ for some large ${N}$ ; with these sums ${\alpha}$ is permitted to be concentrated arbitrarily close to ${n=1}$ , and in particular the Type I convolution can be non-zero on primes comparable to ${x}$ . It turns out that bounding the discrepancy of Type I convolutions is relatively easy, and leads to a proof of Theorem 17.

We turn to the details. There are a number of decompositions of ${\Lambda}$ or ${\mu}$ that one could use to accomplish the desired task. One popular choice of decomposition is the Vaughan identity, which may be compared with the decompositions appearing in the Dirichlet hyperbola method (see Section 3 of Notes 1):

Lemma 18 (Vaughan identity) For any ${U,V > 1}$ , one has

$\displaystyle \Lambda = \Lambda_{\leq V} + \mu_{\leq U} * L - \mu_{\leq U} * \Lambda_{\leq V} * 1 + \mu_{>U} * \Lambda_{>V} * 1 \ \ \ \ \ (32)$

where ${\Lambda_{\leq V}(n) :=\Lambda(n) 1_{n \leq V}}$ , ${\Lambda_{> V}(n) :=\Lambda(n) 1_{n > V}}$ , ${\mu_{\leq V}(n) :=\mu(n) 1_{n \leq U}}$ , and ${\mu_{> V}(n) :=\mu(n) 1_{n > U}}$ .

In this decomposition, ${U}$ and ${V}$ are typically two small powers of ${x}$ (e.g. ${U=V=x^{1/5}}$ ), although the exact choice of ${U,V}$ is often not of critical importance. The terms ${\mu_{\leq U} * L}$ and ${\mu_{\leq U} * \Lambda_{\leq V} * 1}$ are “Type I” convolutions, the terms ${\mu_{>U} * \Lambda_{>V} * 1}$ should be considered as “Type II” convolutions. The term ${\Lambda_{\leq V}}$ is a lower order error that is usually disposed of quite quickly. The Vaughan identity is already strong enough for many applications, but in some more advanced applications (particularly those in which one exploits the structure of triple or higher convolutions) it becomes convenient to use more sophisticated identities such as the Heath-Brown identity, which we will discuss in later notes.

Proof: Since ${\mu = \mu_{\leq U} + \mu_{> U}}$ and ${\Lambda = \Lambda_{\leq V} + \Lambda_{>V}}$ , we have

$\displaystyle \mu * \Lambda = \mu * \Lambda_{\leq V} + \mu_{\leq U} * \Lambda - \mu_{\leq U} * \Lambda_{\leq V} + \mu_{>U}* \Lambda_{>V}.$

Convolving both sides by ${1}$ and using the identities ${\mu*1 =\delta}$ and ${\Lambda*1=L}$ , we obtain the claim. $\Box$

Armed with this identity and Proposition 2 we may now finish off the proof of the Bombieri-Vinogradov theorem. We may assume that ${x}$ is large (depending on ${A}$ ) as the claim is trivial otherwise. We apply the Vaughan identity (32) with ${U=V=x^{1/5}}$ (actually for our argument below, any choice of ${U,V}$ with ${U,V \geq \log^B x}$ and ${UV \leq x^{1/2} \log^{-B} x}$ would have sufficed). By the triangle inequality, it now suffices to establish (31) with ${\Lambda}$ replaced by ${\Lambda_{\leq V}}$ , ${\mu_{\leq U} * L}$ , ${\mu_{\leq U} * \Lambda_{\leq V}*1}$ , and ${\mu_{>U}*\Lambda_{>V}*1}$ .

The term ${\Lambda_{\leq V}}$ is easily disposed of: from the triangle inequality (and crudely bounding ${\Lambda_{\leq V}}$ by ${\log x 1_{\leq V}}$ ) we see that

$\displaystyle \Delta( \Lambda_{\leq V} 1_{[1,x]}; a\ (q)) \ll \frac{V}{\phi(q)} \log x$

and the claim follows since ${\sum_{q \leq Q} \frac{1}{\phi(q)} \ll \log x}$ and ${V = x^{1/5}}$ .

Next, we deal with the Type II convolution ${\mu_{>U} *\Lambda_{>V}*1}$ . The presence of the ${1_{[1,x]}}$ cutoff is slightly annoying (it prevents one from directly applying Proposition 2), but we will deal with this by using the following finer-than-dyadic decomposition trick, originally due to Fouvry and to Fouvry-Iwaniec. We may replace ${\mu_{>U} * 1}$ by ${(\mu_{>U}*1) 1_{(U,x]}}$ , since the portion of ${\mu_{>U}*1}$ on ${(x,\infty)}$ has no contribution. We may similarly replace ${\Lambda_{>V}}$ by ${\Lambda 1_{(V,x]}}$ . Next, we set ${\lambda := 1 + \log^{-A-100} x}$ , and decompose ${(\mu_{>U}*1) 1_{(U,x]}}$ into ${O(\log^{A+101} x)}$ components ${\alpha}$ , each of which is supported in an interval ${[M, \lambda M]}$ and bounded in magnitude by ${\tau}$ for some ${M \geq U}$ . We similarly decompose ${\Lambda 1_{(V,x]}}$ into ${O(\log^{A+101} x)}$ components ${\beta}$ supported in an interval ${[N, \lambda N]}$ and bounded in magnitude by ${\log x}$ for some ${N \geq V}$ . Thus ${(\mu_{>U} * \Lambda_{>V} * 1) 1_{[1,x]}}$ can be decomposed into ${O( \log^{2A+202} x )}$ terms of the form ${(\alpha*\beta)1_{[1,x]}}$ for various ${\alpha,\beta}$ that are components of ${\mu_{>U}*1}$ and ${\Lambda_{>V}}$ respectively.

If ${MN > x}$ then ${(\alpha*\beta)1_{[1,x]}}$ vanishes, so we may assume that ${MN \leq x}$ . By construction we also have ${M,N \geq x^{1/5}}$ , so in particular ${M,N \geq \log^B x}$ if ${B}$ depends only on ${A}$ (recall we are assuming ${x}$ to be large). If ${MN < \lambda^{-2} x}$ , then ${(\alpha*\beta)1_{[1,x]} = \alpha*\beta}$ . The bounds (23), (24) are clear (bounding ${\mu_{>U}*1}$ in magnitude by ${\tau}$ ), and from the Siegel-Walfisz theorem we see that the ${\beta}$ components obey the hypothesis (25). Thus by applying Proposition 2 (with ${A}$ replaced by ${3A+202}$ ) we see that the total contribution of all the ${\alpha*\beta}$ terms with ${MN < \lambda^{-2} x}$ is acceptable.

It remains to control the total contribution of the ${\alpha*\beta}$ terms with ${\lambda^{-2} x \leq MN \leq x}$ . Note that for each ${\alpha}$ there are only ${O(1)}$ choices of ${\beta}$ that are in this range, so there are only ${O( \log^{A+101} x )}$ such terms ${\alpha*\beta}$ to deal with. We then crudely bound

$\displaystyle \sup_{a \in ({\bf Z}/q{\bf Z})^\times} \Delta( \alpha*\beta 1_{[1,x]}; a\ (q) ) \ll \sup_{a \in ({\bf Z}/q{\bf Z})^\times} \sum_{n = a\ (q)} |\alpha| * |\beta|(n)$

$\displaystyle \ll (\log x) \sup_{a \in ({\bf Z}/q{\bf Z})^\times} \sum_{M \leq m \leq \lambda M, N \leq n \leq \lambda N: mn = a\ (q)} \tau(m).$

Since ${MN \geq \lambda^{-2} x}$ and ${q \leq x^{1/2}}$ , one has either ${M \gg q}$ or ${N \gg q}$ . If ${N \gg q}$ , we observe that for each fixed ${m}$ in the above sum, there are ${O( \frac{N}{q} \log^{-A-100} x )}$ choices of ${n}$ that contribute, so the double sum is ${O( \frac{MN}{q} \log^{-2A-199} x ) = O( \frac{x}{q} \log^{-2A-199} x )}$ (using the mean value bounds for ${\tau}$ ). Thus we see that the total contribution of this case is at most

$\displaystyle \ll \log^{A+101} x \times \log x \times \log^{-2A-199} x \times \sum_{q \leq Q} \frac{x}{q}$

which is acceptable.

Now we consider the case when ${M \gg q}$ . For fixed ${n}$ in the above sum, the sum in ${m}$ can be bounded by ${\log^{65} x \times \frac{M}{q} \log^{-A-100} x}$ , thanks to Corollary 26 below (taking, say, ${c_1 =1/2}$ and ${c_2 = 5/6}$ ). The total contribution of this sum is then

$\displaystyle \ll \log^{A+101} x \times \log x \times \log^{65} x \times \log^{-A-100} x \times \log^{-A-100} x \times \sum_{q \leq Q} \frac{x}{q}$

which is also acceptable.

Now let us consider a Type I term ${\mu_{\leq U}*L}$ . From the triangle inequality have

$\displaystyle |\Delta((\mu_{\leq U} * L) 1_{[1,x]}; a\ (q))| \leq \sum_{d: (d,q)=1} |\mu_{\leq U}(d)| |\Delta( L 1_{[1,x/d]}; a/d\ (q) )|.$

We exploit the monotonicity of ${L}$ via the following simple fact:

Exercise 19 Let ${f: [y,x] \rightarrow {\bf R}}$ be a monotone function. Show that

$\displaystyle |\Delta( f 1_{[y,x]}; a\ (q) )| \ll |f(y)| + |f(x)|$

for all primitive residue classes ${a\ (q)}$ . (Hint: use Lemma 2 from Notes 1 and a change of variables.)

Applying this exercise, we see that the contribution of this term to (31) is ${O( \sum_{q \leq Q} \sum_d |\mu_{\leq U}(d)| \log x ) = O( Q U \log x )}$ , which is acceptable since ${Q \leq x^{1/2}}$ and ${U = x^{1/5}}$ .

A similar argument for the Type I term ${\mu_{\leq U} * \Lambda_{\leq V} * 1}$ term (with ${\mu_{\leq U} * \Lambda_{\leq V}}$ and ${1}$ replacing ${\mu_{\leq U}}$ and ${L}$ ) gives a contribution to (31) of

$\displaystyle \ll \sum_{q \leq Q} \sum_d |\mu_{\leq U} * \Lambda_{\leq V}|(d)$

$\displaystyle \ll Q \left(\sum_d |\mu_{\leq U}|(d)\right) \left(\sum_m \Lambda_{\leq V}(m)\right)$

$\displaystyle \ll Q U V$

which is also acceptable since ${Q \leq x^{1/2}}$ and ${U=V=x^{1/5}}$ . This concludes the proof of Theorem 17.

Exercise 20 Strengthen the Bombieri-Vinogradov theorem by showing that

$\displaystyle \sum_{q \leq Q} \sup_{y \leq x} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\Lambda 1_{[1,y]}; a\ (q))| \ll_{A} x \log^{-A} x$

for all ${x \geq 2}$ and ${A>0}$ , if ${Q \leq x^{1/2} \log^{-B} x}$ for some sufficiently large ${B}$ depending on ${A}$ . (Hint: at present ${y}$ ranges over an uncountable number of values, but if one can round ${y}$ to the nearest multiple of (say) ${x \log^{-A-10} x}$ then there are only ${O(\log^{A+10} x)}$ values of ${y}$ in the supremum that need to be considered. Then use the original Bombieri-Vinogradov theorem as a black box.)

Exercise 21 For any ${U, V > 1}$ , establish the Vaughan-type identity

$\displaystyle \mu = \mu_{\leq V} + \mu_{\leq U} - \mu_{\leq U} * \mu_{\leq V} * 1 + \mu_{>U} * \mu_{>V} * 1$

and use this to show that the Bombieri-Vinogradov theorem continues to hold when ${\Lambda}$ is replaced by ${\mu}$ .

Exercise 22 Let us say that ${0 < \theta < 1}$ is a level of distribution for the von Mangoldt function ${\Lambda}$ if one has

$\displaystyle \sum_{q \leq Q} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\Lambda 1_{[1,x]}; a\ (q))| \ll_{A,\theta} x \log^{-A} x$

whenever ${A>0}$ and ${Q \leq x^\theta}$ ; thus, for instance, the Bombieri-Vinogradov theorem implies that every ${0 < \theta < 1/2}$ is a level of distribution for ${\Lambda}$ . Use the Cramér random model (see Section 1 of Supplement 4) to predict that every ${0 < \theta < 1}$ is a level of distribution for ${\Lambda}$ ; this claim is known as the Elliott-Halberstam conjecture, and would have a number of consequences in sieve theory. Unfortunately, no level of distribution above ${1/2}$ (or even at ${1/2}$ ) is currently known, however there are weaker versions of the Elliott-Halberstam conjecture known with levels above ${1/2}$ which do have some interesting number-theoretic consequences; we will return to this point in later notes. For now, we will just remark that ${1/2}$ appears to be the limit of what one can do by using the large sieve and Dirichlet character methods in this set of notes, and all the advances beyond ${1/2}$ have had to rely on other tools (such as exponential sum estimates), although the Cauchy-Schwarz inequality remains an indispensable tool in all of these results.

Exercise 23 Strengthen the Bombieri-Vinogradov theorem by showing that

$\displaystyle \sum_{q \leq Q} \tau(q)^C \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\Lambda 1_{[1,x]}; a\ (q))| \ll_{A,C} x \log^{-A} x$

for all ${x \geq 2}$ , ${C \geq 1}$ , and ${A>0}$ , if ${Q \leq x^{1/2} \log^{-B} x}$ for some sufficiently large ${B}$ depending on ${A,C}$ . (Hint: use the Cauchy-Schwarz inequality and the trivial bound ${|\Delta(\Lambda 1_{[1,x]}; a\ (q))| \ll \frac{x}{q} \log x}$ , together with elementary estimates on such sums as ${\sum_{q \leq Q} \frac{\tau(q)^{2C}}{q}}$ .)

Exercise 24 Show that the Bombieri-Vinogradov theorem continues to hold if the von Mangoldt function ${\Lambda}$ is replaced by the indicator function ${{\mathcal P}}$ of the primes.

— 4. Appendix: the divisor function in arithmetic progressions —

We begin with a lemma of Landreau that controls the divisor function by a short divisor sum.

Lemma 25 For any ${\theta > 0}$ one has

$\displaystyle \tau(n) \leq 2^{2/\theta} \sum_{d|n: d \leq n^\theta} \tau(d)^{2/\theta}$

for all natural numbers ${n}$ .

Proof: We write ${n = qm}$ , where ${q}$ is the product of all the prime factors of ${n}$ that are greater than or equal to ${n^{\theta/2}}$ , and ${m}$ is the product of all the prime factors of ${n}$ that are less than ${n^{\theta/2}}$ , counting multiplicity of course. By a greedy algorithm, we can repeatedly pull out factors of ${m}$ of size between ${n^{\theta/2}}$ and ${n^\theta}$ until the remaining portion of ${m}$ falls below ${n^\theta}$ , yielding a factorisation of the form ${m = n_1 \dots n_r}$ where ${n_1,\dots,n_{r-1}}$ lie between ${n^{\theta/2}}$ and ${n^\theta}$ , ${n_r}$ is at most ${n^\theta}$ , and (if ${r>1}$ ) ${n_{r-1} n_r}$ is at least ${n^\theta}$ . The lower bounds on ${n_1,\dots,n_{r-2}}$ and ${n_{r-1} n_r}$ imply that ${r \leq 2/\theta}$ . By the trivial inequality ${\tau(ab) \leq \tau(a) \tau(b)}$ we have

$\displaystyle \tau(n) \leq \tau(q) \tau(m) \leq 2^{2/\theta} \tau(n_1) \ldots \tau(n_r)$

and thus by the pigeonhole principle one has

$\displaystyle \tau(n) \leq 2^{2/\theta} \tau(n_j)^r$

for some ${j}$ . Since ${n_j}$ is a factor of ${n}$ that is at most ${n^\theta}$ , this gives the claim as long as ${r \leq 2/\theta}$ . If ${r > 2/\theta}$ , then as each of the ${n_1,\dots,n_{r-1}}$ are at least ${n^{\theta/2}}$ , we see that ${r-1 \leq 2/\theta}$ and ${n_{r-1} n_r}$ cannot exceed ${n^\theta}$ . If we now use the inequality

$\displaystyle \tau(n) \leq 2^{2/\theta} \tau(n_1) \dots \tau(n_{r-2}) \tau(n_{r-1} n_r)$

and repeat the pigeonholing argument, we again obtain the claim. $\Box$

Corollary 26 Let ${0 < c_1 < c_2 < 1}$ , let ${x \geq 1}$ , let ${x^{c_2} \leq y \leq x}$ , and let ${a\ (q)}$ be a primitive residue class with ${q \leq x^{c_1}}$ . Then

$\displaystyle \sum_{x \leq n\leq x+y: n=a\ (q)} \tau(n) \ll_{c_1,c_2} \frac{y}{q} \log^{2^{\frac{2}{c_2-c_1}}+1} x. \ \ \ \ \ (33)$

One can lower the exponent of the logarithm here to ${\log x}$ (consistent with the heuristic that the average value of ${\tau(n)}$ for ${n=O(x)}$ is ${O(\log x)}$ ), but this requires additional arguments; see for instance this previous blog post. In contrast, the divisor bound ${\tau(n)=O(n^{o(1)})}$ only gives an upper bound of ${x^{o(1)} \frac{y}{q}}$ here.

Proof: We allow implied constants to depend on ${c_1,c_2}$ . Set ${\theta := c_2-c_1}$ . Using Lemma 25 we have

$\displaystyle \tau(n) \ll \sum_{d|n: d \leq (x+y)^\theta} \tau(d)^{2/\theta}$

for all ${n \leq x+y}$ , and so the left-hand side of (33) is bounded by

$\displaystyle \ll \sum_{d \leq (x+y)^\theta} \tau(d)^{2/\theta} \sum_{x \leq n\leq x+y: n=a\ (q); d|n} 1.$

The conditions ${n=a\ (q)}$ , ${d|n}$ constrain ${n}$ to either the empty set, or an arithmetic progression of modulus ${qd \leq x^{c_1} (x+y)^\theta \ll y}$ , so the inner sum is ${O(\frac{y}{qd})}$ . On the other hand, from Theorem 27 (or Exercise 29) of Notes 1 we have

$\displaystyle \sum_{d \leq (x+y)^\theta} \frac{\tau(d)^{2/\theta}}{d} \ll \log^{2^{2/\theta} + 1} x,$

and the claim follows. $\Box$

Exercise 27 Under the same assumptions as in Corollary 26, establish the bound

$\displaystyle \sum_{x \leq n\leq x+y: n=a\ (q)} \tau^k(n) \ll_{c_1,c_2,k} \frac{y}{q} \log^{O_{c_1,c_2,k}(1)} x$

for any ${k \geq 1}$ .

25 comments

Comments feed for this article

10 January, 2015 at 3:39 pm

Anonymous

These methodologies are too complicated to reach the problems for which they are devised. It detracts attention from the fundamental underlying structure of the integers and primes.

15 January, 2015 at 3:12 am

Sayan Chattopadhyay

Sir can u please tell me that has the Riemann Hypothesis been proven by someone.

16 January, 2015 at 2:42 am

Anonymous

It seems that in order to apply proposition 2, the almost orthogonality condition on the functions $g_j(n)$ is quite restrictive (e.g. to be exponentials or characters – for which the non-diagonal contribution can be efficiently estimated.)

16 January, 2015 at 9:40 am

Terence Tao

Yes, the $g_j$ in practice have to be somewhat structured, but this still gives quite a wide range of applicability since $f$ remains completely arbitrary. For instance, if one takes $g_j(n) = n^{it_j}$ then one will be able to bound the number of places where Dirichlet series (with arbitrary coefficients) are large, which, among other things, can be used to establish zero density estimates for the zeta function; similarly if one takes $g_j(n) = \chi_j(n) n^{it_j}$ then one can obtain zero density estimates for Dirichlet L-functions. (This will be discussed in later notes.) There are also more sophisticated structured functions than exponential phases or Dirichlet characters that one can compute correlations of using more advanced methods; for instance, “trace weights” that arise from traces of Frobenius on a suitable sheaf can be controlled using Deligne’s work on the Riemann hypothesis for varieties and sheaves, which allows one to take $g_j$ to be an exponential sum such as a Kloosterman sum. (Automorphic forms and homogeneous dynamics are two other sources of non-trivial cancellation that can be exploited in some number-theoretic situations.)

One can also interpret Linnik’s dispersion method as a type of application of Proposition 2 in which the functions $g_j$ are not so completely structured that their correlations can be read off right away, but their correlations are at least simpler to estimate than the original correlations $\sum_n f(n) \overline{g_j(n)}$ , and one may need to perform some further application of Cauchy-Schwarz or similar tools to get enough control on the former correlations to arrive at control of the latter. (The proof of the equidistribution estimates Zhang used to establish bounded gaps between primes is a good example of this sort of technique.)

16 January, 2015 at 2:49 pm

Kyle Pratt

I am slightly confused about the treatment of the Type II convolution in the proof of Bombieri-Vinogradov. From Vaughan’s identity we have $\mu_{>U}*\Lambda_{>V}*1$ , but you wrote $\mu_{>U}*\Lambda_{>V}$ . Is this just a typo, or is there an easy way to deal with the $*1$ ?

[Oops, this was indeed a typo; I’ve folded the 1 into the Mobius factor now. -T.]

16 January, 2015 at 4:44 pm

Mike_Miller

Dear Professor Tao. Forgive me but this doesn’t make sense.

17 January, 2015 at 3:05 am

Anonymous

Is it possible to increase the exponent $1/2$ in theorem 17 by taking the supremun in (31) over “typical $a$ ” or “almost every $a$ ” in some sense?

[Yes; see Theorem 13. -T.]

20 January, 2015 at 3:19 pm

Kyle Pratt

I think there is an error in the equation after “We then crudely bound…” The second $\ll$ seems to rely on the assumption that $|\alpha| \leq 1$ . This is not true since (as is pointed out in the paragraph before the problematic equation) $\alpha = \mu_{> U} * 1$ , so really $\alpha$ is bounded in magnitude by $\tau$ . I haven’t checked carefully, but this probably just adds some extra logs to the final result, and these will be harmless.

[Corrected, thanks – T.]

21 January, 2015 at 10:50 am

MrCactu5 (@MonsieurCactus)

I am struggling to follow your FAQ on the primes, but I notice you don’t say much about van Mangoldt until the very very end.

It seems fair to say Bombieri-Vinogradov is a step towards the Generalized Riemann hypothesis, giving very detailed information about the distribution of primes on arithmetic sequences.

13 February, 2015 at 10:16 pm

254A, Notes 6: Large values of Dirichlet polynomials, zero density estimates, and primes in short intervals | What's new

[…] for Dirichlet series. Our approach to these theorems will follow the same sort of methods used in Notes 3, in particular relying on the generalised Bessel inequality from those […]

18 February, 2015 at 1:51 pm

254A, Notes 4: Some sieve theory | What's new

[…] von Mangoldt function ), then estimates such as the Bombieri-Vinogradov theorem, Theorem 17 from Notes 3, turn out to be particularly useful in this regard; in other contexts, the required […]

22 February, 2015 at 9:02 am

254A, Notes 7: Linnik’s theorem on primes in arithmetic progressions | What's new

[…] use the generalised Bessel inequality (Proposition 2 from Notes 3) with to conclude […]

30 March, 2015 at 12:50 pm

254A, Notes 8: The Hardy-Littlewood circle method and Vinogradov’s theorem | What's new

[…] that in the proof of the Bombieri-Vinogradov theorem (see Notes 3), sums such as or were handled by using combinatorial identities such as Vaughan’s identity […]

10 April, 2015 at 10:02 am

254A, Supplement 5: The linear sieve and Chen’s theorem (optional) | What's new

[…] is the multiplicative function with and for . From the Bombieri-Vinogradov theorem (Theorem 17 of Notes 3) we […]

17 November, 2015 at 8:42 am

Klaus Roth | What's new

[…] the clean and general almost orthogonality principle that we have today (discussed for instance in these lecture notes of mine). The paper of Roth that had the most impact on my own personal work was his three-page paper […]

4 January, 2019 at 10:24 am

Spacing of Quadratic Fractions | George Shakan

[…] is the Von Mongoldt function. They use Vaughan’s identity on to decompose (3) into Type I and Type II sums for the […]

20 June, 2019 at 10:28 am

Maths student

Dear Prof. Tao,

in the proof of the large sieve inequality, you talk about “increasing N or decreasing Q”. That’s a little informal, and this kind of language continues to itch underneath my fingernails.

I should be most pleased if this could be changed.

By the way: Will these notes be transformed into a book? It would be much appreciated!

20 June, 2019 at 11:36 am

Anonymous

That is barely informal. If *that* bothers you so much, you shouldn’t be reading these notes.

12 November, 2019 at 6:47 pm

254A, Notes 9 – second moment and entropy methods | What's new

[…] To get a feel for this inequality, suppose for sake of discussion that and all have unit variance and , but that the are pairwise uncorrelated. Then the right-hand side is equal to , and the left-hand side is the sum of squares of the correlations between and each of the . Any individual correlation is then still permitted to be as large as , but it is not possible for multiple correlations to be this large simultaneously. This is geometrically intuitive if one views the random variables as vectors in a Hilbert space (and correlation as a rough proxy for the angle between such vectors). This lemma also shares many commonalities with the large sieve inequality, discussed in this set of notes. […]

12 November, 2019 at 6:47 pm

254A, Notes 9 – second moment and entropy methods | What's new

4 November, 2020 at 10:34 am

Noam

I think that the bound on the LHS in the first inequality of exercise 4 is not optimal, if I’m not wrong using cauchy’s inequality term wise yields a better result (leaving only diagonal terms terms j=j’).

[This is incorrect. Note there is no $j$ summation in the first factor on the RHS. -T]

20 March, 2022 at 1:46 pm

Quân Nguyễn

Dear professor Tao,
I did try an attempt of proving Analytic large sieve inequality by myself and found out that it seems that the choice for $\nu (x) = 1$ or a constant would satisfy all the conditions in the Generalized Bessel’s inequality though the attempt is only valid for $N, M \in \mathbb{N}$ by using $\frac{2}{\pi} x \leq \sin x, x \in [0, \pi / 2 ]$ .

20 March, 2022 at 3:10 pm

Quân Nguyễn

I also do have one more inquiry in the proof of BDH theorem, so the same as in displayed proof, we will start at:
$\displaystyle (\log x)^2 \sup_{s \leq Q} \sum_{1 \leq r \leq Q} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* \left|\sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}\right|^2$
and indeed, one needs to show that:

$\displaystyle \sum_{1 \leq r \leq Q} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* \left|\sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}\right|^2$
for all $1 \leq s \leq x$ . Due to the following decomposition, the right handside is actually less than:

$\displaystyle \text{LHS} = \sum_{n\geq 0} \sum_{a_{n+1} \leq r \leq a_{n}} \frac{1}{\phi(r)} \sum_{\tilde \chi\ (r)}^* \left|\sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}\right|^2$

$\displaystyle \leq \sum_{n\geq 0} \frac{1}{a_{n+1}} \sum_{a_{n+1} \leq r \leq a_n} \frac{r}{\phi(r)} \sum_{\tilde \chi\ (r)}^*\left |\sum_n f(n) 1_{(n,s)=1} \overline{\tilde \chi(n)}\right|^2$

where $a_k = 2^{-k} Q$ . Then since the function $g(n) = f(n) 1_{(n,s) = 1}$ is supported on $[1,x]$ , and so by Large sieve inequality for characters, one obtains:

$\displaystyle \text{LHS} \ll (x+Q^2) \sum_{n\geq 0} \frac{1}{a_{n+1}}\sum_n \vert f(n) 1_{(n,s) = 1} \vert^2 << x \log^{-B_A }x \cdot x \log ^{O(1)} x$

which holds for all $1 \leq s \leq x= x^2 \log^{-A} x$ where $B_A$ depends on $A$ . My apology if it's an obvious error but the attempt does not show the use of the Siegel-Walfisz property so there is something wrong obviously and also for the equations which are not shown perfectly.

20 March, 2022 at 3:45 pm

Terence Tao

There are simple counterexamples to the Bombieri-Davenport-Halberstam inequality when the Siegel-Walfisz property is omitted, for instance if one takes $f$ to be the indicator function of the congruence class $1 \hbox{ mod } 3$ then the Siegel-Walfisz property is highly false at $r=3$ , and the conclusion (18) also fails as can be seen by inspecting the $q=3$ contribution. I would recommend as an instructive exercise to trace through your proposed argument with this example (trying to calculate the left and right hand sides of every expression you are using to the best of your ability) to locate the problem.

2 January, 2023 at 7:38 am

Can Chat-GPT Do Math? | George Shakan

[…] give a more precise meaning the the last paragraph. One can see this first few paragraphs of this blog post of Terence Tao to see an explanation from one of the world’s leading mathematicians. Also, here is a math […]

	Anonymous on Infinite partial sumsets in th…
	Anonymous on A Banach algebra proof of the…
	Anonymous on A Banach algebra proof of the…
	Aleksandar on 245C, Notes 4: Sobolev sp…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Terence Tao on 245C, Notes 4: Sobolev sp…
	Terence Tao on 275A, Notes 3: The weak and st…
	Terence Tao on What is a gauge?
	Terence Tao on Erratum for “An inverse…
	Terence Tao on 275A, Notes 3: The weak and st…
	Terence Tao on An epsilon of room: pages from…
	Aleksandar on 245C, Notes 4: Sobolev sp…

254A, Notes 3: The large sieve and the Bombieri-Vinogradov theorem

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

25 comments

Leave a comment Cancel reply

For commenters

254A, Notes 3: The large sieve and the Bombieri-Vinogradov theorem

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

25 comments

Leave a comment Cancel reply

For commenters