This is one of the continuations of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and a post to come that covers the Type III sums.) The main purpose of this post is to present (and hopefully, to improve upon) the treatment of two of the three key estimates in Zhang’s paper, namely the Type I and Type II estimates.
The main estimate was already stated as Theorem 16 in the previous post, but we quickly recall the relevant definitions here. As in other posts, we always take to be a parameter going off to infinity, with the usual asymptotic notation associated to this parameter.
for all , where is the divisor function.
- (i) If is a coefficient sequence and is a primitive residue class, the (signed) discrepancy of in the sequence is defined to be the quantity
- (ii) A coefficient sequence is said to be at scale for some if it is supported on an interval of the form .
- (iii) A coefficient sequence at scale is said to obey the Siegel-Walfisz theorem if one has
for any , any fixed , and any primitive residue class .
- (iv) A coefficient sequence at scale is said to be smooth if it takes the form for some smooth function supported on obeying the derivative bounds
for all fixed (note that the implied constant in the notation may depend on ).
In Lemma 8 of this previous post we established a collection of “crude estimates” which assert, roughly speaking, that for the purposes of averaged estimates one may ignore the factor in (1) and pretend that was in fact . We shall rely frequently on these “crude estimates” without further citation to that precise lemma.
For any , let denote the square-free numbers whose prime factors lie in .
whenever are coprime. We say that such a system has controlled multiplicity if the
for any fixed and any congruence class with .
The main result of this post is then the following:
with obeying a Siegel-Walfisz theorem. Then for any and any singleton congruence class system with controlled multiplicity we have
The proof of this theorem relies on five basic tools:
- (i) the Bombieri-Vinogradov theorem;
- (ii) completion of sums;
- (iii) the Weil conjectures;
- (iv) factorisation of smooth moduli ; and
- (v) the Cauchy-Schwarz and triangle inequalities (Weyl differencing and the dispersion method).
For the purposes of numerics, it is the interplay between (ii), (iii), and (v) that drives the final conditions (7), (8). The Weil conjectures are the primary source of power savings ( for some fixed ) in the argument, but they need to overcome power losses coming from completion of sums, and also each use of Cauchy-Schwarz tends to halve any power savings present in one’s estimates. Naively, one could thus expect to get better estimates by relying more on the Weil conjectures, and less on completion of sums and on Cauchy-Schwarz.
— 1. The Bombieri-Vinogradov theorem —
One of the basic distribution results in this area of analytic number theory is the Bombieri-Vinogradov theorem. As first observed by Motohashi, this theorem in fact controls the distribution of a general class of Dirichlet convolutions in congruence classes. We will use the following formulation of this theorem, essentially Theorem 0 of Bombieri-Friedlander-Iwaniec;
for some fixed . Let be coefficient sequences at scale respectively. Suppose also that obeys a Siegel-Walfisz theorem.
- (i) (First Bombieri-Vinogradov inequality) We have
for some sufficiently slowly decaying and any fixed .
- (ii) (Second Bombieri-Vinogradov inequality) we have
for some sufficiently slowly decaying and any fixed .
For sake of completeness we now recall the proof of this theorem, following the presentation in Bombieri-Friedlander-Iwaniec. (This is standard material, and experts may immediately skip to the next section.) We first need the large sieve inequality for Dirichlet characters:
where the summation is over primitive Dirichlet characters of conductor .
Proof: By enlarging we may assume that .
We use the method. By duality, the desired estimate is equivalent to
which is in turn equivalent to
We bound by a Schwartz function whose Fourier transform is supported on ; this is possible for a fixed when . The left-hand side then expands as
The inner sum is when , and zero otherwise thanks to Fourier analysis (specifically, the Poisson summation formula combined with the support of the Fourier transform of and the fact that is periodic of period with mean zero). The claim follows.
for any fixed .
Let . By multiplicative Fourier analysis we may write the left-hand side of (11) as
where the inner sum ranges over non-principal Dirichlet characters of modulus , not necessarily primitive. Any such character can be written as where with , and is a primitive Dirichlet character of conductor . The above sum can then be rewritten as
where the summation ranges over primitive characters modulo . Since , we may bound this by
which on performing the summation is bounded by
and then by dyadic decomposition this is bounded by
Let us first consider the contribution of the small moduli, specifically when for some fixed . From the Siegel-Walfisz theorem (3) we have
for any fixed , and then by crude estimates (see Lemma 8 of this previous post) we see that this contribution is acceptable for (11). Thus we may restrict attention to those moduli with for any fixed . On the other hand, from Lemma 5 we have
for any . From crude estimates we have
the claim follows.
By multiplicative Fourier analysis we have
so by splitting into primitive characters as before we may bound the left-hand side of (12) by
Arguing as before, the case is acceptable, so we assume . From crude estimates we have
From Lemma 5 and Cauchy-Schwarz we can thus bound
as is comparable to , this simplifies to
which is acceptable from the hypotheses on if is chosen large enough depending on . This concludes the proof of Theorem 4.
We remark that an inspection of the proof reveals that the or factors in the threshold for may be replaced by provided that is sufficiently large depending on . However, this refinement of the Bombieri-Vinogradov inequality does not lead to any improvement in the final numerology for our purposes.
— 2. Completion of sums —
At several stages in the argument we will need to consider sums of the form
for some smooth coefficient sequence , sone congruence classes depending on a further parameter , and further coefficients . The completion of sums technique replaces the term here with an exponential phase, leaving one with consideration of exponential sums such as
for various , where for (or ) is the standard character on . More precisely, we have
for any fixed .
whenever is periodic of degree .
Proof: We rearrange the left-hand side as
Using the Fourier expansion
and rearranging, the left-hand side then becomes
The term of this is the first term on the right-hand side of (13). The terms coming from integers with can be bounded by the second term in (13), bounding crudely by by crude estimates and also using conjugation symmetry when is negative. So it will suffice to show that
(say) when . Writing as in the definition of a smooth coefficient sequence, and applying Poisson summation, the left-hand side is
where is the Fourier transform of . But from the smoothness (4) of and integration by parts one has the bounds
for any fixed , and from the hypothesis we obtain the claim by taking large enough depending on .
We remark that in the absence of cancellation in the exponential sum , the first error term in (13) could be as large as
which is about times as large as the main term in (13). In practice we will apply this lemma with for some fixed , in which case completion of sums will cost a factor of or so in the bounds. However, it is still often desirable to pay this cost in order to exploit cancellation bounds for exponential sum, in particular those coming from the Weil conjectures as described below.
In our applications, the modulus will split into a product of two factors or three factors . The following simple lemma lets us then split exponential phases of the form :
Similarly, if for coprime and is the intersection of , , and , then
Here and in the sequel we are using the convention that means for coprime to , where is the reciprocal of modulo .
Proof: We just prove the first identity, as the second is similar. Let be an integer such that , and similarly let be an integer such that . Then is equal to mod and mod , and thus equal to mod . Thus
and the claim follows by factoring the exponential.
— 3. The Weil conjectures —
For the purposes of this post, the Weil conjectures (as proven in full generality by Deligne) can be viewed as a black box device to obtain “square root cancellation” for various exponential sums of the form
where is a finite abelian group (i.e. a finite product of cyclic groups) with some additive character and some rational function , basically by obtaining the analogue of the Riemann Hypothesis for a certain zeta function associated to an algebraic variety related to the function . (This is by no means the full strength of the Weil conjectures; amongst other things, one can also twist such sums by multiplicative characters, and also work with more complicated schemes than classical algebraic varieties, though the exponential sum estimates are more difficult to state succinctly as a consequence.) A basic instance of this is Weil’s classical bound on the Kloosterman sum
defined whenever are integers with positive.
where is the greatest common divisor of .
Proof: (Sketch only; see e.g. Iwaniec-Kowalski for a full proof.) For simplicity we only address the case of square-free , which is the case needed in our application. By the Chinese remainder theorem we may reduce to the case when is a prime , then we may also reduce to the case when are coprime to . It then suffices to show that
whenever is an additive character and is the hyperbola
An important trick is then to generalise from to finite dimensional extensions of , and then consider the Kloosterman sums
associated to these extensions, where is the field trace. For non-principal , it is possible to show the explicit formula
for some complex numbers (depending of course on ); this is part of the general theory of -functions associated to algebraic varieties, but can also be established by elementary means (e.g. by establishing a linear recurrence for the ). For the principal character , of course, one has
Next, one observes the basic identity
as can be seen by computing the kernel and range of the linear map on (this identity is also related to Hilbert’s Theorem 90). From this we have a combinatorial interpretation of the quantity
namely as the number of points on the curve
One can show (e.g. using Stepanov’s method, cf. this previous post) that the number of this points on this curve is equal to , leading to the identity
Studying the asymptotics as , one is led to the conclusion that (this trick to “magically” delete the error is a canonical example of the tensor power trick), and the bound (16) then follows from the case of (17).
In practice, we shall estimate crudely by the divisor bound (where denotes a quantity that goes to zero as ), and the factor will also be small in applications, so that we do indeed see the square root savings over the trivial bound . For the Type I/II sums, the classical Weil bound is sufficient; but for the Type III sums that we will cover in a subsequent post, the full force of Deligne’s results are needed.
An important remark is that when , we can apply the change of variables and convert the Kloosterman sum into a Ramanujan sum
using the Chinese remainder theorem to reduce to the case when is a power of a prime, and then using the divisor bound.
For the Type I and Type II sums we need a more complicated variant of this bound (Lemma 11 of Zhang’s paper):
and let be integers. Then the double Kloosterman sum
obeys the bound
for some with
for . In particular, from Theorem 8, we have
while in the case we have from (18) that
Here denotes a quantity that goes to zero as .
Proof: As are coprime, we may refactor as
for some , and , with
for , which in particular implies that as claimed. Using the Chinese remainder theorem, we may now factor as the product of the sums
Bounding the first sum trivially by and shifting the third sum by we obtain the claim (note that ).
The treatment of the terms in the above analysis are crude, but in applications is often trivial anyway, so it is not essential to obtain the sharpest estimates here.
We can combine this with the method of completion of sums:
Proof: Write . By (14) (and overspill) we may bound the left-hand side by
From the divisor bound one has
for any , so from this and Cauchy-Schwarz the net contribution of the second term is
which is acceptable.
— 4. Factoring a smooth number —
We will need to take advantage of the smooth nature of the variable to factor it into two smaller pieces. We need an elementary lemma:
Proof: Suppose that (19) failed, thus
In particular, we see that has at least prime factors, which implies in particular that
On the other hand, we have the standard bound
and the claim now follows from Markov’s inequality.
Let be fixed. Then for all but exceptions, all have a factorisation
where are coprime with
Furthermore has no prime factors less than , thus
The fact that has no tiny (i.e. less than ) prime factors will imply that any two such will typically be coprime to each other with high probability (at least for any fixed ), which is a key technical fact which we will need to exploit later. (The -trick achieves a qualitatively similar effect, but would only give such a claim with probability or maybe for some small if one really optimised it, which is insufficient for the applications at hand.)
Proof: By Lemma 11 we may restrict attention to those for which
Now is the product of distinct primes of size at most , with . Applying the greedy algorithm, one can then find a factor of with
If one then multiplies by all primes of size less than that divide to create , then sets , the claim follows.
— 5. The dispersion method —
We begin the proof of Theorem 3. The reader may wish to track the exponents involved in the model regime
We can restrict to the range
for some sufficiently slowly decaying , since otherwise we may use the Bombieri-Vinogradov theorem (Theorem 4). Thus we need to show that
be an exponent to be optimised later (in many cases, such as (20), it can be set very close to zero). By Corollary 12, outside of a small number of exceptions, we may factor where with , is coprime to , and
Let us first dispose of the set of exceptional values of for which the above factorisation is unavailable. From Corollary 12 we have
On the other hand, we have the crude estimate
which when combined with crude estimates leads to the crude upper bound
Applying Cauchy-Schwarz we conclude that
and so gives a negligible contribution to (21) (increasing as necessary).
For the non-exceptional , we arbitrarily select a factorisation of the above form, and apply a dyadic decomposition. We conclude that it suffices to show that
Fix . We replace by , and abbreviate and by and respectively, thus our task is to show that
We now split the discrepancy
as the sum of the subdiscrepancies
We begin with (26), which is easier. For each fixed , it will suffice to show that
as the claim then follows by dividing by and summing using standard estimates (see Lemma 8 of this previous post). But this claim follows from the Bombieri-Vinogradov theorem (Theorem 4), after restricting to the integers coprime to (which does not affect the property of being a coefficient sequence supported at a certain scale, nor does it affect the Siegel-Walfisz theorem).
Now we establish (25). Here we will not take advantage of the summation, and use crude estimates to reduce to showing that
whenever are good singleton congruence class systems. Let us see how (28) implies (27). Observe that if is any integer, then and are also good singleton congruence class systems, where consists of the primes with not divisible by (thus lies in when is coprime to ). By (28) we thus have
If we average over the interval for some sufficiently small fixed , we observe that
and similarly (using the crude divisor bound)
(The condition is redundant, but we include it for emphasis.) We conclude from the triangle inequality and the bounds on (if is small enough) that
On the other hand, from crude estimates one has
and the claim (27) then follows from Cauchy-Schwarz (noting from the Chinese remainder theorem that the two constraints are equivalent to the single constraint ).
It remains to prove (28). We will use the dispersion method (or Cauchy-Schwarz), playing the two congruence conditions and against each other. We first get rid of the absolute values in (28) by introducing an additional bounded coefficient. More precisely, to prove (28) it suffices to show that
for any bounded real coefficients . We expand out the Dirichlet convolution
then relabel as to rearrange the left-hand side as
We can write for some smooth non-negative coefficient sequence at scale . From crude bounds one has
where is subject to the same constraints as (thus and for ), and is some quantity that is independent of the choice of congruence classes , , since by replacing the with or vice versa as necessary one can express (29) as a linear combination of terms of the form in (29) in such a way that all the terms cancel out ().
At this stage we need to deal with a technical problem that may share a common factor; fortunately, this event turns out to be negligible (but only thanks to the controlled multiplicity hypothesis (6)). More precisely, we split (30) into the coprime case
We first show (32). The basic point here is that because we have previously restricted to have no prime factor smaller than , we can gain a factor of in (32), which is strong enough to overcome logarithmic losses but not losses coming from the divisor bound. To avoid using the divisor bound we will need the controlled multiplicity hypothesis (6) (and this is the only place in the argument where this hypothesis is actually used).
We turn to the details. The quantities must all lie in . We may split and where are coprime. By the Chinese remainder theorem, the constraints and then imply that
so by the triangle inequality and interchange of summation we can bound the left-hand side by
(with understood to lie in and be at most ) which we can write as
are the multiplicity functions associated to the congruence class systems and .
By the elementary bound and symmetry we may thus bound the left-hand side of (32) by
plus a symmetric term which is treated similarly and will be ignored.
We rearrange the constraints
as a combination of the constraints
For a given choice of obeying the former set of constraints, the obeying the second set of constraints lie in a single congruence class mod by the Chinese remainder theorem. From the controlled multiplicity hypothesis (6) one thus has
and so the left-hand side of (32) has been bounded by
We first dispose of the error term. From the divisor bound we have . For a given choice of , the sum then can be bounded by , leading to a total contribution of
for some fixed . We file away this necessary condition for now and move on, though we note that these conditions are weaker than (22) except in the “Type II” case when are close to .
Having disposed of the error term, the remaining contribution to the left-hand side of (32) that we need to control is
Summing in using crude bounds, this is bounded by
and then by summing in this is in turn bounded by
Now we finally use the fact that has no small factors less than to bound the summation here by
and since grows faster than any power of we see that this error term is also acceptable for (32). This concludes the proof of (32) (contingent of course on the lower bounds (22), (34) on that we will deal with later).
It remains to verify (31). Observe that must be coprime to and coprime to , with , to have a non-zero contribution to the sum. We then rearrange the left-hand side as
note that these inverses in the various rings , , are well-defined thanks to the coprimality hypotheses.
for some independent of the and .
We remark that at this stage we are only needing to gain a factor of over the trivial bound. However, we will now perform the expensive step of completion of sums (Lemma 6), which replaces the factor by an exponential phase at the cost of requiring now a significantly larger gain over the trivial bound. Applying Lemma 6 and Lemma 7, we can write the left-hand side of (36) as the sum of the main term
an error term
for a sufficiently small fixed , and we have dropped all hypotheses on other than magnitude, and we abbreviate as . As noted after Lemma 6, we are no longer asking for just a savings over the trivial bound; we must instead gain a factor of to overcome the summation in .
and hence from (24)
We now split into two cases, one which works when are not too close to , and one which works when are close to .
hold for some fixed , then (39) holds for a sufficiently small fixed .
The condition (42) represents a border between this case and the Type II case.
holds, then (39) holds for a sufficiently small fixed .
Both of these theorems are established by using Cauchy-Schwarz to eliminate the absolute values and factors in (39) until one is left with an expression only involving the phase which can then be estimated using the Weil conjectures with a power saving to counteract the loss of , however the use of Cauchy-Schwarz is slightly different in the two cases. In practice, the condition (43) is too strong to be satisfied by value of given in Theorem 3, so we will only use Theorem 14 in the case that (42) fails, since in that case we may make small enough for (43) to hold.
which means that the condition (34) is now weaker than (22) and may be omitted. By (7) we can simultaneously obey (22), (34), (41) by setting sufficiently close to zero, and the claim now follows from Theorem 13.
which is satisfied for small enough thanks to (8).
— 6. The Type I sum —
We now prove Theorem 13. It suffices to show that
for any bounded real coefficients (which are vaguely related to the previous coefficients , but this is not important for us here). We rearrange the left-hand side as
From the divisor bound we have
and we may write for some smooth coefficient sequence at scale , so by Cauchy-Schwarz it suffices to show that
(note now we have to gain more than over the trivial bound, rather than just ). We rearrange this as
so by the triangle inequality it suffices to show that
for any .
To prove (44), we isolate the diagonal case and the non-diagonal case . For the diagonal case, we make the crude bound
The contribution of the diagonal case can now be bounded by
Writing we have , and one can estimate this by
which by the divisor bound is of the form
For this to be acceptable we need a bound of the form
which, by (37) is equivalent to
for all non-diagonal . In the model case (20) with , the two terms on the right-hand side are approximately and , which give the desired power saving compared to the trivial bound of since (and in (20), is small, so just about any power saving suffices). As the model case indicates, the first term in (45) is the dominant one in practice.
so to conclude (44) we need to show that
and the left-hand side of (45) expands as
The first two phases
can be combined as
where and is the residue class
Similarly, the two phases
can combine as
where and is the residue class
although the precise value of will not be important for us. The left-hand side of (45) is thus
and hence Corollary 10 and the coprimality of is bounded by
which is controlled by the first term on the right-hand side of (45). So it remains to show that
We crudely bound by and by . (This is inefficient, but this term is not dominant in the analysis in any case.) By (46) we may bound by , and the claim follows.
— 7. The Type II sum —
We now prove Theorem 14. As before, it suffices to show that
for any bounded real coefficients . We rearrange the left-hand side slightly differently from the previous section:
From crude bounds we have
so by Cauchy-Schwarz it suffices to show that
This is basically the same situation as in the previous section except that we have decoupled the variables from each other.
As before, we isolate a diagonal case , and now consider the contribution of this case. Arguing as in the previous section, the contribution of this case to (47) can be bounded by
which by the divisor bound is of the form
which will be acceptable if
By (37) this is equivalent to
of (45), valid for all non-diagonal . The bound here is weaker than in (45), but in the model case (20) the right-hand side terms are approximately and respectively, which still represents a power saving over the trivial bound of approximately as long as . While this does not cover all the range that the Type I analysis does, it crucially is able to cover the case when is very close to zero, which the Type I analysis does not cover due to the condition (42). The Type I/II border is not the critical border for optimising the exponents, so it is not a priority for us to improve bounds in the Type II analysis such as (48).
so to conclude (47) we need to show that
and the left-hand side of (48) expands as
If we set
then as before we can rewrite this sum as
(with the inverses with respect to the moduli in the first, second, and third summands respectively), and the value of is unimportant for us. We have an analogue of (46), namely
which is controlled by the first term on the right-hand side of (48). So it remains to show that
We crudely bound by and by ; also
and from (49) one has . The claim follows.