This is one of the continuations of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and a post to come that covers the Type III sums.) The main purpose of this post is to present (and hopefully, to improve upon) the treatment of two of the three key estimates in Zhang’s paper, namely the Type I and Type II estimates.

The main estimate was already stated as Theorem 16 in the previous post, but we quickly recall the relevant definitions here. As in other posts, we always take {x} to be a parameter going off to infinity, with the usual asymptotic notation {O(), o(), \ll} associated to this parameter.

Definition 1 (Coefficient sequences) A coefficient sequence is a finitely supported sequence {\alpha: {\bf N} \rightarrow {\bf R}} that obeys the bounds

\displaystyle  |\alpha(n)| \ll \tau^{O(1)}(n) \log^{O(1)}(x) \ \ \ \ \ (1)

for all {n}, where {\tau} is the divisor function.

  • (i) If {\alpha} is a coefficient sequence and {a\ (q) = a \hbox{ mod } q} is a primitive residue class, the (signed) discrepancy {\Delta(\alpha; a\ (q))} of {\alpha} in the sequence is defined to be the quantity

    \displaystyle  \Delta(\alpha; a \ (q)) := \sum_{n: n = a\ (q)} \alpha(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1} \alpha(n). \ \ \ \ \ (2)

  • (ii) A coefficient sequence {\alpha} is said to be at scale {N} for some {N \geq 1} if it is supported on an interval of the form {[(1-O(\log^{-A_0} x)) N, (1+O(\log^{-A_0} x)) N]}.
  • (iii) A coefficient sequence {\alpha} at scale {N} is said to obey the Siegel-Walfisz theorem if one has

    \displaystyle  | \Delta(\alpha 1_{(\cdot,q)=1}; a\ (r)) | \ll \tau(qr)^{O(1)} N \log^{-A} x \ \ \ \ \ (3)

    for any {q,r \geq 1}, any fixed {A}, and any primitive residue class {a\ (r)}.

  • (iv) A coefficient sequence {\alpha} at scale {N} is said to be smooth if it takes the form {\alpha(n) = \psi(n/N)} for some smooth function {\psi: {\bf R} \rightarrow {\bf C}} supported on {[1-O(\log^{-A_0} x), 1+O(\log^{-A_0} x)]} obeying the derivative bounds

    \displaystyle  \psi^{(j)}(t) = O( \log^{j A_0} x ) \ \ \ \ \ (4)

    for all fixed {j \geq 0} (note that the implied constant in the {O()} notation may depend on {j}).

In Lemma 8 of this previous post we established a collection of “crude estimates” which assert, roughly speaking, that for the purposes of averaged estimates one may ignore the {\tau^{O(1)}(n)} factor in (1) and pretend that {\alpha(n)} was in fact {O( \log^{O(1)} n)}. We shall rely frequently on these “crude estimates” without further citation to that precise lemma.

For any {I \subset {\bf R}}, let {{\mathcal S}_I} denote the square-free numbers whose prime factors lie in {I}.

Definition 2 (Singleton congruence class system) Let {I \subset {\bf R}}. A singleton congruence class system on {I} is a collection {{\mathcal C} = (\{a_q\})_{q \in {\mathcal S}_I}} of primitive residue classes {a_q \in ({\bf Z}/q{\bf Z})^\times} for each {q \in {\mathcal S}_I}, obeying the Chinese remainder theorem property

\displaystyle  a_{qr}\ (qr) = (a_q\ (q)) \cap (a_r\ (r)) \ \ \ \ \ (5)

whenever {q,r \in {\mathcal S}_I} are coprime. We say that such a system {{\mathcal C}} has controlled multiplicity if the

\displaystyle  \tau_{\mathcal C}(n) := |\{ q \in {\mathcal S}_I: n = a_q\ (q) \}|

obeys the estimate

\displaystyle  \sum_{C^{-1} x \leq n \leq Cx: n = a\ (r)} \tau_{\mathcal C}(n)^2 \ll \frac{x}{r} \tau(r)^{O(1)} \log^{O(1)} x + x^{o(1)}. \ \ \ \ \ (6)

for any fixed {C>1} and any congruence class {a\ (r)} with {r \in {\mathcal S}_I}.

The main result of this post is then the following:

Theorem 3 (Type I/II estimate) Let {\varpi, \delta, \sigma > 0} be fixed quantities such that

\displaystyle  11 \varpi + 3\delta + 2 \sigma < \frac{1}{4} \ \ \ \ \ (7)

and

\displaystyle  29\varpi + 5 \delta < \frac{1}{4} \ \ \ \ \ (8)

and let {\alpha,\beta} be coefficient sequences at scales {M,N} respectively with

\displaystyle  x \ll MN \ll x \ \ \ \ \ (9)

and

\displaystyle  x^{\frac{1}{2}-\sigma} \ll N \ll M \ll x^{\frac{1}{2}+\sigma} \ \ \ \ \ (10)

with {\beta} obeying a Siegel-Walfisz theorem. Then for any {I \subset [1,x^\delta]} and any singleton congruence class system {(\{a_q\})_{q \in {\mathcal S}_I}} with controlled multiplicity we have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta(\alpha \ast \beta; a_q)| \ll x \log^{-A} x.

The proof of this theorem relies on five basic tools:

  • (i) the Bombieri-Vinogradov theorem;
  • (ii) completion of sums;
  • (iii) the Weil conjectures;
  • (iv) factorisation of smooth moduli {q \in {\mathcal S}_I}; and
  • (v) the Cauchy-Schwarz and triangle inequalities (Weyl differencing and the dispersion method).

For the purposes of numerics, it is the interplay between (ii), (iii), and (v) that drives the final conditions (7), (8). The Weil conjectures are the primary source of power savings ({x^{-c}} for some fixed {c>0}) in the argument, but they need to overcome power losses coming from completion of sums, and also each use of Cauchy-Schwarz tends to halve any power savings present in one’s estimates. Naively, one could thus expect to get better estimates by relying more on the Weil conjectures, and less on completion of sums and on Cauchy-Schwarz.

— 1. The Bombieri-Vinogradov theorem —

One of the basic distribution results in this area of analytic number theory is the Bombieri-Vinogradov theorem. As first observed by Motohashi, this theorem in fact controls the distribution of a general class of Dirichlet convolutions in congruence classes. We will use the following formulation of this theorem, essentially Theorem 0 of Bombieri-Friedlander-Iwaniec;

Theorem 4 (Bombieri-Vinogradov theorem) Let {M,N} be such that

\displaystyle  x \ll MN \ll x

and

\displaystyle  x^\epsilon \ll M,N \ll x^{1-\epsilon}

for some fixed {0 < \epsilon < 1}. Let {\alpha,\beta} be coefficient sequences at scale {M,N} respectively. Suppose also that {\beta} obeys a Siegel-Walfisz theorem.

  • (i) (First Bombieri-Vinogradov inequality) We have

    \displaystyle  \sum_{q \leq x^{-o(1)} N} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\beta; a)|^2 \ll N^2 \log^{-A} x

    for some sufficiently slowly decaying {o(1)} and any fixed {A>0}.

  • (ii) (Second Bombieri-Vinogradov inequality) we have

    \displaystyle  \sum_{q \leq x^{1/2-o(1)}} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha*\beta; a)| \ll x \log^{-A} x

    for some sufficiently slowly decaying {o(1)} and any fixed {A>0}.

For sake of completeness we now recall the proof of this theorem, following the presentation in Bombieri-Friedlander-Iwaniec. (This is standard material, and experts may immediately skip to the next section.) We first need the large sieve inequality for Dirichlet characters:

Lemma 5 (Large sieve inequality) For any sequence {\alpha: {\bf N} \rightarrow {\bf C}} supported on {\{1,\ldots,N\}} and any {Q > 1} one has

\displaystyle  \sum_{q \leq Q} \sum_{\chi\ (q)}^* \frac{q}{\phi(q)} |\sum_n \alpha(n) \chi(n)|^2 \ll (Q^2+N) \sum_n |\alpha(n)|^2

where the {\chi} summation is over primitive Dirichlet characters of conductor {q}.

Proof: By enlarging {N} we may assume that {N \geq Q^2}.

We use the {TT^*} method. By duality, the desired estimate is equivalent to

\displaystyle  |\sum_{q \leq Q} \sum_{\chi\ (q)}^* \frac{q}{\phi(q)} c_\chi \sum_{n \leq N} \alpha(n) \chi(n)|

\displaystyle \ll N^{1/2} (\sum_{n \leq N} |\alpha(n)|^2)^{1/2} (\sum_{q \leq Q} \sum_{\chi\ (q)}^* \frac{q}{\phi(q)} |c_\chi|^2)^{1/2}

which is in turn equivalent to

\displaystyle  \sum_n 1_{n \leq N} |\sum_{q \leq Q} \sum_{\chi\ (q)}^* \frac{q}{\phi(q)} c_\chi \chi(n)|^2 \ll N \sum_{q \leq Q} \sum_{\chi\ (q)}^* \frac{q}{\phi(q)} |c_\chi|^2.

We bound {1_{n \leq N}} by a Schwartz function {\psi(n/N)} whose Fourier transform is supported on {[-1/2Q^2,1/2Q^2]}; this is possible for a fixed {\psi} when {N \geq Q^2}. The left-hand side then expands as

\displaystyle  \sum_{q,q' \leq Q} \sum_{\chi\ (q)}^* \sum_{\chi'\ (q')}^* \frac{q}{\phi(q)} c_\chi \frac{q'}{\phi(q')} \overline{c_{\chi'}} (\sum_n \psi(n/N) \chi\overline{\chi'}(n)).

The inner sum is {O(N \frac{\phi(q)}{q})} when {\chi=\chi'}, and zero otherwise thanks to Fourier analysis (specifically, the Poisson summation formula combined with the support of the Fourier transform of {\psi} and the fact that {\chi\overline{\chi'}} is periodic of period {qq'} with mean zero). The claim follows. \Box

Now we prove part (i) of Theorem 4. By an overspill argument (cf. Lemma 7 of this previous post) it suffices to show that

\displaystyle  \sum_{q \leq x^{-\epsilon} N} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\beta; a)|^2 \ll N^2 \log^{-A} x \ \ \ \ \ (11)

for any fixed {A,\epsilon > 0}.

Let {Q := x^{-\epsilon} N}. By multiplicative Fourier analysis we may write the left-hand side of (11) as

\displaystyle  \sum_{q \leq Q} \frac{1}{\phi(q)} \sum_{\chi \neq \chi_0\ (q)} |\sum_n \beta(n) \chi(n)|^2

where the inner sum ranges over non-principal Dirichlet characters {\chi} of modulus {q}, not necessarily primitive. Any such character can be written as {\chi(n) = \psi(n) 1_{(n,e)=1}} where {q = de} with {d > 1}, and {\psi} is a primitive Dirichlet character of conductor {d}. The above sum can then be rewritten as

\displaystyle \sum_{e \leq Q} \sum_{1 < d < Q/e} \frac{1}{\phi(de)} \sum_{\psi\ (d)}^* |\sum_n \beta(n) \psi(n) 1_{(n,e)=1}|^2

where the {\psi} summation ranges over primitive characters modulo {d}. Since {\frac{1}{\phi(de)} \leq \frac{1}{\phi(d)} \frac{1}{\phi(e)}}, we may bound this by

\displaystyle \sum_{e \leq Q} \frac{1}{\phi(e)} \sum_{1 < d < Q/e} \frac{1}{\phi(d)} \sum_{\psi\ (d)}^* |\sum_n \beta(n) \psi(n) 1_{(n,e)=1}|^2,

which on performing the {e} summation is bounded by

\displaystyle  \ll (\log x)^{O(1)} \sup_{e \leq Q} \sum_{1 < d < Q/e} \frac{1}{\phi(d)} \sum_{\psi\ (d)}^* |\sum_n \beta(n) \psi(n) 1_{(n,e)=1}|^2,

and then by dyadic decomposition this is bounded by

\displaystyle  \ll (\log x)^{O(1)} \sup_{e, D \leq Q} \sum_{D < d < 2D} \frac{1}{\phi(d)} \sum_{\psi\ (d)}^* |\sum_n \beta(n) \psi(n) 1_{(n,e)=1}|^2.

Let us first consider the contribution of the small moduli, specifically when {D \leq \log^{C} x} for some fixed {C>0}. From the Siegel-Walfisz theorem (3) we have

\displaystyle  \sum_n \beta(n) \psi(n) 1_{(n,e)=1} \ll \tau(de)^{O(1)} N \log^{-A' + O(C)} x

for any fixed {A'>0}, and then by crude estimates (see Lemma 8 of this previous post) we see that this contribution is acceptable for (11). Thus we may restrict attention to those moduli with {D > \log^{C} x} for any fixed {C}. On the other hand, from Lemma 5 we have

\displaystyle  \sum_{D \leq d \leq 2D} \frac{1}{\phi(d)} \sum_{\psi\ (d)}^* |\sum_n \beta(n) \psi(n) 1_{(n,e)=1}|^2 \ll \frac{N+D^2}{D} \sum_n |\beta(n) 1_{(n,e)=1}|^2

for any {\log^C x \ll D \ll Q}. From crude estimates we have

\displaystyle  \sum_n |\beta(n) 1_{(n,e)=1}|^2 \ll N \log^{O(1)} x.

Since

\displaystyle  \frac{N+D^2}{D} = N/D + D \ll N \log^{-C} x + N x^{-\epsilon}

the claim follows.

Now we prove (ii). We now set {Q := x^{1/2-\epsilon}} for some fixed {\epsilon>0}. By overspill as before, it suffices to show that

\displaystyle  \sum_{q \leq Q} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha \ast \beta; a)| \ll x \log^{-A} x. \ \ \ \ \ (12)

By multiplicative Fourier analysis we have

\displaystyle  \Delta(\alpha \ast \beta; a) = \frac{1}{\phi(q)} \sum_{\chi \neq \chi_0\ (q)} \overline{\chi(a)} \sum_n \alpha \ast \beta(n) \chi(n)

\displaystyle  = \frac{1}{\phi(q)} \sum_{\chi \neq \chi_0\ (q)} \overline{\chi(a)} (\sum_m \alpha \chi(m)) (\sum_n \beta \chi(n))

so by splitting into primitive characters as before we may bound the left-hand side of (12) by

\displaystyle  \ll (\log x)^{O(1)} \sup_{e,D \leq Q} \sum_{D < d < 2D} \frac{1}{\phi(d)} \sum_{\psi\ (d)}^*

\displaystyle  |\sum_m \alpha \psi(m) 1_{(m,e)=1}| |\sum_n \beta \psi(n) 1_{(n,e)=1}|

Arguing as before, the case {D \leq \log^C x} is acceptable, so we assume {\log^C x \leq D \leq x^{1/2-\epsilon}}. From crude estimates we have

\displaystyle  \sum_m |\alpha(m) 1_{(m,e)=1}|^2 \ll M \log^{O(1)} x.

and

\displaystyle  \sum_n |\beta(n) 1_{(n,e)=1}|^2 \ll N \log^{O(1)} x.

From Lemma 5 and Cauchy-Schwarz we can thus bound

\displaystyle  \sum_{D < d < 2D} \frac{1}{\phi(d)} \sum_{\psi\ (d)}^* |\sum_m \alpha \chi(m) 1_{(m,e)=1}| |\sum_n \beta \chi(n) 1_{(n,e)=1}|

by

\displaystyle  \ll \frac{\log^{O(1)} x}{D} N^{1/2} M^{1/2} (N+D^2)^{1/2} (M+D^2)^{1/2};

as {NM} is comparable to {x}, this simplifies to

\displaystyle  \ll x \log^{O(1)} x ( \frac{1}{D} + N^{-1/2} + M^{-1/2} + x^{-1/2} D ),

which is acceptable from the hypotheses on {D, Q, N, M} if {C} is chosen large enough depending on {A}. This concludes the proof of Theorem 4.

We remark that an inspection of the proof reveals that the {x^{-o(1)}} or {x^{-\epsilon}} factors in the threshold {Q} for {q} may be replaced by {\log^{-A'} x} provided that {A'} is sufficiently large depending on {A}. However, this refinement of the Bombieri-Vinogradov inequality does not lead to any improvement in the final numerology for our purposes.

— 2. Completion of sums —

At several stages in the argument we will need to consider sums of the form

\displaystyle  \sum_m \psi_M(m) \sum_{i \in I: m = a_i\ (d)} c_i

for some smooth coefficient sequence {\psi_M}, sone congruence classes {a_i\ (d)} depending on a further parameter {i}, and further coefficients {c_i}. The completion of sums technique replaces the {\psi_M(m)} term here with an exponential phase, leaving one with consideration of exponential sums such as

\displaystyle  \sum_{i \in I} c_i e_d(-a_i h)

for various {h}, where {e_d(n) := e^{2\pi i n/d}} for {n \in {\bf Z}/d{\bf Z}} (or {n \in {\bf Z}}) is the standard character on {{\bf Z}/d{\bf Z}}. More precisely, we have

Lemma 6 (Completion of sums) Let {\psi_M} be a smooth coefficient sequence at some scale {1 \ll M \ll x}. Let {I} be a finite set of indices, and for each {i \in I} let {c_i} be a complex number and {a_i\ (d)} be a congruence class for some {d \ll x}. we have

\displaystyle  \sum_m \psi_M(m) \sum_{i \in I: m = a_i\ (d)} c_i = \frac{1}{d} (\sum_m \psi_M(m)) (\sum_i c_i) \ \ \ \ \ (13)

\displaystyle  + O( (\log^{O(1)} x) \frac{M}{d} \sum_{1 \leq h \leq x^\epsilon d M^{-1}} |\sum_i c_i e_d( a_i h )| )

\displaystyle  + O( x^{-A} \sum_i |c_i| )

for any fixed {\epsilon,A>0}.

Specialising to the case {I = {\bf Z}/d{\bf Z}} and {a_i = i}, we conclude in particular that

\displaystyle  \sum_m \psi_M(m) c(m) = \frac{1}{d} (\sum_m \psi_M(m)) (\sum_{n \in {\bf Z}/d{\bf Z}} c(n)) \ \ \ \ \ (14)

\displaystyle  + O( (\log^{O(1)} x) \frac{M}{d} \sum_{1 \leq h \leq x^\epsilon d M^{-1}} |\sum_{n \in {\bf Z}/d{\bf Z}} c(n) e_d( n h )| )

\displaystyle  + O( x^{-A} \sum_{n \in {\bf Z}/d{\bf Z}} |c(n)| )

whenever {c: {\bf Z} \rightarrow {\bf C}} is periodic of degree {d}.

Proof: We rearrange the left-hand side as

\displaystyle  \sum_{i \in I} c_i \sum_{m} \psi_M(m) 1_{m=a_i\ (d)}.

Using the Fourier expansion

\displaystyle  1_{m = a_i\ (d)} = \frac{1}{d} \sum_{h \in {\bf Z}/d{\bf Z}} e_d( a_i h) e_d(-m h)

and rearranging, the left-hand side then becomes

\displaystyle  \frac{1}{d} \sum_{h \in {\bf Z}/d{\bf Z}} [\sum_m \psi_M(m) e_d(-mh)] \times [\sum_{i \in I} c_i e_d(a_i h)].

The {h=0} term of this is the first term on the right-hand side of (13). The terms coming from integers {h} with {1 \leq |h| \leq x^\epsilon d M^{-1}} can be bounded by the second term in (13), bounding {\sum_m \psi_M(m) e_d(mh)} crudely by {O(M \log^{O(1)} x)} by crude estimates and also using conjugation symmetry when {h} is negative. So it will suffice to show that

\displaystyle  \sum_m \psi_M(m) e_d(-mh) = O( x^{-A-2} )

(say) when {x^\epsilon d M^{-1} \leq |h| \leq d/2}. Writing {\psi_M(m) = \psi(m/M)} as in the definition of a smooth coefficient sequence, and applying Poisson summation, the left-hand side is

\displaystyle  M \sum_n \hat \psi( M n + \frac{Mh}{d} )

where {\hat \psi(t) := \int_{\bf R} e^{-2\pi i st} \psi(s)\ ds} is the Fourier transform of {\psi}. But from the smoothness (4) of {\psi} and integration by parts one has the bounds

\displaystyle  \hat \psi(t) \ll |t|^{-A'} \log^{O(1)} x

for any fixed {A'}, and from the hypothesis {x^\epsilon d M^{-1} \leq |h| \leq d/2} we obtain the claim by taking {A'} large enough depending on {\epsilon,A}. \Box

We remark that in the absence of cancellation in the exponential sum {\sum_i c_i e_d(-a_i h)}, the first error term in (13) could be as large as

\displaystyle  O( x^{\epsilon+o(1)} \sum_i |c_i| )

which is about {x^{\epsilon+o(1)} \frac{d}{M}} times as large as the main term in (13). In practice we will apply this lemma with {d > x^c M} for some fixed {c>0}, in which case completion of sums will cost a factor of {x^c} or so in the bounds. However, it is still often desirable to pay this cost in order to exploit cancellation bounds for exponential sum, in particular those coming from the Weil conjectures as described below.

In our applications, the modulus {d} will split into a product of two factors {q_1 q_2} or three factors {q_1 q_2 q_3}. The following simple lemma lets us then split exponential phases of the form {e_d(-ah)}:

Lemma 7 Suppose that {d=q_1 q_2} for some coprime {q_1, q_2}, and let {a\ (d)} be the intersection of the congruence classes {b_1\ (q_1)} and {b_2\ (q_2)}. Then for any integer {h},

\displaystyle  e_d( a h ) = e_{q_1}( \frac{b_1 h}{q_2} ) e_{q_2}( \frac{b_2 h}{q_1} ).

Similarly, if {d=q_1q_2q_3} for coprime {q_1,q_2,q_3} and {a\ (d)} is the intersection of {b_1\ (q_1)}, {b_2\ (q_2)}, and {b_3\ (q_3)}, then

\displaystyle  e_d( a h ) = e_{q_1}( \frac{b_1 h}{q_2 q_3} ) e_{q_2}( \frac{b_2 h}{q_1 q_3} ) e_{q_3}( \frac{b_3 h}{q_1 q_2} ).

Here and in the sequel we are using the convention that {e_q( \frac{a}{b} )} means {e_q(a \overline{b} )} for {b} coprime to {q}, where {\overline{b}} is the reciprocal of {b} modulo {q}.

Proof: We just prove the first identity, as the second is similar. Let {\bar{q_1}} be an integer such that {q_1 \bar{q_1} = 1 \ (q_2)}, and similarly let {\bar{q_2}} be an integer such that {q_2 \bar{q_2} = 1\ (q_1)}. Then {b_1 q_2 \bar{q_2} + b_2 q_1 \bar{q_1}} is equal to {b_1} mod {q_1} and {b_2} mod {q_2}, and thus equal to {a} mod {q_1 q_2}. Thus

\displaystyle  e_d( ah) = e_d( b_1 hq_2 \bar{q_2} + b_2 h q_1 \bar{q_1} )

and the claim follows by factoring the exponential. \Box

— 3. The Weil conjectures —

For the purposes of this post, the Weil conjectures (as proven in full generality by Deligne) can be viewed as a black box device to obtain “square root cancellation” for various exponential sums of the form

\displaystyle  \sum_{n \in G} \xi( P(n) )

where {G} is a finite abelian group (i.e. a finite product of cyclic groups) with some additive character {\xi: H \rightarrow S^1} and some rational function {P: G \rightarrow H}, basically by obtaining the analogue of the Riemann Hypothesis for a certain zeta function associated to an algebraic variety related to the function {P}. (This is by no means the full strength of the Weil conjectures; amongst other things, one can also twist such sums by multiplicative characters, and also work with more complicated schemes than classical algebraic varieties, though the exponential sum estimates are more difficult to state succinctly as a consequence.) A basic instance of this is Weil’s classical bound on the Kloosterman sum

\displaystyle  S(a,b;m) := \sum_{n \in ({\bf Z}/m{\bf Z})^\times} e_m( an + \frac{b}{n} ), \ \ \ \ \ (15)

defined whenever {a,b,m} are integers with {m} positive.

Theorem 8 (Weil bound) For any {a,b,m} one has

\displaystyle  |S(a,b;m)| \leq \tau(m) (a,b,m)^{1/2} m^{1/2}

where {(a,b,m)} is the greatest common divisor of {a,b,m}.

Proof: (Sketch only; see e.g. Iwaniec-Kowalski for a full proof.) For simplicity we only address the case of square-free {m}, which is the case needed in our application. By the Chinese remainder theorem we may reduce to the case when {m} is a prime {q}, then we may also reduce to the case when {a,b} are coprime to {p}. It then suffices to show that

\displaystyle  |\sum_{(x,y) \in C({\bf F}_q)} \psi(ax+by)| \leq 2 q^{1/2} \ \ \ \ \ (16)

whenever {\psi: {\bf F}_q \rightarrow S^1} is an additive character and {C({\bf F}_q)} is the hyperbola

\displaystyle  C({\bf F}_q) := \{ (x,y) \in {\bf F}_q \times {\bf F}_q: xy = 1 \}.

An important trick is then to generalise from {{\bf F}_q} to finite dimensional extensions {{\bf F}_{q^n}} of {{\bf F}_q}, and then consider the Kloosterman sums

\displaystyle  S_n(\psi) := \sum_{(x,y) \in C({\bf F}_{q^n})} \psi( \hbox{Tr}(ax+by) )

associated to these extensions, where {\hbox{Tr}: {\bf F}_{q^n} \rightarrow {\bf F}_q} is the field trace. For non-principal {\psi}, it is possible to show the explicit formula

\displaystyle  S_n(\psi) = - \alpha_\psi^n - \beta_\psi^n \ \ \ \ \ (17)

for some complex numbers {\alpha_\psi,\beta_\psi} (depending of course on {a,b}); this is part of the general theory of {L}-functions associated to algebraic varieties, but can also be established by elementary means (e.g. by establishing a linear recurrence for the {S_n}). For the principal character {\psi = \psi_0}, of course, one has

\displaystyle  S_n(\psi_0) = q^n - 1.

Next, one observes the basic identity

\displaystyle  \sum_\psi \psi(\hbox{Tr}(x)) = | \{ z \in {\bf F}_{q^n}: z^q-z = x \}|

as can be seen by computing the kernel and range of the linear map {z \mapsto z^q-z} on {{\bf F}_{q^n}} (this identity is also related to Hilbert’s Theorem 90). From this we have a combinatorial interpretation of the quantity

\displaystyle  \sum_\psi S_n(\psi),

namely as the number of points on the curve

\displaystyle  \{ (x,y,z) \in {\bf F}_{q^n}^3: z^q-z = ax+by; xy=1 \}.

One can show (e.g. using Stepanov’s method, cf. this previous post) that the number of this points on this curve is equal to {q^n + O(q^{n/2+ O(1)})}, leading to the identity

\displaystyle  q^n - 1 - \sum_{\psi \neq \psi_0} (\alpha_\psi^n +\beta_\psi^n) = q^n + O(q^{n/2+ O(1)}).

Studying the asymptotics as {n \rightarrow \infty}, one is led to the conclusion that {|\alpha_\psi|, |\beta_\psi| \leq \sqrt{q}} (this trick to “magically” delete the {O(q^{O(1)})} error is a canonical example of the tensor power trick), and the bound (16) then follows from the {n=1} case of (17). \Box

In practice, we shall estimate {\tau(m)} crudely by the divisor bound {\tau(m) = m^{o(1)}} (where {o(1)} denotes a quantity that goes to zero as {m \rightarrow \infty}), and the factor {(a,b,m)} will also be small in applications, so that we do indeed see the square root savings over the trivial bound {|S(a,b,m)| \leq m}. For the Type I/II sums, the classical Weil bound is sufficient; but for the Type III sums that we will cover in a subsequent post, the full force of Deligne’s results are needed.

An important remark is that when {a=0}, we can apply the change of variables {n \mapsto 1/n} and convert the Kloosterman sum {S(0,b;m)} into a Ramanujan sum

\displaystyle  S(0,b,m) = \sum_{n \in ({\bf Z}/m{\bf Z})^\times} e_m(bn)

which enjoys even better cancellation than square root cancellation; in particular it is not difficult to establish the bound

\displaystyle  |S(0,b,m)| \ll m^{o(1)} (b,m) \ \ \ \ \ (18)

using the Chinese remainder theorem to reduce to the case when {m} is a power of a prime, and then using the divisor bound.

For the Type I and Type II sums we need a more complicated variant of this bound (Lemma 11 of Zhang’s paper):

Lemma 9 Let {d_1,d_2} be square-free natural numbers, let

\displaystyle d := [d_1,d_2]; \quad d_0 := (d_1,d_2); \quad t_1 := d_1/d_0, \quad t_2 := d_2/d_0,

and let {c_1, c_2, l, m} be integers. Then the double Kloosterman sum

\displaystyle  K(d_1,c_1; d_2,c_2; l,m) := \sum_{n \in {\bf Z}/d{\bf Z}: (n,d_1) = (n+l,d_2)=1}

\displaystyle  e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+l} ) e_d( mn )

obeys the bound

\displaystyle  |K(d_1,c_1; d_2,c_2; l,m)| \leq d_0 |S(m_1,b_1; t_1)| |S(m_2,b_2; t_2)|

for some {m_1,m_2,b_1,b_2} with

\displaystyle  (m_i,b_i,t_i) | (m,c_i,d_i)

for {i=1,2}. In particular, from Theorem 8, we have

\displaystyle  |K(d_1,c_1; d_2,c_2; l,m)| \ll (d_1 d_2)^{o(1)}

\displaystyle  (m,c_1,d_1)^{1/2} (m,c_2,d_2)^{1/2} d_1^{1/2} d_2^{1/2}

while in the {m=0} case we have from (18) that

\displaystyle  |K(d_1,c_1; d_2,c_2; l,0)| \ll (d_1 d_2)^{o(1)} d_0 (c_1,d_1) (c_2,d_2).

Here {o(1)} denotes a quantity that goes to zero as {d_1,d_2 \rightarrow \infty}.

Proof: As {d_0,t_1,t_2} are coprime, we may refactor {e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+l} ) e_d( mn )} as

\displaystyle  e_{d_0}( \frac{a_1}{n} + \frac{a_2}{n+l} + m_0 n ) e_{t_1}( \frac{b_1}{n} + m_1 n ) e_{t_2}( \frac{b_2}{n+l} + m_2 n )

for some {a_1,a_2,m_0 \in {\bf Z}/d_0{\bf Z}}, {b_1,m_1 \in {\bf Z}/t_1{\bf Z}} and {b_2,m_2 \in {\bf Z}/t_2{\bf Z}}, with

\displaystyle  m = m_0 t_1 t_2 + m_1 d_0 t_2 + m_2 d_0 t_1\ (d_0 t_1 t_2)

and

\displaystyle  c_i = a_i t_i + b_i d_0\ (d_i)

for {i=1,2}, which in particular implies that {(m_i,b_i,t_i) | (m,c_i,d_i)} as claimed. Using the Chinese remainder theorem, we may now factor {K(d_1,c_1; d_2,c_2; l,m)} as the product of the sums

\displaystyle  \sum_{n \in {\bf Z}/d_0{\bf Z}: (n,d_0)=(n+l,d_0)=1} e_{d_0}( \frac{a_1}{n} + \frac{a_2}{n+l} + m_0 n ),

\displaystyle  \sum_{n \in {\bf Z}/t_1{\bf Z}: (n,t_1) = 1} e_{t_1}( \frac{b_1}{n} + m_1 n )

and

\displaystyle  \sum_{n \in {\bf Z}/t_2{\bf Z}: (n+l,t_2) = 1} e_{t_2}( \frac{b_2}{n+l} + m_2 n ).

Bounding the first sum trivially by {d_0} and shifting the third sum by {l} we obtain the claim (note that {d_0 t_1^{1/2} t_2^{1/2} = d_1^{1/2} d_2^{1/2}}).

The treatment of the {d_0} terms in the above analysis are crude, but in applications {d_0} is often trivial anyway, so it is not essential to obtain the sharpest estimates here.

We can combine this with the method of completion of sums:

Corollary 10 Let {d_1,d_2} be square-free natural numbers with {d_1,d_2 \ll x}, let {c_1,c_2,l} be integers, and let {\psi_N} be a smooth coefficient sequence at a scale {1 \ll N \ll x}. Then

\displaystyle  | \sum_{(n,d_1)=(n+l,d_2)=1} e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+l} ) \psi_N(n) |

\displaystyle  \ll x^{o(1)} ( (d_1 d_2)^{1/2} + \frac{N (d_1,d_2)^2}{d_1 d_2} (c_1,d_1) (c_2,d_2) ).

Proof: Write {d := [d_1,d_2]}. By (14) (and overspill) we may bound the left-hand side by

\displaystyle  \ll \frac{1}{d} (\sum_m \psi_N(m)) (\sum_{n \in {\bf Z}/d{\bf Z}} e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+l} ) )

\displaystyle  + x^{o(1)} \frac{N}{d} \sum_{1 \leq m \leq x^{o(1)} d N^{-1}} |\sum_{n \in {\bf Z}/d{\bf Z}} e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+l} ) e_d(mn)|

\displaystyle  + x^{-A} d,

where we suppress the conditions {(n,d_1)=(n+l,d_2)} from the {n} summation for brevity. The first term is {O( x^{o(1)} \frac{N}{d} (d_1,d_2) (c_1,d_1) (c_2,d_2))} by Lemma 9, which is acceptable. Another application of Lemma 9 gives

\displaystyle  |\sum_{n \in {\bf Z}/d{\bf Z}} e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+l} ) e_d(mn)|

\displaystyle  \ll x^{o(1)} (m,c_1,d_1)^{1/2} (m,c_2,d_2)^{1/2} d_1^{1/2} d_2^{1/2}.

From the divisor bound one has

\displaystyle  \sum_{1 \leq m \leq M} (m,q) \ll q^{o(1)} M

for any {M,q \geq 1}, so from this and Cauchy-Schwarz the net contribution of the second term is

\displaystyle  \ll x^{o(1)} \frac{N}{d} (x^{o(1)} d N^{-1}) d_1^{1/2} d_2^{1/2}

which is acceptable. \Box

— 4. Factoring a smooth number —

We will need to take advantage of the smooth nature of the variable {q} to factor it into two smaller pieces. We need an elementary lemma:

Lemma 11 Let {D} be a quantity of size {1 \ll D \ll x}, and set

\displaystyle  D_0 := \exp( \log^{1/3} x )

(say). Let {A > 0} be fixed. Then, for all but {O(D \log^{-A} x)} exceptions, all integers {d \in [D,2D]} have the property that

\displaystyle  \prod_{p|d: p \leq D_0} p \leq \exp( \log^{2/3} x ); \ \ \ \ \ (19)

in particular,

\displaystyle  \prod_{p|d: p \leq D_0} p = x^{o(1)}.

Proof: Suppose that (19) failed, thus

\displaystyle  \prod_{p|d: p \leq D_0} p > \exp( \log^{2/3} x ) = D_0^{\log^{1/3} x}.

In particular, we see that {d} has at least {\log^{1/3} x} prime factors, which implies in particular that

\displaystyle  \tau(d) \geq 2^{\log^{1/3} x}.

On the other hand, we have the standard bound

\displaystyle  \sum_{D \leq d \leq 2D} \tau(d) \ll D \log x

and the claim now follows from Markov’s inequality. \Box

Corollary 12 (Good factorisation) Let {I = [1,x^\delta]}, and let {R, D} be quantities such that

\displaystyle  1 \ll R \ll D \ll x.

Let {A>0} be fixed. Then for all but {O(D\log^{-A} x)} exceptions, all {d \in {\mathcal S}_I \cap [D,2D]} have a factorisation

\displaystyle  d = qr

where {q,r \in {\mathcal S}_I} are coprime with

\displaystyle  x^{-o(1)} R \ll r \ll x^{\delta-o(1)} R.

Furthermore {q} has no prime factors less than {D_0 := \exp(\log^{1/3} x)}, thus

\displaystyle  q \in {\mathcal S}_{I'}

where {I' := [D_0,x^\delta]}.

The fact that {q} has no tiny (i.e. less than {D_0}) prime factors will imply that any two such {q} will typically be coprime to each other with high probability (at least {1-O(\log^{-A} x)} for any fixed {A}), which is a key technical fact which we will need to exploit later. (The {W}-trick achieves a qualitatively similar effect, but would only give such a claim with probability {1-o(1)} or maybe {1-O(\log^{-c} x)} for some small {c>0} if one really optimised it, which is insufficient for the applications at hand.)

Proof: By Lemma 11 we may restrict attention to those {d \in {\mathcal S}_I \cap [D,2D]} for which

\displaystyle  \prod_{p|d: p \leq D_0} p = x^{o(1)}.

Now {d} is the product of distinct primes of size at most {x^\delta}, with {d \gg R}. Applying the greedy algorithm, one can then find a factor {r'} of {d} with

\displaystyle  R \ll r' \ll x^{\delta} R.

If one then multiplies {r'} by all primes of size less than {D_0} that divide {d/r'} to create {r'}, then sets {q := d/r'}, the claim follows. \Box

— 5. The dispersion method —

We begin the proof of Theorem 3. The reader may wish to track the exponents involved in the model regime

\displaystyle  \varpi, \delta \approx 0; \quad 0 < \sigma < 1/8; \quad M \approx x^{1/2+\sigma}; \quad N \approx x^{1/2-\sigma}. \ \ \ \ \ (20)

We can restrict {q} to the range

\displaystyle  q \geq x^{1/2-o(1)}

for some sufficiently slowly decaying {o(1)}, since otherwise we may use the Bombieri-Vinogradov theorem (Theorem 4). Thus we need to show that

\displaystyle  \sum_{q \in {\mathcal S}_I: x^{1/2-o(1)} \leq q< x^{1/2+2\varpi}} |\Delta(\alpha \ast \beta; a_q)| \ll NM \log^{-A} x. \ \ \ \ \ (21)

Let

\displaystyle  \mu > 0 \ \ \ \ \ (22)

be an exponent to be optimised later (in many cases, such as (20), it can be set very close to zero). By Corollary 12, outside of a small number of exceptions, we may factor {q=q'r} where {q' \in {\mathcal S}_{I'}} with {I' := I \cap [D_0,x^\delta]}, {r \in {\mathcal S}_I} is coprime to {q'}, and

\displaystyle  x^{-\mu-\delta-o(1)} N \ll r \ll x^{-\mu+o(1)} N

and

\displaystyle  x^{1/2-o(1)} \ll q'r \ll x^{1/2+2\varpi+o(1)}.

Let us first dispose of the set {{\mathcal E}} of exceptional values of {q} for which the above factorisation is unavailable. From Corollary 12 we have

\displaystyle  \sum_{q \in {\mathcal S}_I \cap {\mathcal E}: x^{1/2-o(1)} \leq q< x^{1/2+2\varpi}} \frac{x}{q} \ll x \log^{-A+O(1)} x.

On the other hand, we have the crude estimate

\displaystyle  |\Delta(\alpha \ast \beta; a_q)| \ll \frac{x}{q} \tau(q)^{O(1)} \log^{O(1)} x

which when combined with crude estimates leads to the crude upper bound

\displaystyle  \sum_{q: x^{1/2-o(1)} \leq q< x^{1/2+2\varpi}} \frac{q}{x} |\Delta(\alpha \ast \beta; a_q)|^2 \ll x \log^{O(1)} x.

Applying Cauchy-Schwarz we conclude that

\displaystyle  \sum_{q \in {\mathcal S}_I \cap {\mathcal E}: x^{1/2-o(1)} \leq q< x^{1/2+2\varpi}} |\Delta(\alpha \ast \beta; a_q)| \ll x \log^{-A/2+O(1)} x.

and so {{\mathcal E}} gives a negligible contribution to (21) (increasing {A} as necessary).

For the non-exceptional {q \not \in {\mathcal E}}, we arbitrarily select a factorisation {q = q'r} of the above form, and apply a dyadic decomposition. We conclude that it suffices to show that

\displaystyle  \sum_{q' \in {\mathcal S}_{I'}: Q \ll q' \ll Q} \sum_{r \in {\mathcal S}_I: R \ll r \ll R; (q,r)=1} |\Delta(\alpha \ast \beta; a_{q'r})| \ll NM \log^{-A} x.

for any fixed {A > 0}, where {Q, R \geq 1} obey the size conditions

\displaystyle  x^{-\mu-\delta-o(1)} N \ll R \ll x^{-\mu+o(1)} N \ \ \ \ \ (23)

and

\displaystyle  x^{1/2-o(1)} \ll QR \ll x^{1/2 + 2\varpi}. \ \ \ \ \ (24)

Fix {Q,R}. We replace {q'} by {q}, and abbreviate {\sum_{q \in {\mathcal S}_{I'}: Q \ll q \ll Q}} and {\sum_{r \in {\mathcal S}_I: R \ll r \ll R}} by {\sum_q} and {\sum_r} respectively, thus our task is to show that

\displaystyle  \sum_q \sum_{r: (q,r)=1} |\Delta(\alpha \ast \beta; a_{qr})| \ll NM \log^{-A} x.

We now split the discrepancy

\displaystyle  \Delta(\alpha \ast \beta; a_{qr}) = \sum_{n = a_{qr}\ (qr)} \alpha \ast \beta(n) - \frac{1}{\phi(qr)} \sum_{n: (n,qr)=1} \alpha \ast \beta(n)

as the sum of the subdiscrepancies

\displaystyle  \sum_{n: n = a_{qr}\ (qr)} \alpha \ast \beta(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n)

and

\displaystyle  \frac{1}{\phi(q)} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n) - \frac{1}{\phi(qr)} \sum_{n: (n,qr)=1} \alpha \ast \beta(n).

Of the two, the first discrepancy is significantly more difficult to handle. By the triangle inequality, it will suffice to show that

\displaystyle  \sum_{q} \sum_{r; (q,r)=1} |\sum_{n: n = a_{qr}\ (qr)} \alpha \ast \beta(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n)| \ \ \ \ \ (25)

\displaystyle  \ll NM \log^{-A} x

and

\displaystyle  \sum_{q} \sum_{r; (q,r)=1} |\frac{1}{\phi(q)} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n) - \frac{1}{\phi(qr)} \sum_{n: (n,qr)=1} \alpha \ast \beta(n)| \ll \ \ \ \ \ (26)

\displaystyle  NM \log^{-A} x.

We begin with (26), which is easier. For each fixed {q}, it will suffice to show that

\displaystyle  \sum_{r; (q,r)=1} |\sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n) - \frac{1}{\phi(r)} \sum_{n: (n,qr)=1} \alpha \ast \beta(n)| \ll NM \log^{-A} x,

as the claim then follows by dividing by {\phi(q)} and summing using standard estimates (see Lemma 8 of this previous post). But this claim follows from the Bombieri-Vinogradov theorem (Theorem 4), after restricting {\alpha,\beta} to the integers coprime to {q} (which does not affect the property of being a coefficient sequence supported at a certain scale, nor does it affect the Siegel-Walfisz theorem).

Now we establish (25). Here we will not take advantage of the {r} summation, and use crude estimates to reduce to showing that

\displaystyle  \sum_{q; (q,r)=1} |\sum_{n: n = a_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n)| \ \ \ \ \ (27)

\displaystyle \ll NM R^{-1} \tau(r)^{O(1)} \log^{-A} x

for each individual {r \in {\mathcal S}_I} with {R \ll r \ll R}, which we now fix. Actually, we will prove the more general statement

\displaystyle  \sum_{q; (q,r)=1} |\sum_{n: n = b_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n) - \sum_{n: n = b'_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n)| \ \ \ \ \ (28)

\displaystyle  \ll NM R^{-1} \tau(r)^{O(1)} \log^{-A} x

whenever {(b_q)_{q \in {\mathcal S}_{I'}}, (b'_q)_{q \in {\mathcal S}_{I'}}} are good singleton congruence class systems. Let us see how (28) implies (27). Observe that if {t = o(x)} is any integer, then {(\{a_q\})_{q \in {\mathcal S}_{I'_t}}} and {(\{a_q+t\})_{q \in {\mathcal S}_{I'_t}}} are also good singleton congruence class systems, where {I'_t} consists of the primes {p \in I'} with {a_p+t} not divisible by {p} (thus {q \in {\mathcal S}_{I'}} lies in {{\mathcal S}_{I'_t}} when {a_q+t} is coprime to {q}). By (28) we thus have

\displaystyle  \sum_{q; (q,r)=1} 1_{(a_q+t,q)=1} |\sum_{n: n = a_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n)

\displaystyle - \sum_{n: n = a_q + t\ (q); n = a_r\ (r)} \alpha \ast \beta(n)|

\displaystyle \ll NM R^{-1} \tau(r)^{O(1)} \log^{-A} x.

If we average {t} over the interval {T := {\bf Z} \cap [-x^{1-\epsilon},x^{1-\epsilon}]} for some sufficiently small fixed {\epsilon>0}, we observe that

\displaystyle  \frac{1}{|T|} \sum_{t \in T} 1_{(a_q+t,q)=1} = \frac{\phi(q)}{q} + O( Q |T|^{-1} )

and similarly (using the crude divisor bound)

\displaystyle  \frac{1}{|T|} \sum_{t \in T} 1_{(a_q+t,q)=1} \sum_{n: (n,q)=1; n = a_q + t\ (q); n = a_r\ (r)} \alpha \ast \beta(n)

\displaystyle = \frac{1}{q} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n) + O( NM |T|^{-1} x^{o(1)} ).

(The condition {(n,q)=1} is redundant, but we include it for emphasis.) We conclude from the triangle inequality and the bounds on {Q,R} (if {\epsilon} is small enough) that

\displaystyle  \sum_{q; (q,r)=1} \frac{\phi(q)}{q} |\sum_{n: n = a_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n)|

\displaystyle  \ll NM R^{-1} \tau(r)^{O(1)} \log^{-A} x.

On the other hand, from crude estimates one has

\displaystyle  \sum_{q; (q,r)=1} \frac{q}{\phi(q)} |\sum_{n: n = a_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1; n = a_r\ (r)} \alpha \ast \beta(n)|

\displaystyle  \ll NM R^{-1} \log^{O(1)} x

and the claim (27) then follows from Cauchy-Schwarz (noting from the Chinese remainder theorem that the two constraints {n = a_q\ (q), n = a_r\ (r)} are equivalent to the single constraint {n = a_{qr}\ (qr)}).

It remains to prove (28). We will use the dispersion method (or Cauchy-Schwarz), playing the two congruence conditions {n = b_q\ (q)} and {n = b'_q\ (q)} against each other. We first get rid of the absolute values in (28) by introducing an additional bounded coefficient. More precisely, to prove (28) it suffices to show that

\displaystyle  |\sum_{q; (q,r)=1} c_q (\sum_{n: n = b_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n) - \sum_{n: n = b'_q\ (q); n = a_r\ (r)} \alpha \ast \beta(n))|

\displaystyle \ll NM R^{-1} \tau(r)^{O(1)} \log^{-A} x

for any bounded real coefficients {c_q = O(1)}. We expand out the Dirichlet convolution

\displaystyle  \alpha \ast \beta(n) = \sum_{m,n': mn' = n} \alpha(m) \beta(n')

then relabel {n'} as {n} to rearrange the left-hand side as

\displaystyle  |\sum_{m} \alpha(m) \sum_{q,n: mn = a_r\ (r); (q,r)=1} c_{q} \beta(n) (1_{mn = b_{q}\ (q)} - 1_{mn = b'_{q}\ (q)})|.

We can write {\alpha = \alpha \psi_M} for some smooth non-negative coefficient sequence {\psi_M} at scale {M}. From crude bounds one has

\displaystyle  \sum_{m} \alpha^2(m) \psi_M(m)\ll M \log^{O(1)} x

so by the Cauchy-Schwarz inequality it suffices to show that

\displaystyle  \sum_{m} \psi_M(m) |\sum_{q,n: mn = a_r\ (r); (q,r)=1} c_{q} \beta(n) (1_{mn = b_{q}\ (q)} - 1_{mn = b'_{q}\ (q)})|^2 \ \ \ \ \ (29)

\displaystyle  \ll N^2 M R^{-2} \tau(r)^{O(1)} \log^{-A} x

for any fixed {A>0}. (As a sanity check, note that we are still only asking for a {\log^{-A} x} savings over the trivial bound.) Expanding out the square, it suffices to show that

\displaystyle  \sum_{m} \psi_M(m) \sum_{q_1,q_2,n_1,n_2: mn_1=mn_2 = a_r\ (r); (q_1,r)=(q_2,r)=1} \ \ \ \ \ (30)

\displaystyle  c_{q_1} c_{q_2} \beta(n_1) \beta(n_2) 1_{mn_1 = b_{q_1}\ (q_1)} 1_{mn_2 = b'_{q_2}\ (q_2)}

\displaystyle  = X + O( N^2 M R^{-2} \tau(r)^{O(1)} \log^{-A} x )

where {q_1,q_2} is subject to the same constraints as {q} (thus {q_i \in {\mathcal S}_{I'}} and {Q \ll q_i \ll Q} for {i=1,2}), and {X} is some quantity that is independent of the choice of congruence classes {(b_q)_{q \in {\mathcal S}_I}}, {(b'_q)_{q \in {\mathcal S}_I}}, since by replacing the {b_q} with {b'_q} or vice versa as necessary one can express (29) as a linear combination {S_1-2S_2+S_3} of terms {S_1,S_2,S_3} of the form in (29) in such a way that all the {X} terms cancel out ({X - 2X + X = 0}).

At this stage we need to deal with a technical problem that {q_1,q_2} may share a common factor; fortunately, this event turns out to be negligible (but only thanks to the controlled multiplicity hypothesis (6)). More precisely, we split (30) into the coprime case

\displaystyle  \sum_{m} \psi_M(m) \sum_{q_1,q_2,n_1,n_2: mn_1=mn_2 = a_r\ (r); (q_1,r)=(q_2,r)=(q_1,q_2)=1} \ \ \ \ \ (31)

\displaystyle  c_{q_1} c_{q_2} \beta(n_1) \beta(n_2) 1_{mn_1 = b_{q_1}\ (q_1)} 1_{mn_2 = b'_{q_2}\ (q_2)}

\displaystyle = X + O( N^2 M R^{-2} \log^{-A} x )

and the non-coprime case

\displaystyle  |\sum_{q_0>1} \sum_{m} \psi_M(m) \sum_{q_1,q_2,n_1,n_2: mn_1=mn_2 = a_r\ (r); (q_1,r)=(q_2,r)=1; (q_1,q_2)=q_0} \ \ \ \ \ (32)

\displaystyle  c_{q_1} c_{q_2} \beta(n_1) \beta(n_2) 1_{mn_1 = b_{q_1}\ (q_1)} 1_{mn_2 = b'_{q_2}\ (q_2)}|

\displaystyle \ll N^2 M R^{-2} \tau(r)^{O(1)} \log^{-A} x

We first show (32). The basic point here is that because we have previously restricted {q_1,q_2} to have no prime factor smaller than {D_0}, we can gain a factor of {D_0^{-1}} in (32), which is strong enough to overcome logarithmic losses {\log^{O(1)} x} but not losses {x^{o(1)}} coming from the divisor bound. To avoid using the divisor bound we will need the controlled multiplicity hypothesis (6) (and this is the only place in the argument where this hypothesis is actually used).

We turn to the details. The quantities {q_0,q_1,q_2} must all lie in {{\mathcal S}_{I'}}. We may split {q_1 = q_0 q'_1} and {q_2 = q_0 q'_2} where {q'_1,q'_2} are coprime. By the Chinese remainder theorem, the constraints {mn_1 = b_{q_1}\ (q_1)} and {mn_2 = b'_{q_2}\ (q_2)} then imply that

\displaystyle  mn_1 = b_{q_0}\ (q_0); \quad mn_2 = b'_{q_0}\ (q_0); \quad mn_1 = b_{q'_1}\ (q'_1); \quad mn_2 = b'_{q'_2}\ (q'_2)

so by the triangle inequality and interchange of summation we can bound the left-hand side by

\displaystyle  \ll \sum_{q_0>1} \sum_{m} \psi_M(m) \sum_{n_1,n_2: mn_1=mn_2 =a_r\ (r); mn_1 = b_{q_0}\ (q_0); mn_2 = b'_{q_0}\ (q_0)}

\displaystyle  |\beta(n_1)| |\beta(n_2)| \sum_{q'_1,q'_2} 1_{mn_1 = b_{q'_1}\ (q'_1); mn_2 = b'_{q'_2}\ (q'_2)}

(with {q_0} understood to lie in {{\mathcal S}_{I'}} and be at most {Q}) which we can write as

\displaystyle  \ll \sum_{q_0>1} \sum_{m} \psi_M(m) \sum_{n_1,n_2: mn_1=mn_2 =a_r\ (r); mn_1 = b_{q_0}\ (q_0); mn_2 = b'_{q_0}\ (q_0)}

\displaystyle |\beta(n_1)| |\beta(n_2)| \tau_b(mn_1) \tau_{b'}(mn_2)

where

\displaystyle  \tau_b(n) := |\{ q \in {\mathcal S}_{I'}: n = b_q\ (q) \}

and

\displaystyle  \tau_{b'}(n) := |\{ q \in {\mathcal S}_{I'}: n = b'_q\ (q) \}

are the multiplicity functions associated to the congruence class systems {(\{b_q\})_{q \in {\mathcal S}_{I'}}} and {(\{b'_q\})_{q \in {\mathcal S}_{I'}}}.

By the elementary bound {\tau_b(mn_1) \tau_{b'}(mn_2) \leq \tau_b(mn_1)^2 + \tau_{b'}(mn_2)^2} and symmetry we may thus bound the left-hand side of (32) by

\displaystyle  \ll \sum_{q_0>1} \sum_{m} \psi_M(m) \sum_{n_1,n_2: mn_1=mn_2 =a_r\ (r); mn_1 = b_{q_0}\ (q_0); mn_2 = b'_{q_0}\ (q_0)}

\displaystyle  |\beta(n_1)| |\beta(n_2)| \tau_b(mn_1)^2

plus a symmetric term which is treated similarly and will be ignored.

We rearrange the constraints

\displaystyle mn_1=mn_2=a_r\ (r); \quad mn_1 = b_{q_0}\ (q_0); \quad mn_2 = b'_{q_0}\ (q_0)\ (q_0)

as a combination of the constraints

\displaystyle  n_1 = n_2\ (r); \quad b'_{q_0} n_1 = b_{q_0} n_2\ (q_0)

and

\displaystyle  mn_1 = a_r\ (r); \quad mn_1 = b_{q_0}\ (q_0).

For a given choice of {n_1,n_2} obeying the former set of constraints, the {mn_1} obeying the second set of constraints lie in a single congruence class mod {q_0rn_1} by the Chinese remainder theorem. From the controlled multiplicity hypothesis (6) one thus has

\displaystyle  \sum_{m: mn_1 = a_r\ (r); mn_1 = b_{q_0}\ (q_0)} \psi_M(m) \tau_b(mn_1)^2 \ll \frac{M}{q_0 R} \tau(q_0 r n_1)^{O(1)} \log^{O(1)} x + x^{o(1)}

and so the left-hand side of (32) has been bounded by

\displaystyle  \ll \sum_{q_0>1} \sum_{n_1,n_2: n_1=n_2\ (r); b'_{q_0} n_1 = b_{q_0} n_2\ (q_0)} |\beta(n_1)| |\beta(n_2)|

\displaystyle  (\frac{M}{q_0 R} \tau(q_0 r n_1)^{O(1)} \log^{O(1)} x + x^{o(1)}).

We first dispose of the {x^{o(1)}} error term. From the divisor bound we have {|\beta(n_1)| |\beta(n_2)| = O(x^{o(1)})}. For a given choice of {n_1 \sim N}, the {n_2} sum then can be bounded by {x^{o(1)} (\frac{N}{q_0 R} + 1)}, leading to a total contribution of

\displaystyle  \ll x^{o(1)} N (\frac{N}{R} + Q).

We would like to bound this by {N^2 M R^{-2} \log^{-A} x}. This is possible if we have

\displaystyle  R \ll x^{-c+o(1)} M \ \ \ \ \ (33)

and

\displaystyle  QR \times R \ll x^{-c+o(1)} MN

for some fixed {c>0}. But the former bound is immediate from (23), (10), (22), while from (9), (24), (23) we see that the latter bound will follow if we have

\displaystyle  x^\mu \gg x^{-1/2+2\varpi+c} N. \ \ \ \ \ (34)

for some fixed {c>0}. We file away this necessary condition for now and move on, though we note that these conditions are weaker than (22) except in the “Type II” case when {M,N} are close to {\sqrt{x}}.

Having disposed of the {x^{o(1)}} error term, the remaining contribution to the left-hand side of (32) that we need to control is

\displaystyle  \sum_{q_0>1} \sum_{n_1,n_2: n_1=n_2\ (r); b'_{q_0} n_1 = b_{q_0} n_2\ (q_0)} |\beta(n_1)| |\beta(n_2)| \frac{M}{q_0 R} \tau(q_0 r n_1)^{O(1)} \log^{O(1)} x.

Summing in {n_2} using crude bounds, this is bounded by

\displaystyle  \ll \sum_{q_0>1} \sum_{n_1} |\beta(n_1)| \frac{M}{q_0 R} \tau(q_0 r n_1)^{O(1)} \log^{O(1)} x ( \frac{N}{q_0 R} + x^{o(1)}),

and then by summing in {n_1} this is in turn bounded by

\displaystyle  \ll \sum_{q_0>1} \frac{NM}{q_0 R} \tau(q_0 r)^{O(1)} \log^{O(1)} x ( \frac{N}{q_0 R} + x^{o(1)}).

The {x^{o(1)}} term sums to {O( x^{o(1)} NM / R )}, which is {O( N^2 M R^{-2} \log^{-A} x)} thanks to (23), (22). The main term is then

\displaystyle  \ll ( \tau(r)^{O(1)} \log^{O(1)} x) \frac{N^2 M}{R^2} \sum_{q_0>1} \frac{\tau(q_0)^{O(1)}}{q_0^2}.

Now we finally use the fact that {q_0} has no small factors less than {D_0} to bound the summation here by

\displaystyle  \prod_{p \geq D_0} (1 + O(\frac{1}{p^2})) - 1 = O( \frac{1}{D_0} )

and since {D_0} grows faster than any power of {\log x} we see that this error term is also acceptable for (32). This concludes the proof of (32) (contingent of course on the lower bounds (22), (34) on {\mu} that we will deal with later).

It remains to verify (31). Observe that {n_1} must be coprime to {q_1r} and {n_2} coprime to {q_2r}, with {n_1 = n_2\ (r)}, to have a non-zero contribution to the sum. We then rearrange the left-hand side as

\displaystyle  \sum_{q_1,q_2: (q_1,r)=(q_2,r)=(q_1,q_2)=1} \sum_{m} \psi_M(m) \sum_{n_1,n_2: n_1=n_2\ (r); (n_1,q_1r)=(n_2,q_2)=1}

\displaystyle c_{q_1} c_{q_2} \beta(n_1) \beta(n_2) 1_{m = a_r/n_1\ (r); m = b_{q_1}/n_1\ (q_1); m = b'_{q_2}/n_2 (q_2)};

note that these inverses in the various rings {{\bf Z}/r{\bf Z}}, {{\bf Z}/q_1{\bf Z}}, {{\bf Z}/q_2{\bf Z}} are well-defined thanks to the coprimality hypotheses.

We may write {n_2 = n_1+kr} for some {k = O(N/R)}. By the triangle inequality, and relabeling {n_1} as {n}, it thus suffices to show that for any particular

\displaystyle  k = O(N/R), \ \ \ \ \ (35)

one has

\displaystyle  \sum_{q_1,q_2: (q_1,r)=(q_2,r)=(q_1,q_2)=1} \sum_{m} \psi_M(m) \sum_{n; (n,q_1r)=(n+kr,q_2)=1} \ \ \ \ \ (36)

\displaystyle  c_{q_1} c_{q_2} \beta(n) \beta(n+kr) 1_{m = a_r/n\ (r); m = b_{q_1}/n\ (q_1); m = b'_{q_2}/(n+kr) (q_2)}

\displaystyle  = X_k + O( N M R^{-1} \log^{-A} x )

for some {X_k} independent of the {b_q} and {b'_q}.

We remark that at this stage we are only needing to gain a factor of {\log^{-A} x} over the trivial bound. However, we will now perform the expensive step of completion of sums (Lemma 6), which replaces the {\psi_M} factor by an exponential phase at the cost of requiring now a significantly larger gain over the trivial bound. Applying Lemma 6 and Lemma 7, we can write the left-hand side of (36) as the sum of the main term

\displaystyle  X_k := \sum_{q_1,q_2: (q_1,r)=(q_2,r)=(q_1,q_2)=1} \sum_{n: (n,q_1r)=(n+kr,q_2)=1}

\displaystyle \frac{c_{q_1} c_{q_2} \beta(n) \beta(n+kr)}{r q_1 q_2} (\sum_m \psi_M(m));

an error term

\displaystyle  O( (\log^{O(1)} x) \frac{M}{Q^2R} \sum_{1 \leq h \leq H} \sum_{q_1,q_2} |\sum_{n} \beta(n) \beta(n+kr) \Phi(h,q_1,q_2; n) |)

where

\displaystyle  H := x^\epsilon Q^2 R/M, \ \ \ \ \ (37)

{\epsilon > 0} is an arbitrary small fixed quantity, and {\Phi = \Phi_{k,r}} is the phase

\displaystyle  \Phi(h,q_1,q_2;n) := 1_{(q_1,r)=(q_2,r)=(q_1,q_2)=(n,r)=(n,q_1)=(n+kr,q_2)=1} \ \ \ \ \ (38)

\displaystyle  e_r( \frac{a_r h}{nq_1 q_2} ) e_{q_1}( \frac{b_{q_1}h}{n r q_2} ) e_{q_2}( \frac{b'_{q_2} h}{(n+kr) r q_1} )

(here we use the bounded nature of the {c_{q_1}, c_{q_2}}); and another error term that can easily be shown to be {O(x^{-A+O(1)})} for any fixed {A}. The term {X_k} is independent of the {b_q} and {b'_q}, so it will suffice to show that

\displaystyle  \sum_h \sum_{Q \ll q_1,q_2 \ll Q} |\sum_{n} \beta(n) \beta(n+kr) \Phi(h,q_1,q_2; n)| \ll x^{-\epsilon+o(1)} Q^2 N \ \ \ \ \ (39)

for a sufficiently small fixed {\epsilon>0}, and we have dropped all hypotheses on {q_1,q_2} other than magnitude, and we abbreviate {\sum_{1 \leq h \leq H}} as {\sum_h}. As noted after Lemma 6, we are no longer asking for just a {\log^{-A} x} savings over the trivial bound; we must instead gain a factor of {x^{\epsilon} H} to overcome the summation in {h}.

Although this is not strictly necessary for our analysis, let is confirm that {H} is actually non-trivial in the sense that

\displaystyle  H \gg 1. \ \ \ \ \ (40)

Indeed, from (23) and (9) one has

\displaystyle  RM \ll x^{1-\mu+o(1)}

and hence from (24)

\displaystyle  RM \ll x^{-\mu+o(1)} Q^2 R^2

and (40) then follows from (37). Note though in the model case (20) with {\mu \approx 0} that {H} is close to {1} (for any {\sigma} between {0} and {1/8}).

We now split into two cases, one which works when {M, N} are not too close to {x^{1/2}}, and one which works when {M,N} are close to {x^{1/2}}.

Theorem 13 (Type I case) If the inequalities

\displaystyle  11\varpi + 3\mu + 3\delta + 2 \sigma < \frac{1}{4} \ \ \ \ \ (41)

and

\displaystyle  M \gg x^{1/2+2\varpi+c} \ \ \ \ \ (42)

hold for some fixed {c>0}, then (39) holds for a sufficiently small fixed {\epsilon>0}.

The condition (42) represents a border between this case and the Type II case.

Theorem 14 (Type II case) If the inequality

\displaystyle  24\varpi + 10 \mu + 10 \delta + 7 \sigma < \frac{1}{2} \ \ \ \ \ (43)

holds, then (39) holds for a sufficiently small fixed {\epsilon>0}.

Both of these theorems are established by using Cauchy-Schwarz to eliminate the absolute values and {\beta} factors in (39) until one is left with an expression only involving the phase {\Phi(h,q_1,q_2;n)} which can then be estimated using the Weil conjectures with a power saving to counteract the loss of {H}, however the use of Cauchy-Schwarz is slightly different in the two cases. In practice, the condition (43) is too strong to be satisfied by value of {\sigma} given in Theorem 3, so we will only use Theorem 14 in the case that (42) fails, since in that case we may make {\sigma} small enough for (43) to hold.

Assuming these theorems, let us now conclude the proof of Theorem 3. First suppose we are in the “Type I” regime when (42) holds for some fixed {c>0}. Then by (9) we have

\displaystyle  N \ll x^{1/2-2\varpi-c}

which means that the condition (34) is now weaker than (22) and may be omitted. By (7) we can simultaneously obey (22), (34), (41) by setting {\mu} sufficiently close to zero, and the claim now follows from Theorem 13.

Now suppose instead that we are in the “Type II” regime where (42) fails for some small {c>0}, so that by (9) we have

\displaystyle  x^{1/2-2\varpi-c} \ll N \ll M \ll x^{1/2+2\varpi+c}.

From this we see that we may replace {\sigma} by {2\varpi+c} in (10) and in all of the above analysis. If we set {\mu := 2\varpi + c} then the conditions (22), (34) are obeyed. Theorem 14 will then give us what we want provided that

\displaystyle  24\varpi + 10 (2\varpi+c) + 10 \delta + 7 (2\varpi+c) < \frac{1}{2}

which is satisfied for {c} small enough thanks to (8).

In the next two sections we establish Theorem 13 and Theorem 14.

— 6. The Type I sum —

We now prove Theorem 13. It suffices to show that

\displaystyle  |\sum_h \sum_{Q \ll q_1,q_2 \ll Q} c_{h,q_1,q_2} \sum_{n} \beta(n) \beta(n+kr) \Phi(h,q_1,q_2; n)| \ll x^{-\epsilon+o(1)} Q^2 N

for any bounded real coefficients {c_{h,q_1,q_2} = O(1)} (which are vaguely related to the previous coefficients {c_q}, but this is not important for us here). We rearrange the left-hand side as

\displaystyle  |\sum_{Q\ll q_1 \ll Q} \sum_n \beta(n) \beta(n+kr) \sum_h \sum_{Q \ll q_2 \ll Q} c_{h,q_1,q_2} \Phi(h,q_1,q_2; n)|.

From the divisor bound we have

\displaystyle  \sum_{Q \ll q_1 \ll Q} \sum_n |\beta(n) \beta(n+kr)|^2 \ll x^{o(1)} Q N

and we may write {\beta = \beta \psi_N} for some smooth coefficient sequence at scale {N}, so by Cauchy-Schwarz it suffices to show that

\displaystyle  \sum_{Q \ll q_1 \ll Q} \sum_{n} \psi_N(n) |\sum_h \sum_{Q \ll q_2 \ll Q} c_{h,q_1,q_2} \Phi(h,q_1,q_2; n)|^2 \ll x^{-2\epsilon+o(1)} Q^3 N

(note now we have to gain more than {H^2} over the trivial bound, rather than just {H}). We rearrange this as

\displaystyle  |\sum_{h,h'} \sum_{Q \ll q_1, q_2,q'_2 \ll Q} c_{h,q_1,q_2} \overline{c_{h',q_1,q'_2}} \sum_{n} \psi_N(n)\Phi(h,q_1,q_2;n) \overline{\Phi(h',q_1,q'_2;n)}|

so by the triangle inequality it suffices to show that

\displaystyle  \sum_{h,h'} \sum_{Q\ll q_1, q_2,q'_2 \ll Q} |\sum_{n} \psi_N(n) \Phi(h,q_1,q_2;n) \overline{\Phi(h',q_1,q'_2;n)}| \ll x^{-2\epsilon+o(1)} Q^3 N

for some fixed {\epsilon>0}. We discard the {q_1} summation and reduce to showing that

\displaystyle  \sum_{h,h'} \sum_{Q\ll q_2,q'_2 \ll Q} |\sum_{n} \psi_N(n) \Phi(h,q_1,q_2;n) \overline{\Phi(h',q_1,q'_2;n)}| \ll x^{-2\epsilon+o(1)} Q^2 N \ \ \ \ \ (44)

for any {Q \ll q_1 \ll Q}.

To prove (44), we isolate the diagonal case {h'q_2 = hq'_2} and the non-diagonal case {h'q_2 \neq h q'_2}. For the diagonal case, we make the crude bound

\displaystyle  |\sum_{n} \psi_N(n) \Phi(h,q_1,q_2;n) \overline{\Phi(h',q_1,q'_2;n)}| \ll x^{o(1)} N

The contribution of the diagonal case can now be bounded by

\displaystyle  \ll N | \{ (h,h',q_2,q'_2): h,h' = O(H); q_2,q'_2 = O(Q); hq'_2=h'q_2 \}|.

Writing {m := hq'_2 = h'q_2} we have {m=O(HQ)}, and one can estimate this by

\displaystyle  \ll N \sum_{m = O(HQ)} \tau(m)^2

which by the divisor bound is of the form

\displaystyle  \ll x^{o(1)} N H Q.

For this to be acceptable we need a bound of the form

\displaystyle  H \ll x^{-2\epsilon+o(1)} Q

which, by (37) is equivalent to

\displaystyle  QR \ll x^{-3\epsilon+o(1)} M,

but this follows from (24), (42) for {\epsilon} small enough.

Now we treat the non-diagonal case {h'q_2 \neq hq'_2}. The key estimate here is

\displaystyle  |\sum_{n} \psi_N(n) \Phi(h,q_1,q_2;n) \overline{\Phi(h',q_1,q'_2;n)}| \ \ \ \ \ (45)

\displaystyle  \ll x^{o(1)} (Q^{3/2} R^{1/2} + H Q R^{-1} N)

for all non-diagonal {h,q_2,h'_2,q'_2}. In the model case (20) with {\mu \approx 0}, the two terms on the right-hand side are approximately {x^{1/4+\sigma+o(1)}} and {x^{\sigma+o(1)}}, which give the desired power saving compared to the trivial bound of {x^{1/2-\sigma+o(1)}} since {\sigma < 1/8} (and in (20), {H} is small, so just about any power saving suffices). As the model case indicates, the first term in (45) is the dominant one in practice.

Assume for the moment that the estimate (45) holds; then the non-diagonal contribution to (44) is

\displaystyle  \ll x^{o(1)} ( H^2 Q^{7/2} R^{1/2} + H^3 Q^3 R^{-1} N )

so to conclude (44) we need to show that

\displaystyle  H^2 Q^{7/2} R^{1/2} \ll x^{-2\epsilon+o(1)} Q^2 N

and

\displaystyle  H^3 Q^3 R^{-1} N \ll x^{-2\epsilon+o(1)} Q^2 N.

Using (37), (9) we can rewrite these criteria as

\displaystyle  (QR)^{11/2} \ll x^{2-4\epsilon+o(1)} N^2 (R/N)^3

and

\displaystyle  (QR)^7 \ll x^{3-5\epsilon+o(1)} N^2 (R/N)^5

respectively. Applying (24), (23), it suffices to verify that

\displaystyle  x^{\frac{11}{4} + 11 \varpi} \ll x^{2-4\epsilon-3\mu-3\delta+o(1)} N^2

and

\displaystyle  x^{\frac{7}{2} + 14 \varpi} \ll x^{3-5\epsilon-5\mu-5\delta+o(1)} N^2

but these follow from (10) and (41) (the latter inequality holds with considerable room to spare).

It remains to show (45) in the non-diagonal case {h'q_2 \neq hq'_2}. From (38) we may of course assume that

\displaystyle  (q_1,r) = (q_2,r) = (q_1,q_2) = (q'_2,r) = (q_1,q'_2) = 1

and the left-hand side of (45) expands as

\displaystyle  | \sum_{n} \psi_N(n) 1_{(n,q_1r) = (n+kr,q_2q'_2)=1} e_r( \frac{a_r (hq'_2-h'q_2)}{nq_1q_2q'_2} ) e_{q_1}( \frac{b_{q_1} (h q'_2 - h' q_2)}{nrq_2q'_2})

\displaystyle  e_{q_2}( \frac{b'_{q_2} h}{(n+kr) r q_1} ) e_{q'_2}( -\frac{b'_{q'_2} h'}{(n+kr) r q_1} )|.

The first two phases

\displaystyle  e_r( \frac{a_r (hq'_2-h'q_2)}{nq_1q_2q'_2} ) e_{q_1}( \frac{b_{q_1} (h q'_2 - h' q_2)}{nrq_2q'_2})

can be combined as

\displaystyle  e_{d_1}( \frac{c_1}{n} )

where {d_1 := q_1 r} and {c_1 \in {\bf Z}/d_1{\bf Z}} is the residue class

\displaystyle  c_1 := (hq'_2 - h'q_2) (a_r \bar{q_1} \bar{q_2} \bar{q'_2} q_1 + b_{q_1} \bar{r} \bar{q_2} \bar{q'_2} r)\ (d_1)

and the inverses {\bar{x}} are with respect to modulus {r} in the first summand and {q_1} in the second summand. For future reference we note that

\displaystyle  (c_1, r) = (hq'_2-h'q_2, r). \ \ \ \ \ (46)

Similarly, the two phases

\displaystyle  e_{q_2}( \frac{b'_{q_2} h}{(n+kr) r q_1} ) e_{q'_2}( -\frac{b'_{q'_2} h'}{(n+kr) r q_1} )

can combine as

\displaystyle  e_{d_2}( \frac{c_2}{n+kr} )

where {d_2 := [q_2,q'_2]} and {c_2 \in {\bf Z}/d_2{\bf Z}} is the residue class

\displaystyle  c_2 := b'_{q_2} h \bar{r} \bar{q_1} \frac{d_2}{q_2} - b'_{q'_2} h' \bar{r} \bar{q_1} \frac{d_2}{q'_2}\ (d_2),

although the precise value of {c_2} will not be important for us. The left-hand side of (45) is thus

\displaystyle  |\sum_{n} \psi_N(n) 1_{(n,d_1) = (n+kr,d_2)=1} e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+kr} )|

and hence Corollary 10 and the coprimality of {d_1,d_2} is bounded by

\displaystyle  \ll x^{o(1)} ( (d_1 d_2)^{1/2} + \frac{N}{d_1 d_2} (c_1,d_1) (c_2,d_2) ).

Note that

\displaystyle  d_1 d_2 \leq q_1 r q_2 q'_2 \ll Q^3 R

which is controlled by the first term on the right-hand side of (45). So it remains to show that

\displaystyle  \frac{1}{d_1 d_2} (c_1,d_1) (c_2,d_2) \ll H Q R^{-1}.

We crudely bound {(c_2,d_2)} by {d_2} and {(c_1,d_1)} by {(c_1,r) q_1}. (This is inefficient, but this term is not dominant in the analysis in any case.) By (46) we may bound {(c_1,r)} by {HQ}, and the claim follows.

— 7. The Type II sum —

We now prove Theorem 14. As before, it suffices to show that

\displaystyle  |\sum_h \sum_{Q \ll q_1,q_2 \ll Q} c_{h,q_1,q_2} \sum_{n} \beta(n) \beta(n+kr) \Phi(h,q_1,q_2; n)| \ll x^{-\epsilon+o(1)} Q^2 N

for any bounded real coefficients {c_{h,q_1,q_2} = O(1)}. We rearrange the left-hand side slightly differently from the previous section:

\displaystyle  |\sum_n \beta(n) \beta(n+kr) \sum_h \sum_{Q \ll q_1, q_2 \ll Q} c_{h,q_1,q_2} \Phi(h,q_1,q_2; n)|.

From crude bounds we have

\displaystyle  \sum_n |\beta(n) \beta(n+kr)|^2 \ll x^{o(1)} N,

so by Cauchy-Schwarz it suffices to show that

\displaystyle  \sum_{n} \psi_N(n) |\sum_h \sum_{Q \ll q_1,q_2 \ll Q} c_{h,q_1,q_2} \Phi(h,q_1,q_2; n)|^2 \ll x^{-2\epsilon+o(1)} Q^4 N.

Expanding and using the triangle inequality as in the previous section, we reduce to showing that

\displaystyle  \sum_{h,h'} \sum_{Q\ll q_1, q'_1, q_2,q'_2 \ll Q} |\sum_{n} \psi_N(n) \Phi(h,q_1,q_2;n) \overline{\Phi(h',q'_1,q'_2;n)}| \ll x^{-2\epsilon+o(1)} Q^4 N. \ \ \ \ \ (47)

This is basically the same situation as in the previous section except that we have decoupled the {q_1,q'_1} variables from each other.

As before, we isolate a diagonal case {h'q_1 q_2 = h q'_1 q'_2}, and now consider the contribution of this case. Arguing as in the previous section, the contribution of this case to (47) can be bounded by

\displaystyle  \ll N \sum_{m = O(HQ^2)} \tau(m)^{O(1)}

which by the divisor bound is of the form

\displaystyle  \ll x^{o(1)} N H Q^2

which will be acceptable if

\displaystyle  H \ll x^{-2\epsilon+o(1)} Q^2.

By (37) this is equivalent to

\displaystyle  R \ll x^{-3\epsilon+o(1)} M,

but this is automatic from (23) and (22).

For the off-diagonal case, we use the following variant

\displaystyle  |\sum_{n} \psi_N(n) \Phi(h,q_1,q_2;n) \overline{\Phi(h',q'_1,q'_2;n)}| \ll \ \ \ \ \ (48)

\displaystyle  x^{o(1)} (Q^{2} R^{1/2} + H Q^6 R^{-1} N)

of (45), valid for all non-diagonal {h,q_1,q_2,h',q'_1,q'_2}. The bound here is weaker than in (45), but in the model case (20) the right-hand side terms are approximately {x^{\frac{1}{4} + \frac{3\sigma}{2}+o(1)}} and {x^{6\sigma+o(1)}} respectively, which still represents a power saving over the trivial bound of approximately {x^{1/2-\sigma}} as long as {\sigma < 1/14}. While this does not cover all the range {\sigma<1/8} that the Type I analysis does, it crucially is able to cover the case when {\sigma} is very close to zero, which the Type I analysis does not cover due to the condition (42). The Type I/II border is not the critical border for optimising the exponents, so it is not a priority for us to improve bounds in the Type II analysis such as (48).

We assume (48) for now and finish the proof of Theorem 14 by numerical computations similar to that in the previous section. The non-diagonal contribution to (47) is now estimated by

\displaystyle  \ll x^{o(1)} ( H^2 Q^6 R^{1/2} + H^3 Q^{10} R^{-1} N )

so to conclude (47) we need to show that

\displaystyle  H^2 Q^6 R^{1/2} \ll x^{-2\epsilon+o(1)} Q^4 N

and

\displaystyle  H^3 Q^{10} R^{-1} N \ll x^{-2\epsilon+o(1)} Q^4 N.

Using (37), (9) we can rewrite these criteria as

\displaystyle  (QR)^6 \ll x^{2-4\epsilon+o(1)} N^{5/2} (R/N)^{7/2}

and

\displaystyle  (QR)^{12} \ll x^{3-5\epsilon+o(1)} N^7 (R/N)^{10}

respectively. Applying (24), (23), it suffices to verify that

\displaystyle  x^{3 + 12 \varpi} \ll x^{2-4\epsilon-\frac{7}{2}\mu-\frac{7}{2}\delta+o(1)} N^{5/2}

and

\displaystyle  x^{6 + 24 \varpi} \ll x^{3-5\epsilon-10\mu-10\delta+o(1)} N^7

but these follow from (10) and (43).

It remains to prove (48). This is very similar to the treatment of (45). From (38) we may assume

\displaystyle  (q_1,r) = (q'_1,r) = (q_2,r) = (q'_2,r) = (q_1,q_2) = (q'_1,q'_2) = 1

and the left-hand side of (48) expands as

\displaystyle  | \sum_{n} \psi_N(n) 1_{(n,q_1q'_1r) = (n+kr,q_2q'_2)=1} e_r( \frac{a_r (hq'_1q'_2-h'q_1q_2)}{nq_1q'_1q_2q'_2} )

\displaystyle  e_{q_1}( \frac{b_{q_1} h}{nrq_2}) e_{q'_1}( -\frac{b'_{q_1} h'}{nrq'_2})

\displaystyle  e_{q_2}( \frac{b'_{q_2} h}{(n+kr) r q_1} ) e_{q'_2}( -\frac{b'_{q'_2} h'}{(n+kr) r q'_1} )|.

If we set

\displaystyle  d_1 := [q_1,q'_1]r; \quad d_2 := [q_2,q'_2]

then as before we can rewrite this sum as

\displaystyle  | \sum_{n} \psi_N(n) 1_{(n,d_1) = (n+kr,d_2)=1} e_{d_1}( \frac{c_1}{n} ) e_{d_2}( \frac{c_2}{n+kr} ) |

where

\displaystyle  c_1 = a_r \bar{q_1} \bar{q'_1} \bar{q_2} \bar{q'_2} (hq'_1q'_2-h'q_1q_2) \frac{d_1}{r}

\displaystyle  + b_{q_1} \bar{r} \bar{q_2} h \frac{d_1}{q_1} - b'_{q'_1} \bar{r} \bar{q'_2} h' \frac{d_1}{q'_1}\ (d_1)

(with the inverses with respect to the moduli {r, q_1, q'_1} in the first, second, and third summands respectively), and the value of {c_2} is unimportant for us. We have an analogue of (46), namely

\displaystyle  (c_1, r) = (hq'_1q'_2-h'q_1q_2, r). \ \ \ \ \ (49)

We apply Corollary 10 as before, although things are not quite as favorable because {d_1,d_2} need not be coprime in this case. This bounds the left-hand side of (48) by

\displaystyle  \ll x^{o(1)} ( (d_1 d_2)^{1/2} + \frac{N (d_1,d_2)^2}{d_1 d_2} (c_1,d_1) (c_2,d_2) ).

We have

\displaystyle  d_1 d_2 \leq q_1 q'_1 r q_2 q'_2 \ll Q^4 R

which is controlled by the first term on the right-hand side of (48). So it remains to show that

\displaystyle  \frac{(d_1,d_2)^2}{d_1 d_2} (c_1,d_1) (c_2,d_2) \ll H Q^6 R^{-1}.

We crudely bound {(c_2,d_2)} by {d_2} and {(c_1,d_1)} by {(c_1,r) \frac{d_1}{r}}; also

\displaystyle  (d_1,d_2) \leq (q_1q'_1, q_2 q'_2) \ll Q^2

and from (49) one has {(c_1,r) \ll H Q^2}. The claim follows.