This is the final continuation of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and this previous post, that covers the Type I and Type II sums.) The main purpose of this post is to present (and hopefully, to improve upon) the treatment of the final and most innovative of the key estimates in Zhang’s paper, namely the Type III estimate.

The main estimate was already stated as Theorem 17 in the previous post, but we quickly recall the relevant definitions here. As in other posts, we always take {x} to be a parameter going off to infinity, with the usual asymptotic notation {O(), o(), \ll} associated to this parameter.

Definition 1 (Coefficient sequences) A coefficient sequence is a finitely supported sequence {\alpha: {\bf N} \rightarrow {\bf R}} that obeys the bounds

\displaystyle  |\alpha(n)| \ll \tau^{O(1)}(n) \log^{O(1)}(x) \ \ \ \ \ (1)

for all {n}, where {\tau} is the divisor function.

  • (i) If {\alpha} is a coefficient sequence and {a\ (q) = a \hbox{ mod } q} is a primitive residue class, the (signed) discrepancy {\Delta(\alpha; a\ (q))} of {\alpha} in the sequence is defined to be the quantity

    \displaystyle  \Delta(\alpha; a \ (q)) := \sum_{n: n = a\ (q)} \alpha(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1} \alpha(n). \ \ \ \ \ (2)

  • (ii) A coefficient sequence {\alpha} is said to be at scale {N} for some {N \geq 1} if it is supported on an interval of the form {[(1-O(\log^{-A_0} x)) N, (1+O(\log^{-A_0} x)) N]}.
  • (iii) A coefficient sequence {\alpha} at scale {N} is said to be smooth if it takes the form {\alpha(n) = \psi(n/N)} for some smooth function {\psi: {\bf R} \rightarrow {\bf C}} supported on {[1-O(\log^{-A_0} x), 1+O(\log^{-A_0} x)]} obeying the derivative bounds

    \displaystyle  \psi^{(j)}(t) = O( \log^{j A_0} x ) \ \ \ \ \ (3)

    for all fixed {j \geq 0} (note that the implied constant in the {O()} notation may depend on {j}).

For any {I \subset {\bf R}}, let {{\mathcal S}_I} denote the square-free numbers whose prime factors lie in {I}. The main result of this post is then the following result of Zhang:

Theorem 2 (Type III estimate) Let {\varpi, \delta > 0} be fixed quantities, and let {M, N_1, N_2, N_3 \gg 1} be quantities such that

\displaystyle  x \ll M N_1 N_2 N_3 \ll x

and

\displaystyle  N_1 \gg N_2, N_3

and

\displaystyle  N_1^4 N_2^4 N_3^5 \gg x^{4+16\varpi+\delta+c}

for some fixed {c>0}. Let {\alpha, \psi_1, \psi_2, \psi_3} be coefficient sequences at scale {M,N_1,N_2,N_3} respectively with {\psi_1,\psi_2,\psi_3} smooth. Then for any {I \subset [1,x^\delta]} we have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha \ast \beta; a)| \ll x \log^{-A} x.

In fact we have the stronger “pointwise” estimate

\displaystyle  |\Delta(\alpha \ast \psi_1 \ast \psi_2 \ast \psi_3; a)| \ll x^{-\epsilon} \frac{x}{q} \ \ \ \ \ (4)

for all {q \in {\mathcal S}_I} with {q < x^{1/2+2\varpi}} and all {a \in ({\bf Z}/q{\bf Z})^\times}, and some fixed {\epsilon>0}.

(This is very slightly stronger than previously claimed, in that the condition {N_2 \gg N_3} has been dropped.)

It turns out that Zhang does not exploit any averaging of the {\alpha} factor, and matters reduce to the following:

Theorem 3 (Type III estimate without {\alpha}) Let {\delta > 0} be fixed, and let {1 \ll N_1, N_2, N_3, d \ll x^{O(1)}} be quantities such that

\displaystyle  N_1 \gg N_2, N_3

and

\displaystyle d \in {\mathcal S}_{[1,x^\delta]}

and

\displaystyle  N_1^4 N_2^4 N_3^5 \gg d^8 x^{\delta+c}

for some fixed {c>0}. Let {\psi_1,\psi_2,\psi_3} be smooth coefficient sequences at scales {N_1,N_2,N_3} respectively. Then we have

\displaystyle  |\Delta(\psi_1 \ast \psi_2 \ast \psi_3; a)| \ll x^{-\epsilon} \frac{N_1 N_2 N_3}{d}

for all {a \in ({\bf Z}/d{\bf Z})^\times} and some fixed {\epsilon>0}.

Let us quickly see how Theorem 3 implies Theorem 2. To show (4), it suffices to establish the bound

\displaystyle  \sum_{n = a\ (q)} \alpha \ast \psi_1 \ast \psi_2 \ast \psi_3(n) = X + O( x^{-\epsilon} \frac{x}{q} )

for all {a \in ({\bf Z}/q{\bf Z})^\times}, where {X} denotes a quantity that is independent of {a} (but can depend on other quantities such as {\alpha,\psi_1,\psi_2,\psi_3,q}). The left-hand side can be rewritten as

\displaystyle  \sum_{b \in ({\bf Z}/q{\bf Z})^\times} \sum_{m = b\ (q)} \alpha(m) \sum_{n = a/b\ (q)} \psi_1 \ast \psi_2 \ast \psi_3(n).

From Theorem 3 we have

\displaystyle  \sum_{n = a/b\ (q)} \psi_1 \ast \psi_2 \ast \psi_3(n) = Y + O( x^{-\epsilon} \frac{N_1 N_2 N_3}{q} )

where the quantity {Y} does not depend on {a} or {b}. Inserting this asymptotic and using crude bounds on {\alpha} (see Lemma 8 of this previous post) we conclude (4) as required (after modifying {\epsilon} slightly).

It remains to establish Theorem 3. This is done by a set of tools similar to that used to control the Type I and Type II sums:

  • (i) completion of sums;
  • (ii) the Weil conjectures and bounds on Ramanujan sums;
  • (iii) factorisation of smooth moduli {q \in {\mathcal S}_I};
  • (iv) the Cauchy-Schwarz and triangle inequalities (Weyl differencing).

The specifics are slightly different though. For the Type I and Type II sums, it was the classical Weil bound on Kloosterman sums that were the key source of power saving; Ramanujan sums only played a minor role, controlling a secondary error term. For the Type III sums, one needs a significantly deeper consequence of the Weil conjectures, namely the estimate of Bombieri and Birch on a three-dimensional variant of a Kloosterman sum. Furthermore, the Ramanujan sums – which are a rare example of sums that actually exhibit better than square root cancellation, thus going beyond even what the Weil conjectures can offer – make a crucial appearance, when combined with the factorisation of the smooth modulus {q} (this new argument is arguably the most original and interesting contribution of Zhang).

— 1. A three-dimensional exponential sum —

The power savings in Zhang’s Type III argument come from good estimates on the three-dimensional exponential sum

\displaystyle  T(k; m,m'; q) := \sum_{l \in {\bf Z}/q{\bf Z}: (l,q)=(l+k,q)=1} \sum_{t \in ({\bf Z}/q{\bf Z})^\times} \sum_{t' \in ({\bf Z}/q{\bf Z})^\times} \ \ \ \ \ (5)

\displaystyle  e_q( \frac{t}{l} - \frac{t'}{l+k} + \frac{m}{t} - \frac{m'}{t'} )

defined for positive integer {q} and {k,m,m' \in {\bf Z}/q{\bf Z}} (or {k,m,m' \in {\bf Z}}). The key estimate is

Theorem 4 (Bombieri-Birch bound) Let {q} be square-free. Then for any {k,m,m' \in {\bf Z}/q{\bf Z}} we have

\displaystyle  |T(k; m,m';q)| \ll \frac{(m-m',k,q)}{(k,q)^{1/2}} q^{3/2+o(1)}

where {(m-m',k,q)} is the greatest common divisor of {m-m', k, q} (and we adopt the convention that {(0,q)=q}). (Here, the {o(1)} denotes a quantity that goes to zero as {q \rightarrow \infty}, rather than as {x \rightarrow \infty}.)

Note that the square root cancellation heuristic predicts {q^{3/2}} as the size for {T(k;m,m',q)}, thus we can achieve better than square root cancellation if {k} has a common factor with {q} that is not shared with {m-m'}. This improvement over the square root heuristic, which is ultimately due to the presence of a Ramanujan sum inside this three-dimensional exponential sum in certain degenerate cases, is crucial to Zhang’s argument.

Proof: Suppose that {q} factors as {q=q_1q_2}, thus {q_1,q_2} are coprime. Then we have

\displaystyle  e_q(a) = e_{q_1}( \frac{a}{q_2} ) e_{q_2} (\frac{a}{q_1})

(see Lemma 7 of this previous post). From this and the Chinese remainder theorem we see that {T(k;m,m';q)} factorises as

\displaystyle  \prod_{i=1}^2 \sum_{l \in {\bf Z}/q_i{\bf Z}: (l,q_i)=(l+k,q_i)=1} \sum_{t,t' \in ({\bf Z}/q_i{\bf Z})^\times} e_{q_i}( \frac{t}{q_jl} - \frac{t'}{q_j(l+k)} + \frac{m}{q_jt} - \frac{m'}{q_jt'} )

where {j := 3-i}. Dilating {t,t'} by {q_j}, we conclude the multiplicative law

\displaystyle  T(k;m,m';q_1q_2) = T(k;\frac{m}{q_2^2},\frac{m'}{q_2^2};q_1) T(k;\frac{m}{q_1^2},\frac{m'}{q_1^2};q_2).

Iterating this law, we see that to prove Theorem 4 it suffices to do so in the case when {q} is prime, or more precisely that

\displaystyle  |T(k; m,m';p)| \ll \frac{(m-m',k,p)}{(k,p)^{1/2}} p^{3/2}.

We first consider the case when {k = 0\ (p)}, so our objective is now to show that

\displaystyle  |T(0;m,m';p)| \ll (m-m',p) p. \ \ \ \ \ (6)

In this case we can write {T(0;m,m';p)} as

\displaystyle  \sum_{l,t,t' \in ({\bf Z}/p{\bf Z})^\times} e_p( \frac{t}{l} - \frac{t'}{l} + \frac{m}{t} - \frac{m'}{t'} ).

Making the change of variables {s := \frac{tt'}{l}\ (p)}, {u := \frac{1}{t}\ (p)}, {u' := \frac{1}{t'}\ (p)} this becomes

\displaystyle  \sum_{s,u,u' \in ({\bf Z}/p{\bf Z})^\times} e_p( su' - su + mu - m' u' ).

Performing the {u,u'} sums this becomes

\displaystyle  \sum_{s \in ({\bf Z}/p{\bf Z})^\times} C_p(m-s) C_p(s-m')

where {C_q(a)} is the Ramanujan sum

\displaystyle  C_q(a) := \sum_{b \in ({\bf Z}/q{\bf Z})^\times} e_q(ab).

Basic Fourier analysis tells us that {C_p(a)} equals {-1} when {a \neq 0\ (p)} and {p-1} when {a = 0\ (p)}. The expression (6) then follows from direct computation.

Next, suppose that {k \neq 0\ (p)} and {m' = 0\ (p)}. Making the change of variables {s := -\frac{t'}{l+k}}, {T(k;m,0;p)} becomes

\displaystyle  \sum_{l \in {\bf Z}/p{\bf Z}: (l,p)=(l+k,p)=1} \sum_{t \in ({\bf Z}/p{\bf Z})^\times} \sum_{s \in ({\bf Z}/p{\bf Z})^\times} e_p( \frac{t}{l} + s + \frac{m}{t} ).

Performing the {s} summation, this becomes

\displaystyle  - \sum_{l \in {\bf Z}/p{\bf Z}: (l,p)=(l+k,p)=1} \sum_{t \in ({\bf Z}/p{\bf Z})^\times} e_p( \frac{t}{l} + \frac{m}{t} ).

For each {l}, the {t} summation is a Kloosterman sum and is thus {O(p^{1/2})} by the classical Weil bound (Theorem 8 from previous notes). This gives a net estimate of {O(p^{3/2})} as desired. Similarly if {m = 0\ (p)}.

The only remaining case is when {k,m,m' \neq 0\ (p)}. Here one cannot proceed purely through Ramanujan and Weil bounds, and we need to invoke the deep result of Bombieri and Birch, proven in Theorem 1 of the the appendix to this paper of Friedlander and Iwaniec. This bound can be proven by applying Deligne’s proof of the Weil conjectures to a certain {L}-function attached to the surface {\{ (x_1,x_2,x_3,x_4): \frac{1}{x_1x_2} + \frac{1}{x_3x_4} = 1 \}}; an elementary but somewhat lengthy second proof is also given in the above appendix. \Box

To deal with factors such as {(k,q)}, the following simple lemma will be useful.

Lemma 5 For any {q} and any {K \geq 1} we have

\displaystyle  \sum_{1 \leq k \leq K} (k,q) \ll q^{o(1)} K.

in particular

\displaystyle  \sum_{t \in {\bf Z}/q{\bf Z}} (t,q) \ll q^{1+o(1)}.

As in the previous theorem, {o(1)} here denotes a quantity that goes to zero as {q \rightarrow \infty}, rather than as {x \rightarrow \infty}.

Note that it is important that the {k=0} term is excluded from the first sum, otherwise one acquires an additional {q} term. In particular,

\displaystyle  \sum_{|k| \leq K} (k,q) \ll q + q^{o(1)} K.

Proof: Estimating

\displaystyle  (k,q) \leq \sum_{d|q; d|k} d

we can bound

\displaystyle  \sum_{1 \leq k \leq K}(k,q) \leq \sum_{d|q} \sum_{1 \leq k \leq K: d|k} d

\displaystyle \leq \sum_{d|q} \frac{K}{d} d

\displaystyle  = K \tau(q)

\displaystyle  \ll q^{o(1)} K.

\Box

— 2. Cauchy-Schwarz —

We now prove Theorem 3. The reader may wish to track the exponents involved in the model regime

\displaystyle  \delta \approx 0; \quad N_1=N_2=N_3 = N; \quad N \ll d \ll N^{13/8} \ \ \ \ \ (7)

where {N} is any fixed power of {x} (e.g. {N = x^{5/16}}, in which case {d} can be slightly larger than {x^{1/2}}).

Let {\delta,N_1,N_2,N_3,q,\psi_1,\psi_2,\psi_3,a} be as in Theorem 3, and let {\epsilon>0} be a sufficiently small fixed quantity. It will suffice to show that

\displaystyle  \sum_{n = a\ (d)} \psi_1 \ast \psi_2 \ast \psi_3(n) = X + O( x^{-\epsilon} \frac{N_1 N_2 N_3}{d} )

where {X} does not depend on {a}. We rewrite the left-hand side as

\displaystyle  \sum_{n_1} \psi_1(n_1) \sum_{n: (n,q)=1; n_1 = \frac{a}{n}\ (d)} \psi_2 \ast \psi_3(n)

and then apply completion of sums (Lemma 6 from this previous post) to rewrite this expression as the sum of the main term

\displaystyle  \frac{1}{d} (\sum_{n_1} \psi_1(n_1)) (\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n))

plus the error terms

\displaystyle  O( (\log^{O(1)} x) \frac{N_1}{d} \sum_{1 \leq h \le H} |\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n) e_d( \frac{ah}{n} )| )

and

\displaystyle  O( x^{-A} \sum_n |\psi_2 \ast \psi_3(n)| ).

where {A > 0} is any fixed quantity and

\displaystyle  H := x^\epsilon \frac{d}{N_1}.

The first term does not depend on {a}, and the third term is clearly acceptable, so it suffices to show that

\displaystyle  \sum_{1 \leq h \le H} |\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n) e_d( \frac{ah}{n} ) | \ll x^{-\epsilon} N_2 N_3. \ \ \ \ \ (8)

It will be convenient to reduce to the case when {h} and {d} are coprime. More precisely, it will suffice to prove the following claim:

Proposition 6 Let {\delta>0} be fixed, and let

\displaystyle  H, N_2, N_3, d, B \gg 1 \ \ \ \ \ (9)

be such that

\displaystyle  d \in {\mathcal S}_{[1,x^\delta]}

and

\displaystyle  H \ll x^{\epsilon} \frac{d}{N_2} \ \ \ \ \ (10)

and

\displaystyle  N_2^4 N_3^5 \gg B^{-6} d^4 H^4 x^{\delta+c} \ \ \ \ \ (11)

for some fixed {c>0}, and let {\psi_2,\psi_3} be smooth coefficient sequences at scale {N_2,N_3} respectively. Then

\displaystyle  \sum_{1 \leq h \le H: (h,d)=1} |\sum_{n: (n,d)=1} \psi_2 \ast \psi_3(n) e_d( \frac{ah}{n} ) | \ll x^{-\epsilon} B N_2 N_3

for some fixed {\epsilon>0}.

Let us now see why the above proposition implies (8). To prove (8), we may of course assume {H \geq 1} as the claim is trivial otherwise. We can split

\displaystyle  \sum_{1 \leq h \leq H} F(h) = \sum_{d = d_1 d_2} \sum_{1 \leq h' \leq H/d_2: (h',d_1)=1} F( d_2 h )

for any function {F(h)} of {h}, so that (8) can be written as

\displaystyle  \sum_{d = d_1 d_2} \sum_{1 \leq h' \leq H/d_2: (h,d_1)=1} |\sum_{n: (n,d_1 d_2)=1} \psi_2 \ast \psi_3(n) e_{d_1}( \frac{ah'}{n} )|

which we expand as

\displaystyle  \sum_{d = d_1 d_2} \sum_{1 \leq h' \leq H/d_2: (h,d_1)=1} |\sum_{n_2: (n_2,d_1 d_2)=1} \sum_{n_3: (n_3,d_1d_2)=1} \psi_2(n_2) \psi_3(n_3) e_{d_1}( \frac{ah'}{n_2 n_3} )|

In order to apply Proposition (6) we need to modify the {(n_2,d_1d_2)=1}, {(n_3,d_1d_2)=1} constraints. By Möbius inversion one has

\displaystyle  \sum_{n_2: (n_2,d_1d_2)=1} F(n_2) = \sum_{b_2|d_2} \mu(b_2) \sum_{n_2: (n_2,d_1)=1} F(b_2 n_2)

for any function {F}, and similarly for {n_3}, so by the triangle inequality we may bound the previous expression by

\displaystyle  \sum_{d = d_1 d_2} \sum_{b_2|d_2} \sum_{b_3|d_3} F( d_1, d_2, b_1, b_2 ) \ \ \ \ \ (12)

where

\displaystyle  F(d_1,d_2,b_1,b_2) := \sum_{1 \leq h' \leq H/d_2: (h,d_1)=1}

\displaystyle |\sum_{n_2: (n_2,d_1)=1} \sum_{n_3: (n_3,d_1)=1} \psi_2(b_2n_2) \psi_3(b_3n_3)

\displaystyle  e_{d_1}( \frac{ah'}{b_2b_3 n_2 n_3} )|

We may discard those values of {d_2} for which {H' := H/d_2} is less than one, as the summation is vacuous in that case. We then apply Proposition (6) with {d,N_2,N_3,H} replaced by {d_1,N_2/b_2,N_3/b_3,H'} respectively and {B} set equal to {b_2 b_3}, and {\psi_2,\psi_3} replaced by {\psi_2(b_2\cdot)} and {\psi_3(b_3\cdot)}. One can check that all the hypotheses of Proposition 6 are obeyed, so we may bound (12) by

\displaystyle  \ll x^{-\epsilon} N_2 N_3 \sum_{d = d_1 d_2} \sum_{b_2|d_2} \sum_{b_3|d_3} 1

which by the divisor bound is {\ll x^{-\epsilon+o(1)} N_2 N_3}, which is acceptable (after shrinking {\epsilon} slightly).

It remains to prove Proposition 6. Continuing (7), the reader may wish to keep in mind the model case

\displaystyle  \delta \approx 0; N_2 = N_3 = N; \quad N \ll d \ll N^{13/8}; \quad H \approx d/N; \quad B \approx 1.

Note from (9), (10) one has

\displaystyle  d \gg x^{-\epsilon} N_2. \ \ \ \ \ (13)

Expanding out the {\psi_2 \ast \psi_3} convolution, our task is to show that

\displaystyle  \sum_{1 \leq h \le H: (h,d)=1} |\sum_{n_2: (n_2,d)=1} \sum_{n_3: (n_3,d)=1} \psi_2(n_2) \psi_3(n_3) e_d( \frac{ah}{n_2n_3} )| \ll x^{-\epsilon} B N_2 N_3. \ \ \ \ \ (14)

As before, our aim is to obtain a power savings better than {H} over the trivial bound of {H N_2 N_3}.

The next step is Weyl differencing. We will need a step size {r \geq 1} which we will optimise in later. We set

\displaystyle  K := \lfloor x^{-\epsilon} N_2 r^{-1} H^{-1}\rfloor; \ \ \ \ \ (15)

we will make the hypothesis that

\displaystyle  K \geq 1 \ \ \ \ \ (16)

and save this condition to be verified later.

By shifting {n_2} by {khr} for {1 \leq k \leq K} and then averaging, we may write the left-hand side of (14) as

\displaystyle  \sum_{1 \leq h \le H: (h,d)=1} |\frac{1}{K} \sum_{1 \leq k \leq K} \sum_{n_2: (n_2,d)=1} \sum_{n_3: (n_3,d)=1}

\displaystyle  \psi_2(n_2+hkr) \psi_3(n_3) e_d( \frac{ah}{(n_2+hkr)n_3} )|.

By the triangle inequality, it thus suffices to show that

\displaystyle  \sum_{1 \leq h \leq H: (h,d)=1} \sum_{n_2: (n_2,d)=1} |\sum_{1 \leq k \leq K} \psi_2(n_2+hkr) \ \ \ \ \ (17)

\displaystyle  \sum_{n_3: (n_3,d)=1} \psi_3(n_3) e_d( \frac{ah}{(n_2+hkr)n_3} )| \ll x^{-\epsilon} B K N_2 N_3.

Next, we combine the {h} and {n_2} summations into a single summation over {{\bf Z}/d{\bf Z}}. We first use a Taylor expansion and (15) to write

\displaystyle  \psi_2(n_2+hkr) = \sum_{j=0}^J \frac{1}{j!} (h/H)^j N_2^{j} \psi_2^{(j)}(n_2) (Hkr/N_2)^j + O( x^{-J\epsilon+o(1)})

for any fixed {J}. If {J} is large enough, then the error term will be acceptable, so it suffices to establish (17) with {\psi_2(n_2+hkr)} replaced by {(h/H)^j N_2^j \psi_2^{(j)}(n_2) (hkr/N_2)^j} for any fixed {j \geq 0}. We can rewrite

\displaystyle  e_d( \frac{ah}{(n_2+hkr)n_3} ) = e_d( \frac{a}{(l+kr) n_3} )

where {l \in {\bf Z}/d{\bf Z}} is such that {(l+kr,d)=1} and

\displaystyle  l = \frac{n_2}{h}\ (d).

Thus we can estimate the left-hand side of (17) by

\displaystyle  \sum_{l \in {\bf Z}/d{\bf Z}} \nu(l) |\sum_{1 \leq k \leq K: (l+kr,d)=1} (Hkr/N_2)^j \ \ \ \ \ (18)

\displaystyle \sum_{n_3: (n_3,d)=1} \psi_3(n_3) e_d( \frac{a}{(l+kr) n_3})|

where

\displaystyle  \nu(l) := \sum_{1 \leq h \leq H: (h,d)=1} \sum_{n_2} 1_{l = \frac{n_2}{h}\ (d)} N_2^j |\psi_2^{(j)}(n_2)|.

Here we have bounded {(h/H)^j} by {O(1)}.

We will eliminate the {\nu} expression via Cauchy-Schwarz. Observe from the smoothness of {\psi_2} that

\displaystyle  \nu(l) \ll x^{o(1)} |\{ (h,n_2): 1 \leq h \leq H; 1 \ll n_2 \ll N_2; (h,d)=1; l = \frac{n_2}{h}\ (d) \}|

and thus

\displaystyle  \sum_l \nu(l)^2 \ll x^{o(1)} |\{ (h,h',n_2,n'_2): 1 \leq h,h' \leq H; 1\ll n_2,n'_2 \ll N_2;

\displaystyle  (h,d)=(h',d) = 1; \frac{n_2}{h} = \frac{n'_2}{h'}\ (d) \}|.

Note that {\frac{n_2}{h} = \frac{n'_2}{h'}\ (d)} implies {n_2 h' = n'_2 h\ (d)}. But from (10) we have {1 \leq n_2 h', n'_2 h \leq d}, so in fact we have {n_2 h' = n'_2 h}. Thus

\displaystyle  \sum_l \nu(l)^2 \ll x^{o(1)} |\{ (h,h',n_2,n'_2): 1 \leq h' \leq H; 1\ll n_2 \ll N_2; n_2 h' = n'_2 h \}|.

From the divisor bound, we see that for each fixed {n_2, h'} there are {O(x^{o(1)})} choices for {n'_2,h}, thus

\displaystyle  \sum_l \nu(l)^2 \ll x^{o(1)} N_2 H.

From this, (18), and Cauchy-Schwarz, we see that to prove (17) it will suffice to show that

\displaystyle  \sum_{l \in {\bf Z}/d{\bf Z}} |\sum_{1 \leq k \leq K: (l+kr,d)=1} (Hkr/N_2)^j \ \ \ \ \ (19)

\displaystyle  \sum_{n_3: (n_3,d)=1} \psi_3(n_3) e_d( \frac{a}{(l+kr) n_3})|^2

\displaystyle  \ll x^{-2\epsilon} B^{2} K^2 N_2 N_3^2 H^{-1}.

Comparing with the trivial bound of {O( d N_3^2 K^2 )}, our task is now to gain a factor of more than {\frac{B^2Hd}{N_2}} over the trivial bound.

We square out (19) as

\displaystyle  \sum_{1 \leq k,k' \leq K}\sum_{l \in {\bf Z}/d{\bf Z}: (l+kr,d)=(l+k'r,d)=1} (Hkr/N_2)^j (Hk'r/N_2)^j

\displaystyle  \sum_{n_3,n'_3: (n_3,d)=(n'_3,d)=1} \psi_3(n_3) \overline{\psi_3}(n_3) e_d( \frac{a}{(l+kr)n_3)} - \frac{a}{(l+k'r)n'_3} ).

If we shift {l} by {kr}, then relabel {k'-k} by {k}, and use the fact that {Hkr/N_2, Hk'r/N_2 = O(1)}, we can reduce this to

\displaystyle  \sum_{|k| \leq K}

\displaystyle  |\sum_{l \in {\bf Z}/d{\bf Z}: (l,d)=(l+kr,d)=1} \sum_{n_3,n'_3: (n_3,d)=(n'_3,d)=1}

\displaystyle  \psi_3(n_3) \overline{\psi_3}(n_3) e_d( \frac{a}{ln_3} - \frac{a}{(l+kr)n'_3} )|

\displaystyle  \ll x^{-2\epsilon} B^{2} K N_2 N_3^2 H^{-1}.

Next we perform another completion of sums, this time in the {n_3,n'_3} variables, to bound

\displaystyle  |\sum_{l \in {\bf Z}/d{\bf Z}: (l,d)=(l+kr,d)=1} \sum_{n_3,n'_3: (n_3,d)=(n'_3,d)=1}

\displaystyle  \psi_3(n_3) \overline{\psi_3}(n_3) e_d( \frac{a}{ln_3} - \frac{a}{(l+kr)n'_3} )|

by

\displaystyle  \ll x^{o(1)} \sum_{|m|, |m'| \leq M'} (\frac{N_3}{d})^2 | U(k; m,m'; d)|+ x^{-A}

for any fixed {A>0}, where

\displaystyle  M' := x^{\epsilon} \frac{d}{N_3} \ \ \ \ \ (20)

(the prime is there to distinguish this quantity from {M} in the introduction) and

\displaystyle  U(k;m,m';d) := \sum_{l \in {\bf Z}/d{\bf Z}: (l,d)=(l+kr,d)=1} \sum_{n_3,n'_3 \in ({\bf Z}/d{\bf Z})^\times}

\displaystyle  e_d( \frac{a}{ln_3} - \frac{a}{(l+kr)n'_3} + mn_3 - m' n'_3).

Making the change of variables {t := \frac{a}{n_3}\ (d)} and {t' := \frac{a}{n'_3}\ (d)} and comparing with(5), we see that

\displaystyle  U(k;m,m';d) = T( kr; am, am'; d).

Applying Theorem 4 (and recalling that {a \in ({\bf Z}/d{\bf Z})^\times}) we reduce to showing that

\displaystyle  \sum_{|k| \leq K} \sum_{|m|, |m'| \leq M'} \frac{(kr,m-m',d)}{(kr,d)^{1/2}} (\frac{N_3}{d})^2 d^{3/2} \ll x^{-3\epsilon} B^{2} K N_2 N_3^2 H^{-1}.

We now choose {r} to be a factor of {d}, thus

\displaystyle  d = qr

for some {q} coprime to {r}. We compute the sum on the left-hand side:

Lemma 7 We have

\displaystyle  \sum_{|k| \leq K} \sum_{|m|, |m'| \leq M'} \frac{(kr,m-m',d)}{(kr,d)^{1/2}}

\displaystyle  \ll x^{o(1)} ( M' r^{1/2} K + M' d^{1/2} + (M')^2 K r^{-1/2} ).

Proof: We first consider the contribution of the diagonal case {m=m'}. This term may be estimated by

\displaystyle  \ll M' \sum_{|k| \leq K} (kr,d)^{1/2} = M' r^{1/2} \sum_{|k| \leq K} (k,q)^{1/2}.

The {k=0} term gives {M'd^{1/2}}, while the contribution of the non-zero {k} are acceptable by Lemma 5.

For the non-diagonal case {m \neq m'}, we see from Lemma 5 that

\displaystyle  \sum_{|m|,|m'| \leq M': m \neq m'} (kr,m-m',d) \ll x^{o(1)} (M')^2;

since {(kr,d) \geq r}, we obtain a bound of {O( x^{o(1)} (M')^2 K r^{-1/2} )} from this case as required. \Box

From this lemma, we see that we are done if we can find {r} obeying

\displaystyle  (M' r^{1/2} K + M' d^{1/2} + (M')^2 K r^{-1/2} ) (\frac{N_3}{d})^2 d^{3/2} \ll x^{-4\epsilon} B^{2} K N_2 N_3^2 H^{-1}. \ \ \ \ \ (21)

as well as the previously recorded condition (16). We can split the condition (21) into three subconditions:

\displaystyle  M' r^{1/2} d^{-1/2} \ll x^{-4\epsilon} B^{2} N_2 H^{-1}

\displaystyle  M' K^{-1} \ll x^{-4\epsilon} B^{2} N_2 H^{-1}

\displaystyle  (M')^2 r^{-1/2} d^{-1/2} \ll x^{-4\epsilon} B^{2} N_2 H^{-1}.

Substituting the definitions (15), (20) of {K, M'}, we can rewrite all of these conditions as lower and upper bounds on {r}. Indeed, (16) follows from (say)

\displaystyle  r \ll n^{-2\epsilon} N_2 H^{-1} \ \ \ \ \ (22)

while the other three conditions rearrange to

\displaystyle  r \ll x^{-10\epsilon} B^{4} N_2^2 N_3^2 H^{-2} d^{-1} \ \ \ \ \ (23)

\displaystyle  r \ll x^{-6\epsilon} B^{2} N_2^2 N_3 H^{-2} d^{-1} \ \ \ \ \ (24)

and

\displaystyle  r \gg x^{12\epsilon} B^{-4} N_2^{-2} N_3^{-4} H^2 d^{3}.

We can combine (23), (24) into a single condition

\displaystyle  r \ll x^{-10\epsilon} B^{2} N_2^2 N_3 H^{-2} d^{-1}.

Also, from (9), (13) we see that this new condition also implies (22). Thus we are done as soon as we find a factor {r} of {d} such that

\displaystyle  R_1 \ll r \ll R_2

where

\displaystyle  R_1 := x^{12\epsilon} B^{2} N_2^{-2} N_3^{-4} H^2 d^{3}

and

\displaystyle  R_2 := x^{-6\epsilon} B^{-4} N_2^2 N_3 H^{-2} d^{-1}.

From (11) one has

\displaystyle  R_2/R_1 \gg x^\delta

if {\epsilon} is sufficiently small. Also, from (11), (9) one also sees that

\displaystyle  R_1 \ll d

and {R_2 \gg 1}. As {d} is {x^\delta}-smooth, we can thus find {r} with the desired properties by the greedy algorithm. (In view of Corollary 12 from this previous post, one could also have ensured that {q} has no tiny factors, although this does not seem to be of much actual use in the Type III analysis.)