The purpose of this post is to isolate a combinatorial optimisation problem regarding subset sums; any improvement upon the current known bounds for this problem would lead to numerical improvements for the quantities pursued in the Polymath8 project. (UPDATE: Unfortunately no purely combinatorial improvement is possible, see comments.) We will also record the number-theoretic details of how this combinatorial problem is used in Zhang’s argument establishing bounded prime gaps.

First, some (rough) motivational background, omitting all the number-theoretic details and focusing on the combinatorics. (But readers who just want to see the combinatorial problem can skip the motivation and jump ahead to Lemma 5.) As part of the Polymath8 project we are trying to establish a certain estimate called {MPZ[\varpi,\delta]} for as wide a range of {\varpi,\delta > 0} as possible. Currently the best result we have is:

Theorem 1 (Zhang’s theorem, numerically optimised) {MPZ[\varpi,\delta]} holds whenever {207\varpi + 43\delta< \frac{1}{4}}.

Enlarging this region would lead to a better value of certain parameters {k_0}, {H} which in turn control the best bound on asymptotic gaps between consecutive primes. See this previous post for more discussion of this. At present, the best value {k_0=23,283} of {k_0} is applied by taking {(\varpi,\delta)} sufficiently close to {(1/899,71/154628)}, so improving Theorem 1 in the neighbourhood of this value is particularly desirable.

I’ll state exactly what {MPZ[\varpi,\delta]} is below the fold. For now, suffice to say that it involves a certain number-theoretic function, the von Mangoldt function {\Lambda}. To prove the theorem, the first step is to use a certain identity (the Heath-Brown identity) to decompose {\Lambda} into a lot of pieces, which take the form

\displaystyle  \alpha_{1} \ast \ldots \ast \alpha_{n} \ \ \ \ \ (1)

for some bounded {n} (in Zhang’s paper {n} never exceeds {20}) and various weights {\alpha_{1},\ldots,\alpha_n} supported at various scales {N_1,\ldots,N_n \geq 1} that multiply up to approximately {x}:

\displaystyle  N_1 \ldots N_n \sim x.

We can write {N_i = x^{t_i}}, thus ignoring negligible errors, {t_1,\ldots,t_n} are non-negative real numbers that add up to {1}:

\displaystyle  t_1 + \ldots + t_n = 1.

A key technical feature of the Heath-Brown identity is that the weight {\alpha_i} associated to sufficiently large values of {t_i} (e.g. {t_i \geq 1/10}) are “smooth” in a certain sense; this will be detailed below the fold.

The operation {\ast} is Dirichlet convolution, which is commutative and associative. We can thus regroup the convolution (1) in a number of ways. For instance, given any partition {\{1,\ldots,n\} = S \cup T} into disjoint sets {S,T}, we can rewrite (1) as

\displaystyle  \alpha_S \ast \alpha_T

where {\alpha_S} is the convolution of those {\alpha_i} with {i \in S}, and similarly for {\alpha_T}.

Zhang’s argument splits into two major pieces, in which certain classes of (1) are established. Cheating a little bit, the following three results are established:

Theorem 2 (Type 0 estimate, informal version) The term (1) gives an acceptable contribution to {MPZ[\varpi,\delta]} whenever

\displaystyle  t_i > \frac{1}{2} + 2 \varpi

for some {i}.

Theorem 3 (Type I/II estimate, informal version) The term (1) gives an acceptable contribution to {MPZ[\varpi,\delta]} whenever one can find a partition {\{1,\ldots,n\} = S \cup T} such that

\displaystyle  \frac{1}{2} - \sigma < \sum_{i \in S} t_i \leq \sum_{i \in T} t_i < \frac{1}{2} + \sigma

where {\sigma} is a quantity such that

\displaystyle  11 \varpi + 3\delta + 2 \sigma < \frac{1}{4}.

Theorem 4 (Type III estimate, informal version) The term (1) gives an acceptable contribution to {MPZ[\varpi,\delta]} whenever one can find {t_i,t_j,t_k} with distinct {i,j,k \in \{1,\ldots,n\}} with

\displaystyle  t_i \leq t_j \leq t_k \leq \frac{1}{2}

and

\displaystyle  4t_k + 4t_j + 5t_i > 4 + 16 \varpi + \delta.

The above assertions are oversimplifications; there are some additional minor smallness hypotheses on {\varpi,\delta} that are needed but at the current (small) values of {\varpi,\delta} under consideration they are not relevant and so will be omitted.

The deduction of Theorem 1 from Theorems 2, 3, 4 is then accomplished from the following, purely combinatorial, lemma:

Lemma 5 (Subset sum lemma) Let {\varpi,\delta > 0} be such that

\displaystyle  207\varpi + 43\delta < \frac{1}{4}. \ \ \ \ \ (2)

Let {t_1,\ldots,t_n} be non-negative reals such that

\displaystyle  t_1 + \ldots + t_n = 1.

Then at least one of the following statements hold:

  • (Type 0) There is {1 \leq i \leq n} such that {t_i > \frac{1}{2} + 2 \varpi}.
  • (Type I/II) There is a partition {\{1,\ldots,n\} = S \cup T} such that

    \displaystyle  \frac{1}{2} - \sigma < \sum_{i \in S} t_i \leq \sum_{i \in T} t_i < \frac{1}{2} + \sigma

    where {\sigma} is a quantity such that

    \displaystyle  11 \varpi + 3\delta + 2 \sigma < \frac{1}{4}.

  • (Type III) One can find distinct {t_i,t_j,t_k} with

    \displaystyle  t_i \leq t_j \leq t_k \leq \frac{1}{2}

    and

    \displaystyle  4t_k + 4t_j + 5t_i > 4 + 16 \varpi + \delta.

The purely combinatorial question is whether the hypothesis (2) can be relaxed here to a weaker condition. This would allow us to improve the ranges for Theorem 1 (and hence for the values of {k_0} and {H} alluded to earlier) without needing further improvement on Theorems 2, 3, 4 (although such improvement is also going to be a focus of Polymath8 investigations in the future).

Let us review how this lemma is currently proven. The key sublemma is the following:

Lemma 6 Let {1/10 < \sigma < 1/2}, and let {t_1,\ldots,t_n} be non-negative numbers summing to {1}. Then one of the following three statements hold:

  • (Type 0) There is a {t_i} with {t_i \geq 1/2 + \sigma}.
  • (Type I/II) There is a partition {\{1,\ldots,n\} = S \cup T} such that

    \displaystyle  \frac{1}{2} - \sigma < \sum_{i \in S} t_i \leq \sum_{i \in T} t_i < \frac{1}{2} + \sigma.

  • (Type III) There exist distinct {i,j,k} with {2\sigma \leq t_i \leq t_j \leq t_k \leq 1/2-\sigma} and {t_i+t_j,t_i+t_k,t_j+t_k \geq 1/2 + \sigma}.

Proof: Suppose Type I/II never occurs, then every partial sum {\sum_{i \in S} t_i} is either “small” in the sense that it is less than or equal to {1/2-\sigma}, or “large” in the sense that it is greater than or equal to {1/2+\sigma}, since otherwise we would be in the Type I/II case either with {S} as is and {T} the complement of {S}, or vice versa.

Call a summand {t_i} “powerless” if it cannot be used to turn a small partial sum into a large partial sum, thus there are no {S \subset \{1,\ldots,n\} \backslash \{i\}} such that {\sum_{j \in S} t_j} is small and {t_i + \sum_{j \in S} t_j} is large. We then split {\{1,\ldots,n\} = A \cup B} where {A} are the powerless elements and {B} are the powerful elements.

By induction we see that if {S \subset B} and {\sum_{i \in S} t_i} is small, then {\sum_{i \in S} t_i + \sum_{i \in A} t_i} is also small. Thus every sum of powerful summand is either less than {1/2-\sigma-\sum_{i \in A} t_i} or larger than {1/2+\sigma}. Since a powerful element must be able to convert a small sum to a large sum (in fact it must be able to convert a small sum of powerful summands to a large sum, by stripping out the powerless summands), we conclude that every powerful element has size greater than {2\sigma + \sum_{i \in A} t_i}. We may assume we are not in Type 0, then every powerful summand is at least {2\sigma + \sum_{i \in A} t_i} and at most {1/2 - \sigma - \sum_{i \in A} t_i}. In particular, there have to be at least three powerful summands, otherwise {\sum_{i=1}^n t_i} cannot be as large as {1}. As {\sigma > 1/10}, we have {4\sigma > 1/2-\sigma}, and we conclude that the sum of any two powerful summands is large (which, incidentally, shows that there are exactly three powerful summands). Taking {t_i,t_j,t_k} to be three powerful summands in increasing order we land in Type III. \Box

Now we see how Lemma 6 implies Lemma 5. Let {\varpi,\delta} be as in Lemma 5. We take {\sigma} almost as large as we can for the Type I/II case, thus we set

\displaystyle  \sigma := \frac{1}{8} - \frac{11}{2} \varpi - \frac{3}{2} \delta - \epsilon \ \ \ \ \ (3)

for some sufficiently small {\epsilon>0}. We observe from (2) that we certainly have

\displaystyle  \sigma > 2 \varpi

and

\displaystyle  \sigma > \frac{1}{10}

with plenty of room to spare. We then apply Lemma 6. The Type 0 case of that lemma then implies the Type 0 case of Lemma 5, while the Type I/II case of Lemma 6 also implies the Type I/II case of Lemma 5. Finally, suppose that we are in the Type III case of Lemma 6. Since

\displaystyle  4t_i + 4t_j + 5 t_k = \frac{5}{2} (t_i+t_k) + \frac{5}{2}(t_j+t_k) + \frac{3}{2} (t_i+t_j)

we thus have

\displaystyle  4t_i + 4t_j + 5 t_k \geq \frac{13}{2} (\frac{1}{2}+\sigma)

and so we will be done if

\displaystyle  \frac{13}{2} (\frac{1}{2}+\sigma) > 4 + 16 \varpi + \delta.

Inserting (3) and taking {\epsilon} small enough, it suffices to verify that

\displaystyle  \frac{13}{2} (\frac{1}{2}+\frac{1}{8} - \frac{11}{2} \varpi - \frac{3}{2}\delta) > 4 + 16 \varpi + \delta

but after some computation this is equivalent to (2).

It seems that there is some slack in this computation; some of the conclusions of the Type III case of Lemma 5, in particular, ended up being “wasted”, and it is possible that one did not fully exploit all the partial sums that could be used to create a Type I/II situation. So there may be a way to make improvements through purely combinatorial arguments. (UPDATE: As it turns out, this is sadly not the case: consderation of the case when {n=4}, {t_1 = 1/4 - 3\sigma/2}, and {t_2=t_3=t_4 = 1/4+\sigma/2} shows that one cannot obtain any further improvement without actually improving the Type I/II and Type III analysis.)

A technical remark: for the application to Theorem 1, it is possible to enforce a bound on the number {n} of summands in Lemma 5. More precisely, we may assume that {n} is an even number of size at most {n \leq 2K} for any natural number {K} we please, at the cost of adding the additioal constraint {t_i > \frac{1}{K}} to the Type III conclusion. Since {t_i} is already at least {2\sigma}, which is at least {\frac{1}{5}}, one can safely take {K=5}, so {n} can be taken to be an even number of size at most {10}, which in principle makes the problem of optimising Lemma 5 a fixed linear programming problem. (Zhang takes {K=10}, but this appears to be overkill. On the other hand, {K} does not appear to be a parameter that overly influences the final numerical bounds.)

Below the fold I give the number-theoretic details of the combinatorial aspects of Zhang’s argument that correspond to the combinatorial problem described above.

— 1. Coefficient sequences —

We now give some number-theoretic background material that will serve two purposes. The most immediate purpose is to enable one to understand the precise statement of Theorems 1, 2, 3, 4, as well as the deduction of the first theorem from the other three. A secondary purpose is to establish some reference material which will be used in subsequent posts on the Type I/II and Type III analysis in Zhang’s arguments.

As in previous posts, we let {x} be an asymptotic parameter tending to infinity and define the usual asymptotic notation {O(), o(), \ll} relative to this parameter. It is also convenient to have a large fixed quantity {A_0>0} to be chosen later, in order to achieve a fine localisation of scales.

It will be convenient to set up some notation regarding certain types of sequences, abstracting some axioms appearing in the work of Bombieri-Friedlander-Iwaniec and Zhang:

Definition 7 A coefficient sequence is a finitely supported sequence {\alpha: {\bf N} \rightarrow {\bf R}} that obeys the bounds

\displaystyle  |\alpha(n)| \ll \tau^{O(1)}(n) \log^{O(1)}(x) \ \ \ \ \ (4)

for all {n}, where {\tau} is the divisor function. (In particular, any sequence that is pointwise dominated by a coefficient sequence is again a coefficient sequence.)

  • (i) If {\alpha} is a coefficient sequence and {a\ (q) = a \hbox{ mod } q} is a primitive residue class, the (signed) discrepancy {\Delta(\alpha; a\ (q))} of {\alpha} in the sequence is defined to be the quantity

    \displaystyle  \Delta(\alpha; a \ (q)) := \sum_{n: n = a\ (q)} \alpha(n) - \frac{1}{\phi(q)} \sum_{n: (n,q)=1} \alpha(n). \ \ \ \ \ (5)

    Note that this expression is linear in {\alpha}, so in particular we have the triangle inequality

    \displaystyle  |\Delta(\alpha+\beta; a\ (q))| \leq |\Delta(\alpha; a\ (q))| + |\Delta(\beta; a\ (q))|. \ \ \ \ \ (6)

  • (ii) A coefficient sequence {\alpha} is said to be at scale {N} for some {N \geq 1} if it is supported on an interval of the form {[(1-O(\log^{-A_0} x)) N, (1+O(\log^{-A_0} x)) N]}.
  • (iii) A coefficient sequence {\alpha} at scale {N} is said to obey the Siegel-Walfisz theorem if one has

    \displaystyle  | \Delta(\alpha 1_{(\cdot,q)=1}; a\ (r)) | \ll \tau(qr)^{O(1)} N \log^{-A} x \ \ \ \ \ (7)

    for any {q,r \geq 1}, any fixed {A}, and any primitive residue class {a\ (r)}.

  • (iv) A coefficient sequence {\alpha} at scale {N} is said to obey the Elliott-Halberstam conjecture up to scale {M} for some {1 \ll M \ll N} if one has

    \displaystyle  \sum_{q < M} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha; a\ (q))| \ll N \log^{-A} x \ \ \ \ \ (8)

    for any fixed {A>0}. Thus for instance the Siegel-Walfisz theorem implies the Elliott-Halberstam conjecture up to scale {\log^C x} for any fixed {C}.

  • (v) A coefficient sequence {\alpha} at scale {N} is said to be smooth if it takes the form {\alpha(n) = \psi(n/N)} for some smooth function {\psi: {\bf R} \rightarrow {\bf C}} supported on {[1-O(\log^{-A_0} x), 1+O(\log^{-A_0} x)]} obeying the derivative bounds

    \displaystyle  \psi^{(j)}(t) = O( \log^{j A_0} x ) \ \ \ \ \ (9)

    for all fixed {j \geq 0} (note that the implied constant in the {O()} notation may depend on {j}).

To control error terms, we will frequently use the following crude bounds:

Lemma 8 (Crude estimates) Let {\alpha} be a coefficient sequence, let {C>0} be fixed, and let {1 \ll y \ll x^C}. Then we have

\displaystyle  \sum_{d \leq y} \frac{|\alpha(d)|^C}{d} \ll \log^{O(1)} x \ \ \ \ \ (10)

and

\displaystyle  \sum_{d \leq y} |\alpha(d)|^C \ll y \log^{O(1)} x; \ \ \ \ \ (11)

more generally one has

\displaystyle  \sum_{d \leq y: d = a\ (q)} |\alpha(d)|^C \ll \frac{y}{q} \tau^{O(1)}(q) \log^{O(1)} x + y^{o(1)}; \ \ \ \ \ (12)

for any congruence class {a\ (q)} (not necessarily primitive). In particular, if {\alpha} is at scale {N} for some {1 \ll N \ll x^C}, one has

\displaystyle  \sum_{d} |\alpha(d)|^C \ll N \log^{O(1)} x \ \ \ \ \ (13)

and

\displaystyle  \sum_{d: d = a\ (q)} |\alpha(d)|^C \ll \frac{N}{q} \tau^{O(1)}(q) \log^{O(1)} x + N^{o(1)} \ \ \ \ \ (14)

and hence from (5) we have the crude discrepancy bound

\displaystyle  |\Delta(\alpha; a \ (q))| \ll\frac{N}{q} \tau^{O(1)}(q) \log^{O(1)} x + N^{o(1)}. \ \ \ \ \ (15)

(In particular, the Siegel-Walfisz estimate (7) is trivial unless {r = O(\log^{O(1+A)} x)}.) Finally, we have the crude bound

\displaystyle  \alpha(d) = O( x^{o(1)} ) \ \ \ \ \ (16)

for all {1 \leq d \leq x^C}.

Proof: To prove (10) it suffices from (4) to show that

\displaystyle  \sum_{d \leq x^C} \frac{\tau(n)^{O(1)}}{n} \ll \log^{O(1)} x,

but this follows from e.g. Lemma 8 of this previous post. From (10) we also deduce (11). To show (12), it suffices to show that

\displaystyle  \sum_{d \leq y: d = a\ (q)} \tau^{O(1)}(d) \ll \frac{y}{q} \tau^{O(1)}(q) \log^{O(1)} y + y^{o(1)}

(i.e. a Brun-Titchmarsh type inequality for powers of the divisor function); this estimate follows from this paper of Shiu or this paper of Barban-Vehov, and can also be proven using the methods in this previous blog post. (The factor of {\tau^{O(1)}(q)} is needed to account for the possibility that {a\ (q)} is not primitive, while the {y^{o(1)}} term accounts for the possibility that {q} is as large as {y^{1-o(1)}}.) Finally, (16) follows from the standard divisor bound; see this previous post. \Box

As a general rule, we may freely use (16) when we are expecting a net power savings {x^{-\epsilon}} to come from another part of the analysis, and we may freely use the other estimates in Lemma 8 when we have a net super-logarithmic savings {\log^{-A} x} from elsewhere in the analysis. When we have neither a net power savings or a net super-logarithmic savings from other sources then we usually cannot afford to use any of the bounds in Lemma 8.

The concept of a coefficient sequence is stable under the operation of Dirichlet convolution operation

\displaystyle  f \ast g(n) := \sum_{d|n} f(d) g(\frac{n}{d}):

Lemma 9 Let {\alpha,\beta} be coefficient sequences. Then:

  • (i) {\alpha \ast \beta} is also a coefficient sequence.
  • (ii) If {\alpha,\beta} are coefficient sequences at scales {M,N} respectively, then {\alpha \ast \beta} is a coefficient sequence at scale {MN}.
  • (iii) If {\alpha,\beta} are coefficient sequences at scales {M,N} respectively, with {x^c \ll N \ll x^C} for some fixed {C,c>0}, and {\beta} obeys a Siegel-Walfisz theorem, then {\alpha \ast \beta} also obeys a Siegel-Walfisz theorem (at scale {NM}).
  • (iv) If {\alpha,\beta} are coefficient sequences at scales {M,N} respectively, with {x^c \ll N \ll x^C} for some fixed {C,c>0}, and {\beta} obeys the Elliott-Halberstam conjecture up to some scale {L}, then {\alpha \ast \beta} (viewed as a coefficient sequence at scale {NM}) also obeys the Elliott-Halberstam conjecture up to scale {L}.
  • (v) If {L} is the logarithm function, then {L \alpha} is a coefficient sequence. Furthermore, if {\alpha} is smooth at some scale {N}, then {L\alpha} is smooth at that scale also.

Proof: To verify (i), observe from (4) that for any {n} one has

\displaystyle  \alpha \ast \beta(n) \ll \sum_{d|n} (\log^{O(1)}) x \tau(d)^{O(1)} \tau(\frac{n}{d})^{O(1)}

\displaystyle  \ll (\log^{O(1)} x) \tau(n) \tau(n)^{O(1)} \tau(n)^{O(1)}

so that {\alpha \ast \beta} is again a coefficient sequence. The claim (ii) then follows by considering the support of {\alpha \ast \beta}.

Now we verify (iii). We need to show that

\displaystyle  |\Delta((\alpha \ast \beta)1_{(\cdot,q)=1}, a\ (r))| \ll \tau(qr)^{O(1)} N \log^{-A} x

for any {q} and any residue class {a\ (r)}. By restricting {\alpha,\beta} to integers coprime to {qr} (and noting that the restricted version of {\beta} still obeys the Siegel-Walfisz property if one divides out by a suitable power of {\tau(qr)}) we may assume that {\alpha,\beta} are supported on integers coprime to {qr}, at which point we may drop the {1_{(\cdot,qr)=1}} constraint. As noted in Lemma 8 we may also assume that {r = O( \log^{A+O(1)} x)}.

We now have

\displaystyle  \sum_{n = a\ (r)} \alpha \ast \beta(n) = \sum_{b,c\in ({\bf Z}/r{\bf Z})^\times: a=bc} (\sum_{d = b\ (r)} \alpha(d)) (\sum_{m = c\ (r)} \beta(m))

and

\displaystyle  \sum_{n} \alpha \ast \beta(n) = (\sum_d \alpha(d)) (\sum_m \beta(m))

\displaystyle  = \sum_{b,c \in ({\bf Z}/r{\bf Z})^\times: a=bc} (\sum_{d = b\ (r)} \alpha(d)) (\sum_m \beta(m))

and by the triangle inequality so we may upper bound {|\Delta(\alpha \ast \beta, a\ (r))|} by

\displaystyle  \sum_{b,c \in ({\bf Z}/r{\bf Z})^\times: a=bc} |\sum_{d = b\ (r)} \alpha(d)| | \Delta(\beta; c\ (r))|.

Applying (7) for {\beta} and (14) for {\alpha}, we may bound this by

\displaystyle  \ll \tau(r)^{O(1)} \log^{-A' + O(1)} x \sum_{b,c \in ({\bf Z}/r{\bf Z})^\times: a=bc} ( \frac{N}{r} + N^{o(1)} ) M

for any fixed {A'}, which is acceptable using the bound on {r}.

Now we verify (iv), which is similar to (iii). Arguing as before, we have the inequality

\displaystyle  \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\alpha \ast \beta; a\ (q))| \ll (\sum_{d} |\alpha(d)|) \sup_{c \in ({\bf Z}/q{\bf Z})^\times} |\Delta(\beta; c\ (q))|.

The claim then follows from (13) and the Elliott-Halberstam hypothesis for {\beta}.

Finally, (v) is an easy consequence of the product rule and is left to the reader. \Box

We also have some basic sequences that obey the Siegel-Walfisz property:

Lemma 10 Let {\alpha} be a smooth coefficient sequence at some scale {N} with {x^c \ll N \ll x^C} for some fixed {C, c>0}. Then

  • (i) {\alpha} obeys the Siegel-Walfisz theorem.
  • (ii) In fact, {\alpha} obeys the Elliott-Halberstam conjecture up to scale {x^{-\epsilon} N} for any fixed {\epsilon>0}.
  • (iii) {\mu\alpha} obeys the Siegel-Walfisz theorem, where {\mu} is the Möbius function.

Note that some lower bound on {N} is necessary here, since one cannot hope for a sequence to be equidistributed to the extent predicted by the Siegel-Walfisz theorem (7) if the sequence is only supported on a logarithmic range!

Proof: We first prove (i). Our task is to show that

\displaystyle  |\Delta(\alpha 1_{(\cdot,q)=1}, a\ (r))| \ll \tau(qr)^{O(1)} N \log^{-A} x

for any {q} and any residue class {a\ (r)}. By applying the Möbius inversion formula

\displaystyle  1_{(n,q)=1} = \sum_{d|q} \mu(d) 1_{n = 0\ (d)}

and the triangle inequality (6) we thus see that it suffices to show that

\displaystyle  |\Delta(\alpha 1_{d|\cdot}, a\ (r))| \ll \tau(dr)^{O(1)} N \log^{-A} x

for any {d}. We may assume that {r} is coprime to {d} as the left-hand side vanishes otherwise. Our task is now to show that

\displaystyle  \sum_{n: n = 0\ (d); n = a\ (r)} \alpha(n) = \frac{1}{\phi(r)} \sum_{n: n = 0\ (d); (n,r)=1} \alpha(n) + \tau(dr)^{O(1)} N \log^{-A} x.

For this it will suffice to establish the claim

\displaystyle  \sum_{n} \alpha(n) 1_{n = a\ (q)} = \frac{1}{q} (\sum_n \alpha(n)) + O( \tau(q)^{O(1)} N \log^{-A} x ) \ \ \ \ \ (17)

for any residue class {a\ (q)} (not necessarily primitive). Note that we may assume that {q = O( \log^{A+O(1)}(x) )} as the claim follows from (14) otherwise. We use the Fourier expansion

\displaystyle  1_{n=a\ (q)} = \frac{1}{q} \sum_{h \in {\bf Z}/q{\bf Z}} e_q(-ah) e_q(hn)

where {e_q(n) := e^{2\pi i n/q}} for {n \in {\bf Z}/q{\bf Z}} or (by abuse of notation) for {n \in {\bf Z}}. We can thus write the left-hand side of (17) as

\displaystyle  \frac{1}{q} \sum_{h \in {\bf Z}/q{\bf Z}} e_q(-ah) \sum_n \alpha(n) e_q(hn).

The {h=0} term here is the first term on the right-hand side of (17). Thus it will suffice to show that

\displaystyle  |\sum_n \alpha(n) e_q(hn)| \ll N \log^{-A} x

for all non-zero {h \in {\bf Z}/q{\bf Z}}. But if we write {\alpha(n) = \psi(n/N)}, then by the Poisson summation formula the left-hand side is equal to

\displaystyle  N \sum_m \hat \psi( N m - N h / q )

where {\hat \psi(t) := \int_{\bf R} \psi(s) e^{-2\pi i st}\ ds} is the Fourier transform of {\psi}. From the smoothness bounds (9) and integration by parts we have

\displaystyle  |\hat \psi(t)| \ll |t|^{-j} \log^{(j-1) A_0} x

for any fixed {j>0}, so for {j} large enough (actually {j=2} is enough) we obtain the claim thanks to the lower bound {N \gg x^c} and the upper bound {q = O( \log^{A+O(1)} x)}. We remark that the same argument (now with {j} as large as needed) in fact shows the much stronger bound

\displaystyle  \sum_{n} \alpha(n) 1_{n = a\ (q)} = \frac{1}{q} (\sum_n \alpha(n)) + O( \tau(q)^{O(1)} N x^{-A} )

for any fixed {A,\epsilon>0} and any {q < x^{-\epsilon} N}, which also gives (ii).

To prove (iii), we can argue similarly to before and reduce to showing that

\displaystyle  | \sum_{n} \mu(n) \alpha(n) 1_{n = a\ (q)}| \ll \tau(q)^{O(1)} N \log^{-A} x

for any {a\ (q)}, but this follows from the Siegel-Walfisz theorem for the Möbius function (which can be deduced from the more usual Siegel-Walfisz theorem for the von Mangoldt function, or else proven by essentially the same method) and summation by parts (if one wishes, one can also reduce to the case of primitive residue classes using the multiplicativity of {\mu}). \Box

We remark that the Siegel-Walfisz theorem for the Möbius function is ineffective, although in practice one can obtain effective substitutes for this theorem that can make applications (such as the one for bounded prime gaps) effective, see e.g. the last section of this article of Pintz for further discussion. One amusing remark in this regard is that if there do happen to be infinitely many Siegel zeroes, then an old result of Heath-Brown shows that there are infinitely mnay twin primes already, so the ineffective case of Siegel-Walfisz’s theorem is in some sense the best case scenario for us!

Lemmas 9 and 10 combine to give plenty of coefficient sequences obeying the Siegel-Walfisz property. This will become useful when the time comes to deduce Theorem 1 from (the precise versions of) Theorems 2, 3, 4.

— 2. Congruence class systems —

The Elliott-Halberstam conjecture on a sequence at scale {N} requires taking a supremum over all primitive residue classes. At present, we do not know how to achieve such a strong claim for non-smooth arithmetic sequences such as {\Lambda} or {\mu} once the modulus goes beyond the level {N^{1/2}}, even if one restricts to smooth moduli. However, Zhang’s work (as well as some of the precursor work of Bombieri, Fouvry, Friedlander, and Iwaniec) allow one to get a restricted version of the Elliott-Halberstam conjecture if one restricts the congruences one is permitted to work with.

Much as with the coefficient sequence axioms, it is convenient to abstract the axioms that a given system of congruence classes will be obeying. For any set {I \subset {\bf R}}, let {{\mathcal S}_I} denote the set of squarefree natural numbers whose prime divisors lie in {I}.

Definition 11 Let {I \subset {\bf R}}. A congruence class system on {I} is a collection {{\mathcal C} = (C(q))_{q \in {\mathcal S}_I}} of sets {C(q)} residue classes for each {q \in {\mathcal S}_I} obeying the following axioms:

  • (i) (Primitivity) For each {q \in {\mathcal S}_I}, {C(q)} is a subset of {({\bf Z}/q{\bf Z})^\times}.
  • (ii) (Chinese remainder theorem) For any coprime {q,r \in {\mathcal S}_I}, we have {C(q) \times C(r) \equiv C(qr)}, using the canonical identification between {({\bf Z}/q{\bf Z})^\times \times ({\bf Z}/r{\bf Z})^\times} and {({\bf Z}/qr{\bf Z})^\times}.
  • (iii) (Uniform bound) There is a fixed {k_0} such that {|C(p)| \leq k_0} for all primes {p \in I}.

If {|C(p)|=1} for all {p \in I}, thus {C(q) = (a_q)_{q \in {\mathcal S}_I}}, we say that {(C(q))_{q \in {\mathcal S}_I}} is a singleton congruence class system.

For any integer {n}, let {\tau_{\mathcal C}(n)} denote the multiplicity function

\displaystyle  \tau_{\mathcal C}(n) := |\{ q \in {\mathcal S}_I: n\ (q) \in C(q) \}|

or equivalently

\displaystyle  \tau_{\mathcal C}(n) := 2^{|\{ p \in I: n\ (p) \in C(p) \}|}

A congruence class system is said to have controlled multiplicity if for any congruence class {a\ (r)} with {r \in {\mathcal S}_I}, we have

\displaystyle  \sum_{C^{-1} x \leq n \leq Cx: n = a\ (r)} \tau_{\mathcal C}(n)^2 \ll \frac{x}{r} \tau(r)^{O(1)} \log^{O(1)} x + x^{o(1)}. \ \ \ \ \ (18)

for any fixed {C>1}.

Note from Axiom (ii) that a congruence class system can be specified through its values {C(p)} at primes {p \in I}. A simple example of a singleton congruence class system with controlled multiplicity is a fixed congruence class {C(q) = \{ a\ (q) \}} for some fixed non-zero {a}, with {I} avoiding all the prime divisors of {a}. This is a special case of the following more general fact:

Lemma 12 Let {{\mathcal H}} be a fixed admissible {k_0}-tuple for some fixed {k_0 \geq 2}, and let {h_i \in {\mathcal H}}. For any modulus {q}, set

\displaystyle  C_i(q) := \{ a \in ({\bf Z}/q{\bf Z})^\times: P(a) =0 \}

where {P(a) := \prod_{h \in {\mathcal H}} (a+h-h_i)}. Then for any {I \subset {\bf R}}, {(C_i(q))_{q \in {\mathcal S}_I}} is a congruence class system on {I} with controlled multiplicity.

Similarly, if {b\ (W)} is such that {b+h-h_i} is coprime to {W} for all {h \in {\mathcal H}} and {I \subset{\bf R}}, then the system {(C_{i,b,W}(q))_{q \in {\mathcal S}_I}} defined by setting {C_{i,b,W}(p) := C_i(p)} when {p} is coprime to {W}, and {C_{i,b,W}(p) := \{ b\ (p)\}} when {p} divides {W}, with {C_{i,b,W}(q)} then defined for arbitrary {q \in {\mathcal S}_I} by the Chinese remainder theorem, is also a congruence system on {I} with controlled multiplicity.

Proof: We just prove the first claim, as the second is similar. Axioms (i)-(iii) are obvious; it remains only to verify the controlled multiplicity. Let {a\ (r)} be a congruence class with {r \in {\mathcal S}_I}.

\displaystyle  \sum_{x \ll n \ll x: m = a\ (r)} 2^{2|\{ p \in I: p \not | n; p | P(n) \}|}.

We can split

\displaystyle  |\{ p \in I: p \not | n; p | P(n) \}| \leq \sum_{h \in {\mathcal H}} |\{ p \in I: p | n + h - h_i\}|

and hence

\displaystyle  2^{2|\{ p \in I: p \not | n; p | P(n) \}|} \leq \sum_{h \in {\mathcal H}} 2^{2k_0 |\{ p \in I: p | n + h - h_i\}|}

\displaystyle  = \sum_{h \in {\mathcal H}} \tau(n+h-h_i)^{2k_0}

so it suffices to show that

\displaystyle  \sum_{x \ll n \ll x: n = a\ (r)} \tau(n+h-h_i)^{2k_0} \ll \frac{x}{r} \tau(r)^{O(1)} \log^{O(1)} x + x^{o(1)} \ \ \ \ \ (19)

for each {h \in {\mathcal H}}. Writing {b := a+h-h_i}, this becomes

\displaystyle  \sum_{x \ll n \ll x: n = b\ (r)} \tau(n)^{2k_0} \ll \frac{x}{r} \tau(r)^{O(1)} \log^{O(1)} x + x^{o(1)}.

The claim then follows Lemma 8. \Box

The more precise statement of Theorem 1 is now as follows:

Theorem 13 (Zhang’s theorem, numerically optimised) Let {\varpi, \delta > 0} be fixed parameters such that

\displaystyle  207 \varpi + 43 \delta < \frac{1}{4}. \ \ \ \ \ (20)

Let {I \subset [1,x^\delta]}, and let {(C(q))_{q \in {\mathcal S}_I}} be a congruence class system with controlled multiplicity. Then

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sum_{a \in C(q)} |\Delta(\Lambda 1_{[x,2x]}; a\ (q))| \ll x \log^{-A} x \ \ \ \ \ (21)

for any fixed {A>0}, where {\Lambda} is the von Mangoldt function.

For the application to prime gaps we only need to apply Theorem 13 to the congruence class system associated to a fixed {k}-tuple {{\mathcal H}} through Lemma 12, but one could imagine that this theorem could have future application with some other congruence class systems.

One advantage of the abstract formulation of a congruence class systems is that we get a cheap reduction to the singleton case:

Proposition 14 (Reduction to singletons) In order to prove Theorem 13, it suffices to do so for good singleton class congruence systems.

Proof: Let {(C(q))_{q \in {\mathcal S}_I}} be a good congruence class system. By removing all primes {p} with {|C(p)|=0} from {I} we may assume without loss of generality that {1 \leq |C(p)| \leq k_0} for all {p \in I} and some fixed {k_0}.

We use the probabilistic method. Construct a random singleton congruence class system {(\tilde C(q))_{q \in {\mathcal S}_I}} by selecting {a_p} uniformly at random from {C(p)} independently for all {p \in I}, and then set {\tilde C(p) = \{ a_p\}} and extend by the Chinese remainder theorem. Writing {\tilde C(q) = \{ a_q \}}, we observe that the property that {(C(q))_{q \in {\mathcal S}_I}} enjoys of being a congruence class system of controlled multiplicity is inherited by {(\tilde C(q))_{q \in {\mathcal S}_I}}. From hypothesis we then have that

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta(\Lambda 1_{[x,2x]}; a_q\ (q))| \ll x \log^{-A} x

for any fixed {A}; taking expectations we conclude that

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \frac{1}{|C(q)|} \sum_{a \in C(q)} |\Delta(\Lambda 1_{[x,2x]}; a\ (q))| \ll x \log^{-A} x.

On the other hand, from Lemma 8 we have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |C(q)| \sum_{a \in C(q)} |\Delta(\Lambda 1_{[x,2x]}; a\ (q))| \ll x \log^{O(1)} x,

and the claim then follows from Cauchy-Schwarz. \Box

This reduction is not essential to Zhang’s argument (indeed, it is not used in Zhang’s paper), but it does allow for a slightly simpler notation (since the summation over {a} is eliminated).

We can now also state the precise versions of Theorems 2, 3, 4 that we need:

Theorem 15 (Type 0 estimate, precise version) Let {0 < \varpi < 1/2} be fixed, and let {\alpha,\psi} be coefficient sequences at scales {M,N} respectively with {\psi} smooth,

\displaystyle  x \ll MN \ll x,

and

\displaystyle  N > x^{1/2+2\varpi+\epsilon}

for some fixed {\epsilon>0}. Then {\alpha \ast \psi} obeys the Elliott-Halberstam conjecture up to scale {x^{1/2+2\varpi}}. In particular, for any {I \subset {\bf R}} and any singleton congruence class system {(\{a_q\})_{q \in {\mathcal S}_I}} we have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta(\alpha \ast \psi; a_q\ (q))| \ll x \log^{-A} x.

Proof: This is immediate from Lemma 10(ii) and Lemma 9(iv). \Box

Theorem 16 (Type I/II estimate, precise version) Let {\varpi, \delta, \sigma > 0} be fixed quantities such that

\displaystyle  11 \varpi + 3\delta + 2 \sigma < \frac{1}{4}

and

\displaystyle  37\varpi + 5 \delta < \frac{1}{4} \ \ \ \ \ (22)

and let {\alpha,\beta} be coefficient sequences at scales {M,N} respectively with

\displaystyle  x \ll MN \ll x

and

\displaystyle  x^{\frac{1}{2}-\sigma} \ll N \ll M \ll x^{\frac{1}{2}+\sigma}

with {\beta} obeying a Siegel-Walfisz theorem. Then for any {I \subset [1,x^\delta]} and any singleton congruence class system {(\{a_q\})_{q \in {\mathcal S}_I}} with controlled multiplicity we have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta(\alpha \ast \beta; a_q\ (q))| \ll x \log^{-A} x.

The condition (22) is dominated by (20) and can thus be ignored, at least at our current numerical ranges of parameters. We remark that this theorem is the only theorem that actually uses the controlled multiplicity hypothesis.

Theorem 17 (Type III estimate, precise version) Let {\varpi, \delta > 0} be fixed quantities. Let {N_1,N_2,N_3, M > 1} be scales obeying the relations

\displaystyle  N_3 \ll N_2 \ll N_1 \ll x^{1/2}

\displaystyle  N_1^4 N_2^4 N_3^5 > x^{4+16\varpi + \delta+\epsilon}

and

\displaystyle  x \ll MN_1 N_2 N_3 \ll x

for some fixed {\epsilon>0}. Let {\alpha,\psi_1,\psi_2,\psi_3} be coefficient sequences at scales {M,N_1,N_2,N_3} respectively, with {\psi_1,\psi_2,\psi_3} smooth. Then for any {I \subset [1,x^\delta]}, and any singleton congruence class system {(\{a_q\})_{q \in {\mathcal S}_I}} we have

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta(\alpha \ast \psi_1 \ast \psi_2 \ast \psi_3; a_q\ (q))| \ll x \log^{-A} x

for any fixed {A>0}.

As we saw, Theorem 15 is easy to establish. On the other hand, Theorem 16 and Theorem 17 are far deeper and will be the subject of future blog posts. We will not discuss them further here, but now turn to the question of how to deduce Theorem 13 from Theorems 15, 16, 17 using Lemma 5.

— 3. Decomposing the von Mangoldt function —

The basic strategy is to decompose the von Mangoldt function {\Lambda} as a combination of Dirichlet convolutions of other functions such as {\mu}, the constant function {1}, and the logarithmic function {L: n \mapsto \log n}. The simplest identity of this form is

\displaystyle  \Lambda = \mu \ast L \ \ \ \ \ (23)

but this is unsuitable for our purposes because when we localise {\Lambda} to scale {x}, the {\mu} factor could also be localised at scales as large as {x}, and our understanding of the equidistribution properties of the Möbius function is basically no better than that of the von Mangoldt function, so we have not gained anything with this decomposition.

To get around this we need to find decompositions that don’t let rough functions such as the Möbius function get up to scales anywhere close to {x}. One promising identity in this regard is Linnik’s identity, which takes the form

\displaystyle  \Lambda = L - L \ast 1^* + L \ast 1^* \ast 1^* - L \ast 1^* \ast 1^* \ast 1^* + \ldots

where {1^*(n) := 1_{n \neq 1}} is the restriction of the constant function {1} to numbers larger than {1}; this is just the coefficient version of the formal geometric series identity

\displaystyle  -\frac{\zeta'(s)}{\zeta(s)} = -\zeta'(s) + \zeta'(s) (\zeta(s)-1) - \zeta'(s) (\zeta(s)-1)^2 + \ldots.

There are no Möbius functions in sight on the right-hand side, which is promising. However, Linnik’s identity has an unbounded number of terms, which render it unsuitable for our argument (we have various factors of {\tau^{O(1)}(n)} and {\log^{O(1)} x} whose exponent would become unbounded if we had to deal with Dirichlet convolutions of unbounded length, which would swamp any gain of {\log^{-A} x} that we are trying to detect). So we will rely on a truncated variant of Linnik’s identity, known as the Heath-Brown identity.

We will need a fixed positive natural number {K} (Zhang takes {K=10}; the precise choice here does not seem to be terribly important). From the identities {\Lambda = \mu \ast L} and {\delta = \mu \ast 1}, where {\delta(n) = 1_{n=1}} is the Dirichlet convolution identity, we see that we can write {\Lambda} as a {2K}-fold convolution

\displaystyle  \Lambda = \mu^{*K} \ast 1^{*K-1} \ast L \ \ \ \ \ (24)

where {\mu^{*K}} denotes the convolution of {K} copies of {\mu}.

As with (23), the identity (24) is not directly useful for our strategy because one of the {\mu} factors can still get as large as the scale {x} that we are studying {\Lambda} at. However, as Heath-Brown observed, we can manipulate (24) into a useful form by truncating the Möbius function {\mu}. More precisely, we split the Möbius function {\mu} as

\displaystyle  \mu = \mu_\leq + \mu_>

where {\mu_\leq} is the Möbius function restricted to the interval {[1,(2x)^{1/K}]}, and {\mu_>} is the Möbius function restricted to the interval {((2x)^{1/K},\infty)}. The reason for this splitting is that the {K}-fold Möbius convolution {\mu_>^{*K}} vanishes on {[1,2x]}, and in particular we have

\displaystyle  0 = \mu_>^{*K} \ast 1^{*K-1} \ast L

on {[x,2x]}. Expanding out {\mu_> = \mu - \mu_\leq} and using the binomial formula, we conclude that

\displaystyle  0 = \sum_{j=0}^K (-1)^{j} \binom{K}{j} \mu^{*K-j} \ast \mu_{\leq}^{j} \ast 1^{*K-1} \ast L \ \ \ \ \ (25)

on {[x,2x]}.

By (24), the {j=0} term of (25) is just {\Lambda}. For all the other terms, we can cancel {\mu^{*K-j}} against {1^{*K-j}} using the Möbius inversion formula {\delta = \mu \ast 1} to conclude the Heath-Brown identity

\displaystyle  \Lambda =\sum_{j=1}^K (-1)^{j-1} \binom{K}{j} \mu_{\leq}^j \ast 1^{*j-1} \ast L

on {[x,2x]}. Now we see that each of the Möbius factors {\mu_{\leq}} cannot reach scales much larger than {x^{1/K}}, although the factors {\mu_{\leq}^j} may collectively still get close to {x} when {j} is close to {K}. From the triangle inequality and Proposition 14, we thus see that to establish Theorem 13, it suffices to establish the bounds

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta((\mu_{\leq}^j \ast 1^{*j-1} \ast L) 1_{[x,2x]}; a_q\ (q))| \ll x \log^{-A} x \ \ \ \ \ (26)

whenever {\varpi,\delta > 0} are fixed and obey (20), {I \subset [1,x^\delta]}, {1 \leq j \leq K}, {A} is fixed, and {(a_q)_{q \in {\mathcal S}_I}} is a singleton congruence class system with controlled multiplicity.

This is now looking closer to the type of estimates that can be handled by Theorems 15, 16, 17, but there are still some technical issues to resolve, namely the presence of the cutoff {1_{[x,2x]}} and also the fact that the functions {\mu_{\leq}}, {1}, {L} are not localised to any given scale, but are instead spread out at across many scales. This however can be dealt with by a routine dyadic decomposition (which, in harmonic analysis, is sometimes referred to as Littlewood-Paley decomposition, at least when applied in the frequency domain), though here instead of using the usual dyadic range {\{ 2^m: m \geq 0 \}} of scales, one uses instead a sub-dyadic range {{\mathcal D} := \{ (1 + \log^{-A_0} x)^m: m \geq 0\}} to eliminate edge effects. (This trick dates back at least to the work of Fouvry; thanks to Emmanuel Kowalski for this reference.)

More precisely, let {A_0} be a large fixed number to be chosen later, and let {\psi: {\bf R} \rightarrow {\bf R}} be a smooth function supported on {[-1-\log^{-A_0} x, 1+\log^{-A_0} x]} that equals {1} on {[-1,1]} and obeys the derivative estimates

\displaystyle  |\psi^{(j)}(t)| \ll \log^{jA_0} x

for any fixed {j \geq 0} (note that the implied constant here can depend on {j}). We then have a smooth partition of unity

\displaystyle  1 = \sum_{N \in {\mathcal D}} \psi_N(n)

indexed by the multiplicative semigroup {{\mathcal D}} for any natural number {n}, where

\displaystyle  \psi_N(n) := \psi( n/N ) - \psi( (1+\log^{-A_0} x) n/N ).

We can thus decompose

\displaystyle  \mu_{\leq} = \sum_{M \in {\mathcal D}} \mu_{\leq} \psi_M

and similarly

\displaystyle  1 = \sum_{N \in {\mathcal D}} \psi_N

and

\displaystyle  L = \sum_{N \in {\mathcal D}} L\psi_N.

For {1 \leq j \leq K}, the expression

\displaystyle  (\mu_{\leq}^j \ast 1^{*j-1} \ast L) 1_{[x,2x]}

can thus be split as

\displaystyle  \sum_{N_1,\ldots,N_2j \in {\mathcal D}} (\mu_{\leq} \psi_{N_1} \ast \ldots \ast \mu_{\leq} \psi_{N_j} \ast \psi_{N_{j+1}} \ast \ldots \ast \psi_{N_{2j-1}} \ast L \psi_{N_{2j}}) 1_{[x,2x]}

which we can rewrite as

\displaystyle  \sum_{N_1,\ldots,N_{2j} \in {\mathcal D}} \log(N_{2j}) \mu_{\leq} \psi_{N_1} \ast \ldots \ast \mu_{\leq} \psi_{N_j} \ast \psi_{N_1} \ast \ldots \ast \psi_{N_{2j-1}} \ast \psi'_{N_{2j}} 1_{[x,2x]}

where {\psi'_N := \frac{L}{\log N} \psi_N} is a variant of {\psi_N}.

Observe that the summand vanishes unless

\displaystyle  N_1,\ldots,N_j \ll x^{1/K}

and

\displaystyle  (1 - O(\log^{-A_0} x)) x \leq N_1 \ldots N_{2j} \leq (1 + O(\log^{-A_0} x)) 2x

and the {1_{[x,2x]}} factor can be eliminated except in the boundary cases when

\displaystyle  N_1 \ldots N_{2j} = (1 + O(\log^{-A_0} x)) x

or

\displaystyle  N_1 \ldots N_{2j} = (1 + O(\log^{-A_0} x)) 2x.

Let us deal with the contribution of the boundary cases to (26). If we let {\alpha} be the sum of all the boundary summands, then we have

\displaystyle  \alpha(n) \ll \tau(n)^{O(1)} (\log^{O(1)} x) (1_{n = x + O(x \log^{-A_0} x)} + 1_{2n = x + O(x \log^{-A_0} x)}).

From Lemma 8 we have that

\displaystyle  \sum_n |\alpha(n)|^2 \ll x \log^{O(1)} x

and

\displaystyle  \sum_{n = a\ (q)} |\alpha(n)|^2 \ll \frac{x}{q} \tau(q)^{O(1)} \log^{O(1)} x

for any fixed {C > 0} and any {q \leq x^{1/2+2\varpi}}. On the other hand, {\alpha} is supported on two intervals of length {O( x \log^{-A_0} x )}. Thus by Cauchy-Schwarz one has

\displaystyle  \sum_n |\alpha(n)| \ll x \log^{-A_0/2+O(1)} x

and

\displaystyle  \sum_{n = a\ (q)} |\alpha(n)| \ll \frac{x}{q} \tau(q)^{O(1)} \log^{-A_0/2+O(1)} x

for any fixed {\epsilon>0}. We thus have

\displaystyle  |\Delta(\alpha; a\ (q))| \ll \frac{x}{q} \tau(q)^{O(1)} \log^{-A_0/2+O(1)} x

and hence by Lemma 8 again

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta(\alpha; a_q\ (q))| \ll x \log^{-A_0/2+O(1)} x

which is acceptable for (26) if we take {A_0} large enough. Thus it suffices to deal with the contribution of the interior summands, in which the cutoff {1_{[x,2x]}} may be dropped. There are {O(\log^{2j(A_0+1)} x)} such summands, and the {\log(N_j)} factor is bounded by {\log x}, so it will suffice to show that

\displaystyle  \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} |\Delta(\alpha_1 \ast \ldots \ast \alpha_{2j}; a_q\ (q))| \ll x \log^{-A} x \ \ \ \ \ (27)

whenever {N_1,\ldots,N_{2j} \in {\mathcal D}} and {\alpha_1,\ldots,\alpha_{2j}} are such that each {\alpha_i} takes the form {\mu_{\leq} \psi_{N_i}}, {\psi_{N_i}}, or {\psi'_{N_i}}. In particular, from Lemmas 9, (10) we observe the following facts for any {1 \leq i \leq 2j} and {S \subset \{1,\ldots,2j\}}:

  • (i) Each {\alpha_i} is a coefficient sequence at scale {N_i}. More generally the convolution {\alpha_S} of the {\alpha_i} for {i \in S} is a coefficient sequence at scale {\prod_{i \in S} N_i}.
  • (ii) If {N_i \gg x^{1/K+\epsilon}} for some fixed {\epsilon>0}, then {\alpha_i} is smooth.
  • (iii) If {N_i \gg x^\epsilon} for some fixed {\epsilon>0}, then {\alpha_i} obeys a Siegel-Walfisz theorem. More generally, {\alpha_S} obeys a Siegel-Walfisz theorem if {\prod_{i \in S} N_i \gg x^\epsilon} for some fixed {\epsilon>0}.
  • (iv) {x \ll N_1 \ldots N_{2j} \ll x}.

Now we can prove (27). We can write {x^{t_i} \ll N_i \ll x^{t_i}} for {i=1,\ldots,2j}, where the {t_i} are non-negative reals that sum to {1}. We apply Lemma 5 and conclude that the {t_i} obey one of the three conclusions (Type 0), (Type I/II), (Type III) of that lemma. Furthermore, in the Type III case, an inspection of Lemma 6 reveals that we have an additional lower bound {t_i,t_j,t_k \geq 2\delta} available, which in particular implies that {t_i,t_j,t_k \geq \frac{1}{K} + \epsilon} for some fixed {\epsilon>0} if {K} is large enough ({K=10} will certainly do).

In the Type 0 case, we can write {\alpha_1 \ast \ldots \ast \alpha_{2j}} in a form in which Theorem 15 applies. Similarly, in the Type I/II case we can write {\alpha_1 \ast \ldots \ast \alpha_{2j}} in a form in which Theorem 16 applies, and in the Type III case we can write {\alpha_1 \ast \ldots \ast \alpha_{2j}} in a form in which Theorem 17 applies. Thus in all cases we can establish (27), and Theorem 13 follows.