The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function {\mu} is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).

However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:

Theorem 1 At least one of the following two statements are true:
  • (Twin prime conjecture) There are infinitely many primes {p} such that {p+2} is also prime.
  • (No Siegel zeroes) There exists a constant {c>0} such that for every real Dirichlet character {\chi} of conductor {q > 1}, the associated Dirichlet {L}-function {s \mapsto L(s,\chi)} has no zeroes in the interval {[1-\frac{c}{\log q}, 1]}.

Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.

The strategy of Heath-Brown’s proof is fairly straightforward to describe. The usual starting point is to try to lower bound

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda(n) \Lambda(n+2) \ \ \ \ \ (1)

for some large value of {x}, where {\Lambda} is the von Mangoldt function. Actually, in this post we will work with the slight variant

\displaystyle  \sum_{x \leq n \leq 2x} \Lambda_2(n(n+2)) \nu(n(n+2))

where

\displaystyle  \Lambda_2(n) = (\mu * L^2)(n) = \sum_{d|n} \mu(d) \log^2 \frac{n}{d}

is the second von Mangoldt function, and {*} denotes Dirichlet convolution, and {\nu} is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval {x \leq n \leq 2x} and remove very small primes from {n}, but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve {\nu} is essentially replaced by the more combinatorial restriction {1_{(n(n+2),q^{1/C}\#)=1}} for some large {C}, where {q^{1/C}\#} is the primorial of {q^{1/C}}, but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)

If there is a Siegel zero {L(\beta,\chi)=0} with {\beta} close to {1} and {\chi} a Dirichlet character of conductor {q}, then multiplicative number theory methods can be used to show that the Möbius function {\mu} “pretends” to be like the character {\chi} in the sense that {\mu(p) \approx \chi(p)} for “most” primes {p} near {q} (e.g. in the range {q^\varepsilon \leq p \leq q^C} for some small {\varepsilon>0} and large {C>0}). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.

The fact that {\mu} pretends to be like {\chi} can be used to construct a tractable approximation (after inserting the sieve weight {\nu}) in the range {[x,2x]} (where {x = q^C} for some large {C}) for the second von Mangoldt function {\Lambda_2}, namely the function

\displaystyle  \tilde \Lambda_2(n) := (\chi * L)(n) = \sum_{d|n} \chi(d) \log^2 \frac{n}{d}.

Roughly speaking, we think of the periodic function {\chi} and the slowly varying function {\log^2} as being of about the same “complexity” as the constant function {1}, so that {\tilde \Lambda_2} is roughly of the same “complexity” as the divisor function

\displaystyle  \tau(n) := (1*1)(n) = \sum_{d|n} 1,

which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate {\sum_{x \leq n \leq 2x} \tau(n)} to accuracy {O(\sqrt{x})} with little difficulty, whereas to obtain a comparable level of accuracy for {\sum_{x \leq n \leq 2x} \Lambda(n)} or {\sum_{x \leq n \leq 2x} \Lambda_2(n)} is essentially the Riemann hypothesis.)

One expects {\tilde \Lambda_2(n)} to be a good approximant to {\Lambda_2(n)} if {n} is of size {O(x)} and has no prime factors less than {q^{1/C}} for some large constant {C}. The Selberg sieve {\nu} will be mostly supported on numbers with no prime factor less than {q^{1/C}}. As such, one can hope to approximate (1) by the expression

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)) \nu(n(n+2)); \ \ \ \ \ (2)

as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum

\displaystyle  \sum_{x \leq n \leq 2x} \tilde \Lambda_2(n(n+2)).

As discussed above, this sum should be thought of as a slightly more complicated version of the sum

\displaystyle \sum_{x \leq n \leq 2x} \tau(n(n+2)). \ \ \ \ \ (3)

Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation {ab+2=cd} with {a,b,c,d} in various ranges; this is clearly related to understanding the equidistribution of the hyperbola {\{ (a,b) \in {\bf Z}/d{\bf Z}: ab + 2 = 0 \hbox{ mod } d \}} in {({\bf Z}/d{\bf Z})^2}. Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{a_1 m + a_2 \overline{m}}{r} )

where {\overline{m}} denotes the inverse of {m} in {({\bf Z}/r{\bf Z})^\times}. One can then use the Weil bound

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{1/2 + o(1)} (a,b,r)^{1/2} \ \ \ \ \ (4)

where {(a,b,r)} is the greatest common divisor of {a,b,r} (with the convention that this is equal to {r} if {a,b} vanish), and the {o(1)} decays to zero as {r \rightarrow \infty}. The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that {x=q^C} with {C} large enough).

Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of {r} will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:

Lemma 2 (Kloosterman bound) One has

\displaystyle  \sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} ) \ll r^{3/4 + o(1)} (a,b,r)^{1/4} \ \ \ \ \ (5)

whenever {r \geq 1} and {a,b} are coprime to {r}, where the {o(1)} is with respect to the limit {r \rightarrow \infty} (and is uniform in {a,b}).

Proof: Observe from change of variables that the Kloosterman sum {\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e( \frac{am+b\overline{m}}{r} )} is unchanged if one replaces {(a,b)} with {(\lambda a, \lambda^{-1} b)} for {\lambda \in ({\bf Z}/d{\bf Z})^\times}. For fixed {a,b}, the number of such pairs {(\lambda a, \lambda^{-1} b)} is at least {r^{1-o(1)} / (a,b,r)}, thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound

\displaystyle  \sum_{a,b \in {\bf Z}/r{\bf Z}} |\sum_{m \in ({\bf Z}/r{\bf Z})^\times} e\left( \frac{am+b\overline{m}}{r} \right)|^4 \ll d^{4+o(1)}.

The left-hand side can be rearranged as

\displaystyle  \sum_{m_1,m_2,m_3,m_4 \in ({\bf Z}/r{\bf Z})^\times} \sum_{a,b \in {\bf Z}/d{\bf Z}}

\displaystyle  e\left( \frac{a(m_1+m_2-m_3-m_4) + b(\overline{m_1}+\overline{m_2}-\overline{m_3}-\overline{m_4})}{r} \right)

which by Fourier summation is equal to

\displaystyle  d^2 \# \{ (m_1,m_2,m_3,m_4) \in (({\bf Z}/r{\bf Z})^\times)^4:

\displaystyle  m_1+m_2-m_3-m_4 = \frac{1}{m_1} + \frac{1}{m_2} - \frac{1}{m_3} - \frac{1}{m_4} = 0 \hbox{ mod } r \}.

Observe from the quadratic formula and the divisor bound that each pair {(x,y)\in ({\bf Z}/r{\bf Z})^2} has at most {O(r^{o(1)})} solutions {(m_1,m_2)} to the system of equations {m_1+m_2=x; \frac{1}{m_1} + \frac{1}{m_2} = y}. Hence the number of quadruples {(m_1,m_2,m_3,m_4)} of the desired form is {r^{2+o(1)}}, and the claim follows. \Box

We will also need another easy case of the Weil bound to handle some other portions of (2):

Lemma 3 (Easy Weil bound) Let {\chi} be a primitive real Dirichlet character of conductor {q}, and let {a,b,c,d \in{\bf Z}/q{\bf Z}}. Then

\displaystyle  \sum_{n \in {\bf Z}/q{\bf Z}} \chi(an+b) \chi(cn+d) \ll q^{o(1)} (ad-bc, q).

Proof: As {q} is the conductor of a primitive real Dirichlet character, {q} is equal to {2^j} times a squarefree odd number for some {j \leq 3}. By the Chinese remainder theorem, it thus suffices to establish the claim when {q} is an odd prime. We may assume that {ad-bc} is not divisible by this prime {q}, as the claim is trivial otherwise. If {a} vanishes then {c} does not vanish, and the claim follows from the mean zero nature of {\chi}; similarly if {c} vanishes. Hence we may assume that {a,c} do not vanish, and then we can normalise them to equal {1}. By completing the square it now suffices to show that

\displaystyle  \sum_{n \in {\bf Z}/p{\bf Z}} \chi( n^2 - b ) \ll 1

whenever {b \neq 0 \hbox{ mod } p}. As {\chi} is {+1} on the quadratic residues and {-1} on the non-residues, it now suffices to show that

\displaystyle  \# \{ (m,n) \in ({\bf Z}/p{\bf Z})^2: n^2 - b = m^2 \} = p + O(1).

But by making the change of variables {(x,y) = (n+m,n-m)}, the left-hand side becomes {\# \{ (x,y) \in ({\bf Z}/p{\bf Z})^2: xy=b\}}, and the claim follows. \Box

While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function {\Lambda_2(n(n+2))} in place of {\Lambda(n) \Lambda(n+2)}. These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.

— 1. Consequences of a Siegel zero —

It is convenient to phrase Heath-Brown’s theorem in the following equivalent form:

Theorem 4 Suppose one has a sequence {\chi_{\bf n}} of real Dirichlet characters of conductor {q_{\bf n}} going to infinity, and a sequence of real zeroes {L(\beta_{\bf n},\chi_{\bf n}) = 0} with {\beta_{\bf n} = 1 - o( \frac{1}{\log q_{\bf n}})} as {{\bf n} \rightarrow \infty}. Then there are infinitely many prime twins.

Henceforth, we omit the dependence on {{\bf n}} from all of our quantities (unless they are explicitly declared to be “fixed”), and the asymptotic notation {o(1)}, {O(1)}, {\ll}, etc. will always be understood to be with respect to the {{\bf n}} parameter, e.g. {X \ll Y} means that {X \leq CY} for some fixed {C}. (In the language of this previous blog post, we are thus implicitly using “cheap nonstandard analysis”, although we will not explicitly use nonstandard analysis notation (other than the asymptotic notation mentioned above) further in this post. With this convention, we now have a single (but not fixed) Dirichlet character {\chi} of some conductor {q} with a Siegel zero

\displaystyle  \beta = 1 - o(\frac{1}{\log q}). \ \ \ \ \ (6)

It will also be convenient to use the crude bound

\displaystyle  1 - \beta \gg q^{-O(1)} \ \ \ \ \ (7)

which can be proven by elementary means (see e.g. Exercise 57 of this post), although one can use Siegel’s theorem to obtain the better bound {1 - \beta \gg q^{-o(1)}}. Standard arguments (see also Lemma 59 of this blog post) then give

\displaystyle  L(1,\chi) \gg q^{-O(1)} \ \ \ \ \ (8)

We now use this Siegel zero to show that {\mu} pretends to be like {\chi} for primes that are comparable (in log-scale) to {q}:

Lemma 5 For any fixed {0 < \varepsilon < C}, we have

\displaystyle  \sum_{q^\varepsilon \leq p \leq q^C: \chi(p) \neq -1} \frac{1}{p} = o(1).

For more precise estimates on the {o(1)} error, see the paper of Heath-Brown (particularly Lemma 3).

Proof: It suffices to show, for sufficiently large fixed {C>0}, that

\displaystyle  \sum_{q^{C/k} < p \leq q^{2C/k}: \chi(p) \neq -1} \frac{1}{p} = o(1)

for each fixed natural number {k}.

We begin by considering the sum

\displaystyle  \sum_{n \leq x} \frac{1*\chi(n)}{n^\beta} \ \ \ \ \ (9)

for some large {x} (which we will eventually take to be a power of {q}); we will exploit the fact that this sum is very stable for {x} comparable to {q} in log-scale. By the Dirichlet hyperbola method, we can write this as

\displaystyle  \sum_{d \leq \sqrt{x}} \frac{1}{d^\beta} \sum_{m \leq x/d} \frac{\chi(m)}{m^\beta} + \sum_{m < \sqrt{x}} \frac{\chi(m)}{m^\beta} \sum_{\sqrt{x} < d \leq x/m} \frac{1}{d^\beta}

Since {L(\beta,\chi) = 0}, one can show through summation by parts (see Lemma 71 of this previous post) that

\displaystyle  \sum_{m \leq y} \frac{\chi(m)}{m^\beta} \ll \frac{q}{y^\beta}

for any {y \geq 1}, while from the integral test (see Lemma 2 of this previous post) we have

\displaystyle  \sum_{\sqrt{x} < d \leq x/m} \frac{1}{d^\beta} = \frac{(x/m)^{1-\beta}-\sqrt{x}^{1-\beta}}{1-\beta} + O( \frac{1}{\sqrt{x}^\beta}).

We can thus estimate (9) as

\displaystyle  \sum_{d \leq \sqrt{x}} O( \frac{q}{x^\beta} ) + \frac{x^{1-\beta}}{1-\beta} \sum_{m < \sqrt{x}} \frac{\chi(m)}{m} - \frac{x^{(1-\beta)/2}}{1-\beta} \sum_{m < \sqrt{x}} \frac{\chi(m)}{m^\beta}

\displaystyle + \sum_{m < \sqrt{x}} O( \frac{1}{m^\beta \sqrt{x}^\beta} ).

From summation by parts we again have

\displaystyle  \sum_{m < \sqrt{x}} \frac{\chi(m)}{m} = L(1,\chi) + O( \frac{q}{\sqrt{x}})

and we have the crude bound

\displaystyle  \sum_{m < \sqrt{x}} \frac{1}{m^\beta} \ll \frac{x^{1-\beta}}{1-\beta}

so by using (7) and {x^{1-\beta} = x^{o(1)}} we arrive at

\displaystyle  \sum_{n \leq x} \frac{1*\chi(n)}{n^\beta} = \frac{x^{1-\beta}}{1-\beta} L(1,\chi) + O( q^{O(1)} x^{-1/2+o(1)} ).

for any {x > 1}, where the {O(1)} exponent does not depend on {C}. In particular, if {q^C \leq x \leq q^{3C}} and {C} is large enough, then by (6), (7), (8) we have

\displaystyle \sum_{n \leq x} \frac{1*\chi(n)}{n^\beta} = \frac{1+o(1)}{1-\beta} L(1,\chi).

Setting {x=q^C} and {x=q^{3C}} and subtracting, we conclude that

\displaystyle  \sum_{q^C < n \leq q^{3C}} \frac{1*\chi(n)}{n^\beta} = o( \sum_{n \leq q^{C}} \frac{1*\chi(n)}{n^\beta} ). \ \ \ \ \ (10)

On the other hand, observe that {1*\chi} is always non-negative, and that {1*\chi(p_1 \dots p_k n) \geq 1*\chi(n)} whenever {n \leq q^C} and {q^{C/k} < p_1,\dots,p_k \leq q^{2C/k}}, with {p_1,\dots,p_k} primes with {\chi(p_1),\dots,\chi(p_k) \neq -1}. Since any number {N} with {q^C < N \leq q^{3C}} has at most {O(1)} representations of the form {N = p_1 \dots p_k n} with {n \leq q^C} and {q^{C/k} < p \leq q^{2C/k}}, and no {N} outside of the range {q^C < N \leq q^{3C}} has such a representation, we thus see that

\displaystyle  \sum_{q^C < N \leq q^{3C}} \frac{1*\chi(N)}{N^\beta} \gg (\sum_{n \leq q^C} \frac{1*\chi(n)}{n^\beta}) (\sum_{q^{C/k} < p \leq q^{2C/k}: \chi(p) \neq -1} \frac{1}{p^\beta})^k.

Comparing this with (10), we conclude that

\displaystyle  \sum_{q^{C/k} < p \leq q^{2C/k}: \chi(p) \neq -1} \frac{1}{p^\beta} = o(1);

since {1/p \leq 1/p^\beta}, the claim follows. \Box

— 2. Main argument —

We let {w} be a large absolute constant ({w=100} will do) and set {W := \prod_{p \leq w} p} to be the primorial of {w}. Set {x := q^C} for some large fixed {C} (large compared to {w} or {W}). Let {\psi: {\bf R} \rightarrow {\bf R}} be a smooth non-negative function supported on {[-1/2,1/2]} and equal to {1} at {0}. Set

\displaystyle  f(n) := \psi( \frac{\log n}{\log q} )

and

\displaystyle  \psi_x(n) := \psi( \log^C x (\frac{n}{x}-1) ).

Thus {f(n)} is a smooth cutoff to the region {n \leq \sqrt{q}}, and {\psi_x(n)} is a smooth cutoff to the region {n = (1+O(\log^{-C} x)) x}. It will suffice to establish the lower bound

\displaystyle  \sum_{n:(n(n+2),W)=1} \Lambda_2(n(n+2)) (\mu f * 1)(n(n+2)) \psi_x(n)

\displaystyle  \gg (1-o(1)) x \log^{-C} x,

because the non-twin primes {n} contribute at most {O(x^{1/2+o(1)})} to the left-hand side. The weight {(\mu f*1)(n(n+2))} is an unsquared Selberg sieve designed to damp out those {n} for which {n} or {n+2} have somewhat small prime factors; we did not square this weight as is customary with the Selberg sieve in order to simplify the calculations slightly (the fact that the weight can be non-negative sometimes will not be a serious concern for us).

We split {1*\chi} as

\displaystyle  1*\chi(n) = 1_{n=1} + g(n). \ \ \ \ \ (11)

Thus {g} is non-negative, and supported on those products {p_1 \dots p_k} of primes with {k \geq 1} and {\chi(p_1),\dots,\chi(p_k) \neq -1}, times a square. Convolving (11) by {\Lambda_2} and using the identity {\Lambda_2*1=L^2}, we have

\displaystyle  \tilde \Lambda_2 = \Lambda_2 + \Lambda_2 * g

where {\tilde \Lambda_2 := \chi * L^2}. (The quantities {\Lambda_2, g, \tilde \Lambda_2} are all non-negative, but we will not take advantage of these facts here.) It thus suffices to establish the two bounds

\displaystyle  \sum_{n:(n(n+2),W)=1} \tilde \Lambda_2(n (n+2)) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (12)

\displaystyle  \gg (1-o(1)) x \log^{-C} x

and

\displaystyle  \sum_{n:(n(n+2),W)=1} (\Lambda_2*g)(n (n+2)) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (13)

\displaystyle  = o(x \log^{-C} x);

the intuition here is that Lemma 5 is showing that {g} is “sparse” and so the contribution of {\Lambda_2 * g} should be relatively small.

We begin with (13). Let {\varepsilon > 0} be a small fixed quantity to be chosen later. Observe that if {(\Lambda_2*g)(n(n+2))} is non-zero, then {n(n+2)} must have a factor on which {g} is non-zero, which implies that {n(n+2)} is either divisible by a prime {p} with {\chi(p) \neq -1}, or by the square of a prime. If the former case occurs, then either {n} or {n+2} is divisible by {p}; since {n,n+2 \leq 2x}, this implies that either {n(n+2)} is divisible by a prime {p} with {x^\varepsilon \leq p \leq 2x^{1-\varepsilon}}, or that {n(n+2)} is divisible by a prime less than {x^\varepsilon}. To summarise, at least one of the following three statements must hold:

  • {n(n+2)} is divisible by a prime {w < p < x^\varepsilon}.
  • {n(n+2)} is divisible by the square {p^2} of a prime {p \geq x^\varepsilon}.
  • {n(n+2)} is divisible by a prime {x^\varepsilon \leq p \leq 2x^{1-\varepsilon}} with {\chi(p) \neq -1}.

It thus suffices to establish the estimates

\displaystyle  \sum_{w < p < x^\varepsilon} \sum_{n: p|n(n+2),(n(n+2),W)=1} (\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2))| \ \ \ \ \ (14)

\displaystyle \psi_x(n) \ll_C \varepsilon x \log^{-C} x,

\displaystyle  \sum_{p \geq x^\varepsilon} \sum_{n: p^2|n(n+2)} (\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2))| \ \ \ \ \ (15)

\displaystyle  \psi_x(n) = o(x \log^{-C} x),

and

\displaystyle  \sum_{x^\varepsilon \leq p \geq 2x^{1-\varepsilon}: \chi(p) \neq -1} \sum_{n: p|n(n+2)} (\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2))| \ \ \ \ \ (16)

\displaystyle  \psi_x(n) = o(x \log^{-C} x),

as the claim then follows by summing and sending {\varepsilon} slowly to zero.

We begin with (15). Observe that if {p^2} divides {n(n+2)} then either {p^2} divides {n} or {p^2} divides {(n+2)}. In particular the number of {n \leq 2x} with {p^2 | n(n+2)} is {O( \frac{x}{p^2} )}. The summand {(\Lambda_2*g)(n (n+2)) |(\mu f * 1)(n(n+2)| \psi_x(n)} is {O(x^{o(1)})} by the divisor bound, so the left-hand side of (15) is bounded by

\displaystyle  \ll \sum_{p \geq x^\varepsilon} \frac{x}{p^2} x^{o(1)} \ll x^{1-\varepsilon+o(1)}

and the claim follows.

Next we turn to (14). We can very crudely bound

\displaystyle  \Lambda_2*g(n(n+2)) \ll \tau(n(n+2))^{O(1)} \log^2 x, \ \ \ \ \ (17)

so it suffices to show that

\displaystyle  \sum_{w < p < x^\varepsilon} \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n)

\displaystyle  \ll_C \varepsilon x \log^{-C} x.

By Mertens’ theorem, it suffices to show that

\displaystyle  \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n) \ \ \ \ \ (18)

\displaystyle  \ll_C \frac{\log p}{p \log q} x \log^{-C} x

for all {w < p < x^\varepsilon}.

We use a modification of the argument used to prove Proposition 4.2 of this Polymath8b paper. By Fourier inversion, we may write

\displaystyle  e^u \psi(u) = \int_{\bf R} \Psi(t) e^{-itu}\ dt

for some rapidly decreasing function {\Psi}, so that

\displaystyle  f(d) = \int_{\bf R} \frac{1}{d^{(1+it)/\log q}} \Psi(t)\ dt,

and hence

\displaystyle  \mu f * 1(n) = \int_{\bf R} \sum_{d|n} \frac{\mu(d)}{d^{(1+it)/\log q}} \Psi(t)\ dt

\displaystyle  = \int_{\bf R} \prod_{p'|n} (1 - \frac{1}{(p')^{(1+it)/\log q}})\ \Psi(t)\ dt

and hence by the triangle inequality

\displaystyle  (\mu f*1)(n(n+2)) \ll_A \int_{\bf R} \prod_{p'|n(n+2)} O( \min( 1, (1+|t|) \frac{\log p'}{\log q}) )\ \frac{dt}{(1+|t|)^A}

for any fixed {A>0}. Since {\prod_{p'|n(n+2)} O(1) \ll \tau(n(n+2))^{O(1)}}, we can thus (after substituting {\sigma := 1+|t|}) bound the left-hand side of (18) by

\displaystyle  \ll_A \int_1^\infty \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n(n+2))^{O(1)} \prod_{p'|n(n+2)} \min( 1, \sigma \frac{\log p'}{\log q})\ \frac{d\sigma}{\sigma^A}

and so it will suffice to show the bound

\displaystyle  \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n(n+2))^{O(1)} \prod_{p'|n(n+2)} \min( 1, \sigma \frac{\log p'}{\log q}) \psi_x(n) \ \ \ \ \ (19)

\displaystyle  \ll_C \sigma^{O_C(1)} \frac{\log p}{p \log q} x \log^{-C-2} x

for any {\sigma \geq 1} and {p \leq 2x^{1-\varepsilon}}.

We factor {n(n+2) = p_1 \dots p_r} where {w < p_1 \leq \dots \leq p_r} are primes, and then write {n(n+2) = dm} where {d = p_1 \dots p_i} and {i} is the largest index for which {p_1 \dots p_i < x^{1/10}}. Clearly {0 \leq i < r} and {d < x^{1/10}} with {(d,W)=1}, and the least prime factor {p(m) = p_{i+1}} of {m} is such that

\displaystyle  p_{i+1} \geq (p_1 \dots p_{i+1})^{1/(i+1)} \geq x^{\frac{1}{10(i+1)}};

we have {n(n+2) \ll x^2} on the support of {\psi_x(n)}, and so

\displaystyle  r \ll 1+i

and thus {\tau(n(n+2)) \ll \exp( O(i) )}. Clearly we have

\displaystyle  \prod_{p'|n(n+2)} \min( 1, \sigma \frac{\log p'}{\log q}) \leq \prod_{p'|[p,d]} \min( 1, \sigma \frac{\log p'}{\log q}).

We write {i = \Omega(d)}, where {\Omega(d)} denotes the number of prime factors of {d} counting multiplicity. We can thus bound the left-hand side of (19) by

\displaystyle  \ll \sum_{d < x^{1/10}; (d,W)=1} \exp( O( \Omega(d) ) ) \prod_{p'|[p,d]} \min( 1, \sigma \frac{\log p'}{\log q})

\displaystyle  \sum_{n: [p,d]|n(n+2); p(\frac{n(n+2)}{d}) \geq x^{\frac{1}{10(\Omega(d)+1)}}} \psi_x(n).

We may replace the {\psi_x(n)} weight with a restriction of {n} to the interval {[x - O(x \log^{-C} x), x + O(x \log^{-C} x)]}. The constraint {p(\frac{n(n+2)}{d}) \geq x^{\frac{1}{10(\Omega(d)+1)}}} removes two residue classes modulo every odd prime less than {x^{\frac{1}{10(\Omega(d)+1)}}}, while the constraint {[p,d]|n(n+2)} restricts {n} to {O( \exp( O(\Omega(d))))} residue classes modulo {[p,d]}. Standard sieve theory then gives

\displaystyle  \sum_{n: [p,d]|n(n+2); p(\frac{n(n+2)}{d} \geq x^{\frac{1}{10(\Omega(d)+1)}})} \psi_x(n) \ll \exp( O(\Omega(d))) \frac{x}{[p,d]} \log^{-C-2} x

and so we are reduced to showing that

\displaystyle  \sum_{d < x^{1/10}; (d,W)=1} \frac{\exp( O( \Omega(d) ) )}{[p,d]} \prod_{p'|[p,d]} \min( 1, \sigma \frac{\log p'}{\log q}) \ll \sigma^{O(1)} \frac{\log p}{p \log q}.

Factoring {d = p_1 \dots p_i}, we can bound the left-hand side by

\displaystyle  \ll \frac{\min(1, \sigma \frac{\log p}{\log q})}{p} \prod_{w < p' < x^{1/10}: p' \neq p} (1 + \sum_{j=1}^\infty \frac{\exp(O(j))}{(p')^j} \min( 1, \sigma \frac{\log p'}{\log q}) )

which (for {w} large enough) is bounded by

\displaystyle  \ll \sigma \frac{\log p}{p \log q} \prod_{w < p' < x^{1/10}} (1 + O( \frac{1}{p'} \min( 1, \sigma \frac{\log p'}{\log q}) ) )

\displaystyle  \ll \sigma \frac{\log p}{p \log q} \exp( O( \sum_{p' < x^{1/10}} \frac{1}{p'} \min( 1, \sigma \frac{\log p'}{\log q}) ) )

which by Mertens’ theorem is bounded by

\displaystyle  \ll \sigma \frac{\log p}{p \log q} \exp( O_C( \log \sigma ) )

and the claim follows.

For future reference we observe that the above arguments also establish the bound

\displaystyle  \sum_{n: (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n) \ \ \ \ \ (20)

\displaystyle  \ll_C \frac{1}{\log q} x \log^{-C-2} x

and (if one replaces {x^{1/10}} with {x^{\varepsilon/10}})

\displaystyle  \sum_{n: p|n(n+2); (n(n+2),W)=1} \tau(n (n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n) \ \ \ \ \ (21)

\displaystyle  \ll_{C,\varepsilon} \frac{\log p}{p \log q} x \log^{-C-2} x

for all {p \leq 2x^{1-\varepsilon}}.

Finally, we turn to (16). Using (17) again, it suffices to show that

\displaystyle  \sum_{x^\varepsilon \leq p \geq 2x^{1-\varepsilon}: \chi(p) \neq -1} \sum_{n: p|n(n+2)} \tau(n(n+2))^{O(1)} |(\mu f * 1)(n(n+2)| \psi_x(n)

\displaystyle  = o(x \log^{-C-2} x).

The claim then follows from (21) and Lemma 5.

It remains to prove (12), which we write as

\displaystyle  \sum_{n: (n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) \log^2 \frac{n(n+2)}{d}\right) (\mu f * 1)(n(n+2)) \psi_x(n)

\displaystyle \gg (1-o(1)) x \log^{-C} x.

On the support of {\psi_x(n)}, we can write

\displaystyle  \log^2 \frac{n(n+2)}{d} = \log^2 \frac{x^2}{d} + O( \log^{-C+O(1)} x).

The contribution of the error term can be bounded by

\displaystyle  O( \log^{-C+O(1)} x \sum_n \tau(n(n+2))^{O(1)} |(\mu f * 1)(n(n+2))| \psi_x(n) );

applying (20), this is bounded by {O( x \log^{-2C+O(1)} x)} which is acceptable for {C} large enough. Thus it suffices to show that

\displaystyle  \sum_{n: (n(n+2),W)=1} (\sum_{d|n(n+2)} \chi(d) \log^2 \frac{x^2}{d}) (\mu f * 1)(n(n+2)) \psi_x(n)

\displaystyle  \gg (1-o(1)) x \log^{-C} x

which we write as

\displaystyle  \sum_{n:(n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n)/

\displaystyle  \gg (1-o(1)) x \log^{-C-2} x

where {G(u) := (2-u)^2}. We split {G = G_< + G_\sim + G_>} where {G_<}, {G_\sim}, {G_>} are smooth truncations of {G} to the intervals {(-\infty,0.99)}, {(0.98, 1.02)}, and {(1.01, +\infty)} respectively. It will suffice to establish the bounds

\displaystyle  \sum_{n:(n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G_<( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (22)

\displaystyle  \gg (1-o(1)) x \log^{-C-2} x

\displaystyle  \sum_{n:(n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G_\sim( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (23)

\displaystyle  = o(x \log^{-C-2} x)

\displaystyle  \sum_{n: (n(n+2),W)=1} \left(\sum_{d|n(n+2)} \chi(d) G_>( \frac{\log d}{\log x} )\right) (\mu f * 1)(n(n+2)) \psi_x(n) \ \ \ \ \ (24)

\displaystyle  = o(x \log^{-C-2} x)

We begin with (24), which is a relatively easy consequence of the cancellation properties of {\chi}. We may rewrite the left-hand side as

\displaystyle  \sum_e \mu(e) f(e) \sum_m \sum_{n: (n(n+2),W)=1; [e,m] | n(n+2)} \chi\left(\frac{n(n+2)}{m}\right)

\displaystyle G_>\left( \frac{\log \frac{n(n+2)}{m}}{\log x} \right) \psi_x(n).

The summand vanishes unless {m \ll x^{0.99}}, {e \leq q}, and {[e,m]/m} is coprime to {q}, so that {[e,m] \ll q x^{0.99}}. For fixed {e,m}, the constraints {(n(n+2),W)=1}, {[e,m]|n(n+2)} restricts {n} to {O_w(x^{o(1)})} residue classes of the form {a \hbox{ mod } W[e,m]}, with {[e,m]|a(a+2)}, in particular {m_1 | a} and {m_2 | a+2} for some {m_1,m_2} with {m = m_1 m_2}. Let us fix {e,m,a,m_1,m_2} and consider the sum

\displaystyle  \sum_{n: n = a \hbox{ mod } W[e,m]} \chi(\frac{n(n+2)}{m}) G_>( \frac{\log n(n+2)/m}{\log x} ) \psi_x(n).

Writing {n = k W[e,m] + a}, this becomes

\displaystyle  \sum_k \chi( \frac{W[e,m]}{m_1} k + \frac{a}{m_1} ) \chi( \frac{W[e,m]}{m_2} k + \frac{a+2}{m_2} )

\displaystyle G_>( \frac{\log (kW[e,m]+a)(kW[e,m]+a+2)/m}{\log x} ) \psi_x(kW[e,m]+a).

From Lemma 3, we have

\displaystyle  \sum_{k \in {\bf Z}/q{\bf Z}} \chi( \frac{W[e,m]}{m_1} k + \frac{a}{m_1} ) \chi( \frac{W[e,m]}{m_2} k + \frac{a+2}{m_2} ) \ll x^{o(1)} (2 \frac{W[e,m]}{m},q)

\displaystyle  \ll x^{o(1)}

since {[e,m]/m} is coprime to {q}. From summation by parts we thus have

\displaystyle  \sum_k \chi( \frac{W[e,m]}{m_1} k + \frac{a}{m_1} ) \chi( \frac{W[e,m]}{m_2} k + \frac{a+2}{m_2} )

\displaystyle G_>( \frac{\log (kW[e,m]+a)(kW[e,m]+a+2)/m}{\log x} ) \psi_x(kW[e,m]+a)

\displaystyle  \ll \sup_{I: |I| \ll x/[e,m]} |\sum_{k \in I} \chi( \frac{W[e,m]}{m_1} k + \frac{a}{m_1} ) \chi( \frac{W[e,m]}{m_2} k + \frac{a+2}{m_2} ) |

\displaystyle \ll x^{o(1)} \frac{x}{q[e,m]} + q

\displaystyle \ll q^{-1} x^{o(1)} \frac{x}{[e,m]}

(noting that {q \ll \frac{x}{q[e,m]}} if {C} is large enough) and so we can bound the left-hand side of (24) in magnitude by

\displaystyle  q^{-1} x^{1+o(1)} \sum_{e \leq q} \sum_{m \ll x^{0.99}} \frac{1}{[e,m]} \ll q^{-1} x^{1+o(1)} \sum_{d \ll q x^{0.99}} \frac{x^{o(1)}}{d}

\displaystyle  \ll q^{-1} x^{1+o(1)}

and (24) follows.

Now we prove (23), which is where we need nontrivial bounds on Kloosterman sums. Expanding out {\mu f * 1} and using the triangle inequality, it suffices (for {C} large enough) to show that

\displaystyle \sum_{n: (n(n+2),W)=1; r|n(n+2)} (\sum_{d|n(n+2)} \chi(d) G_\sim( \frac{\log d}{\log x} )) \psi_x(n) \ll q^{O(1)} x^{0.99+o(1)}

for all {r < q^{1/2}}. By Fourier expansion of the {r|n(n+2)} and {(n(n+2),W)=1} constraints (retaining only the restriction that {n} is odd), it suffices to show that

\displaystyle \sum_{n \hbox{ odd}} (\sum_{d|n(n+2)} \chi(d) G_\sim( \frac{\log d}{\log x} )) \psi_x(n) e( kn/Wr )\ll q^{O(1)} x^{0.99+o(1)}

for every {k \in {\bf Z}/Wr{\bf Z}}.

Fix {r,k}. If {d|n(n+2)} for an odd {n}, then we can uniquely factor {d = d_1 d_2} such that {d_1|n}, {d_2|n+2}, and {(d_1,d_2)=1}. It thus suffices to show that

\displaystyle  \sum_{d_1,d_2: (d_1,d_2)=1} \chi(d_1) \chi(d_2) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) \ \ \ \ \ (25)

\displaystyle  \sum_{n \hbox{ odd}: d_1|n; d_2|n+2}\psi_x(n) e( kn/Wr)

\displaystyle \ll q^{O(1)} x^{0.99+o(1)}.

Actually, we may delete the condition {(d_1,d_2)=1} since this is implied by the constraints {d_1|n, d_2|n+2} and {n} odd.

We first dispose of the case when {d_1} is large in the sense that {d_1 \geq x^{0.51}}. Making the change of variables {d_3 = n/d_1}, we may rewrite the left-hand side as

\displaystyle  \sum_{d_3,d_2} \chi(d_2) \sum_{n \hbox{ odd}: d_3|n; d_2|n+2} \chi( \frac{n}{d_3} ) G_\sim( \frac{\log n - \log d_3 + \log d_2}{\log x} )

\displaystyle  \psi_x(n) e(kn/Wr).

We can assume {d_2} is coprime to {q} and {d_2,d_3} odd with {d_3} coprime to {d_2} and {d_2,d_3 \ll x^{0.49}}, as the contribution of all other cases vanish. The constraints that {n} is odd and {d_3|n, d_2|n+2} then restricts {n} to a single residue class modulo {2 d_2 d_3}, with {n/d_3} restricted to a single residue class modulo {2 d_2}. We split this into {Wr} residue classes modulo {2Wd_2 r} to make the {e(kn/Wr)} phase constant on each residue class. The modulus {2Wd_2 r} is not divisible by {q}, since {d_2} is coprime to {q} and {r \leq \sqrt{q}}. As such, {\chi(\frac{n}{d_3})} has mean zero on every consecutive {q} elements in each residue class modulo {2Wd_2 r} under consideration, and from summation by parts we then have

\displaystyle  \sum_{n \hbox{odd}: d_3|n; d_2|n+2} \chi( \frac{n}{d_3} ) G_\sim( \frac{\log n - \log d_3 + \log d_2}{\log x} ) \psi_x(n) e(kn/Wr)

\displaystyle  \ll q Wr

and hence the contribution of the {d_1 \geq x^{0.51}} case to (25) is

\displaystyle  \ll \sum_{d_3,d_2 \ll x^{0.49}} q Wr \ll q^{O(1)} x^{0.98}

which is acceptable.

It remains to control the contribution of the {d_1 < x^{0.51}} case to (25). By the triangle inequality, it suffices to show that

\displaystyle  \sum_{d_2} \chi(d_2) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) \sum_{n \hbox{odd}: d_1|n; d_2|n+2}\psi_x(n) e(kn/Wr)

\displaystyle \ll q^{O(1)} x^{0.99+o(1)} / d_1

for all {d_1 < x^{0.51}} coprime to {q}. We can of course restrict {d_1,d_2} to be coprime to each other and to {W}. Writing {n+2 = d_2 (2m+1)}, the constraint {d_1|n} is equivalent to

\displaystyle  m = \overline{d_2} - \overline{2} \hbox{ mod } d_1

and so we can rewrite the left-hand side as

\displaystyle  \sum_{d_2: (d_1,d_2)=1} \chi(d_2) G_\sim( \frac{\log d_1 + \log d_2}{\log x} ))

\displaystyle \sum_{m = \overline{d_2} - \overline{2} \hbox{ mod } d_1} \psi_x(d_2 (2m+1)-2) e(k(d_2(2m+1))/Wr) e(-2k/Wr).

By Fourier expansion, we can write {\chi(d_2)} as a linear combination of {e( l d_2 / q)} with bounded coefficients and {(l,q)=1}, so it suffices to show that

\displaystyle  \sum_{d_2: (d_1,d_2)=1} e(ld_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} ))

\displaystyle \sum_{m = \overline{d_2} - \overline{2} \hbox{ mod } d_1} \psi_x(d_2 (2m+1)-2) e( 2kd_2 m / Wr )

\displaystyle \ll q^{O(1)} x^{0.99+o(1)} / d_1.

Next, by Fourier expansion of the constraint {m = \overline{d_2} - \overline{2} \hbox{ mod } d_1}, we write the left-hand side as

\displaystyle  \frac{1}{d_1} \sum_{h \in {\bf Z}/d_1 {\bf Z}} \sum_{d_2: (d_1 d_2)=1} e(ld_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h(\overline{d_2} - \overline{2})/d_1)

\displaystyle  \sum_m e(-hm/d_1) \psi_x(d_2 (2m+1)-2) e( 2kd_2 m / Wr ).

From Poisson summation and the smoothness of {\psi}, we see that the inner sum is {O(x^{-100})} unless

\displaystyle  \| \frac{h}{d_1} - \frac{c}{Wr} \| \ll \frac{x^{0.01}d_2}{x} \ \ \ \ \ (26)

for some integer {c}, where {\|\theta\|} denotes the distance from {\theta} to the nearest integer. The contribution of the {h} which do not satisfy this relation is easily seen to be acceptable. From the support of {G_\sim} we see in particular that there are only {O( r x^{0.03} )} remaining choices for {h}. Thus it suffices by the triangle inequality to show that

\displaystyle  \sum_{d_2: (d_1 d_2)=1} e(ld_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h\overline{d_2}/d_1)

\displaystyle  \sum_m e(-hm/d_1) \psi_x(d_2 (2m+1)-2) e( 2kd_2 m / Wr )

\displaystyle \ll q^{O(1)} x^{0.95}

for each {h \in {\bf Z}/d_1{\bf Z}} of the form (26).

We rearrange the left-hand side as

\displaystyle  \sum_m e(-hm/d_1) \sum_{d_2: (d_1 d_2)=1} e(kd_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h\overline{d_2}/d_1)

\displaystyle  e( 2kd_2 m / Wr ) \psi_x(d_2 (2m+1)-2).

Suppose first that {h/d_1} is of the form {c/Wr} for some integer {c}. Then the phase {d_2 \mapsto e(kd_2/q) e( h\overline{d_2}/d_1) e( 2kd_2 m / Wr )} is periodic with period {Wqr} and has mean zero here (since {Wr<q}). From this, we can estimate the inner sum by {O( Wqr )}; since {m} is restricted to be of size {O( x / d_2 ) = O( x^{0.02} d_1 ) = O( x^{0.53} )}, this contribution is certainly acceptable. Thus we may assume that {h/d_1} is not of the form {c/Wr}. A similar argument works when {d_1 \leq x^{0.4}} (say), so we may assume that {d_1 \geq x^{0.4}}, so that {d_2 \ll x^{0.6}}.

By (26), this forces the denominator of {h/d_1} in lowest form to be {\gg \frac{x}{x^{0.01} d_2 Wr} \gg q^{-O(1)} x^{0.39}}. By Lemma 2, we thus have

\displaystyle  \sum_{d_2 \in ({\bf Z}/d_1{\bf Z})^\times} e( a d_2/d_1) e( h\overline{d_2}/d_1) \ll x^{o(1)} d_2 ( q^{-O(1)} x^{0.39} )^{-1/4}

\displaystyle  \ll q^{O(1)} x^{-0.09} d_2

for any {a}, so from Poisson summation we have

\displaystyle  \sum_{d_2: (d_1 d_2)=1} e(kd_2/q) G_\sim( \frac{\log d_1 + \log d_2}{\log x} )) e( h\overline{d_2}/d_1) e( 2kd_2 m / Wr )

\displaystyle  \psi_x(d_2 (2m+1)-2) \ll q^{O(1)} x^{-0.07} \frac{x}{d_1};

since {m} is constrained to be {O( x^{0.02} d_1 )}, the claim follows.

Finally, we prove (22), which is a routine sieve-theoretic calculation. We rewrite the left-hand side as

\displaystyle  \sum_{d,e} \chi(d) G_<(\frac{\log d}{\log x}) \mu(e) f(e) \sum_n 1_{n: (n(n+2),W)=1; [d,e] | n(n+2)} \psi_x(n).

The summand vanishes unless {d,e} are coprime to {W} with {d \ll x^{0.52}} and {e \leq \sqrt{q}}. From Poisson summation one then has

\displaystyle  \sum_n 1_{m: (n(n+2),W)=1; [d,e] | n(n+2)} \psi_x(n) = \frac{1}{2} (\prod_{2 < p \leq w} (1-\frac{2}{p})) (\int_{\bf R} \psi) \frac{x \log^{-C} x 2^{\omega([d,e])}}{[d,e]}

\displaystyle + O( x^{-100} ).

The error term is certainly negligible, so it suffices to show that

\displaystyle  (\prod_{2 < p \leq w} (1-\frac{2}{p})) \sum_{d,e: (de,W)=1} \chi(d) G_<(\frac{\log d}{\log x}) \mu(e) \psi(\frac{\log e}{\log q}) \frac{2^{\omega([d,e])}}{[d,e]}

\displaystyle  \gg (1-o(1)) \log^{-2} x.

We can control the left-hand side by Fourier analysis. Writing

\displaystyle  e^u G_<(u) = \int_{\bf R} g(t) e^{-itu}\ dt

and

\displaystyle  e^u \psi(u) = \int_{\bf R} \Psi(t) e^{-itu}\ dt

for some rapidly decreasing functions {g,\Psi}, the left-hand side may be expressed as

\displaystyle  (\prod_{2 < p \leq w} (1-\frac{2}{p})) \int_{\bf R} \int_{\bf R} \sum_{d,e: (de,W)=1} \frac{\chi(d) \mu(e)2^{\omega([d,e])}}{[d,e] d^{\frac{1+it_1}{\log x}} e^{\frac{1+it_2}{\log q}}}\ g(t_1) \Psi(t_2)\ dt_1 dt_2

which factors as

\displaystyle  \int_{\bf R} \int_{\bf R} \prod_p E_p( \frac{1+it_1}{\log x}, \frac{1+it_2}{\log q} ) \ g(t_1) \Psi(t_2)\ dt_1 dt_2 \ \ \ \ \ (27)

where

\displaystyle  E_2(s_1,s_2) := 1

\displaystyle  E_p(s_1,s_2) := 1 - \frac{2}{p}

for {p \leq w}, and

\displaystyle  E_p(s_1,s_2) := 1 - 2\frac{1}{p^{1+s_2}} + 2\sum_{j=1}^\infty \frac{\chi(p)^j}{p^{j(1+s_1)}} (1 - \frac{1}{p^{s_2}})

for {p>w}. From Mertens’ theorem we have the crude bound

\displaystyle  \prod_p E_p( \frac{1+it_1}{\log x}, \frac{1+it_2}{\log q} ) \ll \log^{O(1)} x

which by the rapid decrease of {g,\Psi} allows one to restrict to the range {|t_1|, |t_2| \leq \sqrt{\log q}} with an error of {o(\log^{-1} x)}. In particular, we now have {s_1,s_2 = o(1)}.

Recalling that

\displaystyle  \zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1}

for {\hbox{Re}(s)>1}, we can factor

\displaystyle  \prod_p E_p( s_1, s_2 ) = \frac{\zeta^2(1 + s_1+s_2)}{\zeta^2(1+s_1) \zeta^2(1+s_2)} \prod_p E'_p(s_1,s_2) E''_p(s_1,s_2)

where

\displaystyle  E''_p(s_1,s_2) := 1 + 1_{p>3} \times 2 (\frac{1+\chi(p)}{p^{1+s_1}} - \frac{1+\chi(p)}{p^{1+s_1+s_2}})

(the restriction {p>3} being to prevent {E''_p(s_1,s_2)} vanishing for {p=2,3} and {s_1,s_2} small) and one has

\displaystyle  E'_p(s_1,s_2) = 1 + O( \frac{1}{p^{2+o(1)}} )

for {s_1,s_2 = o(1)}, and

\displaystyle  E_2(0,0) = 2

and

\displaystyle  E_p(0,0) = 1

for odd {p}. In particular, from the Cauchy integral formula we see that

\displaystyle  \prod_p E'_p(s_1,s_2) = 2 + o(1)

for {s_1,s_2 = o(1)}. Since we also have {\zeta(1+s) = \frac{1+o(1)}{s}} in this region, we thus can write (27) as

\displaystyle  \int_{|t_1|, |t_2| \leq \sqrt{\log q}} (1+o(1)) \prod_p E''_p(s_1,s_2) \frac{(1+it_2)^2 (1+it_1)^2}{(1+it_2 + \frac{1+it_1}{C})^2 \log^2 x}

\displaystyle  g(t_1) \Psi(t_2) dt_1 dt_2 + o(\log^{-2} x)

and our task is now to show that

\displaystyle  \int_{|t_1|, |t_2| \leq \sqrt{\log q}} (1+o(1)) \prod_p E''_p(s_1,s_2) \frac{(1+it_2)^2 (1+it_1)^2}{(1+it_2 + \frac{1+it_1}{C})^2}

\displaystyle  g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1 - o(1).

We have

\displaystyle  \log E''_p(s_1,s_2) = O(1/p)

when {s_1,s_2 = O(\frac{1}{\log p})} (even when {s_1,s_2} have negative real part); since {\log E''_p(0,0)=0}, we conclude from the Cauchy integral formula that

\displaystyle  \log E''_p(s_1,s_2) \ll (|s_1|+|s_2|) \frac{\log p}{p}

when {\log p \ll \frac{1}{|s_1|+|s_2|}}. For the remaining primes {p}, we have

\displaystyle  \log E''_p(s_1,s_2) \ll \frac{1+\chi(p)}{p^{1+1/\log x}}

when {s_1 = \frac{1+it_1}{\log x}} and {s_2 := \frac{1+it_2}{\log q}}. Summing in {p} using Lemma 5 to handle those {p} between {q} and {x}, and Mertens’ theorem and the trivial bound {1+\chi(p)=O(1)} for all other {p}, we conclude that

\displaystyle  \sum_p \log E''_p(s_1,s_2) \ll \log( 2+|s_1|+|s_2| )

and thus

\displaystyle  \prod_p E''_p(s_1,s_2) \ll (2 + |s_1| + |s_2| )^{O(1)}.

From this and the rapid decrease of {g,\Psi}, we may restrict the range of {t_1,t_2} even further to {|t_1|, |t_2| \leq \omega(q)} for any {\omega(q)} that goes to infinity arbitrarily slowly with {q}. For sufficiently slow {\omega}, the above estimates on {\log E''_p(s_1,s_2)} and Lemma 5 (now used to handle those {p} between {q^\varepsilon} and {x^{1/\varepsilon}} for some {\varepsilon} going sufficiently slowly to zero) give

\displaystyle  \sum_p \log E''_p(s_1,s_2) = o(1)

and so we are reduced to establishing that

\displaystyle  \int_{|t_1|, |t_2| \leq \omega(q)} (1+o(1)) \frac{(1+it_2)^2 (1+it_1)^2}{(1+it_2 + \frac{1+it_1}{C})^2}\ g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1 - o(1).

We may once again use the rapid decrease of {g,\Psi} to remove the {o(1)} prefactor as well as the restrictions {|t_1|, |t_2| \leq \omega(q)}, and reduce to showing that

\displaystyle  \int_{\bf R} \int_{\bf R} \frac{(1+it_2)^2 (1+it_1)^2}{(1+it_2 + \frac{1+it_1}{C})^2}\ g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1 - o(1).

For {C} large enough, it will suffice to show that

\displaystyle  \int_{\bf R} \int_{\bf R} (1+it_1)^2\ g(t_1) \Psi(t_2) dt_1 dt_2 \gg 1

with the implied constant independent of {C}. But the left-hand side evaluates to {G''_<(0) \psi(0) = 2 \psi(0)}, and the claim follows.