We continue the discussion of sieve theory from Notes 4, but now specialise to the case of the linear sieve in which the sieve dimension {\kappa} is equal to {1}, which is one of the best understood sieving situations, and one of the rare cases in which the precise limits of the sieve method are known. A bit more specifically, let {z, D \geq 1} be quantities with {z = D^{1/s}} for some fixed {s>1}, and let {g} be a multiplicative function with

\displaystyle  g(p) = \frac{1}{p} + O(\frac{1}{p^2}) \ \ \ \ \ (1)

and

\displaystyle  0 \leq g(p) \leq 1-c \ \ \ \ \ (2)

for all primes {p} and some fixed {c>0} (we allow all constants below to depend on {c}). Let {P(z) := \prod_{p<z} p}, and for each prime {p < z}, let {E_p} be a set of integers, with {E_d := \bigcap_{p|d} E_p} for {d|P(z)}. We consider finitely supported sequences {(a_n)_{n \in {\bf Z}}} of non-negative reals for which we have bounds of the form

\displaystyle  \sum_{n \in E_d} a_n = g(d) X + r_d. \ \ \ \ \ (3)

for all square-free {d \leq D} and some {X>0}, and some remainder terms {r_d}. One is then interested in upper and lower bounds on the quantity

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n.

The fundamental lemma of sieve theory (Corollary 19 of Notes 4) gives us the bound

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n = (1 + O(e^{-s})) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (4)

where {V(z)} is the quantity

\displaystyle  V(z) := \prod_{p<z} (1-g(p)). \ \ \ \ \ (5)

This bound is strong when {s} is large, but is not as useful for smaller values of {s}. We now give a sharp bound in this regime. We introduce the functions {F, f: (0,+\infty) \rightarrow {\bf R}^+} by

\displaystyle  F(s) := 2e^\gamma ( \frac{1_{s>1}}{s} \ \ \ \ \ (6)

\displaystyle  + \sum_{j \geq 3, \hbox{ odd}} \frac{1}{j!} \int_{[1,+\infty)^{j-1}} 1_{t_1+\dots+t_{j-1}\leq s-1} \frac{dt_1 \dots dt_{j-1}}{t_1 \dots t_j} )

and

\displaystyle  f(s) := 2e^\gamma \sum_{j \geq 2, \hbox{ even}} \frac{1}{j!} \int_{[1,+\infty)^{j-1}} 1_{t_1+\dots+t_{j-1}\leq s-1} \frac{dt_1 \dots dt_{j-1}}{t_1 \dots t_j} \ \ \ \ \ (7)

where we adopt the convention {t_j := s - t_1 - \dots - t_{j-1}}. Note that for each {s} one has only finitely many non-zero summands in (6), (7). These functions are closely related to the Buchstab function {\omega} from Exercise 28 of Supplement 4; indeed from comparing the definitions one has

\displaystyle  F(s) + f(s) = 2 e^\gamma \omega(s)

for all {s>0}.

Exercise 1 (Alternate definition of {F, f}) Show that {F(s)} is continuously differentiable except at {s=1}, and {f(s)} is continuously differentiable except at {s=2} where it is continuous, obeying the delay-differential equations

\displaystyle  \frac{d}{ds}( s F(s) ) = f(s-1) \ \ \ \ \ (8)

for {s > 1} and

\displaystyle  \frac{d}{ds}( s f(s) ) = F(s-1) \ \ \ \ \ (9)

for {s>2}, with the initial conditions

\displaystyle  F(s) = \frac{2e^\gamma}{s} 1_{s>1}

for {s \leq 3} and

\displaystyle  f(s) = 0

for {s \leq 2}. Show that these properties of {F, f} determine {F, f} completely.

For future reference, we record the following explicit values of {F, f}:

\displaystyle  F(s) = \frac{2e^\gamma}{s} \ \ \ \ \ (10)

for {1 < s \leq 3}, and

\displaystyle  f(s) = \frac{2e^\gamma}{s} \log(s-1) \ \ \ \ \ (11)

for {2 \leq s \leq 4}.

We will show

Theorem 2 (Linear sieve) Let the notation and hypotheses be as above, with {s > 1}. Then, for any {\varepsilon > 0}, one has the upper bound

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n \leq (F(s) + O(\varepsilon)) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (12)

and the lower bound

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n \geq (f(s) - O(\varepsilon)) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (13)

if {D} is sufficiently large depending on {\varepsilon, s, c}. Furthermore, this claim is sharp in the sense that the quantity {F(s)} cannot be replaced by any smaller quantity, and similarly {f(s)} cannot be replaced by any larger quantity.

Comparing the linear sieve with the fundamental lemma (and also testing using the sequence {a_n = 1_{1 \leq n \leq N}} for some extremely large {N}), we conclude that we necessarily have the asymptotics

\displaystyle  1 - O(e^{-s}) \leq f(s) \leq 1 \leq F(s) \leq 1 + O( e^{-s} )

for all {s \geq 1}; this can also be proven directly from the definitions of {F, f}, or from Exercise 1, but is somewhat challenging to do so; see e.g. Chapter 11 of Friedlander-Iwaniec for details.

Exercise 3 Establish the integral identities

\displaystyle  F(s) = 1 + \frac{1}{s} \int_s^\infty (1 - f(t-1))\ dt

and

\displaystyle  f(s) = 1 + \frac{1}{s} \int_s^\infty (1 - F(t-1))\ dt

for {s \geq 2}. Argue heuristically that these identities are consistent with the bounds in Theorem 2 and the Buchstab identity (Equation (16) from Notes 4).

Exercise 4 Use the Selberg sieve (Theorem 30 from Notes 4) to obtain a slightly weaker version of (12) in the range {1 < s < 3} in which the error term {|r_d|} is worsened to {\tau_3(d) |r_d|}, but the main term is unchanged.

We will prove Theorem 2 below the fold. The optimality of {F, f} is closely related to the parity problem obstruction discussed in Section 5 of Notes 4; a naive application of the parity arguments there only give the weak bounds {F(s) \geq \frac{2 e^\gamma}{s}} and {f(s)=0} for {s \leq 2}, but this can be sharpened by a more careful counting of various sums involving the Liouville function {\lambda}.

As an application of the linear sieve (specialised to the ranges in (10), (11)), we will establish a famous theorem of Chen, giving (in some sense) the closest approach to the twin prime conjecture that one can hope to achieve by sieve-theoretic methods:

Theorem 5 (Chen’s theorem) There are infinitely many primes {p} such that {p+2} is the product of at most two primes.

The same argument gives the version of Chen’s theorem for the even Goldbach conjecture, namely that for all sufficiently large even {N}, there exists a prime {p} between {2} and {N} such that {N-p} is the product of at most two primes.

The discussion in these notes loosely follows that of Friedlander-Iwaniec (who study sieving problems in more general dimension than {\kappa=1}).

— 1. Optimality —

We first establish that the quantities {F(s), f(s)} appearing in Theorem 2 cannot be improved. We use the parity argument of Selberg, based on weight sequences {a_n} related to the Liouville function.

We argue for the optimality of {F(s)}; the argument for {f(s)} is similar and is left as an exercise. Suppose that there is {s \geq 1} for which the claim in Theorem 2 is not optimal, thus there exists {\delta>0} such that

\displaystyle  \sum_{n\not \in\bigcup_{p <z} E_p} a_n \leq (F(s) - \delta) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (14)

for {z, D, g, E_p, a_n, X, V(z), r_d} as in that theorem, with {z} sufficiently large.

We will contradict this claim by specialising to a special case. Let {z} be a large parameter going to infinity, and set {D := z^s}. We set {g(d) := 1/d}, then by Mertens’ theorem we have {V(z) = \frac{e^{-\gamma}+o(1)}{\log z}}. We set {E_p} to be the residue class {0\ (p)}, thus (3) becomes

\displaystyle  \sum_{n: d|n} a_n = g(d) X + r_d \ \ \ \ \ (15)

and (14) becomes

\displaystyle  \sum_{n: (n,P(z)) = 1} a_n \leq \frac{F(s) - \delta + o(1)}{e^\gamma} \frac{X}{\log z} + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (16)

where {P(z) := \prod_{p<z} p}.

Now let {\varepsilon > 0} be a small fixed quantity to be chosen later, set {X := D^{1+\varepsilon}}, and let {a_n} be the sequence

\displaystyle  a_n := (1 - \lambda(n)) 1_{1 \leq n \leq X}.

This is clearly finitely supported and non-negative. For any {d}, we have

\displaystyle  \sum_{n: d|n} a_n = \sum_{n \leq X/d} 1 - \lambda(d) \sum_{n \leq X/d} \lambda(n)

from the multiplicativity of {\lambda}. If {d \leq D}, then {X/d \geq D^\varepsilon}, and then by the prime number theorem for the Liouville function (Exercise 41 from Notes 2, combined with Exercise 18 from Supplement 4) we have

\displaystyle  \sum_{n \leq x/d} \lambda(n) \ll_\varepsilon \frac{X}{d} \log^{-10} D

(say), annd hence the remainder term {r_d} in (15) is of size

\displaystyle  |r_d| \ll_\varepsilon \frac{X}{d} \log^{-10} D. \ \ \ \ \ (17)

As such, the error term {O( \sum_{d \leq D: \mu^2(d)=1} |r_d| )} in (16) may be absorbed into the {o(1)} term, and so

\displaystyle  \sum_{n \leq X: (n,P(z))=1} (1-\lambda(n)) \leq \frac{F(s) - \delta + o(1)}{e^\gamma} \frac{X}{\log z}. \ \ \ \ \ (18)

Now we count the left-hand side. Observe that {1-\lambda(n)} is supported on those numbers {n = p_1 \dots p_r} that are the product of an odd number of primes {p_1 \geq \dots \geq p_r} (possibly with repetition), in which case {1-\lambda(n)=2}. To be coprime to {P(z)}, all these primes must be at least {z}; since we are restricting {n \leq X = z^{(1+\varepsilon)s}}, we thus must have {r \leq (1+\varepsilon) s}. The left-hand side of (18) may thus be written as

\displaystyle  2 \sum_{r \leq (1+\varepsilon) s, \hbox{ odd}} \sum_{p_1 \geq \dots \geq p_r \geq z: p_1 \dots p_r \leq X} 1. \ \ \ \ \ (19)

This expression may be computed using the prime number theorem:

Exercise 6 Show that the expression (19) is equal to {\frac{F( s(1+\varepsilon))+o(1)}{e^\gamma} \frac{X}{\log z}}.

Since {F} is continuous for {s>1}, we obtain a contradiction if {\varepsilon} is sufficiently small.

Exercise 7 Verify the optimality of {f(s)} in Theorem 2. (Hint: replace {1-\lambda(n)} by {1+\lambda(n)} in the above arguments.)

— 2. The linear sieve —

We now prove the forward direction of Theorem 2. Again, we focus on the upper bound (12), as the lower bound case is similar.

Fix {s>1}. Morally speaking, the most natural sieve to use here is the (upper bound) beta sieve from Notes 4, with the optimal value of {\beta}, which for the linear sieve turns out to be {\beta=2}. Recall that this sieve is defined as the sum

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}

where {{\mathcal D}_+} is the set of divisors {d = p_1 \dots p_m} of {P(z)} with {z > p_1 > \dots > p_m}, such that

\displaystyle  p_1 \dots p_{r-1} p_r^3 \leq D

for all odd {1 \leq r \leq m}. From Proposition 14 of Notes 4 this is indeed an upper bound sieve; indeed we have

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}(n) = 1_{n \not \in \bigcup_{p < z} E_p} \ \ \ \ \ (20)

\displaystyle  + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} 1_{n \in E_d} 1_{n \not \in \bigcup_{p < p_*(d)} E_p}

where {{\mathcal E}_r} is the set of divisors {d = p_1 \dots p_r} of {P(z)} with {z > p_1 > \dots > p_r = p_*(d)}, such that

\displaystyle  p_1 \dots p_{r-1} p_r^3 > D \ \ \ \ \ (21)

and

\displaystyle  p_1 \dots p_{r'-1} p_{r'}^3 \leq D \ \ \ \ \ (22)

for all odd {1 \leq r' < r}. Now for the key heuristic point: if {n \approx D} lies in the support of {1-\lambda}, then the sum in (20) mostly vanishes. Indeed, if {n \leq D} is such that {n \in E_d} and {n \not \in \bigcup_{p < p_*(d)} E_p} for some {d = p_1 \dots p_r \in {\mathcal E}_r} and odd {r}, then one has {n = p_1 \dots p_r q} for some {q} that is not divisible by any prime less than {p_r}. On the other hand, from (21), (22) one has

\displaystyle  q \approx \frac{D}{p_1 \dots p_r} < p_r^2

and

\displaystyle  q \approx \frac{D}{p_1 \dots p_r} \geq 1

which (morally) implies from the sieve of Erathosthenes that {q} is prime, thus {\lambda(n) = (-1)^{r+1} = +1} and so {n} is not in the support of {1-\lambda}. As such, we expect the upper bound sieve { \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}} to be extremely efficient on the support of {1-\lambda}, which when combined with the analysis of the previous section suggests that this sieve should produce the desired upper bound (12).

One can indeed use this sieve to establish the required bound (12); see Chapter 11 of Friedlander-Iwaniec for details. However, for various technical reasons it will be convenient to modify this sieve slightly, by increasing the {\beta} parameter to be slightly greater than {2}, and also by using the fundamental lemma to perform a preliminary sifting on the small primes.

We turn to the details. To prove (12) we may of course assume that {\varepsilon>0} is suitably small. It will be convenient to worsen the error {O(\varepsilon)} in (12) a little to {O( \varepsilon \log \frac{1}{\varepsilon} )}, since one can of course remove the logarithm by reducing {\varepsilon} appropriately.

Set {D_0 := z^{\varepsilon^2}} and {z_0 := z^{\varepsilon^3}}. By the fundamental lemma of sieve theory (Lemma 17 of Notes 4), one can find combinatorial upper and lower bound sieve coefficients {(\lambda^{\pm,0}_d)_{d \leq D_0}} at sifting level {z_0} supported on divisors of {P(z_0)}, such that

\displaystyle  \sum_{d \leq D_0: d|P(z_0)} \lambda^{\pm,0}_d g(d) = V( z_0 ) ( 1 + O( e^{-1/\varepsilon} ) ). \ \ \ \ \ (23)

Thus we have {\lambda^{\pm,0}_d \in \{-1,0,1\}} and

\displaystyle  \sum_{d \leq D_0: d|P(z_0)} \lambda^{-,0}_d 1_{E_d}(n) \leq 1_{n \not \in \bigcup_{p<z_0} E_p} \leq \sum_{d \leq D_0: d|P(z_0)} \lambda^{+,0}_d 1_{E_d}(n) \ \ \ \ \ (24)

for all {n}.

We will use the upper bound sieve {\sum_{d \leq D_0:d|P(z_0)} \lambda^{+,0}_d 1_{E_d}(n)} as a preliminary sieve to remove the {E_p} for {p<z_0}; the lower bound sieve {\lambda^{-,0}_d} plays only a minor supporting role, mainly to control the difference between the upper bound sieve {1_{n \not \in \bigcup_{p<z_0} E_p}}.

Next, we set {P(z_0,z) := P(z)/P(z_0) = \prod_{z_0 \leq p < z} p}, and let {\lambda^+_d} be the upper bound beta sieve with parameter {\beta = 2+\varepsilon} on the primes dividing {P(z_0,z)} up to level of distribution {D / D_0^2}. In other words, {{\mathcal D}_+} consists of those divisors {d = p_1 \dots p_r} of {P(z_0,z)} with {p_1 > \dots > p_r} such that

\displaystyle  p_1 \dots p_{m-1} p_m^{3+\varepsilon} \leq D / D_0^2

for all odd {1 \leq m \leq r}; in particular, {d \leq D/D_0^2} for all {d \in {\mathcal D}_+}. By Proposition 14 of Notes 4, this is indeed an upper bound sieve for the primes dividing {P(z_0,z)}:

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d} \geq 1_{n \not \in \bigcup_{z_0 \leq p < z} E_p}.

Multiplying this with the second inequality in (24) (this is the method of composition of sieves), we obtain an upper bound sieve for the primes up to {z}:

\displaystyle  \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) 1_{E_{d_0d}}(n) \geq 1_{n \not \in \bigcup_{p < z} E_p}.

Multiplying this by {a_n} and summing in {n}, we conclude that

\displaystyle  \sum_{n \not \in \bigcup_{p<z} E_p} a_n \leq \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) \sum_{n \in E_{d_0 d}} a_n.

Note that each product {d_0 d} appears at most once in the above sum, and all such products are squarefree and at most {D}. Applying (3), we thus have

\displaystyle  \sum_{n \not \in \bigcup_{p<z} E_p} a_n \leq \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) g(d_0) g(d) X

\displaystyle  + \sum_{d \leq D: \mu^2(d)=1} |r_d|.

Thus to prove (12), it suffices to show that

\displaystyle  \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) g(d_0) g(d) = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) V(z)

for {z} sufficiently large depending on {\varepsilon}. Factoring out the {d_0} summation using (23), it thus suffices to show that

\displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) g(d) = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) V(z) / V(z_0).

Now we eliminate the role of {g}. From (1), (5) and Mertens’ theorem we have

\displaystyle  V(z) / V(z_0) = (1 + O(\varepsilon)) \frac{\log z_0}{\log z}

for {z} large enough. Also, if {d \in {\mathcal D}_+}, then {d \leq D/D_0^2} and all prime factors of {d} are at least {z_0 = D^{\varepsilon^3/s}}. Thus {d} has at most {O_{s,\varepsilon}(1)} prime factors, each of which are at least {z_0}. From (1) we then have

\displaystyle  g(d) = \frac{1}{d} + O_{s,\varepsilon}( \frac{1}{z_0} \frac{1}{d} ).

The contribution of the error term {O_{s,\varepsilon}( \frac{1}{z_0} \frac{1}{d} )} is then easily seen to be {O( \varepsilon \frac{\log z_0}{\log z} )} for {z} large enough, and so we reduce to showing that

\displaystyle  \sum_{d \in {\mathcal D}_+} \frac{\mu(d)}{d} = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) \frac{\log z_0}{\log z}. \ \ \ \ \ (25)

One can proceed here by evaluating the left-hand side directly. However, we will proceed instead by using the weight function {1-\lambda} from the previous section. More precisely, we will evaluate the expression

\displaystyle  \sum_{n \leq D} (1-\lambda(n)) \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) 1_{d_0d|n} \ \ \ \ \ (26)

in two different ways, where {\lambda^{+,0}_{d_0}} is as before (but with the role of {g(d)} now replaced by the function {1/d}). Firstly, since {d_0 d \leq D/D_0}, we see from the argument used to establish (17) that

\displaystyle  \sum_{n \leq D: d_0 d|n} 1-\lambda(n) = \frac{D}{d_0 d} + O( \frac{D}{d_0 d} \log^{-10} D )

(say). Since each {d_0 d} appears at least once, we can thus write (26) as

\displaystyle  \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) \frac{D}{d_0 d} + O( D \log^{-9} D )

which upon factoring the {d_0} sum using (23) and Mertens’ theorem

\displaystyle  (1 + O(\varepsilon)) \frac{D}{e^\gamma \log z_0} \sum_{d \in {\mathcal D}_+} \frac{\mu(d)}{d} + O( D \log^{-9} D ).

Thus to verify (25), it will suffice to show that (26) is of the form

\displaystyle  (F(s) + O(\varepsilon \log \frac{1}{\varepsilon})) \frac{D}{e^\gamma \log z}

for {z} sufficiently large.

To do this, we abbreviate (26) as

\displaystyle  \sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) \sum_{d \in {\mathcal D}_+} \mu(d) 1_{d|n} \ \ \ \ \ (27)

where {\nu^{\pm,0}} are the sieves

\displaystyle  \nu^{\pm,0}(n) := \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{\pm,0}_{d_0} 1_{d_0|n}.

By Proposition 14 of Notes 4, we can expand {\sum_{d \in {\mathcal D}_+} \mu(d) 1_{d|n}} as

\displaystyle  1_{(n,P(z_0,z))=1} + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} 1_{d|n} 1_{(n,P(z_0,p_*(d)))=1} \ \ \ \ \ (28)

where for any {r \geq 1}, {{\mathcal E}_r} is the collection of all divisors {d = p_1 \dots p_r} of {P(z_0,z)} with {p_1 > \dots > p_r = p_*(d)} such that

\displaystyle  p_1 \dots p_{r-1} p_r^{3+\varepsilon} > D/D_0^2 \ \ \ \ \ (29)

but

\displaystyle  p_1 \dots p_{r'-1} p_{r'}^{3+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (30)

for all {1 \leq r' < r} with the same parity as {r}. For technical reasons, we will also impose the additional inequality

\displaystyle  p_1 \dots p_{r'-1} p_{r'}^{2+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (31)

for all {1 \leq r' < r} with the opposite parity as {r}; this follows from (30) when {r' > 1}, but is an additional constraint when {r'=1} and {r} is even, but in the above identity {r} is odd, so this additional constraint is harmless. For similar reasons we impose the inequality

\displaystyle  p_1 \dots p_r^{1+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (32)

which follows from (30) or (31) except when {r=1}, but then this inequality is automatic from our hypothesis {s>1}, which implies {z^{1+\varepsilon} \leq D/D_0^2} if {\varepsilon} is chosen small enough.

Inserting the identity (28), we can write (26) as

\displaystyle  A + \sum_{r \hbox{ odd}} B_r

where

\displaystyle  A :=\sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) 1_{(n,P(z_0,z))=1}

and

\displaystyle  B_r := \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) 1_{d|n} 1_{(n,P(z_0,p_*(d))=1}.

We first estimate {A}. By(24), we can write {A} as the sum of

\displaystyle  \sum_{n \leq D} (1-\lambda(n)) 1_{(n,P(z))=1} \ \ \ \ \ (33)

plus an error of size at most

\displaystyle  O( |\sum_{n \leq D} (\sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{+,0}_{d_0} 1_{d_0|n} - \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{-,0}_{d_0} 1_{d_0|n} )| )

(where we bound {(1-\lambda(n)) 1_{(n,P(z_0,z))=1}} by {O(1)}). The error may be rearranged as

\displaystyle  O( |\sum_{d_0 \leq D_0: d_0|P(z_0)} (\lambda^{+,0}_{d_0} - \lambda^{-,0}_{d_0}) (\frac{D}{d_0}+O(1))| ),

which by (23) is of size {O( e^{-1/\varepsilon} \frac{D}{\log z_0} ) = O( \varepsilon \frac{D}{\log z} )} for {\varepsilon} small enough. As for the main term (33), we see from Exercise 6 (and the arguments preceding that exercise) that this term is equal to {\frac{F(s)+O(\varepsilon)}{e^\gamma} \frac{D}{\log z}} for {z} sufficiently large. Thus, to obtain the desired approximation for (26), it will suffice to show that

\displaystyle  \sum_{r \hbox{ odd}} B_r \ll \varepsilon \log \frac{1}{\varepsilon} \frac{D}{\log z}.

Next, we establish an exponential decay estimate on the {B_r}:

Lemma 8 For {z} sufficiently large depending on {\varepsilon}, we have

\displaystyle  B_r \ll \varepsilon^{-O(1)} \alpha^r e^{-s} \frac{D}{\log z}

for all {r \geq 1} and some absolute constant {0 < \alpha < 1}.

Proof: (Sketch) Note that if {d = p_1 \dots p_r} is in {{\mathcal E}_r}, then {d \leq D/D_0^2} and all prime factors are at least {z_0 = D^{\varepsilon^3}}, thus we may assume without loss of generality that {r \leq 1/\varepsilon^3}.

We bound

\displaystyle  B_r \ll \sum_{d \in {\mathcal E}_r} \sum_{n \leq D/d} \nu^{+,0}(n) 1_{(n,P(z_0,p_*(d)))=1}.

Note that if {d = p_1 \dots p_r} lies in {{\mathcal E}_r}, then

\displaystyle  D_0^2 p_r^\varepsilon \leq \frac{D}{d} < D_0^2 p_r^{2+\varepsilon}

thanks to (29), (32). From this and the fundamental lemma of sieve theory we see (Exercise!) that

\displaystyle  \sum_{n \leq D/d} \nu^{+,0}(n) 1_{(n,P(z_0,p_r))=1} \ll \varepsilon^{-O(1)} \frac{D}{d \log p_*(d)}

and so it will suffice to show that

\displaystyle  \sum_{d \in {\mathcal E}_r} \frac{1}{d} \frac{\log z}{\log p_*(d)} \ll \alpha^r e^{-s}. \ \ \ \ \ (34)

By the prime number theorem, the left-hand side is bounded (Exercise!) by {f_r(s-2\varepsilon^2) + o(1)} as {z \rightarrow \infty}, where

\displaystyle  f_r(s) := \int_{J_r(s)} \frac{dt_1 \dots dt_r}{t_1 \dots t_{r-1} t_r^2}

and {J_r(s) \subset (0,+\infty)^r} is the set of points {(t_1,\dots,t_r)} with {1 \geq t_1 \geq \dots \geq t_r > 0},

\displaystyle  t_1 + \dots + t_{r-1} + (3+\varepsilon) t_r \geq s,

and such that

\displaystyle  t_1 + \dots + t_{r'-1} + (3+\varepsilon)t_{r'} \leq s

for all {1 \leq r' < r} with the same parity as {r}, and

\displaystyle  t_1 + \dots + t_{r'-1} + (2+\varepsilon)t_{r'} \leq s

for all {1 \leq r' \leq r}. It thus suffices to prove the bound

\displaystyle  f_r(s) \leq C \alpha^r e^{-s} \ \ \ \ \ (35)

for all {s>1} and some absolute constant {C>0}.

We use an argument from the book of Harman. Observe that {f_r(s)} vanishes for {s > r+2}, which makes the claim (35) routine for {r=1} (Exercise!) for {C} sufficiently large. We will now inductively prove (35) for all odd {r}. From the change of variables {u = \frac{s}{t_1}}, we obtain the identity

\displaystyle  f_r(s) = \frac{1}{s} \int_{\max(s, b_r)}^\infty f_{r-1}(t-1)\ dt \ \ \ \ \ (36)

where {b_r := 3+\varepsilon} when {r} is odd and {b_r := 2+\varepsilon} when {r} is even (Exercise!). In particular, if {r \geq 3} is odd and (35) was already proven for {r-2}, then

\displaystyle  f_r(s) = \frac{1}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{1}{t-1} \int_{t-1}^\infty f_{r-2}(u-1)\ du dt

\displaystyle  \leq \frac{1}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{1}{t-1} \int_{t-1}^\infty C \alpha^{r-2} e^{1-u}\ du dt

\displaystyle  \leq C \alpha^{r} e^{-s} \times \alpha^{-2} \times \frac{e^{s+2}}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{e^{-t}}{t-1}\ dt.

One can check (Exercise!) that the quantity {\frac{e^{s+2}}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{e^{-t}}{t-1}\ dt} is maximised at {s=3+\varepsilon}, where its value is less than {1} (in fact it is {0.94\dots}) if {\varepsilon} is small enough. As such, we obtain (35) if {\alpha} is sufficiently close to {1}.

Finally, (35) for even {r} follows from the odd {r} case (with a slightly larger choice of {C}) by one final application of (36). \Box

Exercise 9 Fill in the steps marked (Exercise!) in the above proof.

In view of this lemma, the total contribution of {B_r} with {r > C \log \frac{1}{\varepsilon}} for some sufficiently large {C} is acceptable. Thus it suffices to show that

\displaystyle  B_r \ll \varepsilon \frac{D}{\log z}

whenever {r = O( \log\frac{1}{\varepsilon} )} is odd.

By (24), we can write {B_r} as

\displaystyle  \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) 1_{d|n} 1_{(n,P(p_*(d)))=1}

plus an error of size

\displaystyle  O( \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (\sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{+,0}_{d_0} 1_{dd_0|n} - \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{-,0}_{d_0} 1_{dd_0|n} ) ).

Arguing as in the treatment of the {A} term, we see from (23) that the error term is bounded by

\displaystyle  \ll O( \sum_{d \in {\mathcal E}_r} \sum_{d_0 \leq D_0: d_0|P(z_0)} (\lambda^{+,0}_{d_0} - \lambda^{-,0}_{d_0}) \frac{D}{dd_0} ) + \sum_{d \in {\mathcal E}_r} \sum_{d_0 \leq D_0: d_0|P(z_0)} 1

\displaystyle  \ll \sum_{d \in {\mathcal E}_r} e^{-1/\varepsilon} \frac{D}{d \log z_0} + \frac{D}{D_0^2} D_0

\displaystyle  \ll e^{-1/\varepsilon} \frac{D}{\log z_0} (\sum_{z_0 \leq p \leq z} \frac{1}{p})^r + \frac{D}{D_0}

\displaystyle  \ll e^{-1/\varepsilon} \frac{D}{\log z_0} O( \log \frac{\log z}{\log z_0} )^r + \frac{D}{D_0}

\displaystyle  \ll \varepsilon \frac{D}{\log z}

as desired for {z} large enough, since {r = O(\log \frac{1}{\varepsilon})} and {\frac{\log z}{\log z_0} = O( \varepsilon^{-3} )}. Thus it suffices to show that

\displaystyle  \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) 1_{d|n} 1_{(n,P(p_*(d)))=1} \ll \varepsilon \frac{D}{\log z}. \ \ \ \ \ (37)

If {d = p_1 \dots p_r} and {n} appear in the above sum, then we have {n = p_1 \dots p_r q} where {q \leq \frac{D}{d}} has no prime factor less than {p_r}, has an even number of prime factors, and obeys the bounds

\displaystyle  q \leq D_0^2 p_r^{2+\varepsilon}

thanks to (29). Note that (29) also gives {p_r \geq (D/D_0^2)^{1/(r+2)}}, and thus (since {r = O( \log \frac{1}{\varepsilon} )} and {D_0 = z^{\varepsilon^3}}) we see that {q < p_r^3} if {\varepsilon} is small enough and {z} is large enough. This forces {q} to either equal {1}, or be the product of two primes between {p_r} and {D_0^2 p_r^{1+\varepsilon}}. The contribution of the {q=1} case is bounded by {|{\mathcal E}_r| \leq D/D_0^2}, which is acceptable. As for the contribution of those {q} that are the product of two primes, the prime number theorem shows that there are at most

\displaystyle  \sum_{p_r \leq p \leq D_0^2 p_r^{1+\varepsilon}} O( \frac{D}{d p \log p_r} ) = O( \varepsilon \frac{D}{d \log p_r} )

values of {q} that can contribute to the sum, and so this contribution to {B_r} is at most

\displaystyle  O( \varepsilon \frac{D}{\log z} \sum_{d \in {\mathcal E}_r} \frac{1}{d} \frac{\log z}{\log p_*(d)} ),

but by (34) the sum here is {O(1)} for {z} large enough, and the claim follows. This completes the proof of (12).

Exercise 10 Establish the lower bound (13) in Theorem 2. (Note that one can assume without loss of generality that {s>2}, which will now be needed to ensure (31) when {r'=1}.)

— 3. Chen’s theorem —

We now prove Chen’s theorem for twin primes, loosely following the treatment in Chapter 25 of Friedlander-Iwaniec. We will in fact show the slightly stronger statement that

\displaystyle  \sum_{x/2 \leq n \leq x-2} \Lambda(n) 1_{{\mathcal P}_2}(n+2) 1_{(n+2,P(z))=1} \gg \frac{x}{\log x}

for sufficiently large {x}, where {{\mathcal P}_2} is the set of all numbers that are products of at most two primes, and {z := x^{1/8}}. Indeed, after removing the (negligible) contribution of those {n} that are powers of primes, this estimate would imply that there are infinitely many primes {p} such that {p+2} is the product of at most two primes, each of which is at least {p^{1/8}}.

Chen’s argument begins with the following simple lower bound sieve for {1_{{\mathcal P}_2}}:

Lemma 11 If {x^{2/3} < n \leq x}, then

\displaystyle  1_{{\mathcal P}_2}(n) \geq 1 - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p|n} - \frac{1}{2} \sum_{p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n=p_1p_2p_3} - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p^2|n}.

Proof: If {n} has no prime factors less than or equal to {x^{1/3}}, then {n \in {\mathcal P}_2} and the claim follows. If {n} has two or more factors less than or equal to {x^{1/3}}, then {1 - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p|n} \leq 0} and the claim follows. Finally, if {n} has exactly one factor less than or equal to {x^{1/3}}, then (as {n > x^{2/3}}) it must be of the form {n=p_1p_2p_3} for some {p_1 \leq x^{1/3} < p_2 \leq p_3}, or it is divisible by {p^2} for some {p \leq x^{1/3}}, and the claim again follows. \Box

In view of this sieve (trivially managing the contribution of {n \leq x^{2/3}} or of the case where {p^2|n}, and using the restriction of {n+2} to be coprime to {P(z)}), it suffices to show that

\displaystyle  A_1 - \frac{1}{2} \sum_{z \leq p \leq x^{1/3}} A_{2,p} - \frac{1}{2} A_3 \gg \frac{x}{\log x} \ \ \ \ \ (38)

for sufficiently large {x}, where

\displaystyle  A_1 := \sum_{x/2 \leq n \leq x-2} \Lambda(n) 1_{(n+2,P(z))=1},

\displaystyle  A_{2,p} := \sum_{x/2 \leq n \leq x-2: p|n+2} \Lambda(n) 1_{(n+2,P(z))=1},

and

\displaystyle  A_3 := \sum_{x/2 \leq n \leq x-2} \Lambda(n) \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n+2=p_1p_2p_3}.

We thus seek sufficiently good lower bounds on {A_1} and sufficiently good upper bounds on {A_{2,p}} and {A_3}. As it turns out, the linear sieve, combined with the Bombieri-Vinogradov theorem, will give bounds on {A_1, A_{2,p}, A_3} with numerical constants that are sufficient for this purpose.

We begin with {A_1}. We use the lower bound linear sieve, with {E_p} equal to the residue class {-2\ (p)} for all {p < z}, so that {E_d} is the residue class {-2\ (d)}. We approximate

\displaystyle  \sum_{x/2 \leq n \leq x-2: n \in E_d} \Lambda(n) = g(d) \frac{x}{2} + r_d

where {g} is the multiplicative function with {g(2) :=0} and {g(p) := \frac{1}{p-1}} for {p>2}. From the Bombieri-Vinogradov theorem (Theorem 17 of Notes 3) we have

\displaystyle  \sum_{d \leq D} |r_d| \ll x \log^{-10} x \ \ \ \ \ (39)

(say) if {D := x^{1/2 - \varepsilon} = z^{4-8\varepsilon}} for some small fixed {\varepsilon > 0}. Applying the lower bound linear sieve (13), we conclude that

\displaystyle  A_1 \geq (f(4-8\varepsilon) - O(\varepsilon)) \frac{x}{2} V(z) + O( x \log^{-10} x )

where

\displaystyle  V(z) = \prod_{p < z} (1-g(p)).

We can compute an asymptotic for {V}:

Exercise 12 Show that

\displaystyle  V(z) = (2 \Pi_2 + o(1)) \frac{1}{e^\gamma \log z}

as {z \rightarrow \infty}, where {\Pi_2 = \prod_{p>2} (1-\frac{1}{(p-1)^2})} is the twin prime constant.

From (11) we have {f(4) = \frac{e^\gamma}{2} \log 3}. Sending {\varepsilon} slowly to zero, we conclude that

\displaystyle  A_1 \geq (\log 3-o(1)) \Pi_2 \frac{x}{2\log z}. \ \ \ \ \ (40)

Now we turn to {A_{2,p}}. Here we use the upper bound linear sieve. Let {E_d} be as before. For any {d} dividing {P(z)} and {z \leq p < x^{1/3}}, we have

\displaystyle  \sum_{x/2 \leq n \leq x-2: n \in E_d} \Lambda(n) 1_{p|n+2} = g(d) g(p) \frac{x}{2} + r_{pd}

where {g} and {r_d} are as previously. We apply the upper bound linear sieve (12) with level of distribution {D/p}, to conclude that

\displaystyle  A_{2,p} \leq (F( \frac{\log D/p}{\log z} ) + O(\varepsilon)) g(p) \frac{x}{2} V(z)

\displaystyle  + O( \sum_{d \leq D/p} |r_{pd}| ).

We sum over {p}. Since {pd} is at most {D}, and each number less than or equal to {D} has at most {O( \log x )} prime factors, we have

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq \sum_{z \leq p< x^{1/3}} (F( \frac{\log D/p}{\log z} )

\displaystyle + O(\varepsilon)) g(p) \frac{x}{2} V(z) + O( \log x \sum_{d \leq D} |r_d| ).

The error term is {O( x \log^{-9} x )} thanks to (39). Since {g(p) = \frac{1+o(1)}{p}}, we thus have

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\sum_{z \leq p< x^{1/3}} \frac{F( \frac{\log D/p}{\log z} )}{p} + O(\varepsilon)) \frac{\Pi_2 x}{e^\gamma \log z}

for sufficiently large {x}, thanks to Exercise 12. We can compute the sum using Exercise 37 of Notes 1, to obtain

\displaystyle  \sum_{z \leq p< x^{1/3}} \frac{F( \frac{\log D/p}{\log z} )}{p} = \int_1^{8/3} F( 4-8\varepsilon-t ) \frac{dt}{t} + o(1),

which by (10) and sending {\varepsilon \rightarrow 0} slowly gives

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\int_1^{8/3} \frac{1}{4-t} \frac{dt}{t} + o(1)) \frac{2 \Pi_2 x}{\log z}.

A routine computation shows that

\displaystyle  \int_1^{8/3} \frac{1}{4-t} \frac{dt}{t} = \frac{\log 6}{4}

and so

\displaystyle  \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\log 6+o(1)) \Pi_2 \frac{x}{2\log z}. \ \ \ \ \ (41)

Finally, we consider {A_3}, which is estimated by “switching” the sieve to sift out small divisors of {n}, rather than small divisors of {n+2}. Removing those {n} with {n \leq \sqrt{x}}, as well as those {n} that are powers of primes, and then shifting {n} by {2}, we have

\displaystyle  A_3 \leq \log x \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n + O_\varepsilon( x^{1/2+\varepsilon} )

where {a_n} is the finitely supported non-negative sequence

\displaystyle  a_n := 1_{x/2+2 \leq n \leq x} \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n=p_1p_2p_3}. \ \ \ \ \ (42)

Here we are sifting out the residue classes {E'_p := 2\ (p)}, so that {E'_d := 2\ (d)}.

The sequence {a_n} has good distribution up to level {D}:

Proposition 13 One has

\displaystyle  \sum_{n \in E'_d} a_n = g(d) \sum_n a_n + r_d

where {g(d)} is as before, and

\displaystyle  \sum_{d \leq D: \mu^2(d)=1} |r_d| \ll x \log^{-10} x

(say), with {D} as before.

Proof: Observe that the quantity {p_3} in (42) is bounded above by {x^{2/3}} if the summand is to be non-zero. We now use a finer-than-dyadic decomposition trick similar to that used in the proof of the Bombieri-Vinogradov theorem in Notes 3 to approximate {a_n} as a combination of Dirichlet convolutions. Namely, we set {\lambda := 1 + \log^{-20} x}, and partition {[x^{1/3},x^{2/3}]} (plus possibly a little portion to the right of {x^{2/3}}) into {A=O( \log^{21} x )} consecutive intervals {I_1,\dots,I_A} each of the form {[N, \lambda N]} for some {x^{1/3} \leq N \leq x^{2/3}}. We similarly split {[z,x^{1/3}]} (plus possibly a tiny portion of {[z/2,z]}) into {B=O( \log^{21} x)} intervals {J_1,\dots, J_B} each of the form {[M, \lambda M]} for some {1 \ll M \leq x^{1/3}}. We can thus split {a_n} as

\displaystyle  a_n = \sum_{1 \leq b \leq B} \sum_{1 \leq a \leq a' \leq A} \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3: p_1 \in J_b, p_2 \in I_a, p_3 \in I_{a'}} 1_{n=p_1p_2p_3}1_{x/2+2 \leq n \leq x} .

Observe that for each {a,a'} there are only {O(1)} choices of {b} for which the summand can be non-zero. As such, the contribution of the diagonal case {a=a'} can be easily seen to be absorbed into the {r_d} error, as can those cases where the product set {\{ uvw: u \in J_b, v \in I_a, r \in I_{a'} \}} is not contained completely in {[x/2+2,x]}. If we let {\Omega} be the set of triplets {(b,a,a')} obeying these properties, we can thus approximate {a_n} by {\sum_{(b,a,a') \not \in \Omega} a_n^{(b,a,a')}}, where {a_n^{(b,a,a')}} is the Dirichlet convolution

\displaystyle  a_n^{(b,a,a')} := 1_{{\mathcal P} \cap J_b} * 1_{{\mathcal P} \cap I_a} * 1_{{\mathcal P} \cap I_{a'}}.

From the general Bombieri-Vinogradov theorem (Theorem 16 of Notes 3) and the Siegel-Walfisz theorem (Exercise 64 of Notes 2) we see that

\displaystyle  \sum_{n \in E_d} a_n^{(b,a,a')} = g(d) X^{(b,a,a')} + r_d^{(b,a,a')}

where

\displaystyle  \sum_{d \leq D: \mu^2(d)=1} |r_d^{(b,a,a')}| \ll x \log^{-10} x

(say) and

\displaystyle  X^{(b,a,a')} = \sum_n a_n^{(b,a,a')}.

This gives the claim with {\sum_n a_n} replaced by the quantity {\sum_{(b,a,a') \not \in \Omega} X^{(b,a,a')}}; but by undoing the previous decomposition we see that this quantity is equal to {\sum_n a_n} up to an error of {O( x \log^{-11} x)} (say), and the claim follows.

Applying the upper bound sieve (12) (with sifting level {D^{1/(1+\varepsilon)}}), we thus have

\displaystyle  \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n \leq (F(1+\varepsilon)+O(\varepsilon)) V( D^{1/(1+\varepsilon)} ) \sum_n a_n + O( x \log^{-10} x )

and hence by (10) and Exercise 12

\displaystyle  \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n \leq (4+O(\varepsilon)) \frac{2\Pi_2}{\log x} \sum_n a_n + O( x \log^{-10} x )

for {x} sufficiently large.

Note that

\displaystyle  \sum_n a_n = \sum_{z \leq p_1 \leq x^{1/3} < p_2 < \frac{x}{p_1 p_2}} \sum_{\max( p_2, \frac{x/2+2}{p_1 p_2} ) < p_3 < \frac{x}{p_1 p_2}} 1.

From the prime number theorem and Exercise 37 of Notes 1, we thus have

\displaystyle  \sum_n a_n \leq (1+o(1)) \frac{x}{2\log x} \int_{1/8 \leq t_1 \leq 1/3 < t_2 < 1-t_1-t_2} \frac{dt_1 dt_2}{t_1 t_2 (1-t_1-t_2)}.

(In fact one also has a matching lower bound, but we will not need it here.) We thus conclude that

\displaystyle  A_3 \leq (C+o(1)) \Pi_2 \frac{x}{2\log z}

where

\displaystyle  C := \int_{1/8 \leq t_1 \leq 1/3 < t_2 < 1-t_1-t_2} \frac{dt_1 dt_2}{t_1 t_2 (1-t_1-t_2)}

\displaystyle  = 0.363 \dots

The left-hand side of (38) is then at least

\displaystyle  (2 \log 3 - \log 6 - C) \Pi_2 \frac{x}{4 \log z}.

One can calculate that {2 \log 3 - \log 6 - C > 0}, and the claim follows.

Exercise 14 Establish Chen’s theorem for the even Goldbach conjecture.

Remark 15 If one is willing to use stronger distributional claims on the primes than is provided by the Bombieri-Vinogradov theorem, then one can use a simpler sieve than Chen’s sieve to establish Chen’s theorem, but then the required distributional theorem will then either be conjectural or more difficult to establish than the Bombieri-Vinogradov theorem. See Chapter 25 of Friedlander-Iwaniec for further discussion.