254A, Supplement 5: The linear sieve and Chen’s theorem (optional)

29 January, 2015 in 254A - analytic prime number theory, math.NT | Tags: Chen's theorem, linear sieve, sieve theory, twin primes | by Terence Tao

We continue the discussion of sieve theory from Notes 4, but now specialise to the case of the linear sieve in which the sieve dimension ${\kappa}$ is equal to ${1}$ , which is one of the best understood sieving situations, and one of the rare cases in which the precise limits of the sieve method are known. A bit more specifically, let ${z, D \geq 1}$ be quantities with ${z = D^{1/s}}$ for some fixed ${s>1}$ , and let ${g}$ be a multiplicative function with

$\displaystyle g(p) = \frac{1}{p} + O(\frac{1}{p^2}) \ \ \ \ \ (1)$

and

$\displaystyle 0 \leq g(p) \leq 1-c \ \ \ \ \ (2)$

for all primes ${p}$ and some fixed ${c>0}$ (we allow all constants below to depend on ${c}$ ). Let ${P(z) := \prod_{p<z} p}$ , and for each prime ${p < z}$ , let ${E_p}$ be a set of integers, with ${E_d := \bigcap_{p|d} E_p}$ for ${d|P(z)}$ . We consider finitely supported sequences ${(a_n)_{n \in {\bf Z}}}$ of non-negative reals for which we have bounds of the form

$\displaystyle \sum_{n \in E_d} a_n = g(d) X + r_d. \ \ \ \ \ (3)$

for all square-free ${d \leq D}$ and some ${X>0}$ , and some remainder terms ${r_d}$ . One is then interested in upper and lower bounds on the quantity

$\displaystyle \sum_{n\not \in\bigcup_{p <z} E_p} a_n.$

The fundamental lemma of sieve theory (Corollary 19 of Notes 4) gives us the bound

$\displaystyle \sum_{n\not \in\bigcup_{p <z} E_p} a_n = (1 + O(e^{-s})) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (4)$

where ${V(z)}$ is the quantity

$\displaystyle V(z) := \prod_{p<z} (1-g(p)). \ \ \ \ \ (5)$

This bound is strong when ${s}$ is large, but is not as useful for smaller values of ${s}$ . We now give a sharp bound in this regime. We introduce the functions ${F, f: (0,+\infty) \rightarrow {\bf R}^+}$ by

$\displaystyle F(s) := 2e^\gamma ( \frac{1_{s>1}}{s} \ \ \ \ \ (6)$

$\displaystyle + \sum_{j \geq 3, \hbox{ odd}} \frac{1}{j!} \int_{[1,+\infty)^{j-1}} 1_{t_1+\dots+t_{j-1}\leq s-1} \frac{dt_1 \dots dt_{j-1}}{t_1 \dots t_j} )$

and

$\displaystyle f(s) := 2e^\gamma \sum_{j \geq 2, \hbox{ even}} \frac{1}{j!} \int_{[1,+\infty)^{j-1}} 1_{t_1+\dots+t_{j-1}\leq s-1} \frac{dt_1 \dots dt_{j-1}}{t_1 \dots t_j} \ \ \ \ \ (7)$

where we adopt the convention ${t_j := s - t_1 - \dots - t_{j-1}}$ . Note that for each ${s}$ one has only finitely many non-zero summands in (6), (7). These functions are closely related to the Buchstab function ${\omega}$ from Exercise 28 of Supplement 4; indeed from comparing the definitions one has

$\displaystyle F(s) + f(s) = 2 e^\gamma \omega(s)$

for all ${s>0}$ .

Exercise 1 (Alternate definition of ${F, f}$ ) Show that ${F(s)}$ is continuously differentiable except at ${s=1}$ , and ${f(s)}$ is continuously differentiable except at ${s=2}$ where it is continuous, obeying the delay-differential equations

$\displaystyle \frac{d}{ds}( s F(s) ) = f(s-1) \ \ \ \ \ (8)$

for ${s > 1}$ and

$\displaystyle \frac{d}{ds}( s f(s) ) = F(s-1) \ \ \ \ \ (9)$

for ${s>2}$ , with the initial conditions

$\displaystyle F(s) = \frac{2e^\gamma}{s} 1_{s>1}$

for ${s \leq 3}$ and

$\displaystyle f(s) = 0$

for ${s \leq 2}$ . Show that these properties of ${F, f}$ determine ${F, f}$ completely.

For future reference, we record the following explicit values of ${F, f}$ :

$\displaystyle F(s) = \frac{2e^\gamma}{s} \ \ \ \ \ (10)$

for ${1 < s \leq 3}$ , and

$\displaystyle f(s) = \frac{2e^\gamma}{s} \log(s-1) \ \ \ \ \ (11)$

for ${2 \leq s \leq 4}$ .

We will show

Theorem 2 (Linear sieve) Let the notation and hypotheses be as above, with ${s > 1}$ . Then, for any ${\varepsilon > 0}$ , one has the upper bound

$\displaystyle \sum_{n\not \in\bigcup_{p <z} E_p} a_n \leq (F(s) + O(\varepsilon)) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (12)$

and the lower bound

$\displaystyle \sum_{n\not \in\bigcup_{p <z} E_p} a_n \geq (f(s) - O(\varepsilon)) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (13)$

if ${D}$ is sufficiently large depending on ${\varepsilon, s, c}$ . Furthermore, this claim is sharp in the sense that the quantity ${F(s)}$ cannot be replaced by any smaller quantity, and similarly ${f(s)}$ cannot be replaced by any larger quantity.

Comparing the linear sieve with the fundamental lemma (and also testing using the sequence ${a_n = 1_{1 \leq n \leq N}}$ for some extremely large ${N}$ ), we conclude that we necessarily have the asymptotics

$\displaystyle 1 - O(e^{-s}) \leq f(s) \leq 1 \leq F(s) \leq 1 + O( e^{-s} )$

for all ${s \geq 1}$ ; this can also be proven directly from the definitions of ${F, f}$ , or from Exercise 1, but is somewhat challenging to do so; see e.g. Chapter 11 of Friedlander-Iwaniec for details.

Exercise 3 Establish the integral identities

$\displaystyle F(s) = 1 + \frac{1}{s} \int_s^\infty (1 - f(t-1))\ dt$

and

$\displaystyle f(s) = 1 + \frac{1}{s} \int_s^\infty (1 - F(t-1))\ dt$

for ${s \geq 2}$ . Argue heuristically that these identities are consistent with the bounds in Theorem 2 and the Buchstab identity (Equation (16) from Notes 4).

Exercise 4 Use the Selberg sieve (Theorem 30 from Notes 4) to obtain a slightly weaker version of (12) in the range ${1 < s < 3}$ in which the error term ${|r_d|}$ is worsened to ${\tau_3(d) |r_d|}$ , but the main term is unchanged.

We will prove Theorem 2 below the fold. The optimality of ${F, f}$ is closely related to the parity problem obstruction discussed in Section 5 of Notes 4; a naive application of the parity arguments there only give the weak bounds ${F(s) \geq \frac{2 e^\gamma}{s}}$ and ${f(s)=0}$ for ${s \leq 2}$ , but this can be sharpened by a more careful counting of various sums involving the Liouville function ${\lambda}$ .

As an application of the linear sieve (specialised to the ranges in (10), (11)), we will establish a famous theorem of Chen, giving (in some sense) the closest approach to the twin prime conjecture that one can hope to achieve by sieve-theoretic methods:

Theorem 5 (Chen’s theorem) There are infinitely many primes ${p}$ such that ${p+2}$ is the product of at most two primes.

The same argument gives the version of Chen’s theorem for the even Goldbach conjecture, namely that for all sufficiently large even ${N}$ , there exists a prime ${p}$ between ${2}$ and ${N}$ such that ${N-p}$ is the product of at most two primes.

The discussion in these notes loosely follows that of Friedlander-Iwaniec (who study sieving problems in more general dimension than ${\kappa=1}$ ).

— 1. Optimality —

We first establish that the quantities ${F(s), f(s)}$ appearing in Theorem 2 cannot be improved. We use the parity argument of Selberg, based on weight sequences ${a_n}$ related to the Liouville function.

We argue for the optimality of ${F(s)}$ ; the argument for ${f(s)}$ is similar and is left as an exercise. Suppose that there is ${s \geq 1}$ for which the claim in Theorem 2 is not optimal, thus there exists ${\delta>0}$ such that

$\displaystyle \sum_{n\not \in\bigcup_{p <z} E_p} a_n \leq (F(s) - \delta) X V(z) + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (14)$

for ${z, D, g, E_p, a_n, X, V(z), r_d}$ as in that theorem, with ${z}$ sufficiently large.

We will contradict this claim by specialising to a special case. Let ${z}$ be a large parameter going to infinity, and set ${D := z^s}$ . We set ${g(d) := 1/d}$ , then by Mertens’ theorem we have ${V(z) = \frac{e^{-\gamma}+o(1)}{\log z}}$ . We set ${E_p}$ to be the residue class ${0\ (p)}$ , thus (3) becomes

$\displaystyle \sum_{n: d|n} a_n = g(d) X + r_d \ \ \ \ \ (15)$

and (14) becomes

$\displaystyle \sum_{n: (n,P(z)) = 1} a_n \leq \frac{F(s) - \delta + o(1)}{e^\gamma} \frac{X}{\log z} + O( \sum_{d \leq D: \mu^2(d)=1} |r_d| ) \ \ \ \ \ (16)$

where ${P(z) := \prod_{p<z} p}$ .

Now let ${\varepsilon > 0}$ be a small fixed quantity to be chosen later, set ${X := D^{1+\varepsilon}}$ , and let ${a_n}$ be the sequence

$\displaystyle a_n := (1 - \lambda(n)) 1_{1 \leq n \leq X}.$

This is clearly finitely supported and non-negative. For any ${d}$ , we have

$\displaystyle \sum_{n: d|n} a_n = \sum_{n \leq X/d} 1 - \lambda(d) \sum_{n \leq X/d} \lambda(n)$

from the multiplicativity of ${\lambda}$ . If ${d \leq D}$ , then ${X/d \geq D^\varepsilon}$ , and then by the prime number theorem for the Liouville function (Exercise 41 from Notes 2, combined with Exercise 18 from Supplement 4) we have

$\displaystyle \sum_{n \leq x/d} \lambda(n) \ll_\varepsilon \frac{X}{d} \log^{-10} D$

(say), annd hence the remainder term ${r_d}$ in (15) is of size

$\displaystyle |r_d| \ll_\varepsilon \frac{X}{d} \log^{-10} D. \ \ \ \ \ (17)$

As such, the error term ${O( \sum_{d \leq D: \mu^2(d)=1} |r_d| )}$ in (16) may be absorbed into the ${o(1)}$ term, and so

$\displaystyle \sum_{n \leq X: (n,P(z))=1} (1-\lambda(n)) \leq \frac{F(s) - \delta + o(1)}{e^\gamma} \frac{X}{\log z}. \ \ \ \ \ (18)$

Now we count the left-hand side. Observe that ${1-\lambda(n)}$ is supported on those numbers ${n = p_1 \dots p_r}$ that are the product of an odd number of primes ${p_1 \geq \dots \geq p_r}$ (possibly with repetition), in which case ${1-\lambda(n)=2}$ . To be coprime to ${P(z)}$ , all these primes must be at least ${z}$ ; since we are restricting ${n \leq X = z^{(1+\varepsilon)s}}$ , we thus must have ${r \leq (1+\varepsilon) s}$ . The left-hand side of (18) may thus be written as

$\displaystyle 2 \sum_{r \leq (1+\varepsilon) s, \hbox{ odd}} \sum_{p_1 \geq \dots \geq p_r \geq z: p_1 \dots p_r \leq X} 1. \ \ \ \ \ (19)$

This expression may be computed using the prime number theorem:

Exercise 6 Show that the expression (19) is equal to ${\frac{F( s(1+\varepsilon))+o(1)}{e^\gamma} \frac{X}{\log z}}$ .

Since ${F}$ is continuous for ${s>1}$ , we obtain a contradiction if ${\varepsilon}$ is sufficiently small.

Exercise 7 Verify the optimality of ${f(s)}$ in Theorem 2. (Hint: replace ${1-\lambda(n)}$ by ${1+\lambda(n)}$ in the above arguments.)

— 2. The linear sieve —

We now prove the forward direction of Theorem 2. Again, we focus on the upper bound (12), as the lower bound case is similar.

Fix ${s>1}$ . Morally speaking, the most natural sieve to use here is the (upper bound) beta sieve from Notes 4, with the optimal value of ${\beta}$ , which for the linear sieve turns out to be ${\beta=2}$ . Recall that this sieve is defined as the sum

$\displaystyle \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}$

where ${{\mathcal D}_+}$ is the set of divisors ${d = p_1 \dots p_m}$ of ${P(z)}$ with ${z > p_1 > \dots > p_m}$ , such that

$\displaystyle p_1 \dots p_{r-1} p_r^3 \leq D$

for all odd ${1 \leq r \leq m}$ . From Proposition 14 of Notes 4 this is indeed an upper bound sieve; indeed we have

$\displaystyle \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}(n) = 1_{n \not \in \bigcup_{p < z} E_p} \ \ \ \ \ (20)$

$\displaystyle + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} 1_{n \in E_d} 1_{n \not \in \bigcup_{p < p_*(d)} E_p}$

where ${{\mathcal E}_r}$ is the set of divisors ${d = p_1 \dots p_r}$ of ${P(z)}$ with ${z > p_1 > \dots > p_r = p_*(d)}$ , such that

$\displaystyle p_1 \dots p_{r-1} p_r^3 > D \ \ \ \ \ (21)$

and

$\displaystyle p_1 \dots p_{r'-1} p_{r'}^3 \leq D \ \ \ \ \ (22)$

for all odd ${1 \leq r' < r}$ . Now for the key heuristic point: if ${n \approx D}$ lies in the support of ${1-\lambda}$ , then the sum in (20) mostly vanishes. Indeed, if ${n \leq D}$ is such that ${n \in E_d}$ and ${n \not \in \bigcup_{p < p_*(d)} E_p}$ for some ${d = p_1 \dots p_r \in {\mathcal E}_r}$ and odd ${r}$ , then one has ${n = p_1 \dots p_r q}$ for some ${q}$ that is not divisible by any prime less than ${p_r}$ . On the other hand, from (21), (22) one has

$\displaystyle q \approx \frac{D}{p_1 \dots p_r} < p_r^2$

and

$\displaystyle q \approx \frac{D}{p_1 \dots p_r} \geq 1$

which (morally) implies from the sieve of Erathosthenes that ${q}$ is prime, thus ${\lambda(n) = (-1)^{r+1} = +1}$ and so ${n}$ is not in the support of ${1-\lambda}$ . As such, we expect the upper bound sieve ${ \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}}$ to be extremely efficient on the support of ${1-\lambda}$ , which when combined with the analysis of the previous section suggests that this sieve should produce the desired upper bound (12).

One can indeed use this sieve to establish the required bound (12); see Chapter 11 of Friedlander-Iwaniec for details. However, for various technical reasons it will be convenient to modify this sieve slightly, by increasing the ${\beta}$ parameter to be slightly greater than ${2}$ , and also by using the fundamental lemma to perform a preliminary sifting on the small primes.

We turn to the details. To prove (12) we may of course assume that ${\varepsilon>0}$ is suitably small. It will be convenient to worsen the error ${O(\varepsilon)}$ in (12) a little to ${O( \varepsilon \log \frac{1}{\varepsilon} )}$ , since one can of course remove the logarithm by reducing ${\varepsilon}$ appropriately.

Set ${D_0 := z^{\varepsilon^2}}$ and ${z_0 := z^{\varepsilon^3}}$ . By the fundamental lemma of sieve theory (Lemma 17 of Notes 4), one can find combinatorial upper and lower bound sieve coefficients ${(\lambda^{\pm,0}_d)_{d \leq D_0}}$ at sifting level ${z_0}$ supported on divisors of ${P(z_0)}$ , such that

$\displaystyle \sum_{d \leq D_0: d|P(z_0)} \lambda^{\pm,0}_d g(d) = V( z_0 ) ( 1 + O( e^{-1/\varepsilon} ) ). \ \ \ \ \ (23)$

Thus we have ${\lambda^{\pm,0}_d \in \{-1,0,1\}}$ and

$\displaystyle \sum_{d \leq D_0: d|P(z_0)} \lambda^{-,0}_d 1_{E_d}(n) \leq 1_{n \not \in \bigcup_{p<z_0} E_p} \leq \sum_{d \leq D_0: d|P(z_0)} \lambda^{+,0}_d 1_{E_d}(n) \ \ \ \ \ (24)$

for all ${n}$ .

We will use the upper bound sieve ${\sum_{d \leq D_0:d|P(z_0)} \lambda^{+,0}_d 1_{E_d}(n)}$ as a preliminary sieve to remove the ${E_p}$ for ${p<z_0}$ ; the lower bound sieve ${\lambda^{-,0}_d}$ plays only a minor supporting role, mainly to control the difference between the upper bound sieve ${1_{n \not \in \bigcup_{p<z_0} E_p}}$ .

Next, we set ${P(z_0,z) := P(z)/P(z_0) = \prod_{z_0 \leq p < z} p}$ , and let ${\lambda^+_d}$ be the upper bound beta sieve with parameter ${\beta = 2+\varepsilon}$ on the primes dividing ${P(z_0,z)}$ up to level of distribution ${D / D_0^2}$ . In other words, ${{\mathcal D}_+}$ consists of those divisors ${d = p_1 \dots p_r}$ of ${P(z_0,z)}$ with ${p_1 > \dots > p_r}$ such that

$\displaystyle p_1 \dots p_{m-1} p_m^{3+\varepsilon} \leq D / D_0^2$

for all odd ${1 \leq m \leq r}$ ; in particular, ${d \leq D/D_0^2}$ for all ${d \in {\mathcal D}_+}$ . By Proposition 14 of Notes 4, this is indeed an upper bound sieve for the primes dividing ${P(z_0,z)}$ :

$\displaystyle \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d} \geq 1_{n \not \in \bigcup_{z_0 \leq p < z} E_p}.$

Multiplying this with the second inequality in (24) (this is the method of composition of sieves), we obtain an upper bound sieve for the primes up to ${z}$ :

$\displaystyle \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) 1_{E_{d_0d}}(n) \geq 1_{n \not \in \bigcup_{p < z} E_p}.$

Multiplying this by ${a_n}$ and summing in ${n}$ , we conclude that

$\displaystyle \sum_{n \not \in \bigcup_{p<z} E_p} a_n \leq \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) \sum_{n \in E_{d_0 d}} a_n.$

Note that each product ${d_0 d}$ appears at most once in the above sum, and all such products are squarefree and at most ${D}$ . Applying (3), we thus have

$\displaystyle \sum_{n \not \in \bigcup_{p<z} E_p} a_n \leq \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) g(d_0) g(d) X$

$\displaystyle + \sum_{d \leq D: \mu^2(d)=1} |r_d|.$

Thus to prove (12), it suffices to show that

$\displaystyle \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) g(d_0) g(d) = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) V(z)$

for ${z}$ sufficiently large depending on ${\varepsilon}$ . Factoring out the ${d_0}$ summation using (23), it thus suffices to show that

$\displaystyle \sum_{d \in {\mathcal D}_+} \mu(d) g(d) = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) V(z) / V(z_0).$

Now we eliminate the role of ${g}$ . From (1), (5) and Mertens’ theorem we have

$\displaystyle V(z) / V(z_0) = (1 + O(\varepsilon)) \frac{\log z_0}{\log z}$

for ${z}$ large enough. Also, if ${d \in {\mathcal D}_+}$ , then ${d \leq D/D_0^2}$ and all prime factors of ${d}$ are at least ${z_0 = D^{\varepsilon^3/s}}$ . Thus ${d}$ has at most ${O_{s,\varepsilon}(1)}$ prime factors, each of which are at least ${z_0}$ . From (1) we then have

$\displaystyle g(d) = \frac{1}{d} + O_{s,\varepsilon}( \frac{1}{z_0} \frac{1}{d} ).$

The contribution of the error term ${O_{s,\varepsilon}( \frac{1}{z_0} \frac{1}{d} )}$ is then easily seen to be ${O( \varepsilon \frac{\log z_0}{\log z} )}$ for ${z}$ large enough, and so we reduce to showing that

$\displaystyle \sum_{d \in {\mathcal D}_+} \frac{\mu(d)}{d} = (F(s) + O(\varepsilon\log \frac{1}{\varepsilon})) \frac{\log z_0}{\log z}. \ \ \ \ \ (25)$

One can proceed here by evaluating the left-hand side directly. However, we will proceed instead by using the weight function ${1-\lambda}$ from the previous section. More precisely, we will evaluate the expression

$\displaystyle \sum_{n \leq D} (1-\lambda(n)) \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) 1_{d_0d|n} \ \ \ \ \ (26)$

in two different ways, where ${\lambda^{+,0}_{d_0}}$ is as before (but with the role of ${g(d)}$ now replaced by the function ${1/d}$ ). Firstly, since ${d_0 d \leq D/D_0}$ , we see from the argument used to establish (17) that

$\displaystyle \sum_{n \leq D: d_0 d|n} 1-\lambda(n) = \frac{D}{d_0 d} + O( \frac{D}{d_0 d} \log^{-10} D )$

(say). Since each ${d_0 d}$ appears at least once, we can thus write (26) as

$\displaystyle \sum_{d_0 \leq D_0: d_0|P(z_0)} \sum_{d \in {\mathcal D}_+} \lambda^{+,0}_{d_0} \mu(d) \frac{D}{d_0 d} + O( D \log^{-9} D )$

which upon factoring the ${d_0}$ sum using (23) and Mertens’ theorem

$\displaystyle (1 + O(\varepsilon)) \frac{D}{e^\gamma \log z_0} \sum_{d \in {\mathcal D}_+} \frac{\mu(d)}{d} + O( D \log^{-9} D ).$

Thus to verify (25), it will suffice to show that (26) is of the form

$\displaystyle (F(s) + O(\varepsilon \log \frac{1}{\varepsilon})) \frac{D}{e^\gamma \log z}$

for ${z}$ sufficiently large.

To do this, we abbreviate (26) as

$\displaystyle \sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) \sum_{d \in {\mathcal D}_+} \mu(d) 1_{d|n} \ \ \ \ \ (27)$

where ${\nu^{\pm,0}}$ are the sieves

$\displaystyle \nu^{\pm,0}(n) := \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{\pm,0}_{d_0} 1_{d_0|n}.$

By Proposition 14 of Notes 4, we can expand ${\sum_{d \in {\mathcal D}_+} \mu(d) 1_{d|n}}$ as

$\displaystyle 1_{(n,P(z_0,z))=1} + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} 1_{d|n} 1_{(n,P(z_0,p_*(d)))=1} \ \ \ \ \ (28)$

where for any ${r \geq 1}$ , ${{\mathcal E}_r}$ is the collection of all divisors ${d = p_1 \dots p_r}$ of ${P(z_0,z)}$ with ${p_1 > \dots > p_r = p_*(d)}$ such that

$\displaystyle p_1 \dots p_{r-1} p_r^{3+\varepsilon} > D/D_0^2 \ \ \ \ \ (29)$

but

$\displaystyle p_1 \dots p_{r'-1} p_{r'}^{3+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (30)$

for all ${1 \leq r' < r}$ with the same parity as ${r}$ . For technical reasons, we will also impose the additional inequality

$\displaystyle p_1 \dots p_{r'-1} p_{r'}^{2+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (31)$

for all ${1 \leq r' < r}$ with the opposite parity as ${r}$ ; this follows from (30) when ${r' > 1}$ , but is an additional constraint when ${r'=1}$ and ${r}$ is even, but in the above identity ${r}$ is odd, so this additional constraint is harmless. For similar reasons we impose the inequality

$\displaystyle p_1 \dots p_r^{1+\varepsilon} \leq D/D_0^2 \ \ \ \ \ (32)$

which follows from (30) or (31) except when ${r=1}$ , but then this inequality is automatic from our hypothesis ${s>1}$ , which implies ${z^{1+\varepsilon} \leq D/D_0^2}$ if ${\varepsilon}$ is chosen small enough.

Inserting the identity (28), we can write (26) as

$\displaystyle A + \sum_{r \hbox{ odd}} B_r$

where

$\displaystyle A :=\sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) 1_{(n,P(z_0,z))=1}$

and

$\displaystyle B_r := \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) \nu^{+,0}(n) 1_{d|n} 1_{(n,P(z_0,p_*(d))=1}.$

We first estimate ${A}$ . By(24), we can write ${A}$ as the sum of

$\displaystyle \sum_{n \leq D} (1-\lambda(n)) 1_{(n,P(z))=1} \ \ \ \ \ (33)$

plus an error of size at most

$\displaystyle O( |\sum_{n \leq D} (\sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{+,0}_{d_0} 1_{d_0|n} - \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{-,0}_{d_0} 1_{d_0|n} )| )$

(where we bound ${(1-\lambda(n)) 1_{(n,P(z_0,z))=1}}$ by ${O(1)}$ ). The error may be rearranged as

$\displaystyle O( |\sum_{d_0 \leq D_0: d_0|P(z_0)} (\lambda^{+,0}_{d_0} - \lambda^{-,0}_{d_0}) (\frac{D}{d_0}+O(1))| ),$

which by (23) is of size ${O( e^{-1/\varepsilon} \frac{D}{\log z_0} ) = O( \varepsilon \frac{D}{\log z} )}$ for ${\varepsilon}$ small enough. As for the main term (33), we see from Exercise 6 (and the arguments preceding that exercise) that this term is equal to ${\frac{F(s)+O(\varepsilon)}{e^\gamma} \frac{D}{\log z}}$ for ${z}$ sufficiently large. Thus, to obtain the desired approximation for (26), it will suffice to show that

$\displaystyle \sum_{r \hbox{ odd}} B_r \ll \varepsilon \log \frac{1}{\varepsilon} \frac{D}{\log z}.$

Next, we establish an exponential decay estimate on the ${B_r}$ :

Lemma 8 For ${z}$ sufficiently large depending on ${\varepsilon}$ , we have

$\displaystyle B_r \ll \varepsilon^{-O(1)} \alpha^r e^{-s} \frac{D}{\log z}$

for all ${r \geq 1}$ and some absolute constant ${0 < \alpha < 1}$ .

Proof: (Sketch) Note that if ${d = p_1 \dots p_r}$ is in ${{\mathcal E}_r}$ , then ${d \leq D/D_0^2}$ and all prime factors are at least ${z_0 = D^{\varepsilon^3}}$ , thus we may assume without loss of generality that ${r \leq 1/\varepsilon^3}$ .

We bound

$\displaystyle B_r \ll \sum_{d \in {\mathcal E}_r} \sum_{n \leq D/d} \nu^{+,0}(n) 1_{(n,P(z_0,p_*(d)))=1}.$

Note that if ${d = p_1 \dots p_r}$ lies in ${{\mathcal E}_r}$ , then

$\displaystyle D_0^2 p_r^\varepsilon \leq \frac{D}{d} < D_0^2 p_r^{2+\varepsilon}$

thanks to (29), (32). From this and the fundamental lemma of sieve theory we see (Exercise!) that

$\displaystyle \sum_{n \leq D/d} \nu^{+,0}(n) 1_{(n,P(z_0,p_r))=1} \ll \varepsilon^{-O(1)} \frac{D}{d \log p_*(d)}$

and so it will suffice to show that

$\displaystyle \sum_{d \in {\mathcal E}_r} \frac{1}{d} \frac{\log z}{\log p_*(d)} \ll \alpha^r e^{-s}. \ \ \ \ \ (34)$

By the prime number theorem, the left-hand side is bounded (Exercise!) by ${f_r(s-2\varepsilon^2) + o(1)}$ as ${z \rightarrow \infty}$ , where

$\displaystyle f_r(s) := \int_{J_r(s)} \frac{dt_1 \dots dt_r}{t_1 \dots t_{r-1} t_r^2}$

and ${J_r(s) \subset (0,+\infty)^r}$ is the set of points ${(t_1,\dots,t_r)}$ with ${1 \geq t_1 \geq \dots \geq t_r > 0}$ ,

$\displaystyle t_1 + \dots + t_{r-1} + (3+\varepsilon) t_r \geq s,$

and such that

$\displaystyle t_1 + \dots + t_{r'-1} + (3+\varepsilon)t_{r'} \leq s$

for all ${1 \leq r' < r}$ with the same parity as ${r}$ , and

$\displaystyle t_1 + \dots + t_{r'-1} + (2+\varepsilon)t_{r'} \leq s$

for all ${1 \leq r' \leq r}$ . It thus suffices to prove the bound

$\displaystyle f_r(s) \leq C \alpha^r e^{-s} \ \ \ \ \ (35)$

for all ${s>1}$ and some absolute constant ${C>0}$ .

We use an argument from the book of Harman. Observe that ${f_r(s)}$ vanishes for ${s > r+2}$ , which makes the claim (35) routine for ${r=1}$ (Exercise!) for ${C}$ sufficiently large. We will now inductively prove (35) for all odd ${r}$ . From the change of variables ${u = \frac{s}{t_1}}$ , we obtain the identity

$\displaystyle f_r(s) = \frac{1}{s} \int_{\max(s, b_r)}^\infty f_{r-1}(t-1)\ dt \ \ \ \ \ (36)$

where ${b_r := 3+\varepsilon}$ when ${r}$ is odd and ${b_r := 2+\varepsilon}$ when ${r}$ is even (Exercise!). In particular, if ${r \geq 3}$ is odd and (35) was already proven for ${r-2}$ , then

$\displaystyle f_r(s) = \frac{1}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{1}{t-1} \int_{t-1}^\infty f_{r-2}(u-1)\ du dt$

$\displaystyle \leq \frac{1}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{1}{t-1} \int_{t-1}^\infty C \alpha^{r-2} e^{1-u}\ du dt$

$\displaystyle \leq C \alpha^{r} e^{-s} \times \alpha^{-2} \times \frac{e^{s+2}}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{e^{-t}}{t-1}\ dt.$

One can check (Exercise!) that the quantity ${\frac{e^{s+2}}{s} \int_{\max(s,3+\varepsilon)}^\infty \frac{e^{-t}}{t-1}\ dt}$ is maximised at ${s=3+\varepsilon}$ , where its value is less than ${1}$ (in fact it is ${0.94\dots}$ ) if ${\varepsilon}$ is small enough. As such, we obtain (35) if ${\alpha}$ is sufficiently close to ${1}$ .

Finally, (35) for even ${r}$ follows from the odd ${r}$ case (with a slightly larger choice of ${C}$ ) by one final application of (36). $\Box$

Exercise 9 Fill in the steps marked (Exercise!) in the above proof.

In view of this lemma, the total contribution of ${B_r}$ with ${r > C \log \frac{1}{\varepsilon}}$ for some sufficiently large ${C}$ is acceptable. Thus it suffices to show that

$\displaystyle B_r \ll \varepsilon \frac{D}{\log z}$

whenever ${r = O( \log\frac{1}{\varepsilon} )}$ is odd.

By (24), we can write ${B_r}$ as

$\displaystyle \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) 1_{d|n} 1_{(n,P(p_*(d)))=1}$

plus an error of size

$\displaystyle O( \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (\sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{+,0}_{d_0} 1_{dd_0|n} - \sum_{d_0 \leq D_0: d_0|P(z_0)} \lambda^{-,0}_{d_0} 1_{dd_0|n} ) ).$

Arguing as in the treatment of the ${A}$ term, we see from (23) that the error term is bounded by

$\displaystyle \ll O( \sum_{d \in {\mathcal E}_r} \sum_{d_0 \leq D_0: d_0|P(z_0)} (\lambda^{+,0}_{d_0} - \lambda^{-,0}_{d_0}) \frac{D}{dd_0} ) + \sum_{d \in {\mathcal E}_r} \sum_{d_0 \leq D_0: d_0|P(z_0)} 1$

$\displaystyle \ll \sum_{d \in {\mathcal E}_r} e^{-1/\varepsilon} \frac{D}{d \log z_0} + \frac{D}{D_0^2} D_0$

$\displaystyle \ll e^{-1/\varepsilon} \frac{D}{\log z_0} (\sum_{z_0 \leq p \leq z} \frac{1}{p})^r + \frac{D}{D_0}$

$\displaystyle \ll e^{-1/\varepsilon} \frac{D}{\log z_0} O( \log \frac{\log z}{\log z_0} )^r + \frac{D}{D_0}$

$\displaystyle \ll \varepsilon \frac{D}{\log z}$

as desired for ${z}$ large enough, since ${r = O(\log \frac{1}{\varepsilon})}$ and ${\frac{\log z}{\log z_0} = O( \varepsilon^{-3} )}$ . Thus it suffices to show that

$\displaystyle \sum_{d \in {\mathcal E}_r} \sum_{n \leq D} (1-\lambda(n)) 1_{d|n} 1_{(n,P(p_*(d)))=1} \ll \varepsilon \frac{D}{\log z}. \ \ \ \ \ (37)$

If ${d = p_1 \dots p_r}$ and ${n}$ appear in the above sum, then we have ${n = p_1 \dots p_r q}$ where ${q \leq \frac{D}{d}}$ has no prime factor less than ${p_r}$ , has an even number of prime factors, and obeys the bounds

$\displaystyle q \leq D_0^2 p_r^{2+\varepsilon}$

thanks to (29). Note that (29) also gives ${p_r \geq (D/D_0^2)^{1/(r+2)}}$ , and thus (since ${r = O( \log \frac{1}{\varepsilon} )}$ and ${D_0 = z^{\varepsilon^3}}$ ) we see that ${q < p_r^3}$ if ${\varepsilon}$ is small enough and ${z}$ is large enough. This forces ${q}$ to either equal ${1}$ , or be the product of two primes between ${p_r}$ and ${D_0^2 p_r^{1+\varepsilon}}$ . The contribution of the ${q=1}$ case is bounded by ${|{\mathcal E}_r| \leq D/D_0^2}$ , which is acceptable. As for the contribution of those ${q}$ that are the product of two primes, the prime number theorem shows that there are at most

$\displaystyle \sum_{p_r \leq p \leq D_0^2 p_r^{1+\varepsilon}} O( \frac{D}{d p \log p_r} ) = O( \varepsilon \frac{D}{d \log p_r} )$

values of ${q}$ that can contribute to the sum, and so this contribution to ${B_r}$ is at most

$\displaystyle O( \varepsilon \frac{D}{\log z} \sum_{d \in {\mathcal E}_r} \frac{1}{d} \frac{\log z}{\log p_*(d)} ),$

but by (34) the sum here is ${O(1)}$ for ${z}$ large enough, and the claim follows. This completes the proof of (12).

Exercise 10 Establish the lower bound (13) in Theorem 2. (Note that one can assume without loss of generality that ${s>2}$ , which will now be needed to ensure (31) when ${r'=1}$ .)

— 3. Chen’s theorem —

We now prove Chen’s theorem for twin primes, loosely following the treatment in Chapter 25 of Friedlander-Iwaniec. We will in fact show the slightly stronger statement that

$\displaystyle \sum_{x/2 \leq n \leq x-2} \Lambda(n) 1_{{\mathcal P}_2}(n+2) 1_{(n+2,P(z))=1} \gg \frac{x}{\log x}$

for sufficiently large ${x}$ , where ${{\mathcal P}_2}$ is the set of all numbers that are products of at most two primes, and ${z := x^{1/8}}$ . Indeed, after removing the (negligible) contribution of those ${n}$ that are powers of primes, this estimate would imply that there are infinitely many primes ${p}$ such that ${p+2}$ is the product of at most two primes, each of which is at least ${p^{1/8}}$ .

Chen’s argument begins with the following simple lower bound sieve for ${1_{{\mathcal P}_2}}$ :

Lemma 11 If ${x^{2/3} < n \leq x}$ , then

$\displaystyle 1_{{\mathcal P}_2}(n) \geq 1 - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p|n} - \frac{1}{2} \sum_{p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n=p_1p_2p_3} - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p^2|n}.$

Proof: If ${n}$ has no prime factors less than or equal to ${x^{1/3}}$ , then ${n \in {\mathcal P}_2}$ and the claim follows. If ${n}$ has two or more factors less than or equal to ${x^{1/3}}$ , then ${1 - \frac{1}{2} \sum_{p \leq x^{1/3}} 1_{p|n} \leq 0}$ and the claim follows. Finally, if ${n}$ has exactly one factor less than or equal to ${x^{1/3}}$ , then (as ${n > x^{2/3}}$ ) it must be of the form ${n=p_1p_2p_3}$ for some ${p_1 \leq x^{1/3} < p_2 \leq p_3}$ , or it is divisible by ${p^2}$ for some ${p \leq x^{1/3}}$ , and the claim again follows. $\Box$

In view of this sieve (trivially managing the contribution of ${n \leq x^{2/3}}$ or of the case where ${p^2|n}$ , and using the restriction of ${n+2}$ to be coprime to ${P(z)}$ ), it suffices to show that

$\displaystyle A_1 - \frac{1}{2} \sum_{z \leq p \leq x^{1/3}} A_{2,p} - \frac{1}{2} A_3 \gg \frac{x}{\log x} \ \ \ \ \ (38)$

for sufficiently large ${x}$ , where

$\displaystyle A_1 := \sum_{x/2 \leq n \leq x-2} \Lambda(n) 1_{(n+2,P(z))=1},$

$\displaystyle A_{2,p} := \sum_{x/2 \leq n \leq x-2: p|n+2} \Lambda(n) 1_{(n+2,P(z))=1},$

and

$\displaystyle A_3 := \sum_{x/2 \leq n \leq x-2} \Lambda(n) \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n+2=p_1p_2p_3}.$

We thus seek sufficiently good lower bounds on ${A_1}$ and sufficiently good upper bounds on ${A_{2,p}}$ and ${A_3}$ . As it turns out, the linear sieve, combined with the Bombieri-Vinogradov theorem, will give bounds on ${A_1, A_{2,p}, A_3}$ with numerical constants that are sufficient for this purpose.

We begin with ${A_1}$ . We use the lower bound linear sieve, with ${E_p}$ equal to the residue class ${-2\ (p)}$ for all ${p < z}$ , so that ${E_d}$ is the residue class ${-2\ (d)}$ . We approximate

$\displaystyle \sum_{x/2 \leq n \leq x-2: n \in E_d} \Lambda(n) = g(d) \frac{x}{2} + r_d$

where ${g}$ is the multiplicative function with ${g(2) :=0}$ and ${g(p) := \frac{1}{p-1}}$ for ${p>2}$ . From the Bombieri-Vinogradov theorem (Theorem 17 of Notes 3) we have

$\displaystyle \sum_{d \leq D} |r_d| \ll x \log^{-10} x \ \ \ \ \ (39)$

(say) if ${D := x^{1/2 - \varepsilon} = z^{4-8\varepsilon}}$ for some small fixed ${\varepsilon > 0}$ . Applying the lower bound linear sieve (13), we conclude that

$\displaystyle A_1 \geq (f(4-8\varepsilon) - O(\varepsilon)) \frac{x}{2} V(z) + O( x \log^{-10} x )$

where

$\displaystyle V(z) = \prod_{p < z} (1-g(p)).$

We can compute an asymptotic for ${V}$ :

Exercise 12 Show that

$\displaystyle V(z) = (2 \Pi_2 + o(1)) \frac{1}{e^\gamma \log z}$

as ${z \rightarrow \infty}$ , where ${\Pi_2 = \prod_{p>2} (1-\frac{1}{(p-1)^2})}$ is the twin prime constant.

From (11) we have ${f(4) = \frac{e^\gamma}{2} \log 3}$ . Sending ${\varepsilon}$ slowly to zero, we conclude that

$\displaystyle A_1 \geq (\log 3-o(1)) \Pi_2 \frac{x}{2\log z}. \ \ \ \ \ (40)$

Now we turn to ${A_{2,p}}$ . Here we use the upper bound linear sieve. Let ${E_d}$ be as before. For any ${d}$ dividing ${P(z)}$ and ${z \leq p < x^{1/3}}$ , we have

$\displaystyle \sum_{x/2 \leq n \leq x-2: n \in E_d} \Lambda(n) 1_{p|n+2} = g(d) g(p) \frac{x}{2} + r_{pd}$

where ${g}$ and ${r_d}$ are as previously. We apply the upper bound linear sieve (12) with level of distribution ${D/p}$ , to conclude that

$\displaystyle A_{2,p} \leq (F( \frac{\log D/p}{\log z} ) + O(\varepsilon)) g(p) \frac{x}{2} V(z)$

$\displaystyle + O( \sum_{d \leq D/p} |r_{pd}| ).$

We sum over ${p}$ . Since ${pd}$ is at most ${D}$ , and each number less than or equal to ${D}$ has at most ${O( \log x )}$ prime factors, we have

$\displaystyle \sum_{z \leq p < x^{1/3}} A_{2,p} \leq \sum_{z \leq p< x^{1/3}} (F( \frac{\log D/p}{\log z} )$

$\displaystyle + O(\varepsilon)) g(p) \frac{x}{2} V(z) + O( \log x \sum_{d \leq D} |r_d| ).$

The error term is ${O( x \log^{-9} x )}$ thanks to (39). Since ${g(p) = \frac{1+o(1)}{p}}$ , we thus have

$\displaystyle \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\sum_{z \leq p< x^{1/3}} \frac{F( \frac{\log D/p}{\log z} )}{p} + O(\varepsilon)) \frac{\Pi_2 x}{e^\gamma \log z}$

for sufficiently large ${x}$ , thanks to Exercise 12. We can compute the sum using Exercise 37 of Notes 1, to obtain

$\displaystyle \sum_{z \leq p< x^{1/3}} \frac{F( \frac{\log D/p}{\log z} )}{p} = \int_1^{8/3} F( 4-8\varepsilon-t ) \frac{dt}{t} + o(1),$

which by (10) and sending ${\varepsilon \rightarrow 0}$ slowly gives

$\displaystyle \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\int_1^{8/3} \frac{1}{4-t} \frac{dt}{t} + o(1)) \frac{2 \Pi_2 x}{\log z}.$

A routine computation shows that

$\displaystyle \int_1^{8/3} \frac{1}{4-t} \frac{dt}{t} = \frac{\log 6}{4}$

and so

$\displaystyle \sum_{z \leq p < x^{1/3}} A_{2,p} \leq (\log 6+o(1)) \Pi_2 \frac{x}{2\log z}. \ \ \ \ \ (41)$

Finally, we consider ${A_3}$ , which is estimated by “switching” the sieve to sift out small divisors of ${n}$ , rather than small divisors of ${n+2}$ . Removing those ${n}$ with ${n \leq \sqrt{x}}$ , as well as those ${n}$ that are powers of primes, and then shifting ${n}$ by ${2}$ , we have

$\displaystyle A_3 \leq \log x \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n + O_\varepsilon( x^{1/2+\varepsilon} )$

where ${a_n}$ is the finitely supported non-negative sequence

$\displaystyle a_n := 1_{x/2+2 \leq n \leq x} \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3} 1_{n=p_1p_2p_3}. \ \ \ \ \ (42)$

Here we are sifting out the residue classes ${E'_p := 2\ (p)}$ , so that ${E'_d := 2\ (d)}$ .

The sequence ${a_n}$ has good distribution up to level ${D}$ :

Proposition 13 One has

$\displaystyle \sum_{n \in E'_d} a_n = g(d) \sum_n a_n + r_d$

where ${g(d)}$ is as before, and

$\displaystyle \sum_{d \leq D: \mu^2(d)=1} |r_d| \ll x \log^{-10} x$

(say), with ${D}$ as before.

Proof: Observe that the quantity ${p_3}$ in (42) is bounded above by ${x^{2/3}}$ if the summand is to be non-zero. We now use a finer-than-dyadic decomposition trick similar to that used in the proof of the Bombieri-Vinogradov theorem in Notes 3 to approximate ${a_n}$ as a combination of Dirichlet convolutions. Namely, we set ${\lambda := 1 + \log^{-20} x}$ , and partition ${[x^{1/3},x^{2/3}]}$ (plus possibly a little portion to the right of ${x^{2/3}}$ ) into ${A=O( \log^{21} x )}$ consecutive intervals ${I_1,\dots,I_A}$ each of the form ${[N, \lambda N]}$ for some ${x^{1/3} \leq N \leq x^{2/3}}$ . We similarly split ${[z,x^{1/3}]}$ (plus possibly a tiny portion of ${[z/2,z]}$ ) into ${B=O( \log^{21} x)}$ intervals ${J_1,\dots, J_B}$ each of the form ${[M, \lambda M]}$ for some ${1 \ll M \leq x^{1/3}}$ . We can thus split ${a_n}$ as

$\displaystyle a_n = \sum_{1 \leq b \leq B} \sum_{1 \leq a \leq a' \leq A} \sum_{z \leq p_1 \leq x^{1/3} < p_2 \leq p_3: p_1 \in J_b, p_2 \in I_a, p_3 \in I_{a'}} 1_{n=p_1p_2p_3}1_{x/2+2 \leq n \leq x} .$

Observe that for each ${a,a'}$ there are only ${O(1)}$ choices of ${b}$ for which the summand can be non-zero. As such, the contribution of the diagonal case ${a=a'}$ can be easily seen to be absorbed into the ${r_d}$ error, as can those cases where the product set ${\{ uvw: u \in J_b, v \in I_a, r \in I_{a'} \}}$ is not contained completely in ${[x/2+2,x]}$ . If we let ${\Omega}$ be the set of triplets ${(b,a,a')}$ obeying these properties, we can thus approximate ${a_n}$ by ${\sum_{(b,a,a') \not \in \Omega} a_n^{(b,a,a')}}$ , where ${a_n^{(b,a,a')}}$ is the Dirichlet convolution

$\displaystyle a_n^{(b,a,a')} := 1_{{\mathcal P} \cap J_b} * 1_{{\mathcal P} \cap I_a} * 1_{{\mathcal P} \cap I_{a'}}.$

From the general Bombieri-Vinogradov theorem (Theorem 16 of Notes 3) and the Siegel-Walfisz theorem (Exercise 64 of Notes 2) we see that

$\displaystyle \sum_{n \in E_d} a_n^{(b,a,a')} = g(d) X^{(b,a,a')} + r_d^{(b,a,a')}$

where

$\displaystyle \sum_{d \leq D: \mu^2(d)=1} |r_d^{(b,a,a')}| \ll x \log^{-10} x$

(say) and

$\displaystyle X^{(b,a,a')} = \sum_n a_n^{(b,a,a')}.$

This gives the claim with ${\sum_n a_n}$ replaced by the quantity ${\sum_{(b,a,a') \not \in \Omega} X^{(b,a,a')}}$ ; but by undoing the previous decomposition we see that this quantity is equal to ${\sum_n a_n}$ up to an error of ${O( x \log^{-11} x)}$ (say), and the claim follows.

Applying the upper bound sieve (12) (with sifting level ${D^{1/(1+\varepsilon)}}$ ), we thus have

$\displaystyle \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n \leq (F(1+\varepsilon)+O(\varepsilon)) V( D^{1/(1+\varepsilon)} ) \sum_n a_n + O( x \log^{-10} x )$

and hence by (10) and Exercise 12

$\displaystyle \sum_{n} 1_{(n-2,P(\sqrt{x}))=1} a_n \leq (4+O(\varepsilon)) \frac{2\Pi_2}{\log x} \sum_n a_n + O( x \log^{-10} x )$

for ${x}$ sufficiently large.

Note that

$\displaystyle \sum_n a_n = \sum_{z \leq p_1 \leq x^{1/3} < p_2 < \frac{x}{p_1 p_2}} \sum_{\max( p_2, \frac{x/2+2}{p_1 p_2} ) < p_3 < \frac{x}{p_1 p_2}} 1.$

From the prime number theorem and Exercise 37 of Notes 1, we thus have

$\displaystyle \sum_n a_n \leq (1+o(1)) \frac{x}{2\log x} \int_{1/8 \leq t_1 \leq 1/3 < t_2 < 1-t_1-t_2} \frac{dt_1 dt_2}{t_1 t_2 (1-t_1-t_2)}.$

(In fact one also has a matching lower bound, but we will not need it here.) We thus conclude that

$\displaystyle A_3 \leq (C+o(1)) \Pi_2 \frac{x}{2\log z}$

where

$\displaystyle C := \int_{1/8 \leq t_1 \leq 1/3 < t_2 < 1-t_1-t_2} \frac{dt_1 dt_2}{t_1 t_2 (1-t_1-t_2)}$

$\displaystyle = 0.363 \dots$

The left-hand side of (38) is then at least

$\displaystyle (2 \log 3 - \log 6 - C) \Pi_2 \frac{x}{4 \log z}.$

One can calculate that ${2 \log 3 - \log 6 - C > 0}$ , and the claim follows.

Exercise 14 Establish Chen’s theorem for the even Goldbach conjecture.

Remark 15 If one is willing to use stronger distributional claims on the primes than is provided by the Bombieri-Vinogradov theorem, then one can use a simpler sieve than Chen’s sieve to establish Chen’s theorem, but then the required distributional theorem will then either be conjectural or more difficult to establish than the Bombieri-Vinogradov theorem. See Chapter 25 of Friedlander-Iwaniec for further discussion.

23 comments

Comments feed for this article

29 January, 2015 at 4:16 pm

Eytan Paldi

It seems that in the RHS of (6)-(7), “ $u$ ” (in the integrands) should be “ $s$ “.

[Corrected, thanks – T.]

29 January, 2015 at 5:32 pm

Eytan Paldi

Also in the line below (7).

[Corrected, thanks – T.]

29 January, 2015 at 7:41 pm

arch1

In the last two integrals preceding Exercise 9, the upper integration limit is misplaced.

[Corrected, thanks -T.]

29 January, 2015 at 7:50 pm

arch1

3 lines up from (37): we see from (23) *that the error*(?) is bounded by

[Corrected, thanks -T.]

1 February, 2015 at 8:46 am

Lior Silberman

In the proof of Chen’s Theorem, in the paragraph “We begin with $A_1$ “, $E_d$ is said to be the class of -2 mod $p$ rather than mod $d$ .

[Corrected, thanks – T.]

31 March, 2015 at 7:58 am

Karaskas

Just after the fragment “and the Siegel-Walfisz theorem (Exercise 64 of Notes 2) we see that [Formula] where” it should probably be log^{-10}x instead of log^{-100x}.

[Corrected, thanks – T.]

31 March, 2015 at 1:20 pm

Karaskas

Right now it’s log^{-10x} instead of log^{-10}x.

[Corrected, thanks – T.]

10 April, 2015 at 3:07 pm

Karaskas

Chen’s theorem section: shouldn’t it be $x/2 \leq n \leq x-2 : p|n+2$ under the big sigma in the definition of $A_{2,p}$ ? The same thing reappears a few lines later.

Just after “Now we turn to $A_{2,p}$ “, what happens if p divides d?

[I was not able to locate this issue – T.]

11 April, 2015 at 12:43 am

Karaskas

In the same fragment it probably should be $z \leq p \leq x^{1/3}$ instead of $z^{1/3} \leq p \leq z$ , shouldn’t it? That would also explain my last question.

[Corrected, thanks – T.]

11 April, 2015 at 9:53 am

Karaskas

The $A_{2,p}$ issue wasn’t there anymore while I was writting about it. I think you had repaired it a few days before and I forgot to refresh the website. Sorry for the confusion.

By the way, thank you for writing posts about sieve theory. Some details and ideas are much easier for me to follow right now.

14 April, 2015 at 2:49 pm

Karaskas

The formula under “Applying the upper bound sieve” fragment: it should be $O( \varepsilon )$ instead of $\varepsilon$ in one place.

[Corrected, thanks – T.]

15 December, 2015 at 5:04 pm

Karaskas

I have a little confusion: before the equation (40) we send {\varepsilon} to 0 slowly but in Theorem 2 we have “if {D} is sufficiently large depending on {\varepsilon, s, c}.” How should I understand this?

15 December, 2015 at 6:23 pm

Karaskas

It seems to me like we are using two different epsilons there. By the way, is it possible to relax conditions in Theorem 2 and let $\varepsilon$ depend on z?

Theorem 2 also reminds me the Jurkat-Richert theorem, although we do not have any restriction on the sieve dimension. How do these two theorems are related to each other?

Sorry for doubling the post and using LaTeX wrongly in the previous one.

15 December, 2015 at 7:13 pm

Terence Tao

For any fixed choice of $\varepsilon$ , the largeness condition on $D$ required for Theorem 2 will apply if $x$ is large enough, thus

$A_1 \geq (f(4-8\varepsilon) - O(\varepsilon)) \Pi_2 \frac{x}{\log z}$

whenever $x$ is sufficiently large depending on $\varepsilon$ (this allows for the o(1) errors to be absorbed into the $O(\varepsilon)$ error). In particular, for any $\delta > 0$ , one has

$A_1 \geq (f(4)-\delta) \Pi_2 \frac{x}{\log z}$

whenever $x$ is sufficiently large depending on $\delta$ (by choosing $\varepsilon$ appropriately depending on $\delta$ ). This implies that

$A_1 \geq (f(4) - o(1)) \Pi_2 \frac{x}{\log z}$

as $x \to \infty$ .

Another way to think about it as follows: if one has a bound of the form

$f(x) = O(\varepsilon) + o(1) \quad (1)$

as $x \to \infty$ for any fixed choice of $\varepsilon$ , then one automatically has the improvement

$f(x) = o(1)$

as $x \to \infty$ , by setting $\varepsilon$ equal to a function of $x$ that goes to zero sufficiently slowly (slow enough so that the $o(1)$ error in (1) still goes to zero even though $\varepsilon$ is now varying).

The Jurkat-Richert theorem has a slightly better error term replacing $O(\varepsilon)$ , and a slightly worse error term for the $|r_d|$ errors, but can easily be used as a substitute for Theorem 2 for the purpose of proving Chen’s theorem.

30 December, 2015 at 6:32 pm

Karaskas

I have one problem with Lemma 11. How do we rule out the numbers of the form $p^2q$ , where $p,q \in \mathcal{P}$ and $p \leq x^{1/3}-1$ and $q \geq x^{1/3}$ ?

I saw an another proof of Chen’s theorem in which the author (Nathanson, I guess) used the sum $\sum_{z< q \leq x^{1/3}} k 1_{q^k \| n}$ instead of $\sum_{p \leq x^{1/3}} 1_{p|n}$ . By this trick we can easily avoid thinking about numbers which are not square-free.

Are these two approaches equivalent in some easy way which I cannot see right now?

I would also like to thank you for your last answer.

[Oops, this case does need to be inserted, but it is easily dealt with (with $z = x^{1/8}$ , it contributes $x^{7/8+o(1)}$ from trivial estimates). -T.]

10 July, 2019 at 2:51 pm

Anonymous

Why is of $z:=x^{\frac{1}{8}}$ in Chen’s Theorem? Is it not the case that the closer $z$ is to $z:=x^{\frac{1}{3}}$ the better? I believe that using the same arguments and with same error term, one could set $z:=x^{\frac{1}{k}}$ , where $k$ is the root of $\log\frac{k-2}{4} - \int_{\frac{1}{k}}^{\frac{1}{3}} \frac{\log(2-3x)}{x(1-x)} dx$ , i.e, $k =7.585...$ .

10 July, 2019 at 5:04 pm

Terence Tao

In this post I am not attempting to optimise the exponents in order to keep the exposition as clean as possible. I believe the text of Friedlander-Iwaniec contains further improvements to the exponent for $z$ , but I do not have the reference handy right now, and there may be further slight improvements in the literature.

UPDATE: Section 25.6 of Friedlander-Iwaniec allows $z$ to be as large as $x^{3/11}$ . It may be that further improvements are still possible beyond this exponent.

25 February, 2020 at 4:54 pm

Anonymous

Hi Prof Tao,
Just a quick question: why do we need eq. (31)? We seem to never use it in the proof. Apologies if I missed a clear usage of it.

26 February, 2020 at 4:43 pm

Terence Tao

(31) is needed to establish some cases of (32), which is then used in the proof of Lemma 8. (One could cut out the middle man and just impose (32), but this formalism is closer to the general beta-sieve formalism one sees for instance in the text of Iwaniec-Kowalski or Friedlander-Iwaniec.)

23 March, 2020 at 7:11 am

Michelangelo Mecozzi

A tiny detail: when we define $D_0$ and $z_0$ should the values be swapped? With the current values we have $s_0 = \log D_0/\log z_0 = 1/\varepsilon$ and I assume we want $s_0 = \varepsilon$ as in the proof.

23 March, 2020 at 7:28 am

Michelangelo Mecozzi

My bad, actually we want that and it wouldn’t make sense to swap! I was working on the Exercise trying to show that the LHS of (34) is bounded by $f(s-2\varepsilon^3)+o(1)$ , this should be be $f(s-2\varepsilon^2)+o(1)$ or I’m mistaken again? Doesn’t make any difference in the calculations…

[Corrected, thanks – T.]

26 March, 2020 at 4:44 pm

Anonymous

Is it possible to get the following clarification: when you say (towards the end of the proof of the linear sieve) “Note that (29) also gives $p_r \geq (D/D_0^2)^{1/(r+2)}$ ” how can we deduce that?

[You are right, this is not the correct way to get a lower bound on $p_r$ at this step. I will have to look at Friedlander-Iwaniec for the correct argument, but unfortunately due to the lockdown I will not be able to do so for a while. -T]

5 April, 2020 at 7:53 am

Terence Tao

I was finally able to retrieve my copy of Friedlander-Iwaniec. Theorem 2 is proven (in a stronger and more general form) as Theorem 11.12 from that paper. However the arguments are quite intricate, in particular the upper bounds on $B_r$ are proven recursively. I now vaguely remember finding an alternate approach (based on using the Selberg counterexample weights) to write the notes here, but as you pointed out the lower bounding of $p_r$ is incorrect. If one does compute what (29), (30) do give for lower bounds, they are more of the form $p_r \geq D^{1/C^r}$ , thus the exponent degrades exponentially in $r$ rather than linearly. This still gives satisfactory control on the $B_r$ when $r = o(\log \frac{1}{\varepsilon})$ but now there is an intermediate region $r \sim \log \frac{1}{\varepsilon}$ which is not currently covered by the above arguments. The issue is probably fixable, for instance by importing some of the recursive arguments from Friedlander-Iwaniec, but currently I will not have the time to address them and for now one should simply refer to Theorem 11.12 of Friedlander-Iwaniec for a proof of Theorem 2 in this post.

	Anonymous on Marton’s conjecture in a…
	Anonymous on Marton’s conjecture in a…
	Anonymous on Marton’s conjecture in a…
	Anonymous on Marton’s conjecture in a…
	Anonymous on Marton’s conjecture in a…
	Anonymous on Marton’s conjecture in a…
	Anonymous on 254A, Notes 2: The central lim…
	uk49smresult on What are the odds?
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 254B, Notes 5: Product theorem…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on Stein’s maximal principl…
	Theorem Proving usin… on A slightly longer Lean 4 proof…
	Theorem Proving usin… on Formalizing the proof of PFR i…
	Anonymous on Analysis I

254A, Supplement 5: The linear sieve and Chen’s theorem (optional)

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

23 comments

Leave a comment Cancel reply

For commenters

254A, Supplement 5: The linear sieve and Chen’s theorem (optional)

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

23 comments

Leave a comment Cancel reply

For commenters