Many problems in non-multiplicative prime number theory can be recast as sieving problems. Consider for instance the problem of counting the number {N(x)} of pairs of twin primes {p,p+2} contained in {[x/2,x]} for some large {x}; note that the claim that {N(x) > 0} for arbitrarily large {x} is equivalent to the twin prime conjecture. One can obtain this count by any of the following variants of the sieve of Eratosthenes:

  1. Let {A} be the set of natural numbers in {[x/2,x-2]}. For each prime {p \leq \sqrt{x}}, let {E_p} be the union of the residue classes {0\ (p)} and {-2\ (p)}. Then {N(x)} is the cardinality of the sifted set {A \backslash \bigcup_{p \leq \sqrt{x}} E_p}.
  2. Let {A} be the set of primes in {[x/2,x-2]}. For each prime {p \leq \sqrt{x}}, let {E_p} be the residue class {-2\ (p)}. Then {N(x)} is the cardinality of the sifted set {A \backslash \bigcup_{p \leq \sqrt{x}} E_p}.
  3. Let {A} be the set of primes in {[x/2+2,x]}. For each prime {p \leq \sqrt{x}}, let {E_p} be the residue class {2\ (p)}. Then {N(x)} is the cardinality of the sifted set {A \backslash \bigcup_{p \leq \sqrt{x}} E_p}.
  4. Let {A} be the set {\{ n(n+2): x/2 \leq n \leq x-2 \}}. For each prime {p \leq \sqrt{x}}, let {E_p} be the residue class {0\ (p)} Then {N(x)} is the cardinality of the sifted set {A \backslash \bigcup_{p \leq \sqrt{x}} E_p}.

Exercise 1 Develop similar sifting formulations of the other three Landau problems.

In view of these sieving interpretations of number-theoretic problems, it becomes natural to try to estimate the size of sifted sets {A \backslash \bigcup_{p | P} E_p} for various finite sets {A} of integers, and subsets {E_p} of integers indexed by primes {p} dividing some squarefree natural number {P} (which, in the above examples, would be the product of all primes up to {\sqrt{x}}). As we see in the above examples, the sets {E_p} in applications are typically the union of one or more residue classes modulo {p}, but we will work at a more abstract level of generality here by treating {E_p} as more or less arbitrary sets of integers, without caring too much about the arithmetic structure of such sets.

It turns out to be conceptually more natural to replace sets by functions, and to consider the more general the task of estimating sifted sums

\displaystyle  \sum_{n \in {\bf Z}} a_n 1_{n \not \in \bigcup_{p | P} E_p} \ \ \ \ \ (1)

for some finitely supported sequence {(a_n)_{n \in {\bf Z}}} of non-negative numbers; the previous combinatorial sifting problem then corresponds to the indicator function case {a_n=1_{n \in A}}. (One could also use other index sets here than the integers {{\bf Z}} if desired; for much of sieve theory the index set and its subsets {E_p} are treated as abstract sets, so the exact arithmetic structure of these sets is not of primary importance.)

Continuing with twin primes as a running example, we thus have the following sample sieving problem:

Problem 2 (Sieving problem for twin primes) Let {x, z \geq 1}, and let {\pi_2(x,z)} denote the number of natural numbers {n \leq x} which avoid the residue classes {0, -2\ (p)} for all primes {p < z}. In other words, we have

\displaystyle  \pi_2(x,z) := \sum_{n \in {\bf Z}} a_n 1_{n \not \in \bigcup_{p | P(z)} E_p}

where {a_n := 1_{n \in [1,x]}}, {P(z) := \prod_{p < z} p} is the product of all the primes strictly less than {z} (we omit {z} itself for minor technical reasons), and {E_p} is the union of the residue classes {0, -2\ (p)}. Obtain upper and lower bounds on {\pi_2(x,z)} which are as strong as possible in the asymptotic regime where {x} goes to infinity and the sifting level {z} grows with {x} (ideally we would like {z} to grow as fast as {\sqrt{x}}).

From the preceding discussion we know that the number of twin prime pairs {p,p+2} in {(x/2,x]} is equal to {\pi_2(x-2,\sqrt{x}) - \pi_2(x/2,\sqrt{x})}, if {x} is not a perfect square; one also easily sees that the number of twin prime pairs in {[1,x]} is at least {\pi_2(x-2,\sqrt{x})}, again if {x} is not a perfect square. Thus we see that a sufficiently good answer to Problem 2 would resolve the twin prime conjecture, particularly if we can get the sifting level {z} to be as large as {\sqrt{x}}.

We return now to the general problem of estimating (1). We may expand

\displaystyle  1_{n \not \in \bigcup_{p | P} E_p} = \prod_{p | P} (1 - 1_{E_p}(n)) \ \ \ \ \ (2)

\displaystyle  = \sum_{k=0}^\infty (-1)^k \sum_{p_1 \dots p_k|P: p_1 < \dots < p_k} 1_{E_{p_1}} \dots 1_{E_{p_k}}(n)

\displaystyle  = \sum_{d|P} \mu(d) 1_{E_d}(n)

where {E_d := \bigcap_{p|d} E_p} (with the convention that {E_1={\bf Z}}). We thus arrive at the Legendre sieve identity

\displaystyle  \sum_{n \in {\bf Z}} a_n 1_{n \not \in \bigcup_{p | P} E_p} = \sum_{d|P} \mu(d) \sum_{n \in E_d} a_n. \ \ \ \ \ (3)

Specialising to the case of an indicator function {a_n=1_{n \in A}}, we recover the inclusion-exclusion formula

\displaystyle  |A \backslash \bigcup_{p|P} E_p| = \sum_{d|P} \mu(d) |A \cap E_d|.

Such exact sieving formulae are already satisfactory for controlling sifted sets or sifted sums when the amount of sieving is relatively small compared to the size of {A}. For instance, let us return to the running example in Problem 2 for some {x,z \geq 1}. Observe that each {E_p} in this example consists of {\omega(p)} residue classes modulo {p}, where {\omega(p)} is defined to equal {1} when {p=2} and {2} when {p} is odd. By the Chinese remainder theorem, this implies that for each {d|P(z)}, {E_d} consists of {\prod_{p|d} \omega(p)} residue classes modulo {d}. Using the basic bound

\displaystyle  \sum_{n \leq x: n = a\ (q)} 1 = \frac{x}{q} + O(1) \ \ \ \ \ (4)

for any {x > 0} and any residue class {a\ (q)}, we conclude that

\displaystyle  \sum_{n \in E_d} a_n = g(d) x + O( \prod_{p|d} \omega(p) ) \ \ \ \ \ (5)

for any {d|P(z)}, where {g} is the multiplicative function

\displaystyle  g(d) := \prod_{p|d: p|P(z)} \frac{\omega(p)}{p}.

Since {\omega(p) \leq 2} and there are at most {\pi(z)} primes dividing {P(z)}, we may crudely bound {\prod_{p|d} \omega(p) \leq 2^{\pi(z)}}, thus

\displaystyle  \sum_{n \in E_d} a_n = g(d) x + O( 2^{\pi(z)} ). \ \ \ \ \ (6)

Also, the number of divisors of {P(z)} is at most {2^{\pi(z)}}. From the Legendre sieve (3), we thus conclude that

\displaystyle  \pi_2(x,z) = (\sum_{d|P(z)} \mu(d) g(d) x) + O( 4^{\pi(z)} ).

We can factorise the main term to obtain

\displaystyle  \pi_2(x,z) = x \prod_{p < z} (1-\frac{\omega(p)}{p}) + O( 4^{\pi(z)} ).

This is compatible with the heuristic

\displaystyle  \pi_2(x,z) \approx x \prod_{p < z} (1-\frac{\omega(p)}{p}) \ \ \ \ \ (7)

coming from the equidistribution of residues principle (Section 3 of Supplement 4), bearing in mind (from the modified Cramér model, see Section 1 of Supplement 4) that we expect this heuristic to become inaccurate when {z} becomes very large. We can simplify the right-hand side of (7) by recalling the twin prime constant

\displaystyle  \Pi_2 := \prod_{p>2} (1 - \frac{1}{(p-1)^2}) = 0.6601618\dots

(see equation (7) from Supplement 4); note that

\displaystyle  \prod_p (1-\frac{1}{p})^{-2} (1-\frac{\omega(p)}{p}) = 2 \Pi_2

so from Mertens’ third theorem (Theorem 42 from Notes 1) one has

\displaystyle  \prod_{p < z} (1-\frac{\omega(p)}{p}) = (2\Pi_2+o(1)) \frac{1}{(e^\gamma \log z)^2} \ \ \ \ \ (8)

as {z \rightarrow \infty}. Bounding {4^{\pi(z)}} crudely by {\exp(o(z))}, we conclude in particular that

\displaystyle  \pi_2(x,z) = (2\Pi_2 +o(1)) \frac{x}{(e^\gamma \log z)^2}

when {x,z \rightarrow \infty} with {z = O(\log x)}. This is somewhat encouraging for the purposes of getting a sufficiently good answer to Problem 2 to resolve the twin prime conjecture, but note that {z} is currently far too small: one needs to get {z} as large as {\sqrt{x}} before one is counting twin primes, and currently {z} can only get as large as {\log x}.

The problem is that the number of terms in the Legendre sieve (3) basically grows exponentially in {z}, and so the error terms in (4) accumulate to an unacceptable extent once {z} is significantly larger than {\log x}. An alternative way to phrase this problem is that the estimate (4) is only expected to be truly useful in the regime {q=o(x)}; on the other hand, the moduli {d} appearing in (3) can be as large as {P}, which grows exponentially in {z} by the prime number theorem.

To resolve this problem, it is thus natural to try to truncate the Legendre sieve, in such a way that one only uses information about the sums {\sum_{n \in E_d} a_n} for a relatively small number of divisors {d} of {P}, such as those {d} which are below a certain threshold {D}. This leads to the following general sieving problem:

Problem 3 (General sieving problem) Let {P} be a squarefree natural number, and let {{\mathcal D}} be a set of divisors of {P}. For each prime {p} dividing {P}, let {E_p} be a set of integers, and define {E_d := \bigcap_{p|d} E_p} for all {d|P} (with the convention that {E_1={\bf Z}}). Suppose that {(a_n)_{n \in {\bf Z}}} is an (unknown) finitely supported sequence of non-negative reals, whose sums

\displaystyle  X_d := \sum_{n \in E_d} a_n \ \ \ \ \ (9)

are known for all {d \in {\mathcal D}}. What are the best upper and lower bounds one can conclude on the quantity (1)?

Here is a simple example of this type of problem (corresponding to the case {P = 6}, {{\mathcal D} = \{1, 2, 3\}}, {X_1 = 100}, {X_2 = 60}, and {X_3 = 10}):

Exercise 4 Let {(a_n)_{n \in {\bf Z}}} be a finitely supported sequence of non-negative reals such that {\sum_{n \in {\bf Z}} a_n = 100}, {\sum_{n \in {\bf Z}: 2|n} a_n = 60}, and {\sum_{n \in {\bf Z}: 3|n} a_n = 10}. Show that

\displaystyle  30 \leq \sum_{n \in {\bf Z}: (n,6)=1} a_n \leq 40

and give counterexamples to show that these bounds cannot be improved in general, even when {a_n} is an indicator function sequence.

Problem 3 is an example of a linear programming problem. By using linear programming duality (as encapsulated by results such as the Hahn-Banach theorem, the separating hyperplane theorem, or the Farkas lemma), we can rephrase the above problem in terms of upper and lower bound sieves:

Theorem 5 (Dual sieve problem) Let {P, {\mathcal D}, E_p, E_d, X_d} be as in Problem 3. We assume that Problem 3 is feasible, in the sense that there exists at least one finitely supported sequence {(a_n)_{n \in {\bf Z}}} of non-negative reals obeying the constraints in that problem. Define an (normalised) upper bound sieve to be a function {\nu^+: {\bf Z} \rightarrow {\bf R}} of the form

\displaystyle  \nu^+ = \sum_{d \in {\mathcal D}} \lambda^+_d 1_{E_d}

for some coefficients {\lambda^+_d \in {\bf R}}, and obeying the pointwise lower bound

\displaystyle  \nu^+(n) \geq 1_{n \not \in\bigcup_{p|P} E_p}(n) \ \ \ \ \ (10)

for all {n \in {\bf Z}} (in particular {\nu^+} is non-negative). Similarly, define a (normalised) lower bound sieve to be a function {\nu^-: {\bf Z} \rightarrow {\bf R}} of the form

\displaystyle  \nu^-(n) = \sum_{d \in {\mathcal D}} \lambda^-_d 1_{E_d}

for some coefficients {\lambda^-_d \in {\bf R}}, and obeying the pointwise upper bound

\displaystyle  \nu^-(n) \leq 1_{n \not \in\bigcup_{p|P} E_p}(n)

for all {n \in {\bf Z}}. Thus for instance {1} and {0} are (trivially) upper bound sieves and lower bound sieves respectively.

  • (i) The supremal value of the quantity (1), subject to the constraints in Problem 3, is equal to the infimal value of the quantity {\sum_{d \in {\mathcal D}} \lambda^+_d X_d}, as {\nu^+ = \sum_{d \in {\mathcal D}} \lambda^+_d 1_{E_d}} ranges over all upper bound sieves.
  • (ii) The infimal value of the quantity (1), subject to the constraints in Problem 3, is equal to the supremal value of the quantity {\sum_{d \in {\mathcal D}} \lambda^-_d X_d}, as {\nu^- = \sum_{d \in {\mathcal D}} \lambda^-_d 1_{E_d}} ranges over all lower bound sieves.

Proof: We prove part (i) only, and leave part (ii) as an exercise. Let {A} be the supremal value of the quantity (1) given the constraints in Problem 3, and let {B} be the infimal value of {\sum_{d \in {\mathcal D}} \lambda^+_d X_d}. We need to show that {A=B}.

We first establish the easy inequality {A \leq B}. If the sequence {a_n} obeys the constraints in Problem 3, and {\nu^+ = \sum_{d \in {\mathcal D}} \lambda^+_d 1_{E_d}} is an upper bound sieve, then

\displaystyle  \sum_n \nu^+(n) a_n = \sum_{d \in {\mathcal D}} \lambda^+_d X_d

and hence (by the non-negativity of {\nu^+} and {a_n})

\displaystyle  \sum_{n \not \in \bigcup_{p|P} E_p} a_n \leq \sum_{d \in {\mathcal D}} \lambda^+_d X_d;

taking suprema in {f} and infima in {\nu^+} we conclude that {A \leq B}.

Now suppose for contradiction that {A<B}, thus {A < C < B} for some real number {C}. We will argue using the hyperplane separation theorem; one can also proceed using one of the other duality results mentioned above. (See this previous blog post for some discussion of the connections between these various forms of linear duality.) Consider the affine functional

\displaystyle  \rho_0: (a_n)_{n \in{\bf Z}} \mapsto C - \sum_{n \not \in \bigcup_{p|P} E_p} a_n.

on the vector space of finitely supported sequences {(a_n)_{n \in {\bf Z}}} of reals. On the one hand, since {C > A}, this functional is positive for every sequence {(a_n)_{n \in{\bf Z}}} obeying the constraints in Problem 3. Next, let {K} be the space of affine functionals {\rho} of the form

\displaystyle  \rho: (a_n)_{n \in {\bf Z}} \mapsto -\sum_{d \in {\mathcal D}} \lambda^+_d ( \sum_{n \in E_d} a_n - X_d ) + \sum_n a_n \nu(n) + X

for some real numbers {\lambda^+_d \in {\bf R}}, some non-negative function {\nu: {\bf Z} \rightarrow {\bf R}^+} which is a finite linear combination of the {1_{E_d}} for {d|P}, and some non-negative {X}. This is a closed convex cone in a finite-dimensional vector space {V}; note also that {\rho_0} lies in {V}. Suppose first that {\rho_0 \in K}, thus we have a representation of the form

\displaystyle C - \sum_{n \not \in \bigcup_{p|P} E_p} a_n = -\sum_{d \in {\mathcal D}} \lambda^+_d ( \sum_{n \in E_d} a_n - X_d ) + \sum_n a_n \nu(n) + X

for any finitely supported sequence {(a_n)_{n \in {\bf Z}}}. Comparing coefficients, we conclude that

\displaystyle  \sum_{d \in {\mathcal D}} \lambda^+_d 1_{E_d}(n) \geq 1_{n \not \in \bigcup_{p|P} E_p}

for any {n} (i.e., {\sum_{d \in {\mathcal D}} \lambda^+_d 1_{E_d}} is an upper bound sieve), and also

\displaystyle  C \geq \sum_{d \in {\mathcal D}} \lambda^+_d X_d,

and thus {C \geq B}, a contradiction. Thus {\rho_0} lies outside of {K}. But then by the hyperplane separation theorem, we can find an affine functional {\iota: V \rightarrow {\bf R}} on {V} that is non-negative on {K} and negative on {\rho_0}. By duality, such an affine functional takes the form {\iota: \rho \mapsto \rho((b_n)_{n \in {\bf Z}}) + c} for some finitely supported sequence {(b_n)_{n \in {\bf Z}}} and {c \in {\bf R}} (indeed, {(b_n)_{n \in {\bf Z}}} can be supported on a finite set consisting of a single representative for each atom of the finite {\sigma}-algebra generated by the {E_p}). Since {\iota} is non-negative on the cone {K}, we see (on testing against multiples of the functionals {(a_n)_{n \in {\bf Z}} \mapsto \sum_{n \in E_d} a_n - X_d} or {(a_n)_{n \in {\bf Z}} \mapsto a_n}) that the {b_n} and {c} are non-negative, and that {\sum_{n \in E_d} b_n - X_d = 0} for all {d \in {\mathcal D}}; thus {(b_n)_{n \in {\bf Z}}} is feasible for Problem 3. Since {\iota} is negative on {\rho_0}, we see that

\displaystyle  \sum_{n \not \in \bigcup_{p|P} E_p} b_n \geq C

and thus {A \geq C}, giving the desired contradiction. \Box

Exercise 6 Prove part (ii) of the above theorem.

Exercise 7 Show that the infima and suprema in the above theorem are actually attained (so one can replace “infimal” and “supremal” by “minimal” and “maximal” if desired).

Exercise 8 What are the optimal upper and lower bound sieves for Exercise 4?

In the case when {{\mathcal D}} consists of all the divisors of {P}, we see that the Legendre sieve {\sum_{d|P} \mu(d) 1_{E_d}} is both the optimal upper bound sieve and the optimal lower bound sieve, regardless of what the quantities {X_d} are. However, in most cases of interest, {{\mathcal D}} will only be some strict subset of the divisors of {P}, and there will be a gap between the optimal upper and lower bounds.

Observe that a sequence {(\lambda^+_d)_{d \in {\mathcal D}}} of real numbers will form an upper bound sieve {\sum_d \lambda^+_d 1_{E_d}} if one has the inequalities

\displaystyle  \lambda^+_1 \geq 1

and

\displaystyle  \sum_{d|n} \lambda^+_d \geq 0

for all {n|P}; we will refer to such sequences as upper bound sieve coefficients. (Conversely, if the sets {E_p} are in “general position” in the sense that every set of the form {\bigcap_{p|n} E_p \backslash \bigcup_{p|P; p\not | n} E_p} for {n|P} is non-empty, we see that every upper bound sieve arises from a sequence of upper bound sieve coefficients.) Similarly, a sequence {(\lambda^-_d)_{d \in {\mathcal D}}} of real numbers will form a lower bound sieve {\sum_d \lambda^-_d 1_{E_d}} if one has the inequalities

\displaystyle  \lambda^-_1 \leq 1

and

\displaystyle  \sum_{d|n} \lambda^-_d \leq 0

for all {n|P} with {n>1}; we will refer to such sequences as lower bound sieve coefficients.

Exercise 9 (Brun pure sieve) Let {P} be a squarefree number, and {k} a non-negative integer. Show that the sequence {(\lambda_d)_{d | P}} defined by

\displaystyle  \lambda_d := 1_{\omega(d) \leq k} \mu(d),

where (in contrast with the rest of this set of notes) {\omega(d)} denotes the number of prime factors of {d}, is a sequence of upper bound sieve coefficients for even {k}, and a sequence of lower bound sieve coefficients for odd {k}. Deduce the Bonferroni inequalities

\displaystyle  \sum_{n \in {\bf Z}} a_n 1_{n \not \in \bigcup_{p | P} E_p} \leq \sum_{d|P: \omega(d) \leq k} \mu(d) X_d \ \ \ \ \ (11)

when {k} is even, and

\displaystyle  \sum_{n \in {\bf Z}} a_n 1_{n \not \in \bigcup_{p | P} E_p} \geq \sum_{d|P: \omega(d) \leq k} \mu(d) X_d \ \ \ \ \ (12)

when {k} is odd, whenever one is in the situation of Problem 3 (and {{\mathcal D}} contains all {d|P} with {\omega(d) \leq k}). The resulting upper and lower bound sieves are sometimes known as Brun pure sieves. The Legendre sieve can be viewed as the limiting case when {k \geq \omega(P)}.

In many applications the sums {X_d} in (9) take the form

\displaystyle  \sum_{n \in E_d} a_n = g(d) X + r_d \ \ \ \ \ (13)

for some quantity {X} independent of {d}, some multiplicative function {g} with {0 \leq g(p) \leq 1}, and some remainder term {r_d} whose effect is expected to be negligible on average if {d} is restricted to be small, e.g. less than a threshold {D}; note for instance that (5) is of this form if {D \leq x^{1-\varepsilon}} for some fixed {\varepsilon>0} (note from the divisor bound, Lemma 23 of Notes 1, that {\prod_{p|d} \omega(p) \ll x^{o(1)}} if {d \ll x^{O(1)}}). We are thus led to the following idealisation of the sieving problem, in which the remainder terms {r_d} are ignored:

Problem 10 (Idealised sieving) Let {z, D \geq 1} (we refer to {z} as the sifting level and {D} as the level of distribution), let {g} be a multiplicative function with {0 \leq g(p) \leq 1}, and let {{\mathcal D} := \{ d|P(z): d \leq D \}}. How small can one make the quantity

\displaystyle  \sum_{d \in {\mathcal D}} \lambda^+_d g(d) \ \ \ \ \ (14)

for a sequence {(\lambda^+_d)_{d \in {\mathcal D}}} of upper bound sieve coefficients, and how large can one make the quantity

\displaystyle  \sum_{d \in {\mathcal D}} \lambda^-_d g(d) \ \ \ \ \ (15)

for a sequence {(\lambda^-_d)_{d \in {\mathcal D}}} of lower bound sieve coefficients?

Thus, for instance, the trivial upper bound sieve {\lambda^+_d := 1_{d=1}} and the trivial lower bound sieve {\lambda^-_d := 0} show that (14) can equal {1} and (15) can equal {0}. Of course, one hopes to do better than these trivial bounds in many situations; usually one can improve the upper bound quite substantially, but improving the lower bound is significantly more difficult, particularly when {z} is large compared with {D}.

If the remainder terms {r_d} in (13) are indeed negligible on average for {d \leq D}, then one expects the upper and lower bounds in Problem 3 to essentially be the optimal bounds in (14) and (15) respectively, multiplied by the normalisation factor {X}. Thus Problem 10 serves as a good model problem for Problem 3, in which all the arithmetic content of the original sieving problem has been abstracted into two parameters {z,D} and a multiplicative function {g}. In many applications, {g(p)} will be approximately {\kappa/p} on the average for some fixed {\kappa>0}, known as the sieve dimension; for instance, in the twin prime sieving problem discussed above, the sieve dimension is {2}. The larger one makes the level of distribution {D} compared to {z}, the more choices one has for the upper and lower bound sieves; it is thus of interest to obtain equidistribution estimates such as (13) for {d} as large as possible. When the sequence {a_d} is of arithmetic origin (for instance, if it is the von Mangoldt function {\Lambda}), then estimates such as the Bombieri-Vinogradov theorem, Theorem 17 from Notes 3, turn out to be particularly useful in this regard; in other contexts, the required equidistribution estimates might come from other sources, such as homogeneous dynamics, or the theory of expander graphs (the latter arises in the recent theory of the affine sieve, discussed in this previous blog post). However, the sieve-theoretic tools developed in this post are not particularly sensitive to how a certain level of distribution is attained, and are generally content to use sieve axioms such as (13) as “black boxes”.

In some applications one needs to modify Problem 10 in various technical ways (e.g. in altering the product {P(z)}, the set {{\mathcal D}}, or the definition of an upper or lower sieve coefficient sequence), but to simplify the exposition we will focus on the above problem without such alterations.

As the exercise below (or the heuristic (7)) suggests, the “natural” size of (14) and (15) is given by the quantity {V(z) := \prod_{p < z} (1 - g(p))} (so that the natural size for Problem 3 is {V(z) X}):

Exercise 11 Let {z,D,g} be as in Problem 10, and set {V(z) := \prod_{p \leq z} (1 - g(p))}.

  • (i) Show that the quantity (14) is always at least {V(z)} when {(\lambda^+_d)_{d \in {\mathcal D}}} is a sequence of upper bound sieve coefficients. Similarly, show that the quantity (15) is always at most {V(z)} when {(\lambda^-_d)_{d \in {\mathcal D}}} is a sequence of lower bound sieve coefficients. (Hint: compute the expected value of {\sum_{d|n} \lambda^\pm_d} when {n} is a random factor of {P(z)} chosen according to a certain probability distribution depending on {g}.)
  • (ii) Show that (14) and (15) can both attain the value of {V(z)} when {D \geq P(z)}. (Hint: translate the Legendre sieve to this setting.)

The problem of finding good sequences of upper and lower bound sieve coefficients in order to solve problems such as Problem 10 is one of the core objectives of sieve theory, and has been intensively studied. This is more of an optimisation problem rather than a genuinely number theoretic problem; however, the optimisation problem is sufficiently complicated that it has not been solved exactly or even asymptotically, except in a few special cases. (It can be reduced to a optimisation problem involving multilinear integrals of certain unknown functions of several variables, but this problem is rather difficult to analyse further; see these lecture notes of Selberg for further discussion.) But while we do not yet have a definitive solution to this problem in general, we do have a number of good general-purpose upper and lower bound sieve coefficients that give fairly good values for (14), (15), often coming within a constant factor of the idealised value {V(z)}, and which work well for sifting levels {z} as large as a small power of the level of distribution {D}. Unfortunately, we also know of an important limitation to the sieve, known as the parity problem, that prevents one from taking {z} as large as {D^{1/2}} while still obtaining non-trivial lower bounds; as a consequence, sieve theory is not able, on its own, to sift out primes for such purposes as establishing the twin prime conjecture. However, it is still possible to use these sieves, in conjunction with additional tools, to produce various types of primes or prime patterns in some cases; examples of this include the theorem of Ben Green and myself in which an upper bound sieve is used to demonstrate the existence of primes in arbitrarily long arithmetic progressions, or the more recent theorem of Zhang in which an upper bound sieve (among other things) was used to demonstrate the existence of infinitely many pairs of primes whose difference was bounded. In such arguments, the upper bound sieve was used not so much to count the primes or prime patterns directly, but to serve instead as a sort of “container” to efficiently envelop such prime patterns; when used in such a manner, the upper bound sieves are sometimes known as enveloping sieves. If the original sequence was supported on primes, then the enveloping sieve can be viewed as a “smoothed out indicator function” that is concentrated on almost primes, which in this context refers to numbers with no small prime factors.

In a somewhat different direction, it can be possible in some cases to break the parity barrier by assuming additional equidistribution axioms on the sequence {a_n} than just (13), in particular controlling certain bilinear sums involving {a_{nm}} rather than just linear sums of the {a_n}. This approach was in particular pursued by Friedlander and Iwaniec, leading to their theorem that there are infinitely many primes of the form {n^2+m^4}.

The study of sieves is an immense topic; see for instance the recent 527-page text by Friedlander and Iwaniec. We will limit attention to two sieves which give good general-purpose results, if not necessarily the most optimal ones:

  • (i) The beta sieve (or Rosser-Iwaniec sieve), which is a modification of the classical combinatorial sieve of Brun. (A collection of sieve coefficients {\lambda_d^{\pm}} is called combinatorial if its coefficients lie in {\{-1,0,+1\}}.) The beta sieve is a family of upper and lower bound combinatorial sieves, and are particularly useful for efficiently sieving out all primes up to a parameter {z = x^{1/u}} from a set of integers of size {x}, in the regime where {u} is moderately large, leading to what is sometimes known as the fundamental lemma of sieve theory.
  • (ii) The Selberg upper bound sieve, which is a general-purpose sieve that can serve both as an upper bound sieve for classical sieving problems, as well as an enveloping sieve for sets such as the primes. (One can also convert the Selberg upper bound sieve into a lower bound sieve in a number of ways, but we will only touch upon this briefly.) A key advantage of the Selberg sieve is that, due to the “quadratic” nature of the sieve, the difficult optimisation problem in Problem 10 is replaced with a much more tractable quadratic optimisation problem, which can often be solved for exactly.

Remark 12 It is possible to compose two sieves together, for instance by using the observation that the product of two upper bound sieves is again an upper bound sieve, or that the product of an upper bound sieve and a lower bound sieve is a lower bound sieve. Such a composition of sieves is useful in some applications, for instance if one wants to apply the fundamental lemma as a “preliminary sieve” to sieve out small primes, but then use a more precise sieve like the Selberg sieve to sieve out medium primes. We will see an example of this in later notes, when we discuss the linear beta-sieve.

We will also briefly present the (arithmetic) large sieve, which gives a rather different approach to Problem 3 in the case that each {E_p} consists of some number (typically a large number) of residue classes modulo {p}, and is powered by the (analytic) large sieve inequality of the preceding section. As an application of these methods, we will utilise the Selberg upper bound sieve as an enveloping sieve to establish Zhang’s theorem on bounded gaps between primes. Finally, we give an informal discussion of the parity barrier which gives some heuristic limitations on what sieve theory is able to accomplish with regards to counting prime patters such as twin primes.

These notes are only an introduction to the vast topic of sieve theory; more detailed discussion can be found in the Friedlander-Iwaniec text, in these lecture notes of Selberg, and in many further texts.

— 1. Combinatorial sieving and the fundamental lemma of sieve theory —

We begin with a discusion of combinatorial upper and lower bound sieves, in which the sieve coefficients {\lambda^\pm_d} take values in {\{-1,0,+1\}}. These sieves, first introduced by Brun, can be viewed as truncations of the Legendre sieve (which corresponds to the choice of coefficients {\lambda^\pm_d = \mu(d)} for all {d|P(z)}). The Legendre sieve, and more generally the Brun pure sieves in Exercise 9, give basic examples of combinatorial sieves.

We loosely follow the discussion of Friedlander and Iwaniec. The combinatorial sieves may be motivated by observing the Buchstab identity

\displaystyle  1_{n \not \in \bigcup_{p < z} E_p} = 1 - \sum_{p_1 < z} 1_{E_{p_1}}(n) 1_{n \not \in \bigcup_{p < p_1} E_p} \ \ \ \ \ (16)

for any {n \in {\bf Z}} and any collection of sets {E_p \subset {\bf Z}} for {p < z}. This identity reflects the basic fact that if {n} does lie in {\bigcup_{p < z} E_p}, then there is a unique prime {p_1 < z} such that {n} lies in {E_{p_1}}, but does not lie in {E_p} for any {p < p_1}.

There is an analogous identity for the function {g}:

Exercise 13 For any arithmetic function {g}, show that

\displaystyle  \prod_{p<z} (1-g(p)) = 1 - \sum_{p_1 < z} g(p_1) \prod_{p<p_1} (1-g(p)).

How is this identity related to (16)?

The Buchstab identity (16) allows one to convert upper bound sieves into lower bound sieves and vice versa. Indeed, if {\sum_{d|P(p_1)} \lambda^{-,p_1}_d 1_{E_d}} is a family of lower bound sieves for each {p_1 < z}, then (16) tells us that

\displaystyle  1 - \sum_{p_1 < z} \sum_{d|P(p_1)} \lambda^{-,p_1}_d 1_{E_{p_1 d}}

is an upper bound sieve; similarly, if {\sum_{d|P(p_1)} \lambda^{+,p_1}_d 1_{E_d}} is a family of upper bound sieves for each {p_1 < z}, then

\displaystyle  1 - \sum_{p_1 < z} \sum_{d|P(p_1)} \lambda^{+,p_1}_d 1_{E_{p_1 d}}

is a lower bound sieve. In terms of sieve coefficients, we see that if {(\lambda^{-,p_1}_d)_{d | P(p_1)}} are lower bound sieve coefficients for each {p_1 < z}, then

\displaystyle  \left( 1_{d=1} - 1_{d > 1} \lambda^{-,p^*(d)}_{d/p^*(d)} \right)_{d|P(z)}

is a sequence of upper bound sieve coefficients (where {p^*(d)} denotes the largest prime factor of {d}). Similarly, if {(\lambda^{+,p_1}_d)_{d | P(p_1)}} are upper bound sieve coefficients for each {p_1 < z}, then

\displaystyle  \left( 1_{d=1} - 1_{d > 1} \lambda^{+,p^*(d)}_{d/p^*(d)} \right)_{d|P(z)}

is a sequence of lower bound sieve coefficients.

One can iterate this procedure to produce an alternating sequence of upper and lower bound sieves. If one iterates out the Buchstab formula completely, one ends up back at the Legendre sieve. However, one can truncate this iteration by using the trivial lower bound sieve of {0} for some of the primes {p_1}. For instance, suppose one seeks an upper bound for {1_{n \not \in \bigcup_{p < z} E_p}}. Applying (16), we only save some of the summands, say those {p_1} which obey some predicate {A_1(p_1)} to be chosen later. For the remaining summands, we use the trivial lower bound sieve of {0}, giving

\displaystyle  1_{n \not \in \bigcup_{p < z} E_p} \leq 1 - \sum_{p_1 < z: A_1(p_1)} 1_{E_{p_1}}(n) 1_{n \not \in \bigcup_{p < p_1} E_p}.

For the surviving summands, we apply (16) again. With the sign change, the trivial lower bound sieve is not applicable, so we do not discard any further summands and arrive at

\displaystyle  1_{n \not \in \bigcup_{p < z} E_p} \leq 1 - \sum_{p_1 < z: A_1(p_1)} 1_{E_{p_1}}(n)

\displaystyle  + \sum_{p_2 < p_1 < z: A_1(p_1)} 1_{E_{p_1 p_2}}(n) 1_{n \not \in \bigcup_{p < p_2} E_p}.

For the summands in the second sum on the right, we apply (16) once again. We once again have a favourable sign and can use the trivial lower bound sieve of {0} to discard some of the resulting summands, keeping only those triples {p_1,p_2,p_3} that obey some predicate {A_3(p_1,p_2,p_3)}, giving

\displaystyle  1_{n \not \in \bigcup_{p < z} E_p} \leq 1 - \sum_{p_1 < z: A_1(p_1)} 1_{E_{p_1}}(n)

\displaystyle  + \sum_{p_2 < p_1 < z: A_1(p_1)} 1_{E_{p_1 p_2}}(n)

\displaystyle  - \sum_{p_3 < p_2 < p_1 < z: A_1(p_1) \wedge A_3(p_1,p_2,p_3)} 1_{E_{p_1 p_2 p_3}}(n) 1_{n \not \in \bigcup_{p < p_3} E_p}.

Since one cannot have an infinite descent of primes, this process will eventually terminate to produce an upper bound sieve. A similar argument (but with the signs reversed) gives a lower bound sieve. We formalise this discussion as follows:

Proposition 14 (General combinatorial sieve) Let {z > 0}. For each natural number {r}, let {A_r(p_1,\dots,p_r)} be a predicate pertaining to a sequence {z > p_1 > \dots > p_r} of {r} decreasing primes, thus {A_r(p_1,\dots,p_r)} is either true or false for a given choice of such primes. Let {{\mathcal D}_+} (resp. {{\mathcal D}_-}) denote the set of {d|P(z)} which, when factored as {d = p_1 \dots p_m} for {z > p_1 > \dots > p_m}, are such that {A_r(p_1,\dots,p_r)} holds for all odd (resp. even) {1 \leq r \leq m}. Thus

\displaystyle  {\mathcal D}_+ := \{ p_1 < z: A_1(p_1) \} \cup \{ p_1 p_2: p_2 < p_1 < z; A_1(p_1)\}

\displaystyle  \cup \{p_1 p_2 p_3: p_3 < p_2 < p_1 < z; A_1(p_1) \wedge A_3(p_1,p_2,p_3) \}

\displaystyle  \cup \{p_1 p_2 p_3 p_4: p_4 < p_3 < p_2 < p_1 < z; A_1(p_1) \wedge A_3(p_1,p_2,p_3) \}

\displaystyle  \cup \dots

and

\displaystyle  {\mathcal D}_- := \{ p_1 < z \} \cup \{ p_1 p_2: p_2 < p_1 < z; A_2(p_1,p_2)\}

\displaystyle  \cup \{p_1 p_2 p_3: p_3 < p_2 < p_1 < z; A_2(p_1,p_2) \}

\displaystyle  \cup \{p_1 p_2 p_3 p_4: p_4 < p_3 < p_2 < p_1 < z; A_2(p_1,p_2) \wedge A_4(p_1,p_2,p_3,p_4) \}

\displaystyle  \cup \dots.

  • (i) For any sets {E_d \subset {\bf Z}}, {\sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}} is an upper bound sieve, and {\sum_{d \in {\mathcal D}_-} \mu(d) 1_{E_d}} is a lower bound sieve. In fact, we have the more precise identities

    \displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) 1_{E_d}(n) = 1_{n \not \in \bigcup_{p<z} E_p} + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} 1_{n \in E_d} 1_{n \not \in \bigcup_{p<p_*(d)} E_p}

    and

    \displaystyle  \sum_{d \in {\mathcal D}_-} \mu(d) 1_{E_d}(n) = 1_{n \not \in \bigcup_{p<z} E_p} - \sum_{r \hbox{ even}} \sum_{d \in {\mathcal E}_r} 1_{n \in E_d} 1_{n \not \in \bigcup_{p<p_*(d)} E_p}

    where {p_*(d)} denotes the least prime factor of {d} and, for each natural number {r}, {{\mathcal E}_r} is the collection of products {p_1 \dots p_r} with {z > p_1 > \dots > p_r} such that {A_{r'}(p_1,\dots,p_{r'})} holds for all {r' < r} with the same parity as {r}, but fails for {r'=r}. Thus for instance

    \displaystyle  {\mathcal E}_3 = \{ p_1 p_2 p_3: z > p_1 > p_2 > p_3; A_1(p_1) \wedge \neg A_3(p_1,p_2,p_3) \}

    and

    \displaystyle  {\mathcal E}_4 = \{ p_1 p_2 p_3 p_4: z > p_1 > p_2 > p_3 > p_4;

    \displaystyle  A_2(p_1,p_2) \wedge \neg A_4(p_1,p_2,p_3,p_4) \}.

  • (ii) The sequences {(\mu(d) 1_{{\mathcal D}_+}(d))_{d|P(z)}} and {(\mu(d) 1_{{\mathcal D}_-}(d))_{d|P(z)}} are upper and lower sieve coefficient sequences respectively.
  • (iii) For any multiplicative function {g}, one has the identities

    \displaystyle  \sum_{d \in {\mathcal D}_+} \mu(d) g(d) = V(z) + \sum_{r \hbox{ odd}} \sum_{d \in {\mathcal E}_r} g(d) V( p_*(d) ) \ \ \ \ \ (17)

    and

    \displaystyle  \sum_{d \in {\mathcal D}_-} \mu(d) g(d) = V(z) - \sum_{r \hbox{ even}} \sum_{d \in {\mathcal E}_r} g(d) V( p_*(d) ) \ \ \ \ \ (18)

    where {V(w) := \prod_{p<w} (1-g(p))}.

Note that the Legendre sieve is the special case of the combinatorial sieve when all the predicates {A_r(p_1,\dots,p_r)} are true for all choices of inputs {p_1,\dots,p_r} (i.e. one does not ever utilise the trivial lower bound sieve); more generally, the Brun pure sieve with parameter {k}corresponds to the case when {A_r(p_1,\dots,p_r)} is always true for {r \leq k} and always false for {r>k}. The predicates {A_r} for odd {r} are only used for the upper bound sieve and not the lower bound sieve; conversely, the predicates {A_r} for even {r} are only used for the lower bound sieve and not the upper bound sieve.

Exercise 15 Prove Proposition 14.

Exercise 16 Interpret the sieves in Exercise 8 as combinatorial sieves. What are the predicates {A_1, A_2} in this case?

We now use the above proposition to attack Problem 10 for a given choice of parameters {z} and {D}; we will focus on the regime when

\displaystyle  z = D^{1/s} \ \ \ \ \ (19)

for some {s,z \geq 1}, thus we are sifting out primes up to a small power of the sieve level {D}. To do this, one needs to select the predicates {A_r(p_1,\dots,p_r)} in such a fashion that the sets {{\mathcal D}_\pm} only consist of divisors {d|P(z)} that are less than or equal to {D}. There are a number of ways to achieve this, but experience has shown that a good general-purpose choice in this regard is to define {A_r(p_1,\dots,p_r)} to be the predicate

\displaystyle  p_1 \dots p_{r-1} p_r^{\beta+1} \leq D \ \ \ \ \ (20)

for some parameter

\displaystyle  1 \leq \beta \leq s \ \ \ \ \ (21)

to be chosen later. The combinatorial sieve with this choice of predicate is known as the beta sieve or Rosser-Iwaniec sieve at sieve level {D}. Observe that if {p_1 \dots p_r} lies in {{\mathcal D}_\pm} for some {r \geq 2} and {z > p_1 > \dots > p_r}, then at least one of {p_1 \dots p_{r-1} p_r^{\beta+1} \leq D} or {p_1 \dots p_{r-2} p_{r-1}^{\beta+1} \leq D} will hold, and hence {p_1 \dots p_r \leq D} since {\beta \geq 1}. If {r \leq 1}, we have the same conclusion thanks to (19) since {s \geq 1}. Thus under the hypotheses (21), (19), the beta sieves are indeed upper and lower bound sieves at level {D}.

Now we estimate the quantities (14), (15) for the beta sieve coefficients {\lambda^\pm_d := \mu(d) 1_{{\mathcal D}_\pm}(d)} for some multiplicative function {g} with {0 \leq g(p) \leq 1} for all {p}. In order to actually compute asymptotics, we assume the Mertens-like axiom

\displaystyle  V(w) \ll_\kappa \left(\frac{\log z}{\log w}\right)^\kappa V(z) \ \ \ \ \ (22)

whenever {2 \leq w \leq z}, where {V(z) := \prod_{p < z} (1 - g(p))} and some fixed {\kappa>0}, which we refer to as the sieve dimension. From Mertens’ theorems, we see that these axioms will in particular be obeyed if

\displaystyle  g(p) \leq \frac{\kappa}{p} + O_\kappa(\frac{1}{p^2})

and

\displaystyle  g(p) \leq 1-c

for all primes {p} and some fixed {c>0} depending only on {\kappa}.

Let {g} be as above. From (17), (18), (22) we see that the quantities (14), (15) are both of the form

\displaystyle  V(z) \left( 1 + O_\kappa \left( \sum_r \sum_{d \in {\mathcal E}_r} g(d) \left( \frac{\log z}{\log p_*(d)} \right)^\kappa \right)\right). \ \ \ \ \ (23)

We now estimate the error term; our estimates will be somewhat crude in order to simplify the calculations, but this will already suffice for many applications. First observe that if {z > p_1 > \dots > p_r} then

\displaystyle  p_1 \dots p_{r-1} p_r^{\beta+1} \leq z^{r+\beta} = D^{(r+\beta)/s}.

We conclude that the condition {A_r(p_1,\dots,p_r)} is automatically satisfied when {r+\beta \leq s}, and so the {r} summand in (23) can be restricted to the range {r > s - \beta}.

Next, note that if {d = p_1 \dots p_r \in {\mathcal E}_r} for some {z > p_1 > \dots > p_r}, then from (20) we have

\displaystyle  p_1 \dots p_{r'-1} p_{r'}^{\beta+1} \leq D

for all {1 \leq r' < r} with the same parity as {r}, which implies (somewhat crudely) that

\displaystyle  p_1 \dots p_{r'-1} p_{r'}^{\beta} \leq D

for all {1 \leq r' < r}, regardless of parity (note that the {r'=1} case follows since {p_1^{\beta} \leq z^{\beta} = D^{\beta / s} \leq D}). We rearrange this inequality as

\displaystyle  \frac{D}{p_1 \dots p_{r'}} \geq \left( \frac{D}{p_1 \dots p_{r'-1}}\right)^{\frac{\beta-1}{\beta}}

which iterates to give

\displaystyle  \frac{D}{p_1 \dots p_{r-1}} \geq D^{(\frac{\beta-1}{\beta})^{r-1}}.

On the other hand, from the definition of {{\mathcal E}_r} we have

\displaystyle  p_1 \dots p_{r-1} p_r^{\beta+1} > D

which leads to a lower bound on {p_r = p_*(d)}:

\displaystyle  p_r > D^{(\frac{\beta-1}{\beta})^{r-1} / (\beta+1)}

\displaystyle  > D^{(\frac{\beta-1}{\beta})^{r} / \beta}

\displaystyle  > z^{(\frac{\beta-1}{\beta})^{r}}

so in particular

\displaystyle  \frac{\log z}{\log p_*(d)} < \left(\frac{\beta}{\beta-1}\right)^r.

We can thus estimate (23) somewhat crudely as

\displaystyle  V(z) \left( 1 + O_\kappa \left( \sum_{r \geq s-\beta} \sum_{z > p_1 > \dots > p_r > z^{(\frac{\beta-1}{\beta})^{r\kappa}}} g(p_1) \dots g(p_r) \left(\frac{\beta}{\beta-1}\right)^r \right)\right).

We can estimate

\displaystyle  \sum_{z > p_1 > \dots > p_r > z^{(\frac{\beta-1}{\beta})^{r}}} g(p_1) \dots g(p_r) \leq \frac{1}{r!} (\sum_{z > p \geq z^{(\frac{\beta-1}{\beta})^{r}}} g(p))^r

\displaystyle  \leq \frac{1}{r!} (\log \prod_{z > p \geq z^{(\frac{\beta-1}{\beta})^{r}}} (1-g(p))^{-1})^r

\displaystyle  = \frac{1}{r!} ( \log( V(z^{(\frac{\beta-1}{\beta})^{r}}) / V(z) ) )^r

\displaystyle  \ll \frac{1}{r!} ( \kappa r \log \frac{\beta}{\beta-1} + O(1) )^r

\displaystyle  \ll ( \kappa e \log \frac{\beta}{\beta-1} + O(1/r) )^r

thanks to (22) and the crude bound {\frac{r^r}{r!} \leq e^r} coming from the Taylor expansion of {e^r}. If we choose {\beta} large enough so that

\displaystyle  \kappa e (\log \frac{\beta}{\beta-1}) (\frac{\beta}{\beta-1})^\kappa \leq \frac{1}{e} \ \ \ \ \ (24)

(note the left-hand side decays like {\kappa/\beta} as {\beta \rightarrow \infty}), we thus can estimate (23) by

\displaystyle  V(z) \left( 1 + O_\kappa \left( \sum_{r \geq s-\beta} (\frac{1}{e} + O(1/r))^r \right)\right)

which sums to

\displaystyle  V(z) ( 1 + O_{\kappa,\beta}( e^{-s} ) ).

We conclude

Lemma 17 (Fundamental lemma of sieve theory) Let {\kappa > 0}. If {z = D^{1/s}} for some {D,s \geq 1}, then there exist combinatorial upper and lower sieve coefficients {(\lambda^\pm_d)_{d \in {\mathcal D}}} supported in {{\mathcal D} := \{ d | P(z): d \leq D \}} such that

\displaystyle  \sum_{d \in {\mathcal D}} \lambda^\pm_d g(d) = V(z) ( 1 + O_{\kappa}( e^{-s} ) ) \ \ \ \ \ (25)

for any multiplicative function {g} with {0 \leq g(p) < 1} for all primes {p}, obeying the bounds (22).

Informally, the fundamental lemma shows that one can get exponentially close to the Legendre sieve limit of {V(z)} as long as the sieve level {D} is a large power of the sifting range {z}.

Proof: Without loss of generality we may assume that {s} is sufficiently large depending on {\kappa}, since for small {s} we may simply reduce {z} (for the purposes of the upper bound sieve) or use the trivial lower bound sieve {0}. But then we may find {\beta = \beta(\kappa)} obeying (24) and less than {s}, and the claim follows. \Box

Exercise 18 Improve the {O_\kappa( e^{-s} )} error in (25) to {O_{\kappa,\varepsilon}( e^{-(1-\varepsilon) s \log s} )} for any {\varepsilon>0}. (It is possible to show that this error term is essentially best possible, at least in the linear dimension case {\kappa = 1}, by using asymptotics for rough numbers (Exercise 28 of Supplement 4), but we will not do so here.)

The fundamental lemma is often used as a preliminary sifting device to remove small primes, in preparation for a more sophisticated sieve to deal with medium primes. But one can of course use it directly. Indeed, thanks to Theorem 5, we now have an adequate answer to Problem 3 in the regime where the sieve dimension {\kappa} is fixed and the sifting level is small compared to the level of distribution:

Corollary 19 (Fundamental lemma, applied directly) Let {z = D^{1/s}} for some {\kappa, s, D \geq 1}, and suppose that {g} is a multiplicative function obeying the bounds (22) for all {2 \leq w \leq z}, where {V(w) := \prod_{p<w} (1-g(p))}. For each {p \leq z}, let {E_p} be a set of integers, and let {(a_n)_{n \in {\bf Z}}} is a finitely supported sequence of non-negative reals such that

\displaystyle  \sum_{n \in E_d} a_n = X g(d) + r_d

for all square-free {d \leq D}, where {X > 0} and {r_d \in {\bf R}}, and {E_d := \bigcap_{p|d} E_p}. Then

\displaystyle  \sum_{n \not \in \bigcup_{p \leq z} E_p} a_n = (1 + O_\kappa(e^{-s})) X V(z) + O( \sum_{d\leq D: \mu^2(d)=1} |r_d| ).

We now specialise to the twin prime problem in Problem 2. Here, {X = x}, and {g(p)=\omega(p)/p} where {\omega(p)=2} for odd {p}, with {\omega(2)=1}. In particular we may take {\kappa = 2}. We set {z := x^{1/u}} for some fixed {u > 1}, and {D = x^{1-o(1)}} for some quantity {o(1)} going sufficiently slowly to infinity, so {s := u + o(1)}. From (8) we have

\displaystyle  V(z) = (1+o(1)) \frac{u^2}{e^{2\gamma}} \frac{2\Pi_2}{\log^2 x}

and from (5) and the divisor bound we have

\displaystyle  r_d \leq \tau(d) \ll x^{o(1)}

for any {d \leq D}, as {x \rightarrow \infty}. From the fundamental lemma, we conclude that

\displaystyle  \pi_2(x, x^{1/u}) = \frac{u^2}{e^{2\gamma}} (1 + O( e^{-u} ) + o(1)) \frac{2\Pi_2 x}{\log^2 x} + O( x^{o(1)} D ).

If we let the {o(1)} term in {D = x^{1-o(1)}} decay sufficiently slowly, then the final error term in the above expression can be absorbed into the second error term, and so

\displaystyle  \pi_2(x, x^{1/u}) = \frac{u^2}{e^{2\gamma}} (1 + O( e^{-u} ) + o(1)) \frac{2\Pi_2 x}{\log^2 x}. \ \ \ \ \ (26)

Setting {u=2}, we conclude in particular that

\displaystyle  \pi_2(x, x^{1/2}) \ll \frac{x}{\log^2 x}

which implies from the sieve of Eratosthenes that there are {O( \frac{x}{\log^2 x} )} pairs of twin primes in {[x/2,x]}. This implies a famous theorem of Brun:

Exercise 20 (Brun’s theorem) Show that the sum {\sum_{p: p+2 \in {\mathcal P}} \frac{1}{p}} is convergent.

Alternatively, if one applies (26) with {u} a sufficiently large natural number, we see that

\displaystyle  \pi_2(x,x^{1/u}) \gg \frac{x}{\log^2 x}

for all sufficiently large {x}. This implies that there are {\gg \frac{x}{\log^2 x}} numbers {n} in {[1,x]} with the property that {n, n+2} are both not divisible by any prime less than or equal to {x^{1/u}}. In particular (if {n \leq x-2}), {n} and {n+2} both have at most {u} prime factors. We conclude a version of the twin prime conjecture for almost primes:

Corollary 21 There is an absolute constant {u} such that there are infinitely many natural numbers {n} such that {n, n+2} are each the product of at most {u} primes.

Exercise 22 By going through the beta sieve estimates carefully, show that one can take {u=20} in the above corollary. (One can do substantially better if one uses more sophisticated sieve estimates, and of course Chen’s theorem tells us that {u} can be taken as low as {2}; we will review the proof of Chen’s theorem in subsequent notes.)

In the above corollary, both elements of the pair {n,n+2} are permitted to be almost prime rather than prime. It is possible to use the fundamental lemma to improve upon this by forcing {n} (for instance) to be prime. To do this, we run the twin prime sieve differently; rather than start with all integers and sieve out two residue classes modulo each prime {p}, we instead start with the primes and sieve out one residue class modulo {p}. More precisely, we consider the quantity

\displaystyle  \sum_{n \not \in \bigcup_{p \leq z} E_p} f(n)

where {E_p} is the residue class {-2\ (p)} and

\displaystyle  f(n) := \Lambda(n) 1_{[x/2,x-2]}(n)

(say). Observe that the sum {\sum_{n \in E_d} f(n)} is only as large as {O( \log^{O(1)} x)} at worst for even {d}. For odd {d}, we expect from the prime number theorem in arithmetic progressions that

\displaystyle  \sum_{n \in E_d} f(n) = \frac{1}{\phi(d)} \frac{x}{2} + r_d

for some small {r_d}. Thus it is natural to apply Corollary 19 with {X := \frac{x}{2}} and {g(d) := 1_{(d,2)=1} \frac{1}{\phi(d)}}, so that {\kappa = 1}. From the Bombieri-Vinogradov theorem (see Theorem 17 of Notes 3), we see that

\displaystyle  \sum_{d \leq x^{1/2-\varepsilon}} |r_d| \ll_{A,\varepsilon} x \log^{-A} x

for any fixed {\varepsilon, A> 0}. We thus take {D := x^{1/2-\varepsilon}} for some small fixed {\varepsilon>0} and {z := x^{1/u}}, and apply Corollary 19 to conclude that

\displaystyle  \sum_{n \not \in \bigcup_{p \leq z} E_p} f(n) = (1 + O( e^{-u/(1/2-\varepsilon)})) \frac{x}{2} \prod_{2 < p < z}( 1 - \frac{1}{\phi(p)})

\displaystyle  + O_{A,\varepsilon}( x \log^{-A} x ).

For {u} a sufficiently large fixed constant, we conclude from Mertens’ theorem that

\displaystyle  \sum_{n \not \in \bigcup_{p \leq z} E_p} f(n) \gg \frac{x}{\log x}

for {x} large enough. The contribution for those {n} for which {n} is a power of a prime, rather than a prime, is easily seen to be negligible, and we conclude that there are {\gg \frac{x}{\log^2 x}} primes {p} less than {x-2} such that {p+2} has no factors less than {x^{1/u}}, and thus is the product of at most {u} primes. We thus have the following improvement of Corollary 21:

Proposition 23 There is an absolute constant {u} such that there are infinitely many primes {p} such that {p+2} is the product of at most {u} primes.

More generally, the fundamental lemma (combined with equidistribution results such as the Bombieri-Vinogradov theorem) lets one prove analogues of various unsolved conjectures about primes, in which one either (a) is content with upper bounds of the right order of magnitude, or (b) is willing to replace primes with a suitable notion of almost prime. We give some examples in the exercises below.

Exercise 24 (Approximations to the prime {k}-tuples conjecture) Let {(h_1,\dots,h_k)} be an admissible {k}-tuple of integers for some {k \geq 1}, thus the {h_i} are all distinct, and for any {p}, the number {\omega(p)} of residue classes occupied by {h_1,\dots,h_k\ (p)} is never equal to {p}. Let {{\mathfrak S}} denote the singular series

\displaystyle  {\mathfrak S} := \prod_p (1-\frac{1}{p})^{-k} (1-\frac{\omega(p)}{p}). \ \ \ \ \ (27)

  • (i) Show that {\sum_{n \leq x: n+h_1,\dots,n+h_k \in {\mathcal P}} 1 \ll_k {\mathfrak S} \frac{x}{\log^k x}} for all sufficiently large {x}. (This should be compared with the Hardy-Littlewood prime tuples conjecture {\sum_{n \leq x: n+h_1,\dots,n+h_k \in {\mathcal P}} 1 = (1+o(1)) {\mathfrak S} \frac{x}{\log^k x}} as {x \rightarrow \infty}, see Exercise 13 of Supplement 4.) Thus the fundamental lemma tells us that the prime tuples conjecture is only off from the truth by a multiplicative factor of {O_k(1)}, at least as far as upper bounds are concerned.
  • (ii) Show that if {u} is sufficiently large depending on {k}, then there are infinitely many {n} such that {n+h_1,\dots,n+h_k} are each the product of at most {u} prime factors.
  • (iii) Strengthen (ii) to include the requirement that {n+h_1} (say) is prime.

Exercise 25 (Approximations to the even Goldbach conjecture) If {N} is an even natural number, define the singular series

\displaystyle  {\mathfrak S}(N) := 2 \Pi_2 \prod_{p>2: p|N} \frac{p-1}{p-2}.

  • (i) Show that the number of ways to write {N} as the sum of two primes {N=p_1+p_2} is {O( {\mathfrak S}(N) \frac{N}{\log^2 N} )} if {N} is sufficiently large. (This should be compared with the prediction of {(1+o(1)) {\mathfrak S}(N) \frac{N}{\log^2 N}} as {N \rightarrow \infty}, see Exercise 12 of Supplement 4.)
  • (ii) Show that if {u} is a sufficiently large natural number, then every sufficiently large even number {N} is expressible as the sum of two natural numbers {n_1,n_2}, each of which are the product of at most {u} prime factors.
  • (iii) Strengthen (ii) to include the requirement that {n_1} is prime.

Exercise 26 (Approximations to Legendre’s conjecture)

  • (i) Show that for any {x,y \geq 2}, the number of primes in the interval {[x,x+y]} is at most {O( \frac{y}{\log y} )}.
  • (ii) Show that if {u} is a sufficiently large natural number, then for all sufficiently large {x}, there is at least one natural number {n} between {x^2} and {(x+1)^2} that is the product of at most {u} primes.

Exercise 27 (Approximations to Landau’s fourth problem) Define the singular series

\displaystyle  {\mathfrak S} := \prod_p \frac{1-\omega(p)/p}{1-1/p}

where {\omega(p)} is the number of residue classes {a\ (p)} such that {a^2 + 1 =0\ (p)}.

  • (i) Show that the number of primes of the form {n^2+1} with {n \leq x} is {O( {\mathfrak S} \frac{x}{\log x} )} for sufficiently large {x}. (This should be compared with the prediction {(1+o(1)) {\mathfrak S} \frac{x}{\log x}}, see Exercise 15 of Supplement 4.)
  • (ii) Show that if {u} is a sufficiently large natural number, then there are an infinite number of numbers of the form {n^2+1} that are the product of at most {u} primes.

— 2. The Selberg upper bound sieve —

We now turn to the Selberg upper bound sieve, which is a simple but remarkably useful general-purpose upper bound sieve. We again follow the discussion of Friedlander and Iwaniec.

The idea of the sieve comes from the following trivial observation: if {P} is a squarefree number, {E_p} is a set of integers for each {p|P}, and {(\rho_d)_{d|P}} are arbitrary real numbers with {\rho_1 = 1}, then the function

\displaystyle  \left( \sum_{d|P} \rho_d 1_{E_d} \right)^2

is an upper bound sieve, since it is clearly non-negative and equals {1} outside of {\bigcup_{p|P} E_p}. Equivalently, the sequence

\displaystyle  \lambda^+_d := \sum_{d_1,d_2: [d_1,d_2] = d} \rho_{d_1} \rho_{d_2} \ \ \ \ \ (28)

for {d|P} is a sequence of upper bound sieve coefficients, where {[d_1,d_2]} is the least common multiple of {d_1,d_2}. If we set {D=R^2} and assume that the {\rho_d} are only non-zero for {d \leq R}, then this sequence of sieve coefficients will only be non-zero on {{\mathcal D} := \{ d|P: d \leq D \}}. We will refer to sieves (and sieve coefficients) constructed by this method as Selberg upper bound sieves (or Selberg upper bound sieve coefficients); they are also referred to as {\rho^2} sieves in the literature.

A key advantage of the Selberg sieve is that the coefficients {\rho_d} for {1 < d \leq R} are completely unconstrained within the real numbers, and the sieve depends in a quadratic fashion on these coefficients, so the problem of optimising the Selberg sieve for a given application usually reduces to an unconstrained quadratic optimisation problem, which can often be solved exactly in many cases. Of course, the optimal Selberg sieve need not be the globally optimal sieve amongst all upper bound sieves; but in practice the Selberg sieve tends to give quite good results even if it might not be the globally optimal choice.

Selberg sieves can be used for many purposes, but we begin with the classical sieve problem posed in Problem 10. To avoid some degeneracies we will assume that

\displaystyle  0 < g(p) < 1 \ \ \ \ \ (29)

for all {p|P}. Restricting to upper bound sieve coefficients of the Selberg form, the quantity (14) becomes

\displaystyle  \sum_{d_1,d_2|P} \rho_{d_1} \rho_{d_2} g( [d_1,d_2] ). \ \ \ \ \ (30)

This is a quadratic form in the coefficients {\rho_d}; our task is to minimise this amongst all choices of coefficients with {\rho_1=1}, with {\rho_d} vanishing for {d>R}.

Observe that any {d_1,d_2|P} can be expressed as {d_1 = a_1 b}, {d_2 = a_2 b} with {a_1 a_2 b | P}, in which case {[d_1,d_2] = a_1 a_2 b}. By multiplicativity, we may thus write (30) as

\displaystyle  \sum_{a_1 a_2 b|P} \rho_{a_1 b} \rho_{a_2 b} g(b) g(a_1) g(a_2)

which we can rearrange as as

\displaystyle  \sum_{b|P} g(b) \sum_{a_1,a_2 | P/b: (a_1,a_2)=1} \rho_{a_1b} g(a_1) \rho_{a_2 b} g(a_2).

The inner sum almost factorises as a square, but we have the coprimality constraint {(a_1,a_2)=1} to deal with. The standard trick for dealing with this constraint is Möbius inversion:

\displaystyle  1_{(a_1,a_2)=1} = \sum_{d |(a_1,a_2)} \mu(d)

\displaystyle  = \sum_{d|a_1,a_2} \mu(d);

inserting this and writing {a_1 = d c_1}, {a_2 = d c_2}, we can now write (30) as

\displaystyle  \sum_{b|P} g(b) \sum_{d|P/b} \mu(d) g(d)^2 \sum_{c_1,c_2 | P/db} \rho_{dbc_1} g(c_1) \rho_{dbc_2} g(c_1)

which now does factor as a sum of squares:

\displaystyle  \sum_{b|P} g(b) \sum_{d|P/b} \mu(d) g(d)^2 \left( \sum_{c | P/db} \rho_{dbc} g(c)\right)^2.

Writing {e = bcd}, this becomes

\displaystyle  \sum_{bd|P} \frac{\mu(d)}{g(b)} \left( \sum_{bd | e | P} \rho_e g(e)\right)^2.

Writing {bd = m}, this becomes

\displaystyle  \sum_{m|P} h(m) y_m^2 \ \ \ \ \ (31)

where

\displaystyle  y_m := \frac{\mu(m)}{h(m)} \sum_{m | e | P} \rho_e g(e)

and {h} is defined on factors of {P} by the Dirichlet convolution

\displaystyle  \frac{1}{h} = \mu * \frac{1}{g}.

Thus {h} is the multiplicative function with

\displaystyle  h(p) := \frac{g(p)}{1-g(p)} \ \ \ \ \ (32)

for all {p|P}; we extend {h} by zero outside of the factors of {P}. In particular {h} is non-negative.

The coefficients {(y_m)_{m|P}} are a transform of the original coefficients {(\rho_e)_{e|P}}. For instance, if {P} was replaced by a single prime {p}, then the coefficients {(y_1,y_p)} would be related to the coefficients {(\rho_1,\rho_p)} by the transform

\displaystyle  y_1 = \rho_1 + g(p) \rho_p; \quad y_p = -\frac{g(p)}{h(p)} \rho_p

which is inverted as

\displaystyle  \rho_1 = y_1 + h(p) y_p; \quad \rho_p = - \frac{1}{g(p)} h(p) y_p.

More generally, we have the following inversion formula:

Exercise 28 (Möbius-type inversion formula) Verify that

\displaystyle  \rho_e = \frac{\mu(e)}{g(e)} \sum_{e|m|P} h(m) y_m \ \ \ \ \ (33)

for any {e|P}.

From this formula, we see in particular that {y_m} is supported on {m \leq R} if and only if {\rho_e} is supported on {e \leq R}. The constraint {\rho_1=1} can now be written in terms of the {y_m} coefficients as

\displaystyle  \sum_{m|P} h(m) y_m = 1. \ \ \ \ \ (34)

Our task is now to minimise the quadratic form (31) amongst all coefficients {y_m} supported on {m \leq R} obeying (34). The Cauchy-Schwarz inequality gives the lower bound

\displaystyle  \sum_{m|P} h(m) y_m^2 \geq J^{-1}

where {J := \sum_{m|P: m \leq R} h(m)}, with equality precisely when {y_m = 1/J} for {m \leq R}. Inserting this back into (33), we arrive at the optimised Selberg sieve coefficients

\displaystyle  \rho_e = \frac{1}{J} \frac{\mu(e)}{g(e)} \sum_{e|m|P: m \leq R} h(m) \ \ \ \ \ (35)

and the quantity (14) is equal to {\frac{1}{J}}.

We have a pleasant size bound on the coefficients {\rho_e}:

Lemma 29 We have {|\rho_e| \leq 1} for all {e|P}.

Proof: Writing {m = ek}, we have

\displaystyle  |\rho_e| = \frac{1}{Jg(e)} \sum_{k| P/e: k \leq R/e} h(e) h(k)

but

\displaystyle  J \geq \sum_{k \leq R/e} \sum_{d|e} h(d) h(k)

\displaystyle  = \sum_{k \leq R/e} h(k) \frac{h(e)}{g(e)}

since one can compute that {\frac{h}{g} = h * 1}. The claim follows. \Box

The upper bound sieve coefficients {\lambda^+_d} in (28) is then bounded by

\displaystyle  |\lambda^+_d| \leq \sum_{[d_1,d_2]=d} 1 = \tau_3(d)

where {\tau_3(d) = \sum_{d_1 d_2 d_3 = d} 1} is the third divisor function. Applying Theorem 5, we thus have a general upper bound for Problem 3:

Theorem 30 (Selberg sieve upper bound) Let {P} be a squarefree natural number. For each prime {p} dividing {P}, let {E_p} be a set of integers. Let {g} be a multiplicative function with {0 < g(p) < 1} for all {p}, and define the multiplicative function {h} by (32), extending {h} by zero outside of the factors of {P}. Let {D = R^2} for some {R \geq 1}. Suppose that {(a_n)_{n \in {\bf Z}}} is a finitely supported sequence of non-negative reals, such that

\displaystyle  \sum_{n \in E_d} a_n = g(d) X + r_d \ \ \ \ \ (36)

for all {d \leq D} and some reals {X, r_d}. Then

\displaystyle  \sum_{n \not \in\bigcup_{p|P} E_p} a_n \leq \frac{X}{\sum_{d \leq R} h(d)} + \sum_{d \leq D} \tau_3(d) |r_d|.

Remark 31 To compare this bound against the “expected” value of {X \prod_{p|P} (1-g(p))}, observe from Euler products that

\displaystyle \prod_{p|P} (1-g(p)) = \frac{1}{\sum_d h(d)}.

The quantity {\sum_{d \leq R} h(d)} can often be computed in practice using standard methods for controlling sums of multiplicative functions, such as Theorem 27 of Notes 1. We illustrate this with the following sample application of the Selberg sieve:

Theorem 32 (Sieving an interval) Let {k, C} be fixed natural numbers, and let {x \geq 2}. For each prime number {p \leq \sqrt{x}}, let {E_p} be the union of {\omega(p)} residue classes modulo {p}, where {\omega(p) = k} for all {p \geq C}, and {\omega(p) < p} for all {p}. Then for any {\varepsilon > 0}, one has

\displaystyle  |\{ n: n \leq x \} \backslash \bigcup_{p \leq \sqrt{x}} E_p| \leq (2^k k! + \varepsilon) {\mathfrak S} \frac{x}{\log^k x} \ \ \ \ \ (37)

whenever {x} is sufficiently large depending on {k,C,\varepsilon}, where {{\mathfrak S}} is the singular series

\displaystyle  {\mathfrak S} := \prod_p (1-\frac{1}{p})^{-k} (1 - \frac{\omega(p)}{p}). \ \ \ \ \ (38)

Proof: We apply Theorem 30 with {a_n := 1_{1 \leq n \leq x}}, {X := x}, {P := \prod_{p \leq \sqrt{x}} p}, and {g(p) := \frac{\omega(p)}{p}}. We set {D := x^{1-\varepsilon}} and {R := D^{1/2}}. For any {d|P}, {E_d} is the union of {\prod_{p|d} \omega(p)} residue classes modulo {d}, and thus by (36) we have

\displaystyle  |r_d| \leq \prod_{p|d} \omega(p)

and in particular

\displaystyle  |r_d| \ll_{k,C} \tau_k(d).

From Theorem 30, we thus can upper bound the left-hand side of (37) by

\displaystyle  \frac{x}{\sum_{d \leq R} h(d)} + O_{k,C}( \sum_{d \leq D} \tau_3(d) \tau_k(d) ).

By the divisor bound and the choice of {D} we have

\displaystyle  \sum_{d \leq D} \tau_3(d) \tau_k(d) \ll_{k,C,\varepsilon} \frac{x}{\log^{k+1} x}

so it will suffice to show (after adjusting {\varepsilon}) that

\displaystyle  \sum_{d \leq R} h(d) = \frac{1}{(2 + O(\varepsilon))^k k! {\mathfrak S}} \log^k x + O_{k,C,\varepsilon}( \log^{k-1} x ).

But from Theorem 27(ii) of Notes 1 (with {g(n)} replaced by {n h(n)}), the left-hand side is

\displaystyle  \frac{1}{k!} {\mathfrak H} \log^k R + O_{k,C,\varepsilon}( \log^{k-1} R )

where

\displaystyle  {\mathfrak H} = \prod_p (1 - \frac{1}{p})^k (1 + h(p)).

Since {h(p) = \frac{g(p)}{1-g(p)} = \frac{\omega(p)}{p-\omega(p)}}, we see that {{\mathfrak H} = {\mathfrak S}^{-1}}, and the claim follows (after adjusting {\varepsilon} appropriately). \Box

The upper bound in (37) is off by a factor of about {2^k k!} from what one expects to be the truth. The following exercises give some quick applications of Theorem 32.

Exercise 33 (Brun-Titchmarsh inequality) If {\varepsilon > 0}, and {y} is sufficiently large depending on {\varepsilon}, establish the Brun-Titchmarsh inequality

\displaystyle  \pi(x+y) - \pi(x) \leq \frac{(2+\varepsilon)y}{\log y}

for all {x > 1}, where {\pi(x)} is the number of primes less than {x}. More generally, if {\varepsilon > 0}, {a\ (q)} is a primitive residue class, and {y} is sufficiently large depending on {\varepsilon,q}, establish the Brun-Titchmarsh inequality

\displaystyle  \pi(x+y; a\ (q)) - \pi(x; a\ (q)) \leq \frac{(2+\varepsilon)y}{\phi(q) \log(y/q)}

for all {x>1}, where {\pi(x; a\ (q))} is the number of primes less than {x} that lie in {a\ (q)}. Comparing this with the prime number theorem in arithmetic progressions, we see that the Brun-Titchmarsh inequality is off from the truth by a factor of two when {x = O(y)}; however, the Brun-Titchmarsh inequality is applicable even when {x} is extremely large compared with {y}, whereas the prime number theorem gives no non-trivial bound in this regime. The {\varepsilon} can in fact be deleted in these inequalities, a result of Montgomery and Vaughan (using a refinement of the large sieve, discussed below).

Exercise 34 (Upper bound for {k}-tuple conjecture) If {(h_1,\dots,h_k)} is an admissible {k}-tuple (as in Exercise 24), and let {{\mathfrak S}} be the singular series (27). Show that

\displaystyle  |\{ n \leq x: n+h_1,\dots,n+h_k \in {\mathcal P}\}| \leq (2^k k! + o(1)) {\mathfrak S} \frac{x}{\log^k x} \ \ \ \ \ (39)

as {x \rightarrow \infty}. (This should be compared with the prime tuples conjecture, which predicts

\displaystyle  |\{ n \leq x: n+h_1,\dots,n+h_k \in {\mathcal P}\}| = (1 + o(1)) {\mathfrak S} \frac{x}{\log^k x}

as {x \rightarrow \infty}.)

As with the fundamental lemma, one can use the Selberg sieve together with the Bombieri-Vinogradov theorem to sift out a set of primes, rather than a set of intervals:

Exercise 35 (Sieving primes) Let {k, C} be fixed natural numbers, and let {x \geq 2}. For each prime number {p \leq \sqrt{x}}, let {E_p} be the union of {\omega(p)} primitive residue classes modulo {p}, where {\omega(p) = k-1} for all {p \geq C}, and {\omega(p) < p-1} for all {p}. Then for any {\varepsilon > 0}, one has

\displaystyle  |\{ p \in {\mathcal P}: p \leq x \} \backslash \bigcup_{p \leq \sqrt{x}} E_p| \leq (4^{k-1} (k-1)! + \varepsilon \ \ \ \ \ (40)

\displaystyle  + O_{k,C,\varepsilon}( \frac{1}{\log x} )) {\mathfrak S} \frac{x}{\log^k x}

where {{\mathfrak S}} is the singular series

\displaystyle  {\mathfrak S} := \prod_p (1-\frac{1}{p})^{1-k} (1 - \frac{\omega(p)}{p-1}).

(You will need Exercise 23 of Notes 3 to control the {\tau_3} terms arising from the remainder term.) If one assumes the Elliott-Halberstam conjecture (see Exercise 22 of Notes 3), show that the factor of {4^{k-1}} here can be lowered to {2^{k-1}}.

Exercise 36 Show that the quantity {2^k k!} in (39) may be replaced by {4^{k-1} (k-1)!}, which is a superior bound for {k \leq 3}. (It remains an open question to improve upon the bound {\min( 2^k k!, 4^{k-1} (k-1)! )} without assuming any further unproven conjectures.) Assuming the Elliott-Halberstam conjecture, show that one can replace {2^k k!} instead by {2^{k-1} (k-1)!}.

We have approached the analysis of the Selberg sieve as a “discrete” optimisation problem, in which one is trying to optimise the parameters {(\rho_d)_{d|P; d \leq R}} or {(y_m)_{m|P; m \leq R}} that are indexed by the discrete set {\{ d|P: d \leq R \}}. It is however also instructive to consider a “continuous” version of the optimisation problem, in which the coefficients {\rho_d} or {y_m} are described by a continuous function of {d} or {m}; this is a slightly more restrictive class of sieves, but turns out to lead to numerical bounds that come very close to the discrete optimum, and allow for the use of methods from calculus of variations to come into play when using the Selberg sieve in more sophisticated ways. (See these lecture notes of Selberg for further discussion of how the continuous problem serves as a good approximation of the discrete one.) We give a basic illustration of this by presenting an alternate proof of (a very slightly weaker form of) Theorem 32 in which we do not use the choice (35) for the sieve weights {\rho_e}, but instead choose weights of the form

\displaystyle  \rho_d := \mu(d) f( \frac{\log d}{\log R} ) \ \ \ \ \ (41)

for some smooth compactly supported function {f: {\bf R} \rightarrow {\bf R}} that vanishes on {[1,+\infty)}, and which equals {1} at the origin; compare this smoothly truncated version of the Möbius function with the more abrupt truncations of the Möbius function that arise in combinatorial sieving. Such cases of the Selberg sieve were studied by several authors, most notably Goldston, Pintz, and Yildirim (although they made {F} a polynomial on {[0,1]}, rather than a smooth function). As in the previous proof of Theorem 32, we take {g(p) := \frac{\omega(p)}{p}}, {D := x^{1-\varepsilon}}, {R := D^{1/2}}, {P := \prod_{p \leq \sqrt{x}} p}, {X=x}, and {a_n := 1_{1 \leq n \leq x}}. The quantity (30) (or (14)) is then equal to

\displaystyle  \sum_{d_1,d_2|P} f( \frac{\log d_1}{\log R} ) f( \frac{\log d_2}{\log R} ) \mu(d_1) \mu(d_2) g( [d_1,d_2] ).

We can remove the constraints {d_1, d_2|P}, since the summands vanish outside of this range. To estimate this sum we use an argument related to that used to prove Proposition 9 from Notes 2. We first use Fourier inversion, applied to the function {u \mapsto e^u f(u)}, to obtain a representation of the form

\displaystyle  e^u f(u) = \int_{\bf R} F(t) e^{-itu}\ dt \ \ \ \ \ (42)

for some smooth, rapidly decreasing function {F: {\bf R} \rightarrow {\bf C}}, thus

\displaystyle  |F(t)| \ll_{f,A} (1+|t|)^{-A} \ \ \ \ \ (43)

for any {A>0}. We then have

\displaystyle  f( \frac{\log d_1}{\log R} ) = \int_{\bf R} \frac{F(t_1)}{d_1^{\frac{1+it_1}{\log R}}} dt_1

and

\displaystyle  f( \frac{\log d_2}{\log R} ) = \int_{\bf R} \frac{F(t_2)}{d_2^{\frac{1+it_2}{\log R}}} dt_2

and so by Fubini’s theorem (using the divisor bound {g(d) = O( d^{-1+o(1)} )}) we may write (30) as

\displaystyle  \int_{\bf R} \int_{\bf R} F(t_1) F(t_2) \sum_{d_1,d_2} \frac{\mu(d_1) \mu(d_2) g([d_1,d_2])}{d_1^{\frac{1+it_1}{\log R}} d_2^{\frac{1+it_2}{\log R}}}\ dt_1 dt_2,

and then by Euler products and the definition of {g} we may factorise this as

\displaystyle  \int_{\bf R} \int_{\bf R} F(t_1) F(t_2) G( t_1, t_2)\ dt_1 dt_2 \ \ \ \ \ (44)

where

\displaystyle  G(t_1,t_2) := \prod_{p} ( 1 - \frac{\omega(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{\omega(p)}{p^{1+\frac{1+it_2}{\log R}}} + \frac{\omega(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ).

We have the crude upper bound

\displaystyle  |G(t_1,t_2)| \ll_{k,C} \prod_p \exp( O_k( \frac{1}{p^{1+1/\log R}} ) )

which by the asymptotic {\sum_p \frac{1}{p^s} = -\log(s-1) + O(1)} for {s > 1} (see equation (22) of Notes 1) implies that

\displaystyle  |G(t_1,t_2)| \ll_{k,C} \log^{O_k(1)} R.

From this and the rapid decrease of {f}, we see that the contribution of the cases {|t_1| \geq \sqrt{\log R}} or {|t_2| \geq \sqrt{\log R}} (say) to (44) is {O_{k,C,f,A}( \log^{-A} R )} for any {A>0}. Thus we can write (44) as

\displaystyle  \int_{|t_1|, |t_2| \leq \sqrt{\log R}} F(t_1) F(t_2) G( t_1, t_2)\ dt_1 dt_2 + O_{k,C,f,A}(\log^{-A} R).

Since {\zeta(s) = \prod_p (1 - \frac{1}{p^s})^{-1}}, we can factor

\displaystyle  G(t_1,t_2) = \frac{\zeta(1 + \frac{1+it_1+1+it_2}{\log R})^k}{\zeta(1+\frac{1+it_1}{\log R})^k \zeta(1+\frac{1+it_2}{\log R})^k} \prod_p E_p(t_1,t_2)

where {E_p(t_1,t_2)} is the local Euler factor

\displaystyle  \frac{( 1 - \frac{\omega(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{\omega(p)}{1+p^{\frac{1+it_2}{\log R}}} + \frac{\omega(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ) (1 - \frac{1}{p^{1+\frac{1+it_1+1+it_2}{\log R}}})^k}{(1-\frac{1}{p^{1+\frac{1+it_1}{\log R}}})^k (1-\frac{1}{p^{1+\frac{1+it_2}{\log R}}})^k}.

The point of this factorisation is that, thanks to a routine but somewhat tedious Taylor expansion, one has the asymptotics

\displaystyle  E_p(t_1,t_2) = 1 + O_{k,C}(\frac{1}{p^{3/2}})

for any complex {t_1,t_2} with imaginary part less than {\frac{1}{2} \log R} in magnitude, and hence (by the Cauchy integral formula)

\displaystyle  \partial_{t_1} E_p(t_1,t_2), \partial_{t_2} E_p(t_1,t_2) = O_{k,C}( \frac{1}{\log R} \frac{1}{p^{3/2}})

for any complex {t_1,t_2} with imaginary part less than {\frac{1}{4} \log R} in magnitude. By the fundamental theorem of calculus, we thus have

\displaystyle  E_p(t_1,t_2) = E_p(i,i) + O_{k,C}( \frac{1}{\sqrt{\log R}} \frac{1}{p^{3/2}} )

when {|t_1|, |t_2| \leq \sqrt{\log R}} are real. Also, observe from (38) that

\displaystyle  \prod_p E_p(i,i) = {\mathfrak S}

and hence

\displaystyle  \prod_p E_p(t_1,t_2) = {\mathfrak S} + O_{k,C}( \frac{1}{\sqrt{\log R}} ).

From this and the standard asymptotic {\zeta(s) = \frac{1}{s-1} + O(1)} for {s} close to {1}, we have

\displaystyle  G(t_1,t_2) = {\mathfrak S} \log^{-k} R \frac{(1+it_1)^k (1+it_2)^k}{(1+it_1+1+it_2)^k} + O_{k,C}( (1+|t_1|+|t_2|)^{O(1)} \log^{-k-1/2} R )

and thus we can write (44) as

\displaystyle  {\mathfrak S} \log^{-k} R \int_{|t_1|, |t_2| \leq \sqrt{\log R}} F(t_1) F(t_2) \frac{(1+it_1)^k (1+it_2)^k}{(1+it_1+1+it_2)^k} \ dt_1 dt_2

\displaystyle  + O_{k,C,f}( \log^{-k-1/2} R ).

From the rapid decrease of {F}, we may now remove the constraints on {t_1,t_2} and write this as

\displaystyle  {\mathfrak S} \log^{-k} R \int_{\bf R} \int_{\bf R} F(t_1) F(t_2) \frac{(1+it_1)^k (1+it_2)^k}{(1+it_1+1+it_2)^k} \ dt_1 dt_2

\displaystyle + O_{k,C,f}( \log^{-k-1/2} R ).

To handle the weight {\frac{(1+it_1)^k (1+it_2)^k}{(1+it_1+1+it_2)^k}}, we observe from dividing (42) by {e^u} and then differentiating {k} times that

\displaystyle  f^{(k)}(u) = (-1)^k \int_{\bf R} (1+it)^k F(t) e^{-(1+it)u}\ dt

and hence by Fubini’s theorem

\displaystyle  f^{(k)}(u)^2 = \int_{\bf R} \int_{\bf R} (1+it_1)^k (1+it_2)^k F(t_1) F(t_2) e^{-(1+it_1+1+it_2)u}\ dt_1 dt_2.

Using the Gamma function identity {\int_0^\infty e^{-su} u^{k-1}\ du = \frac{(k-1)!}{s^k}} for any {s} with {\hbox{Re}(s)>0}, we conclude from another application of Fubini’s theorem that

\displaystyle  \int_0^\infty f^{(k)}(u)^2 u^{k-1}\ du

\displaystyle = (k-1)! \int_{\bf R} \int_{\bf R} \frac{(1+it_1)^k (1+it_2)^k}{(1+it_1+1+it_2)^k} F(t_1) F(t_2)\ dt_1 dt_2.

Note from the support of {f} that we can restrict the {u} integral to {1}. Putting this all together, we see that with the choice of weights (41), the expression (14) takes the form

\displaystyle  \frac{1}{(k-1)!} {\mathfrak S} \log^{-k} R \int_0^1 f^{(k)}(u)^2 u^{k-1}\ du + O_{k,C,f}( \log^{-k-1/2} R ).

Recall that we are requiring {f(0)} to equal {1}, which after {k} applications of the fundamental theorem of calculus (or integration by parts) is equivalent to the constraint

\displaystyle  \int_0^1 f^{(k)}(u) \frac{u^{k-1}}{(k-1)!}\ du = 1.

Since {\int_0^1 u^{k-1}\ du = \frac{1}{k}}, we conclude from Cauchy-Schwarz that

\displaystyle  \int_0^1 f^{(k)}(u)^2 u^{k-1}\ du \geq k ((k-1)!)^2,

with equality occurring when {f^{(k)}(u) = k!}. Equality is not quite attainable with {f} smooth, but by a mollification one can select an {f} smoothly supported on (say) {[-1,1]} with {f(0)=0} and

\displaystyle  \int_0^1 f^{(k)}(u)^2 u^{k-1}\ du \leq (1+\varepsilon) k ((k-1)!)^2

for any {\varepsilon > 0}. The expression (14) now takes the form

\displaystyle  (1+\varepsilon) k! {\mathfrak S} \log^{-k} R + O_{k,C,f}( \log^{-k-1/2} R ).

Meanwhile, from (41) we have {\rho_d = O_f(1)} and thus {\lambda^+_d \ll_f \tau_3(d)} much as before. We can then apply Theorem 5 as before to recover Theorem 32.

Remark 37 The Selberg sieve is most naturally used as an upper bound sieve. However, it is possible to modify the Selberg sieve to produce a lower bound sieve in a number of ways. One way is to simply insert the Selberg sieve into the Buchstab identity (16); see for instance this paper of Ankeny and Onishi for an investigation of this approach. Another method is to multiply the Selberg upper bound sieve by a lower bound sieve {\Lambda_-}; somewhat surprisingly, even a very crude lower bound sieve such as the {k=1} Bonferroni sieve {\Lambda_- = 1 - \sum_{p|P} 1_{E_p}} can give reasonably good results for this purpose. See Friedlander and Iwaniec for further analysis of this lower bound sieve (sometimes known as the {\Lambda^2 \Lambda_-} sieve).

— 3. A multidimensional Selberg sieve, and bounded gaps between primes —

We now discuss a multidimensional variant of the Selberg sieve that establishes a recent result of Maynard (and independently by myself), giving further partial progress towards the Hardy-Littlewood prime tuples conjecture. For sake of exposition we shall omit some details; see this previous blog post for a more careful treatment; see also this survey of Granville.

More precisely, we sketch the proof of

Theorem 38 (Maynard’s theorem) Let {m} be a natural number, and let {k} be sufficiently large depending on {m}. Then for any admissible {k}-tuple {(h_1,\dots,h_k)}, there exist infinitely many natural numbers {n} such that at least {m+1} of the numbers {n+h_1,\dots,n+h_k} are prime.

Exercise 39 Show that Maynard’s theorem implies that the quantity {H_{m} = \lim \inf_{n \rightarrow \infty} p_{n+m}-p_n} is finite for each natural number {m}, where {p_n} is the {n^{th}} prime, is finite. (For the most recent bounds on {H_m}, see this Polymath paper.) In particular, the {m=1} case of Theorem 38 yields “bounded gaps between primes”, in that there is a finite {H} for which {p_{n+1}-p_n \leq H} for infinitely many {n}.

Even the {m=1} case of this theorem was only proven in 2013 by Zhang, building upon earlier work of Goldston, Pintz, and Yildirim. Zhang’s original value of {H} in the “bounded gaps” application was {H = 7 \times 10^7}; this has since been lowered several times, with the current record being {H=246}, due to the Polymath project. The original arguments of Zhang and Goldston-Pintz-Yildirim used a “one-dimensional” Selberg sieve, with coefficients given by (41), and required quite strong equidistribution hypotheses on the primes (stronger than what the Bombieri-Vinogradov theorem provides). However, the argument of Maynard relies on a more general and flexible “multi-dimensional” Selberg sieve, which has better numerical performance, and as such does not need any equidistribution result beyond the Bombieri-Vinogradov theorem. This sieve has since been used for a number of further applications in analytic number theory; see this web page for a list of papers in this direction.

Let {m, k, h_1,\dots,h_k} be as in Theorem 38. Suppose that, for sufficiently large {x}, we can find a non-negative function {\nu: {\bf N} \rightarrow {\bf R}^+} supported on {[x/2,x]} with the property that

\displaystyle  \sum_{i=1}^k \sum_n \nu(n) 1_{n+h_i \in {\mathcal P}} > m \sum_n \nu(n). \ \ \ \ \ (45)

Then by the pigeonhole principle, there exists at least one {n \in [x/2,x]} such that {\sum_{i=1}^k 1_{n+h_i \in {\mathcal P}} > m}, that is to say at least {m+1} of the numbers {n+h_1,\dots,n+h_k} are prime. Letting {x} go to infinity (keeping {m,k,h_1,\dots,h_k} fixed), we will obtain Theorem 38.

It remains to establish the existence of a function {\nu} with the required properties. Prior to the work of Goldston, Pintz, and Yildirim, it was only known (using tools such as the fundamental lemma of sieve theory, or off-the-shelf Selberg sieves) how to construct sieve weights {\nu} obeying (45) with {m} replaced by a small constant less than {1}. The earlier arguments of Zhang and Goldston-Pintz-Yildirim chose a weight {\nu} which was essentially of the Selberg sieve form

\displaystyle  \nu(n) := 1_{[x/2,x]}(n) \left( \sum_{d | (n+h_1) \dots (n+h_k)} \mu(d) f( \frac{\log d}{\log R} ) \right)^2

(compare with (41)), where {R = x^{\theta/2}} for some fixed {0 < \theta < 1}, and {f} was a smooth fixed compactly supported function. Omitting many calculations, their conclusion was that one could prove the {m=1} case of Theorem 38 once one had some sort of distribution result on the primes at level {\theta} (as defined in Exercise 22 of Notes 3) with {1/2 < \theta < 1}. Unfortunately, this just falls short of what the Bombieri-Vinogradov theorem provides, which is a level of distribution {\theta} for any {0 < \theta < 1/2}. One of the key contributions of Zhang was to obtain a partial distribution result at a value of {\theta} slightly above {1/2}, which when combined with (a modification of) the previous work of Goldston, Pintz, and Yildirim, was able to yield the {m=1} case of Theorem 38.

The approach of Maynard and myself was a little different, based on multidimensional Selberg sieves such as

\displaystyle  \nu(n) := 1_{[x/2,x]}(n) \times

\displaystyle  \left( \sum_{d_i | n+h_i \forall i=1,\dots,k} \mu(d_1) \dots \mu(d_k) f( \frac{\log d_1}{\log R}, \dots, \frac{\log d_k}{\log R} ) \right)^2

for some multidimensional smooth function {f: [0,+\infty)^k \rightarrow {\bf R}} supported on the simplex {\{ (t_1,\dots,t_k): t_1+\dots +t_k \leq 1 \}}; the one-dimensional sieve considered by Zhang and Goldston-Pintz-Yildirim is then essentially the case when {f(t_1,\dots,t_k)} is a function just of {t_1+\dots+t_k} on this simplex. Actually, to avoid some (very minor) issues concerning common factors between the {n+h_i}, and also to essentially eliminate the role of the singular series, it is convenient to work with a slight modification

\displaystyle  \nu(n) := 1_{n=b\ (W)} 1_{[x/2,x]}(n) \times \ \ \ \ \ (46)

\displaystyle  \left( \sum_{d_i | n+h_i \forall i=1,\dots,k} \mu(d_1) \dots \mu(d_k) f( \frac{\log d_1}{\log R}, \dots, \frac{\log d_k}{\log R} ) \right)^2

of the above sieve, where {W = \prod_{p \leq w} p} for some {w} slowly growing to infinity with {x} (e.g. {w = \log\log\log x} will do), and {b\ (W)} is a residue class such that {b+h_i\ (W)} is primitive for all {i=1,\dots,k} (such a {b} exists thanks to the admissibility of {(h_1,\dots,h_k)} and the Chinese remainder theorem). Note that for {w} large enough, the {n+h_i} have no common factors, and so the {d_1,\dots,d_k} are automatically coprime to each other and to {W}. (In Maynard’s paper, the coefficients {f( \frac{\log d_1}{\log R}, \dots, \frac{\log d_k}{\log R} )} were replaced by a sequence {\rho_{d_1,\dots,d_k}} which was not directly arising from a smooth function {f}, but instead the analogue of the {y_{m_1,\dots,m_k}} coefficients from the preceding section were chosen to be of the form {F( \frac{\log d_1}{\log R}, \dots, \frac{\log d_k}{\log R} )} for some suitable function {F}. However, the differences between the two approaches are fairly minor.)

It is possible to estimate the right-hand side of (45) using a modification of the arguments of the previous section:

Exercise 40 Let {k, h_1,\dots,h_k, b, W, \theta, R, f, \nu} be as in the above discussion. Assume that {0 < \theta < 1}. If {w} goes to infinity at a sufficiently slow rate at {x \rightarrow \infty}, establish the asymptotic

\displaystyle  \sum_n \nu(n) = \left( \frac{W}{\phi(W)} \right)^k \frac{x}{2W\log^k R} (I(f) + o(1))

as {x \rightarrow \infty}, where

\displaystyle  I(f) := \int_{[0,+\infty)^k} f_{1,\dots,k}(t_1,\dots,t_k)^2\ dt_1 \dots dt_k

and {f_{1,\dots,k}} denotes the mixed partial derivative {\frac{\partial^k f}{\partial t_1 \dots \partial t_k}} in each of the {k} variables {t_1,\dots,t_k}.

Now we consider a term on the left-hand side, say {\sum_n \nu(n) 1_{n+h_k \hbox{ prime}}}. When {n+h_k} is prime, the quantity {d_k} in (46) is necessarily {1}, so (46) simplifies to a {k-1}-dimensional version of itself:

\displaystyle  \nu(n) = 1_{n=b\ (W)} 1_{[x/2,x]}(n) \times

\displaystyle \left( \sum_{d_i | n+h_i \forall i=1,\dots,k-1} \mu(d_1) \dots \mu(d_{k-1}) f( \frac{\log d_1}{\log R}, \dots, \frac{\log d_{k-1}}{\log R}, 0 ) \right)^2.

If {\theta < \frac{1}{2}}, one can control the effect of the weight {1_{n+h_k \hbox{ prime}}} using the Bombieri-Vinogradov theorem, and after some moderately complicated calculations one obtains

Exercise 41 Let {k, h_1,\dots,h_k, b, W, \theta, R, f, \nu} be as in the above discussion. Assume that {0 < \theta < 1/2}. If {w} goes to infinity at a sufficiently slow rate at {x \rightarrow \infty}, establish the asymptotic

\displaystyle  \sum_n \nu(n) 1_{n+h_k \in {\mathcal P}} = \left( \frac{W}{\phi(W)} \right)^k \frac{x}{2W\log^{k-1} R \log x} (J_k(f) + o(1))

as {x \rightarrow \infty}, where

\displaystyle  J_k(f) := \int_{[0,+\infty)^{k-1}} f_{1,\dots,k-1}(t_1,\dots,t_{k-1},0)^2\ dt_1 \dots dt_{k-1}

and {f_{1,\dots,k-1}} denotes the mixed partial derivative {\frac{\partial^{k-1} f}{\partial t_1 \dots \partial t_{k-1}}} in the {k-1} variables {t_1,\dots,t_{k-1}}. Similarly for permutations of the {1,\dots,k} indices.

In view of these two calculations, one can establish Maynard’s theorem as soon as one can find {k} and {f} for which

\displaystyle  \sum_{i=1}^k J_i(f) > \frac{m}{\theta/2} I(f)

where {J_i} is defined similarly to {J_k} but with the role of the {i} and {k} indices swapped. This is now a problem purely in calculus of variations rather than number theory. It is thus natural to take {\theta} close to {1/2} (though, for the qualitative analysis performed here, actually any non-zero choice of {\theta} independent of {k} will suffice). Our task is now to find {k,f} with

\displaystyle  \sum_{i=1}^k J_i(f) > 4 m I(f).

The optimal choice of {f} for a given {k} is not known for any {k>2}, but one can generate a “good enough” choice of {f} by the following set of calculations. Firstly, we restrict to the case when {f(t_1,\dots,t_k)} is symmetric in the {t_1,\dots,t_k}, so that all the {J_i} are equal and our objective is now to satisfy the inequality

\displaystyle  kJ_k(f) > 4 m I(f).

Next, we make the change of variables {F := f_{1,\dots,k}}, thus {F} is smooth on {[0,+\infty)^k} and supported on the simplex {\{t_1+\dots+t_k \leq 1\}} but is otherwise arbitrary on this simplex. From the fundamental theorem of calculus, the desired inequality now becomes

\displaystyle  k \int_{[0,+\infty)^{k-1}} (\int_0^\infty F(t_1,\dots,t_k)\ dt_k)^2 dt_1 \dots dt_{k-1}

\displaystyle > 4m \int_{[0,+\infty)^k} F(t_1,\dots,t_k)^2\ dt_1 \dots t_k.

We select a function {F} of the truncated product form

\displaystyle  F(t_1,\dots,t_k) = k^{1/2} g(k t_1) \dots k^{1/2} g(k t_k) 1_{t_1 + \dots +t_k < 1}

where {g: [0,+\infty) \rightarrow {\bf R}^+} is a smooth non-negative function normalised so that

\displaystyle  \int_0^\infty g(t)^2\ dt = 1 \ \ \ \ \ (47)

so that

\displaystyle  \int_{[0,+\infty)^k} F(t_1,\dots,t_k)^2\ dt_1 \dots t_k \leq 1

and our objective (after a rescaling) is now to get

\displaystyle  \int_{[0,+\infty)^{k-1}} \left(\int_0^\infty g(t_k) 1_{t_1+\dots+t_k \leq k}\ dt_k\right)^2 g(t_1)^2 dt_1 \dots g(t_{k-1})^2 dt_{k-1}

\displaystyle  > 4m.

The emergence of the function {g} as a parameter in the multidimensional sieve is a feature that is not present in previous sieves (at least not without sharply curtailing the support of {g}), and is the main reason why this sieve performs better than previous sieves for this problem.

We interpret the above optimisation problem probabilistically. Let {X_1,X_2,\dots} be independent real values in {[0,+\infty)} with probability density function {g(t)^2\ dt}. Our task is now to obtain the inequality

\displaystyle  \mathop{\bf E} \left(\int_0^{\min(k-X_1-\dots-X_k),0} g(t_k)\ dt_k\right)^2 > 4m.

Suppose that we can locate a fixed function {g} (independent of {k}) obeying (47) with

\displaystyle  \mathop{\bf E} X_i = \int_0^\infty t g(t)^2\ dt < 1 \ \ \ \ \ (48)

and

\displaystyle  \int_0^\infty g(t)\ dt = \infty \ \ \ \ \ (49)

Then from the law of large numbers, {k-X_1-\dots-X_k} goes to infinity almost surely as {k} to {\infty}, and so from dominated convergence one has

\displaystyle  \lim_{k \rightarrow \infty} \mathop{\bf E} (\int_0^{\min(k-X_1-\dots-X_k),0} g(t_k)\ dt_k)^2 = \infty

and the claim follows from (49) if {k} is sufficiently large depending on {m}.

We are thus reduced to the easy one-dimensional problem of producing a smooth function {g: [0,+\infty) \rightarrow {\bf R}} obeying the constraints (47), (48), (49). However, one can verify that the choice

\displaystyle  g(t) := \frac{1}{(1+t) \log(2+t)}

(barely) obeys (49) with {\int_0^\infty g(t)^2\ dt} and {\int_0^\infty t g(t)^2\ dt} both finite, and the claim follows by a routine rescaling.

Exercise 42 By working through a more quantitative version of the above argument, establish Theorem 38 with {m \gg \log k} as {k \rightarrow \infty}. (Hint: One can replace the use of the law of large numbers with Chebyshev’s inequality.) It is not currently known how to obtain this theorem with any choice of {m} that grows faster than logarithmically in {k}; the current record is {m = (\frac{1}{4} + \frac{7}{600}) \log k + O(1)}, due to the Polymath project. It is known, though, that bounds on the order of {\log k} are the limit of the Selberg sieve method, and one either needs new sieves, or new techniques beyond sieve theory, to increase {m} beyond this rate; see the Polymath paper for further discussion.

One can use the multidimensional sieve to study the classical sieving problem of counting prime {k}-tuples, but unfortunately the numerology it gives is precisely the same as the numerology given by previous sieves:

Exercise 43 Let {(h_1,\dots,h_k)} be an admissible tuple, and let {w} go to infinity sufficiently slowly as {x \rightarrow \infty}.

  • (i) Use Exercise 40, followed by an optimisation of the function {f} (or {F}) to establish the bound

    \displaystyle  |\{x/2 \leq n \leq x: n = b\ (W); n+h_1,\dots,n+h_k \in {\mathcal P}\}|

    \displaystyle \leq (2^k k! + o(1)) \left( \frac{W}{\phi(W)}\right)^k \frac{x}{2W\log^k x}

    as {x \rightarrow \infty}, uniformly for all {b\ (W)} with {b+h_1, \dots, b+h_k\ (W)} primitive. Use this to give an alternate proof of (39).

  • (ii) By using Exercise 41 instead of Exercise 40, repeat (i) but with {2^k k!} replaced by {4^{k-1} (k-1)!}.

— 4. The large sieve —

In the general discussion of sieving problems such as Problem 3 in previous sections, there was no structure imposed on the sets {E_p} beyond the fact that one could control the sums (9). However, as we have seen in practice, {E_p} is typically the union of some number {\omega(p)} of residue classes modulo {p}. It is possible to exploit this structure using Fourier analysis, leading to an approach to sieving known as the large sieve, and relying crucially on the analytic large sieve inequality from Notes 3. The large sieve tends to give results similar to that of the Selberg sieve; indeed, the two sieves are in some sense “dual” to each other and are powered by the same sort of {L^2}-based techniques (such as the Cauchy-Schwarz inequality). However, the large sieve methods can offer sharper control on error terms than the Selberg sieve if carried out carefully. As the name suggests, the large sieve is particularly useful in the regime when the number of residue classes modulo {p} being sieved out is large, for instance as large as a constant multiple of {p}. (When {\omega(p)} is very large – close to {p}, then there is another sieve that gives even sharper results, namely the larger sieve of Gallagher, which we do not discuss further here.)

To apply the analytic large sieve to sieving problems, we need to exploit an “uncertainty principle” of Montgomery (discussed further in this previous blog post), that shows that functions which avoid certain residue classes in the physical domain are necessarily spread out in the frequency domain. More precisely, we have

Theorem 44 (Montgomery uncertainty principle) Let {f: {\bf Z} \rightarrow {\bf C}} be a finitely supported function, and let {\hat f: {\bf R}/{\bf Z} \rightarrow {\bf C}} be the Fourier transform {\hat f(\theta) := \sum_n f(n) e(n\theta)}, where {e(x) := e^{2\pi i x}}. Suppose that for each prime {p}, there are {\omega(p)} residue classes modulo {p} on which {f} vanishes identically for some {\omega(p) < p}. Then for any {\theta \in {\bf R}/{\bf Z}} and any natural number {q}, one has

\displaystyle  \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\hat f(\theta + \frac{a}{q})|^2 \geq h(q) |\hat f(\theta)|^2, \ \ \ \ \ (50)

where {h} is the multiplicative function

\displaystyle  h(q) := \mu^2(q) \prod_{p|q} \frac{\omega(p)}{p-\omega(p)}. \ \ \ \ \ (51)

Proof: We may assume that {q} is squarefree, since otherwise we have {h(q)=0} and the claim (50) is trivial. Observe from the Chinese remainder theorem that if the inequality (50) holds for two coprime values {q=q_1, q=q_2} of {q}, then it also holds for {q=q_1q_2} (as every fraction {\frac{a}{q_1 q_2}} in {{\bf R}/{\bf Z}} with {(a,q_1q_2)=1} can be uniquely written as {\frac{a_1}{q_1} + \frac{a_2}{q_2}} with {(a_1,q_1)=(a_2,q_2)=1}). Thus it suffices to verify the claim when {q} is prime. By multiplying {f(n)} by {e(-n\theta)} we can also reduce to the case {\theta=0}, thus our task is now to show that

\displaystyle  \sum_{a \in ({\bf Z}/p{\bf Z}) \backslash \{0\}} |\hat f(\frac{a}{p})|^2 \geq \frac{\omega(p)}{p-\omega(p)} |\hat f(0)|^2

or equivalently

\displaystyle  \sum_{a \in {\bf Z}/p{\bf Z}} |\hat f(\frac{a}{p})|^2 \geq \frac{p}{p-\omega(p)} |\hat f(0)|^2.

By the Plancherel identity in {{\bf Z}/p{\bf Z}}, the left-hand side can be written as

\displaystyle  p \sum_{b \in {\bf Z}/p{\bf Z}} |\sum_{n = b\ (p)} f(n)|^2

and the right-hand side is

\displaystyle  \frac{p}{p-\omega(p)} |\sum_{b \in {\bf Z}/p{\bf Z}} \sum_{n = b\ (p)} f(n)|^2.

Since the sum {\sum_{n = b\ (p)} f(n)} is only non-vanishing for at most {p-\omega(p)} choices of {b}, the claim follows from the Cauchy-Schwarz inequality. \Box

We then obtain the following variant of Theorem 32 or Theorem 30:

Theorem 45 (Arithmetic large sieve) Let {Q \geq 1}. For each prime number {p \leq Q}, let {E_p} be the union of {\omega(p)} residue classes modulo {p}, where {\omega(p) < p} for all {p}. Then for any {M \in {\bf Z}} and natural number {N \geq 1}, we have

\displaystyle  |\{ n \in {\bf Z}: M < n \leq M+N\} \backslash \bigcup_{p \leq Q} E_p| \ll \frac{N+Q^2}{J} \ \ \ \ \ (52)

where

\displaystyle  J := \sum_{q \leq Q} h(q)

and {h} is given by (51).

Proof: Let {f} be the indicator function of the set in (52). From Theorem 44 with {\theta=0}, and {q} summed up to {Q}, one has

\displaystyle  \sum_{q \leq Q} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\hat f(\frac{a}{q})|^2 \geq J |\hat f(0)|^2.

Now observe from the basic identity

\displaystyle  \frac{a_1}{q_1} - \frac{a_2}{q_2} = \frac{a_1 q_2 - a_2 q_1}{q_1 q_2}

that the Farey sequence {\{ \frac{a}{q}: q \leq Q; a \in ({\bf Z}/q{\bf Z})^\times\}} is {\frac{1}{Q^2}}-separated in {{\bf R}/{\bf Z}}. We may thus invoke the analytic large sieve inequality (see Proposition 6 of Notes 3) to conclude that

\displaystyle  \sum_{q \leq Q} \sum_{a \in ({\bf Z}/q{\bf Z})^\times} |\hat f(\frac{a}{q})|^2 \ll (N + Q^2) \sum_n |f(n)|^2.

But from construction of {n}, the left-hand side of (52) is equal to both {\sum_n |f(n)|^2} and {\hat f(0)}, and the claim follows. \Box

Note that if one uses the sharp form of the large sieve inequality (Remark 7 from Notes 3) then one can sharpen (52) to

\displaystyle  |\{ n \in {\bf Z}: M < n \leq M+N\} \backslash \bigcup_{p \leq Q} E_p| \leq \frac{N+Q^2}{J}.

Setting {Q = N^{1/2-\varepsilon}}, one can then recover sharper versions of Theorem 32 and its consequences, for instance by working along these lines (and using a very sharp version of the analytic large sieve inequality that exploits the variable spacing of the set of fractions {\frac{a}{q}}), result of Montgomery and Vaughan were able to completely delete the error term in the Brun-Titchmarsh inequality (Exercise 33).

Remark 46 It is a general phenomenon that large sieve methods tend to give the same main term as the Selberg sieve; see Section 9.5 of Friedlander-Iwaniec for further discussion.

— 5. The parity problem —

We have seen that sieve theoretic techniques are reasonably good at obtaining upper bounds to various sieving problems, and can also provide lower bounds if the sifting limit is low. However, these methods have proven to be frustratingly ineffective at providing non-trivial lower bounds for the original problems that motivated the development of sieve theory in the first place, namely counting patterns of primes such as twin primes. While we do not fully understand the exact limitations of what sieve theory can and cannot do, we do have a convincing heuristic barrier, known as the parity barrier or parity problem and first raised explicitly by Selberg, which explains why sieves have so much difficulty in lower bounding (or efficiently upper bounding) counts involving primes; in particular they explain why sieve-theoretic methods have thus far been unable to fully resolve any of the four Landau problems. The barrier is not completely rigorous, as it relies on the Möbius pseudorandomness principle (or Liouville pseudorandomness principle) discussed in Section 2 of Supplement 4. As such, our discussion in this section will be somewhat informal in nature; for instance, we will use vague symbols such as {\approx} without defining their meaning precisely.

Consider the general sieving problem in Problem 3, in which one tries to control the quantity {\sum_{n \not \in \bigcup_{p|P} E_p} a_n} via knowledge of the sums {\sum_{n \in E_d} a_n}, together with the hypothesis that the {a_n} are non-negative. In most situations, one expects this problem to be underdetermined; in particular, there will probably exist two different finitely supported sequences {(a_n)_{n \in {\bf Z}}}, {(b_n)_{n \in {\bf Z}}} of non-negative reals which are essentially indistinguishable with regards to the inputs of this sieve problem, in the sense that

\displaystyle  \sum_{n \in E_d} a_n \approx \sum_{n \in E_d} b_n \approx X_d

for all {d \in {\mathcal D}}, but which have very different outputs for this sieve problem, in that

\displaystyle  \sum_{n \not \in \bigcup_{p|P} E_p} a_n \not \approx \sum_{n \not \in \bigcup_{p|P} E_p} b_n .

Then any upper bound for Problem 3 with this choice of {E_d, X_d, {\mathcal D}, P} must exceed (at least approximately) the greater of {\sum_{n \not \in \bigcup_{p|P} E_p} a_n} and {\sum_{n \not \in \bigcup_{p|P} E_p} b_n}; similarly, any lower bound for this problem must be (approximately) less than the lesser of {\sum_{n \not \in \bigcup_{p|P} E_p} a_n} and {\sum_{n \not \in \bigcup_{p|P} E_p} b_n}.

As a special case of this observation, we have

Proposition 47 (Abstract parity barrier) (Informal) Let {E_d, X_d, {\mathcal D}, P} be as in Problem 3. Let {(a_n)_{n \in {\bf Z}}} be a finitely supported sequence of non-negative reals such that

\displaystyle  \sum_{n \in E_d} a_n \approx X_d

for all {d \in {\mathcal D}}. Suppose that we have an additional “parity function” {\omega: {\bf Z} \rightarrow \{-1,0,+1\}} with the property that

\displaystyle  \omega(n) = -1

for all {n \not \in \bigcup_{p|P} E_p} in the support of {a_n}, but such that

\displaystyle  \sum_{d \in E_d} a_n \omega(n) \approx 0

for all {d \in {\mathcal D}}. Then any upper bound for Problem 3 cannot be significantly smaller than {2 \sum_{n \not \in \bigcup_{p|P} E_p} a_n}, and any lower bound for Problem 3 cannot be significantly larger than zero. In particular, one does not expect to be able to show using such sieve methods that the support of {a_n} contains any appreciable number of elements outside of {\bigcup_{p|P} E_p}.

Proof: (Non-rigorous!) For the statement about the upper bound, apply the previous discussion to the sequence {b_n := a_n (1 - \omega(n))}. For the statement about the lower bound, apply the previous discussion to the sequence {b(n) := a_n (1+\omega(n))}. \Box

Informally, the proposition asserts that if the sifted set {{\bf Z} \backslash \bigcup_{p|P} E_p} is sensitive to the parity function {\omega}, but original pre-sifted sequence {a_n} and the sets {E_d} are not, then traditional sieve methods cannot give any non-trivial lower bounds, and any upper bounds must be off from the truth by a factor of at least two.

In practice, one can heuristically obey the hypotheses of this proposition if one takes {\omega(n)} to be some variant of the Liouville function {\lambda(n)}, which counts the parity of the number of prime factors of {n}, in which case the parity problem asserts (roughly speaking) that the sifting problem must allow both numbers with an even number of prime factors and numbers with an odd number of prime factors to survive if one is to have any hope of good lower bounds. For instance:

Corollary 48 (Parity barrier for twin primes) (Very informal) One cannot use traditional sieve methods to resolve the twin prime conjecture.

Proof: (Very non-rigorous! Also assumes Liouville pseudorandomness) As mentioned in the introduction, there are a number of different ways to try to approach the twin prime conjecture by sieve methods. Consider for instance the approach in which one starts with the primes {A} in {[x/2+2,x]} and removes the residue class {E_p = 2\ (p)} for each {p \leq \sqrt{x}}. The quantity {\lambda(n+2)} will then be {-1} on the sifted set {A \backslash \bigcup_{p|P} E_p} (since {n+2} will be prime on that set), while the Liouville pseudorandomness heuristic predicts that

\displaystyle  \sum_{n \in E_d} 1_A(n) \lambda(n+2) = \sum_{x/2+2 \leq n \leq x: n = 2\ (d)} 1_{\mathcal P}(n) \lambda(n+2)

is negligible for all {d \leq x} (say). The claim then follows from Proposition 47. \Box

Exercise 49 (Informal) Develop similar parity obstructions for the other sieve-theoretic approaches to the twin prime conjecture. In the case when one is sieving out the integers by two residue classes {0, -2\ (p)} for each prime {p}, argue that any sieve-theoretic bound must in fact be off from the truth by a factor of at least four. (Hint: experiment with weights such as {(1 - \lambda(n)) (1-\lambda(n+2))}.)

Exercise 50 (Informal) Argue why any sieve-theoretic upper bound on prime {k}-tuples that is based on equidistribution estimates for the natural numbers and primes must be off from the truth by a factor of at least {2^{k-1}}. (This should be compared with the known upper bounds of {2^k k!} and {4^{k-1} (k-1)!} discussed in previous sections. It is not clear at present which of these bounds is closest to the true limit of sieve theory.)

The above argument is far from watertight, and there have been successful attempts to circumvent the parity barrier. For instance:

  • (i) The parity barrier does not prohibit the possibility of producing a very small number of prime (or near-prime) patterns, even if one cannot hope to get lower bounds that are of the “right” order of magnitude. For instance, it was shown unconditionally by Goldston, Graham, Pintz, and Yildirim that there exist infinitely many pairs of consecutive numbers {n,n+1}, each of which are the product of exactly four primes. Cramér type random models (see Supplement 4) predict the number of such pairs up to {x} to be approximately {\frac{x}{\log^2 x}} (ignoring some factors of {\log\log x}), and the parity barrier prevents one from getting lower bounds of this magnitude; but the construction of Goldston, Graham, Pintz, and Yildirim only produces a quite sparse set of pairs (about {\frac{x}{\log^3 x}} or so) and so evades the parity barrier. Using a very different class of arguments, it is also possible to produce analogues of patterns like twin primes in the function field setting (in which the role of primes are replaced by irreducible polynomials over a finite field) by using very sparse sequences of polynomials for which irreducibility can be tested easily; see for instance this paper of Hall for the case of twin irreducible polynomials.
  • (ii) The parity barrier may disappear if one is somehow in a situation in which the Möbius or Liouville pseudorandomness principles break down. An extreme example of this is when one assumes the existence of one (or many) Siegel zeroes, which imply extremely high correlation between the Möbius or Liouville functions and a Dirichlet character, in strong violation of the pseudorandomness principle. A striking example of this case occurs in this paper of Heath-Brown, which demonstrates the truth of the twin prime conjecture if one assumes the existence of an infinite sequence of Siegel zeroes.
  • (iii) Traditional sieve theoretic methods only assume that one can control “linear” or “Type I” sums of the form {\sum_{n \in E_d} a_n}. However, in some cases it is also possible to control “bilinear” or “Type II” sums such as {\sum_n \sum_m \alpha_n \beta_m a_{nm}} for various coefficients {\alpha_n, \beta_m} of controlled size and support but uncontrolled phase. Note that multiplicative parity functions such as {\lambda} will be distinguishable with such sums, since {\sum_n \sum_m \alpha_n \beta_m \lambda(nm)} will not exhibit any cancellation if {\alpha_n, \beta_m} both oscillate with the sign of {\lambda(n), \lambda(m)} respectively; as such, the parity barrier is no longer in effect. In such situations it can be possible to go beyond the limits of traditional sieve theory, for instance by using decompositions similar to the Vaughan identity employed in Notes 3. Of course, one still has to actually establish the required bilinear sum asymptotics, and this typically requires some deep and non-trivial input outside of sieve theory. A model example here is the work of Friedlander and Iwaniec establishing the infinitude of primes of the form {a^2+b^4}. Here, the relevant bilinear input is obtained using the algebraic structure of the Gaussian integers and some non-trivial results on lattice counting problems.
  • (iv) Sieve theory is focused on “one-dimensional” sums, such as {\sum_{n \not \in \bigcup_{p|P} E_p} a_n}, or perhaps {\sum_{n \leq x} \Lambda(n) \Lambda(n+2)}. The presence of the parity problem relies on the ability to modify weights such as {a_n} or {\Lambda(n)} in such sums in such a fashion that the value of the sum dramatically changes. If one instead is trying to control a multi-dimensional sum, such as {\sum_{n,r \leq x} \Lambda(n) \Lambda(n+r) \Lambda(n+2r) \Lambda(n+3r)} (which roughly speaking counts arithmetic progressions of primes of length four), then there is no easy way to modify any of the factors of such a sum in the fashion required for the parity barrier to be in effect. For such multi-dimensional sums one can then hope to use additional estimates to produce non-trivial lower bounds. For instance, with Ben Green, we used Selberg sieves combined with the additional tool of Szemerédi’s theorem to obtain lower bounds for multi-dimensional sums similar to the one above, proving in particular that the primes contained arbitrarily long arithmetic progressions.

It is one of the major problems in analytic prime number theory to continue to find more ways to defeat the parity barrier, so that we can produce prime patterns in many more situations than what is currently possible. However, this has proven to be a surprisingly challenging task, and one which typically requires the injection of new ideas and methods external to sieve theory.

Remark 51 As mentioned previously, it is not known in general if the parity problem represents the only obstruction to sieve methods. However, under the additional hypothesis of the Elliott-Halberstam conjecture, and when focusing on “binary” problems such as the twin prime conjecture, there is a more detailed analysis of the parity problem due to Bombieri which indicates that, in some sense, the parity problem is indeed the only information not captured by sieve-theoretic methods, in that the asymptotic distribution of primes or almost primes in pairs {n,n+2} is completely determined by a single parameter {0 \leq \delta \leq 2} that represents the ratio between the actual number of twin primes and the expected number of twin primes; the parity problem prevents us from obtaining any further information about {\delta} other than that it lies between {0} and {2}, but once {\delta} is known, essentially all other asymptotics regarding the prime factorisation of {n} and {n+2} will also be known. See Chapter 16 of Friedlander-Iwaniec for further discussion.

Remark 52 For a slightly different way to formalise the parity problem, see this previous blog post. Among other things, it is shown in that post that the parity problem prevents one from proving Theorem 38 with any choice of {m,k} with {m > \lceil \frac{k}{2} \rceil} using purely sieve-theoretic methods, in particular ruling out a solution to the twin prime conjecture by sieve methods.