Many problems in non-multiplicative prime number theory can be recast as sieving problems. Consider for instance the problem of counting the number of pairs of twin primes contained in for some large ; note that the claim that for arbitrarily large is equivalent to the twin prime conjecture. One can obtain this count by any of the following variants of the sieve of Eratosthenes:
- Let be the set of natural numbers in . For each prime , let be the union of the residue classes and . Then is the cardinality of the sifted set .
- Let be the set of primes in . For each prime , let be the residue class . Then is the cardinality of the sifted set .
- Let be the set of primes in . For each prime , let be the residue class . Then is the cardinality of the sifted set .
- Let be the set . For each prime , let be the residue class Then is the cardinality of the sifted set .
Exercise 1 Develop similar sifting formulations of the other three Landau problems.
In view of these sieving interpretations of number-theoretic problems, it becomes natural to try to estimate the size of sifted sets for various finite sets of integers, and subsets of integers indexed by primes dividing some squarefree natural number (which, in the above examples, would be the product of all primes up to ). As we see in the above examples, the sets in applications are typically the union of one or more residue classes modulo , but we will work at a more abstract level of generality here by treating as more or less arbitrary sets of integers, without caring too much about the arithmetic structure of such sets.
It turns out to be conceptually more natural to replace sets by functions, and to consider the more general the task of estimating sifted sums
for some finitely supported sequence of non-negative numbers; the previous combinatorial sifting problem then corresponds to the indicator function case . (One could also use other index sets here than the integers if desired; for much of sieve theory the index set and its subsets are treated as abstract sets, so the exact arithmetic structure of these sets is not of primary importance.)
Continuing with twin primes as a running example, we thus have the following sample sieving problem:
where , is the product of all the primes strictly less than (we omit itself for minor technical reasons), and is the union of the residue classes . Obtain upper and lower bounds on which are as strong as possible in the asymptotic regime where goes to infinity and the sifting level grows with (ideally we would like to grow as fast as ).
From the preceding discussion we know that the number of twin prime pairs in is equal to , if is not a perfect square; one also easily sees that the number of twin prime pairs in is at least , again if is not a perfect square. Thus we see that a sufficiently good answer to Problem 2 would resolve the twin prime conjecture, particularly if we can get the sifting level to be as large as .
We return now to the general problem of estimating (1). We may expand
where (with the convention that ). We thus arrive at the Legendre sieve identity
Specialising to the case of an indicator function , we recover the inclusion-exclusion formula
Such exact sieving formulae are already satisfactory for controlling sifted sets or sifted sums when the amount of sieving is relatively small compared to the size of . For instance, let us return to the running example in Problem 2 for some . Observe that each in this example consists of residue classes modulo , where is defined to equal when and when is odd. By the Chinese remainder theorem, this implies that for each , consists of residue classes modulo . Using the basic bound
Also, the number of divisors of is at most . From the Legendre sieve (3), we thus conclude that
We can factorise the main term to obtain
coming from the equidistribution of residues principle (Section 3 of Supplement 4), bearing in mind (from the modified Cramér model, see Section 1 of Supplement 4) that we expect this heuristic to become inaccurate when becomes very large. We can simplify the right-hand side of (7) by recalling the twin prime constant
(see equation (7) from Supplement 4); note that
so from Mertens’ third theorem (Theorem 42 from Notes 1) one has
when with . This is somewhat encouraging for the purposes of getting a sufficiently good answer to Problem 2 to resolve the twin prime conjecture, but note that is currently far too small: one needs to get as large as before one is counting twin primes, and currently can only get as large as .
The problem is that the number of terms in the Legendre sieve (3) basically grows exponentially in , and so the error terms in (4) accumulate to an unacceptable extent once is significantly larger than . An alternative way to phrase this problem is that the estimate (4) is only expected to be truly useful in the regime ; on the other hand, the moduli appearing in (3) can be as large as , which grows exponentially in by the prime number theorem.
To resolve this problem, it is thus natural to try to truncate the Legendre sieve, in such a way that one only uses information about the sums for a relatively small number of divisors of , such as those which are below a certain threshold . This leads to the following general sieving problem:
Problem 3 (General sieving problem) Let be a squarefree natural number, and let be a set of divisors of . For each prime dividing , let be a set of integers, and define for all (with the convention that ). Suppose that is an (unknown) finitely supported sequence of non-negative reals, whose sums
are known for all . What are the best upper and lower bounds one can conclude on the quantity (1)?
Here is a simple example of this type of problem (corresponding to the case , , , , and ):
and give counterexamples to show that these bounds cannot be improved in general, even when is an indicator function sequence.
Problem 3 is an example of a linear programming problem. By using linear programming duality (as encapsulated by results such as the Hahn-Banach theorem, the separating hyperplane theorem, or the Farkas lemma), we can rephrase the above problem in terms of upper and lower bound sieves:
Theorem 5 (Dual sieve problem) Let be as in Problem 3. We assume that Problem 3 is feasible, in the sense that there exists at least one finitely supported sequence of non-negative reals obeying the constraints in that problem. Define an (normalised) upper bound sieve to be a function of the form
for some coefficients , and obeying the pointwise upper bound
for all . Thus for instance and are (trivially) upper bound sieves and lower bound sieves respectively.
- (i) The supremal value of the quantity (1), subject to the constraints in Problem 3, is equal to the infimal value of the quantity , as ranges over all upper bound sieves.
- (ii) The infimal value of the quantity (1), subject to the constraints in Problem 3, is equal to the supremal value of the quantity , as ranges over all lower bound sieves.
Proof: We prove part (i) only, and leave part (ii) as an exercise. Let be the supremal value of the quantity (1) given the constraints in Problem 3, and let be the infimal value of . We need to show that .
We first establish the easy inequality . If the sequence obeys the constraints in Problem 3, and is an upper bound sieve, then
and hence (by the non-negativity of and )
taking suprema in and infima in we conclude that .
Now suppose for contradiction that , thus for some real number . We will argue using the hyperplane separation theorem; one can also proceed using one of the other duality results mentioned above. (See this previous blog post for some discussion of the connections between these various forms of linear duality.) Consider the affine functional
on the vector space of finitely supported sequences of reals. On the one hand, since , this functional is positive for every sequence obeying the constraints in Problem 3. Next, let be the space of affine functionals of the form
for some real numbers , some non-negative function which is a finite linear combination of the for , and some non-negative . This is a closed convex cone in a finite-dimensional vector space ; note also that lies in . Suppose first that , thus we have a representation of the form
for any finitely supported sequence . Comparing coefficients, we conclude that
for any (i.e., is an upper bound sieve), and also
and thus , a contradiction. Thus lies outside of . But then by the hyperplane separation theorem, we can find an affine functional on that is non-negative on and negative on . By duality, such an affine functional takes the form for some finitely supported sequence and (indeed, can be supported on a finite set consisting of a single representative for each atom of the finite -algebra generated by the ). Since is non-negative on the cone , we see (on testing against multiples of the functionals or ) that the and are non-negative, and that for all ; thus is feasible for Problem 3. Since is negative on , we see that
and thus , giving the desired contradiction.
Exercise 6 Prove part (ii) of the above theorem.
Exercise 7 Show that the infima and suprema in the above theorem are actually attained (so one can replace “infimal” and “supremal” by “minimal” and “maximal” if desired).
Exercise 8 What are the optimal upper and lower bound sieves for Exercise 4?
In the case when consists of all the divisors of , we see that the Legendre sieve is both the optimal upper bound sieve and the optimal lower bound sieve, regardless of what the quantities are. However, in most cases of interest, will only be some strict subset of the divisors of , and there will be a gap between the optimal upper and lower bounds.
Observe that a sequence of real numbers will form an upper bound sieve if one has the inequalities
for all ; we will refer to such sequences as upper bound sieve coefficients. (Conversely, if the sets are in “general position” in the sense that every set of the form for is non-empty, we see that every upper bound sieve arises from a sequence of upper bound sieve coefficients.) Similarly, a sequence of real numbers will form a lower bound sieve if one has the inequalities
for all with ; we will refer to such sequences as lower bound sieve coefficients.
where is the number of prime factors of , is a sequence of upper bound sieve coefficients for even , and a sequence of lower bound sieve coefficients for odd . Deduce the Bonferroni inequalities
when is odd, whenever one is in the situation of Problem 3 (and contains all with ). The resulting upper and lower bound sieves are sometimes known as Brun pure sieves. The Legendre sieve can be viewed as the limiting case when .
In many applications the sums in (9) take the form
for some quantity independent of , some multiplicative function with , and some remainder term whose effect is expected to be negligible on average if is restricted to be small, e.g. less than a threshold ; note for instance that (5) is of this form if for some fixed (note from the divisor bound, Lemma 23 of Notes 1, that if ). We are thus led to the following idealisation of the sieving problem, in which the remainder terms are ignored:
Thus, for instance, the trivial upper bound sieve and the trivial lower bound sieve show that (14) can equal and (15) can equal . Of course, one hopes to do better than these trivial bounds in many situations; usually one can improve the upper bound quite substantially, but improving the lower bound is significantly more difficult, particularly when is large compared with .
If the remainder terms in (13) are indeed negligible on average for , then one expects the upper and lower bounds in Problem 3 to essentially be the optimal bounds in (14) and (15) respectively, multiplied by the normalisation factor . Thus Problem 10 serves as a good model problem for Problem 3, in which all the arithmetic content of the original sieving problem has been abstracted into two parameters and a multiplicative function . In many applications, will be approximately on the average for some fixed , known as the sieve dimension; for instance, in the twin prime sieving problem discussed above, the sieve dimension is . The larger one makes the level of distribution compared to , the more choices one has for the upper and lower bound sieves; it is thus of interest to obtain equidistribution estimates such as (13) for as large as possible. When the sequence is of arithmetic origin (for instance, if it is the von Mangoldt function ), then estimates such as the Bombieri-Vinogradov theorem, Theorem 17 from Notes 3, turn out to be particularly useful in this regard; in other contexts, the required equidistribution estimates might come from other sources, such as homogeneous dynamics, or the theory of expander graphs (the latter arises in the recent theory of the affine sieve, discussed in this previous blog post). However, the sieve-theoretic tools developed in this post are not particularly sensitive to how a certain level of distribution is attained, and are generally content to use sieve axioms such as (13) as “black boxes”.
In some applications one needs to modify Problem 10 in various technical ways (e.g. in altering the product , the set , or the definition of an upper or lower sieve coefficient sequence), but to simplify the exposition we will focus on the above problem without such alterations.
Exercise 11 Let be as in Problem 10, and set .
- (i) Show that the quantity (14) is always at least when is a sequence of upper bound sieve coefficients. Similarly, show that the quantity (15) is always at most when is a sequence of lower bound sieve coefficients. (Hint: compute the expected value of when is a random factor of chosen according to a certain probability distribution depending on .)
- (ii) Show that (14) and (15) can both attain the value of when . (Hint: translate the Legendre sieve to this setting.)
The problem of finding good sequences of upper and lower bound sieve coefficients in order to solve problems such as Problem 10 is one of the core objectives of sieve theory, and has been intensively studied. This is more of an optimisation problem rather than a genuinely number theoretic problem; however, the optimisation problem is sufficiently complicated that it has not been solved exactly or even asymptotically, except in a few special cases. (It can be reduced to a optimisation problem involving multilinear integrals of certain unknown functions of several variables, but this problem is rather difficult to analyse further; see these lecture notes of Selberg for further discussion.) But while we do not yet have a definitive solution to this problem in general, we do have a number of good general-purpose upper and lower bound sieve coefficients that give fairly good values for (14), (15), often coming within a constant factor of the idealised value , and which work well for sifting levels as large as a small power of the level of distribution . Unfortunately, we also know of an important limitation to the sieve, known as the parity problem, that prevents one from taking as large as while still obtaining non-trivial lower bounds; as a consequence, sieve theory is not able, on its own, to sift out primes for such purposes as establishing the twin prime conjecture. However, it is still possible to use these sieves, in conjunction with additional tools, to produce various types of primes or prime patterns in some cases; examples of this include the theorem of Ben Green and myself in which an upper bound sieve is used to demonstrate the existence of primes in arbitrarily long arithmetic progressions, or the more recent theorem of Zhang in which (among other things) used an upper bound sieve was used to demonstrate the existence of infinitely many pairs of primes whose difference was bounded. In such arguments, the upper bound sieve was used not so much to count the primes or prime patterns directly, but to serve instead as a sort of “container” to efficiently envelop such prime patterns; when used in such a manner, the upper bound sieves are sometimes known as enveloping sieves. If the original sequence was supported on primes, then the enveloping sieve can be viewed as a “smoothed out indicator function” that is concentrated on almost primes, which in this context refers to numbers with no small prime factors.
In a somewhat different direction, it can be possible in some cases to break the parity barrier by assuming additional equidistribution axioms on the sequence than just (13), in particular controlling certain bilinear sums involving rather than just linear sums of the . This approach was in particular pursued by Friedlander and Iwaniec, leading to their theorem that there are infinitely many primes of the form .
The study of sieves is an immense topic; see for instance the recent 527-page text by Friedlander and Iwaniec. We will limit attention to two sieves which give good general-purpose results, if not necessarily the most optimal ones:
- (i) The beta sieve (or Rosser-Iwaniec sieve), which is a modification of the classical combinatorial sieve of Brun. (A collection of sieve coefficients is called combinatorial if its coefficients lie in .) The beta sieve is a family of upper and lower bound combinatorial sieves, and are particularly useful for efficiently sieving out all primes up to a parameter from a set of integers of size , in the regime where is moderately large, leading to what is sometimes known as the fundamental lemma of sieve theory.
- (ii) The Selberg upper bound sieve, which is a general-purpose sieve that can serve both as an upper bound sieve for classical sieving problems, as well as an enveloping sieve for sets such as the primes. (One can also convert the Selberg upper bound sieve into a lower bound sieve in a number of ways, but we will only touch upon this briefly.) A key advantage of the Selberg sieve is that, due to the “quadratic” nature of the sieve, the difficult optimisation problem in Problem 10 is replaced with a much more tractable quadratic optimisation problem, which can often be solved for exactly.
Remark 12 It is possible to compose two sieves together, for instance by using the observation that the product of two upper bound sieves is again an upper bound sieve, or that the product of an upper bound sieve and a lower bound sieve is a lower bound sieve. Such a composition of sieves is useful in some applications, for instance if one wants to apply the fundamental lemma as a “preliminary sieve” to sieve out small primes, but then use a more precise sieve like the Selberg sieve to sieve out medium primes. We will see an example of this in later notes, when we discuss the linear beta-sieve.
We will also briefly present the (arithmetic) large sieve, which gives a rather different approach to Problem 3 in the case that each consists of some number (typically a large number) of residue classes modulo , and is powered by the (analytic) large sieve inequality of the preceding section. As an application of these methods, we will utilise the Selberg upper bound sieve as an enveloping sieve to establish Zhang’s theorem on bounded gaps between primes. Finally, we give an informal discussion of the parity barrier which gives some heuristic limitations on what sieve theory is able to accomplish with regards to counting prime patters such as twin primes.
These notes are only an introduction to the vast topic of sieve theory; more detailed discussion can be found in the Friedlander-Iwaniec text, in these lecture notes of Selberg, and in many further texts.
— 1. Combinatorial sieving and the fundamental lemma of sieve theory —
We begin with a discusion of combinatorial upper and lower bound sieves, in which the sieve coefficients take values in . These sieves, first introduced by Brun, can be viewed as truncations of the Legendre sieve (which corresponds to the choice of coefficients for all ). The Legendre sieve, and more generally the Brun pure sieves in Exercise 9, give basic examples of combinatorial sieves.
We loosely follow the discussion of Friedlander and Iwaniec. The combinatorial sieves may be motivated by observing the Buchstab identity
There is an analogous identity for the function :
Exercise 13 For any arithmetic function , show that
How is this identity related to (16)?
is an upper bound sieve; similarly, if is a family of upper bound sieves for each , then
is a lower bound sieve. In terms of sieve coefficients, we see that if are lower bound sieve coefficients for each , then
is a sequence of upper bound sieve coefficients (where denotes the largest prime factor of ). Similarly, if are upper bound sieve coefficients for each , then
is a sequence of lower bound sieve coefficients.
One can iterate this procedure to produce an alternating sequence of upper and lower bound sieves. If one iterates out the Buchstab formula completely, one ends up back at the Legendre sieve. However, one can truncate this iteration by using the trivial lower bound sieve of for some of the primes . For instance, suppose one seeks an upper bound for . Applying (16), we only save some of the summands, say those which obey some predicate to be chosen later. For the remaining summands, we use the trivial lower bound sieve of , giving
For the surviving summands, we apply (16) again. With the sign change, the trivial lower bound sieve is not applicable, so we do not discard any further summands and arrive at
For the summands in the second sum on the right, we apply (16) once again. We once again have a favourable sign and can use the trivial lower bound sieve of to discard some of the resulting summands, keeping only those triples that obey some predicate , giving
Since one cannot have an infinite descent of primes, this process will eventually terminate to produce an upper bound sieve. A similar argument (but with the signs reversed) gives a lower bound sieve. We formalise this discussion as follows:
Proposition 14 (General combinatorial sieve) Let . For each natural number , let be a predicate pertaining to a sequence of decreasing primes, thus is either true or false for a given choice of such primes. Let (resp. ) denote the set of which, when factored as for , are such that holds for all odd (resp. even) . Thus
- (i) For any sets , is an upper bound sieve, and is a lower bound sieve. In fact, we have the more precise identities
where denotes the least prime factor of and, for each natural number , is the collection of products with such that holds for all with the same parity as , but fails for . Thus for instance
- (ii) The sequences and are upper and lower sieve coefficient sequences respectively.
- (iii) For any multiplicative function , one has the identities
Note that the Legendre sieve is the special case of the combinatorial sieve when all the predicates are true for all choices of inputs (i.e. one does not ever utilise the trivial lower bound sieve); more generally, the Brun pure sieve with parameter corresponds to the case when is always true for and always false for . The predicates for odd are only used for the upper bound sieve and not the lower bound sieve; conversely, the predicates for even are only used for the lower bound sieve and not the upper bound sieve.
Exercise 15 Prove Proposition 14.
Exercise 16 Interpret the sieves in Exercise 8 as combinatorial sieves. What are the predicates in this case?
We now use the above proposition to attack Problem 10 for a given choice of parameters and ; we will focus on the regime when
for some , thus we are sifting out primes up to a small power of the sieve level . To do this, one needs to select the predicates in such a fashion that the sets only consist of divisors that are less than or equal to . There are a number of ways to achieve this, but experience has shown that a good general-purpose choice in this regard is to define to be the predicate
to be chosen later. The combinatorial sieve with this choice of predicate is known as the beta sieve or Rosser-Iwaniec sieve at sieve level . Observe that if lies in for some and , then at least one of or will hold, and hence since . If , we have the same conclusion thanks to (19) since . Thus under the hypotheses (21), (19), the beta sieves are indeed upper and lower bound sieves at level .
for all primes and some fixed depending only on .
We conclude that the condition is automatically satisfied when , and so the summand in (23) can be restricted to the range .
Next, note that if for some , then from (20) we have
for all with the same parity as , which implies (somewhat crudely) that
for all , regardless of parity (note that the case follows since ). We rearrange this inequality as
which iterates to give
On the other hand, from the definition of we have
which leads to a lower bound on :
so in particular
We can thus estimate (23) somewhat crudely as
We can estimate
thanks to (22) and the crude bound coming from the Taylor expansion of . If we choose large enough so that
(note the right-hand side decays like as ), we thus can estimate (23) by
which sums to
for any multiplicative function with for all primes , obeying the bounds (22).
Informally, the fundamental lemma shows that one can get exponentially close to the Legendre sieve limit of as long as the sieve level is a large power of the sifting range .
Proof: Without loss of generality we may assume that is sufficiently large depending on , since for small we may simply reduce (for the purposes of the upper bound sieve) or use the trivial lower bound sieve . But then we may find obeying (24) and less than , and the claim follows.
Exercise 18 Improve the error in (25) to for any . (It is possible to show that this error term is essentially best possible, at least in the linear dimension case , by using asymptotics for rough numbers (Exercise 28 of Supplement 4), but we will not do so here.)
The fundamental lemma is often used as a preliminary sifting device to remove small primes, in preparation for a more sophisticated sieve to deal with medium primes. But one can of course use it directly. Indeed, thanks to Theorem 5, we now have an adequate answer to Problem 3 in the regime where the sieve dimension is fixed and the sifting level is small compared to the level of distribution:
Corollary 19 (Fundamental lemma, applied directly) Let for some , and suppose that is a multiplicative function obeying the bounds (22) for all , where . For each , let be a set of integers, and let is a finitely supported sequence of non-negative reals such that
for all square-free , where and , and . Then
We now specialise to the twin prime problem in Problem 2. Here, , and where for odd , with . In particular we may take . We set for some fixed , and for some quantity going sufficiently slowly to infinity, so . From (8) we have
and from (5) and the divisor bound we have
for any , as . From the fundamental lemma, we conclude that
which implies from the sieve of Eratosthenes that there are pairs of twin primes in . This implies a famous theorem of Brun:
Exercise 20 (Brun’s theorem) Show that the sum is convergent.
Alternatively, if one applies (26) with a sufficiently large natural number, we see that
for all sufficiently large . This implies that there are numbers in with the property that are both not divisible by any prime less than or equal to . In particular (if ), and both have at most prime factors. We conclude a version of the twin prime conjecture for almost primes:
Exercise 22 By going through the beta sieve estimates carefully, show that one can take in the above corollary. (One can do substantially better if one uses more sophisticated sieve estimates, and of course Chen’s theorem tells us that can be taken as low as ; we will review the proof of Chen’s theorem in subsequent notes.)
In the above corollary, both elements of the pair are permitted to be almost prime rather than prime. It is possible to use the fundamental lemma to improve upon this by forcing (for instance) to be prime. To do this, we run the twin prime sieve differently; rather than start with all integers and sieve out two residue classes modulo each prime , we instead start with the primes and sieve out one residue class modulo . More precisely, we consider the quantity
where is the residue class and
(say). Observe that the sum is only as large as at worst for even . For odd , we expect from the prime number theorem in arithmetic progressions that
for any fixed . We thus take for some small fixed and , and apply Corollary 19 to conclude that
For a sufficiently large fixed constant, we conclude from Mertens’ theorem that
for large enough. The contribution for those for which is a power of a prime, rather than a prime, is easily seen to be negligible, and we conclude that there are primes less than such that has no factors less than , and thus is the product of at most primes. We thus have the following improvement of Corollary 21:
Proposition 23 There is an absolute constant such that there are infinitely many primes such that is the product of at most primes.
More generally, the fundamental lemma (combined with equidistribution results such as the Bombieri-Vinogradov theorem) lets one prove analogues of various unsolved conjectures about primes, in which one either (a) is content with upper bounds of the right order of magnitude, or (b) is willing to replace primes with a suitable notion of almost prime. We give some examples in the exercises below.
Exercise 24 (Approximations to the prime -tuples conjecture) Let be an admissible -tuple of integers for some , thus the are all distinct, and for any , the number of residue classes occupied by is never equal to . Let denote the singular series
- (i) Show that for all sufficiently large . (This should be compared with the Hardy-Littlewood prime tuples conjecture as , see Exercise 13 of Supplement 4.) Thus the fundamental lemma tells us that the prime tuples conjecture is only off from the truth by a multiplicative factor of , at least as far as upper bounds are concerned.
- (ii) Show that if is sufficiently large depending on , then there are infinitely many such that are each the product of at most prime factors.
- (iii) Strengthen (ii) to include the requirement that (say) is prime.
Exercise 25 (Approximations to the even Goldbach conjecture) If is an even natural number, define the singular series
- (i) Show that the number of ways to write as the sum of two primes is if is sufficiently large. (This should be compared with the prediction of as , see Exercise 12 of Supplement 4.)
- (ii) Show that if is a sufficiently large natural number, then every sufficiently large even number is expressible as the sum of two natural numbers , each of which are the product of at most prime factors.
- (iii) Strengthen (ii) to include the requirement that is prime.
Exercise 26 (Approximations to Legendre’s conjecture)
- (i) Show that for any , the number of primes in the interval is at most .
- (ii) Show that if is a sufficiently large natural number, then for all sufficiently large , there is at least one natural number between and that is the product of at most primes.
Exercise 27 (Approximations to Landau’s fourth problem) Define the singular series
where is the number of residue classes such that .
- (i) Show that the number of primes of the form with is for sufficiently large . (This should be compared with the prediction , see Exercise 15 of Supplement 4.)
- (ii) Show that if is a sufficiently large natural number, then there are an infinite number of numbers of the form that are the product of at most primes.
— 2. The Selberg upper bound sieve —
We now turn to the Selberg upper bound sieve, which is a simple but remarkably useful general-purpose upper bound sieve. We again follow the discussion of Friedlander and Iwaniec.
The idea of the sieve comes from the following trivial observation: if is a squarefree number, is a set of integers for each , and are arbitrary real numbers with , then the function
for is a sequence of upper bound sieve coefficients, where is the least common multiple of . If we set and assume that the are only non-zero for , then this sequence of sieve coefficients will only be non-zero on . We will refer to sieves (and sieve coefficients) constructed by this method as Selberg upper bound sieves (or Selberg upper bound sieve coefficients); they are also referred to as sieves in the literature.
A key advantage of the Selberg sieve is that the coefficients for are completely unconstrained within the real numbers, and the sieve depends in a quadratic fashion on these coefficients, so the problem of optimising the Selberg sieve for a given application usually reduces to an unconstrained quadratic optimisation problem, which can often be solved exactly in many cases. Of course, the optimal Selberg sieve need not be the globally optimal sieve amongst all upper bound sieves; but in practice the Selberg sieve tends to give quite good results even if it might not be the globally optimal choice.
Selberg sieves can be used for many purposes, but we begin with the classical sieve problem posed in Problem 10. To avoid some degeneracies we will assume that
for all . Restricting to upper bound sieve coefficients of the Selberg form, the quantity (14) becomes
Observe that any can be expressed as , with , in which case . By multiplicativity, we may thus write (30) as
which we can rearrange as as
The inner sum almost factorises as a square, but we have the coprimality constraint to deal with. The standard trick for dealing with this constraint is Möbius inversion:
inserting this and writing , , we can now write (30) as
which now does factor as a sum of squares:
Writing , this becomes
and is defined on factors of by the Dirichlet convolution
The coefficients are a transform of the original coefficients . For instance, if was replaced by a single prime , then the coefficients would be related to the coefficients by the transform
which is inverted as
More generally, we have the following inversion formula:
where , with equality precisely when for . Inserting this back into (33), we arrive at the optimised Selberg sieve coefficients
and the quantity (14) is equal to .
We have a pleasant size bound on the coefficients :
Lemma 29 We have for all .
Proof: Writing , we have
since one can compute that . The claim follows.
The upper bound sieve coefficients in (28) is then bounded by
Theorem 30 (Selberg sieve upper bound) Let be a squarefree natural number. For each prime dividing , let be a set of integers. Let be a multiplicative function with for all , and define the multiplicative function by (32), extending by zero outside of the factors of . Let for some . Suppose that is a finitely supported sequence of non-negative reals, such that
Remark 31 To compare this bound against the “expected” value of , observe from Euler products that
The quantity can often be computed in practice using standard methods for controlling sums of multiplicative functions, such as Theorem 27 of Notes 1. We illustrate this with the following sample application of the Selberg sieve:
and in particular
By the divisor bound and the choice of we have
so it will suffice to show (after adjusting ) that
But from Theorem 27(ii) of Notes 1 (with replaced by ), the left-hand side is
Since , we see that , and the claim follows (after adjusting appropriately).
Exercise 33 (Brun-Titchmarsh inequality) If , and is sufficiently large depending on , establish the Brun-Titchmarsh inequality
for all , where is the number of primes less than . More generally, if , is a primitive residue class, and is sufficiently large depending on , establish the Brun-Titchmarsh inequality
for all , where is the number of primes less than that lie in . Comparing this with the prime number theorem in arithmetic progressions, we see that the Brun-Titchmarsh inequality is off from the truth by a factor of two when ; however, the Brun-Titchmarsh inequality is applicable even when is extremely large compared with , whereas the prime number theorem gives no non-trivial bound in this regime. The can in fact be deleted in these inequalities, a result of Montgomery and Vaughan (using a refinement of the large sieve, discussed below).
As with the fundamental lemma, one can use the Selberg sieve together with the Bombieri-Vinogradov theorem to sift out a set of primes, rather than a set of intervals:
Exercise 35 (Sieving primes) Let be fixed natural numbers, and let . For each prime number , let be the union of primitive residue classes modulo , where for all , and for all . Then for any , one has
where is the singular series
(You will need Exercise 23 of Notes 3 to control the terms arising from the remainder term.) If one assumes the Elliott-Halberstam conjecture (see Exercise 22 of Notes 3), show that the factor of here can be lowered to .
Exercise 36 Show that the quantity in (39) may be replaced by , which is a superior bound for . (It remains an open question to improve upon the bound without assuming any further unproven conjectures.) Assuming the Elliott-Halberstam conjecture, show that one can replace instead by .
We have approached the analysis of the Selberg sieve as a “discrete” optimisation problem, in which one is trying to optimise the parameters or that are indexed by the discrete set . It is however also instructive to consider a “continuous” version of the optimisation problem, in which the coefficients or are described by a continuous function of or ; this is a slightly more restrictive class of sieves, but turns out to lead to numerical bounds that come very close to the discrete optimum, and allow for the use of methods from calculus of variations to come into play when using the Selberg sieve in more sophisticated ways. (See these lecture notes of Selberg for further discussion of how the continuous problem serves as a good approximation of the discrete one.) We give a basic illustration of this by presenting an alternate proof of (a very slightly weaker form of) Theorem 32 in which we do not use the choice (35) for the sieve weights , but instead choose weights of the form
for some smooth compactly supported function that vanishes on , and which equals at the origin; compare this smoothly truncated version of the Möbius function with the more abrupt truncations of the Möbius function that arise in combinatorial sieving. Such cases of the Selberg sieve were studied by several authors, most notably Goldston, Pintz, and Yildirim (although they made a polynomial on , rather than a smooth function). As in the previous proof of Theorem 32, we take , , , , , and . The quantity (30) (or (14)) is then equal to
We can remove the constraints , since the summands vanish outside of this range. To estimate this sum we use an argument related to that used to prove Proposition 9 from Notes 2. We first use Fourier inversion, applied to the function , to obtain a representation of the form
and so by Fubini’s theorem (using the divisor bound ) we may write (30) as
We have the crude upper bound
which by the asymptotic for (see equation (22) of Notes 1) implies that
Since , we can factor
where is the local Euler factor
The point of this factorisation is that, thanks to a routine but somewhat tedious Taylor expansion, one has the asymptotics
for any complex with imaginary part less than in magnitude, and hence (by the Cauchy integral formula)
for any complex with imaginary part less than in magnitude. By the fundamental theorem of calculus, we thus have
when are real. Also, observe from (38) that
From this and the standard asymptotic for close to , we have
and thus we can write (44) as
From the rapid decrease of , we may now remove the constraints on and write this as
To handle the weight , we observe from dividing (42) by and then differentiating times that
and hence by Fubini’s theorem
Using the Gamma function identity for any with , we conclude from another application of Fubini’s theorem that
Recall that we are requiring to equal , which after applications of the fundamental theorem of calculus (or integration by parts) is equivalent to the constraint
Since , we conclude from Cauchy-Schwarz that
with equality occurring when . Equality is not quite attainable with smooth, but by a mollification one can select an smoothly supported on (say) with and
for any . The expression (14) now takes the form
Remark 37 The Selberg sieve is most naturally used as an upper bound sieve. However, it is possible to modify the Selberg sieve to produce a lower bound sieve in a number of ways. One way is to simply insert the Selberg sieve into the Buchstab identity (16); see for instance this paper of Ankeny and Onishi for an investigation of this approach. Another method is to multiply the Selberg upper bound sieve by a lower bound sieve ; somewhat surprisingly, even a very crude lower bound sieve such as the Bonferroni sieve can give reasonably good results for this purpose. See Friedlander and Iwaniec for further analysis of this lower bound sieve (sometimes known as the sieve).
— 3. A multidimensional Selberg sieve, and bounded gaps between primes —
We now discuss a multidimensional variant of the Selberg sieve that establishes a recent result of Maynard (and independently by myself), giving further partial progress towards the Hardy-Littlewood prime tuples conjecture. For sake of exposition we shall omit some details; see this previous blog post for a more careful treatment; see also this survey of Granville.
More precisely, we sketch the proof of
Theorem 38 (Maynard’s theorem) Let be a natural number, and let be sufficiently large depending on . Then for any admissible -tuple , there exist infinitely many natural numbers such that at least of the numbers are prime.
Exercise 39 Show that Maynard’s theorem implies that the quantity is finite for each natural number , where is the prime, is finite. (For the most recent bounds on , see this Polymath paper.) In particular, the case of Theorem 38 yields “bounded gaps between primes”, in that there is a finite for which for infinitely many .
Even the case of this theorem was only proven in 2013 by Zhang, building upon earlier work of Goldston, Pintz, and Yildirim. Zhang’s original value of in the “bounded gaps” application was ; this has since been lowered several times, with the current record being , due to the Polymath project. The original arguments of Zhang and Goldston-Pintz-Yildirim used a “one-dimensional” Selberg sieve, with coefficients given by (41), and required quite strong equidistribution hypotheses on the primes (stronger than what the Bombieri-Vinogradov theorem provides). However, the argument of Maynard relies on a more general and flexible “multi-dimensional” Selberg sieve, which has better numerical performance, and as such does not need any equidistribution result beyond the Bombieri-Vinogradov theorem. This sieve has since been used for a number of further applications in analytic number theory; see this web page for a list of papers in this direction.
Let be as in Theorem 38. Suppose that, for sufficiently large , we can find a non-negative function supported on with the property that
Then by the pigeonhole principle, there exists at least one such that , that is to say at least of the numbers are prime. Letting go to infinity (keeping fixed), we will obtain Theorem 38.
It remains to establish the existence of a function with the required properties. Prior to the work of Goldston, Pintz, and Yildirim, it was only known (using tools such as the fundamental lemma of sieve theory, or off-the-shelf Selberg sieves) how to construct sieve weights obeying (45) with replaced by a small constant less than . The earlier arguments of Zhang and Goldston-Pintz-Yildirim chose a weight which was essentially of the Selberg sieve form
(compare with (41)), where for some fixed , and was a smooth fixed compactly supported function. Omitting many calculations, their conclusion was that one could prove the case of Theorem 38 once one had some sort of distribution result on the primes at level (as defined in Exercise 22 of Notes 3) with . Unfortunately, this just falls short of what the Bombieri-Vinogradov theorem provides, which is a level of distribution for any . One of the key contributions of Zhang was to obtain a partial distribution result at a value of slightly above , which when combined with (a modification of) the previous work of Goldston, Pintz, and Yildirim, was able to yield the case of Theorem 38.
The approach of Maynard and myself was a little different, based on multidimensional Selberg sieves such as
for some multidimensional smooth function supported on the simplex ; the one-dimensional sieve considered by Zhang and Goldston-Pintz-Yildirim is then essentially the case when is a function just of on this simplex. Actually, to avoid some (very minor) issues concerning common factors between the , and also to essentially eliminate the role of the singular series, it is convenient to work with a slight modification
of the above sieve, where for some slowly growing to infinity with (e.g. will do), and is a residue class such that is primitive for all (such a exists thanks to the admissibility of and the Chinese remainder theorem). Note that for large enough, the have no common factors, and so the are automatically coprime to each other and to . (In Maynard’s paper, the coefficients were replaced by a sequence which was not directly arising from a smooth function , but instead the analogue of the coefficients from the preceding section were chosen to be of the form for some suitable function . However, the differences between the two approaches are fairly minor.)
It is possible to estimate the right-hand side of (45) using a modification of the arguments of the previous section:
as , where
and denotes the mixed partial derivative in each of the variables .
If , one can control the effect of the weight using the Bombieri-Vinogradov theorem, and after some moderately complicated calculations one obtains
as , where
and denotes the mixed partial derivative in the variables . Similarly for permutations of the indices.
In view of these two calculations, one can establish Maynard’s theorem as soon as one can find and for which
where is defined similarly to but with the role of the and indices swapped. This is now a problem purely in calculus of variations rather than number theory. It is thus natural to take close to (though, for the qualitative analysis performed here, actually any non-zero choice of independent of will suffice). Our task is now to find with
The optimal choice of for a given is not known for any , but one can generate a “good enough” choice of by the following set of calculations. Firstly, we restrict to the case when is symmetric in the , so that all the are equal and our objective is now to satisfy the inequality
Next, we make the change of variables , thus is smooth on and supported on the simplex but is otherwise arbitrary on this simplex. From the fundamental theorem of calculus, the desired inequality now becomes
We select a function of the truncated product form
and our objective (after a rescaling) is now to get
The emergence of the function as a parameter in the multidimensional sieve is a feature that is not present in previous sieves (at least not without sharply curtailing the support of ), and is the main reason why this sieve performs better than previous sieves for this problem.
We interpret the above optimisation problem probabilistically. Let be independent real values in with probability density function . Our task is now to obtain the inequality
Suppose that we can locate a fixed function (independent of ) obeying (47) with
and the claim follows from (49) if is sufficiently large depending on .
(barely) obeys (49) with and both finite, and the claim follows by a routine rescaling.
Exercise 42 By working through a more quantitative version of the above argument, establish Theorem 38 with as . (Hint: One can replace the use of the law of large numbers with Chebyshev’s inequality.) It is not currently known how to obtain this theorem with any choice of that grows faster than logarithmically in ; the current record is , due to the Polymath project. It is known, though, that bounds on the order of are the limit of the Selberg sieve method, and one either needs new sieves, or new techniques beyond sieve theory, to increase beyond this rate; see the Polymath paper for further discussion.
One can use the multidimensional sieve to study the classical sieving problem of counting prime -tuples, but unfortunately the numerology it gives is precisely the same as the numerology given by previous sieves:
Exercise 43 Let be an admissible tuple, and let go to infinity sufficiently slowly as .
— 4. The large sieve —
In the general discussion of sieving problems such as Problem 3 in previous sections, there was no structure imposed on the sets beyond the fact that one could control the sums (9). However, as we have seen in practice, is typically the union of some number of residue classes modulo . It is possible to exploit this structure using Fourier analysis, leading to an approach to sieving known as the large sieve, and relying crucially on the analytic large sieve inequality from Notes 3. The large sieve tends to give results similar to that of the Selberg sieve; indeed, the two sieves are in some sense “dual” to each other and are powered by the same sort of -based techniques (such as the Cauchy-Schwarz inequality). However, the large sieve methods can offer sharper control on error terms than the Selberg sieve if carried out carefully. As the name suggests, the large sieve is particularly useful in the regime when the number of residue classes modulo being sieved out is large, for instance as large as a constant multiple of . (When is very large – close to , then there is another sieve that gives even sharper results, namely the larger sieve of Gallagher, which we do not discuss further here.)
To apply the analytic large sieve to sieving problems, we need to exploit an “uncertainty principle” of Montgomery (discussed further in this previous blog post), that shows that functions which avoid certain residue classes in the physical domain are necessarily spread out in the frequency domain. More precisely, we have
Theorem 44 (Montgomery uncertainty principle) Let be a finitely supported function, and let be the Fourier transform , where . Suppose that for each prime , there are residue classes modulo on which vanishes identically for some . Then for any and any natural number , one has
Proof: We may assume that is squarefree, since otherwise we have and the claim (50) is trivial. Observe from the Chinese remainder theorem that if the inequality (50) holds for two coprime values of , then it also holds for (as every fraction in with can be uniquely written as with ). Thus it suffices to verify the claim when is prime. By multiplying by we can also reduce to the case , thus our task is now to show that
By the Plancherel identity in , the left-hand side can be written as
and the right-hand side is
Since the sum is only non-vanishing for at most choices of , the claim follows from the Cauchy-Schwarz inequality.
and is given by (51).
Now observe from the basic identity
But from construction of , the left-hand side of (52) is equal to both and , and the claim follows.
Setting , one can then recover sharper versions of Theorem 32 and its consequences, for instance by working along these lines (and using a very sharp version of the analytic large sieve inequality that exploits the variable spacing of the set of fractions ), result of Montgomery and Vaughan were able to completely delete the error term in the Brun-Titchmarsh inequality (Exercise 33).
Remark 46 It is a general phenomenon that large sieve methods tend to give the same main term as the Selberg sieve; see Section 9.5 of Friedlander-Iwaniec for further discussion.
— 5. The parity problem —
We have seen that sieve theoretic techniques are reasonably good at obtaining upper bounds to various sieving problems, and can also provide lower bounds if the sifting limit is low. However, these methods have proven to be frustratingly ineffective at providing non-trivial lower bounds for the original problems that motivated the development of sieve theory in the first place, namely counting patterns of primes such as twin primes. While we do not fully understand the exact limitations of what sieve theory can and cannot do, we do have a convincing heuristic barrier, known as the parity barrier or parity problem and first raised explicitly by Selberg, which explains why sieves have so much difficulty in lower bounding (or efficiently upper bounding) counts involving primes; in particular they explain why sieve-theoretic methods have thus far been unable to fully resolve any of the four Landau problems. The barrier is not completely rigorous, as it relies on the Möbius pseudorandomness principle (or Liouville pseudorandomness principle) discussed in Section 2 of Supplement 4. As such, our discussion in this section will be somewhat informal in nature; for instance, we will use vague symbols such as without defining their meaning precisely.
Consider the general sieving problem in Problem 3, in which one tries to control the quantity via knowledge of the sums , together with the hypothesis that the are non-negative. In most situations, one expects this problem to be underdetermined; in particular, there will probably exist two different finitely supported sequences , of non-negative reals which are essentially indistinguishable with regards to the inputs of this sieve problem, in the sense that
for all , but which have very different outputs for this sieve problem, in that
Then any upper bound for Problem 3 with this choice of must exceed (at least approximately) the greater of and ; similarly, any lower bound for this problem must be (approximately) less than the lesser of and .
As a special case of this observation, we have
Proposition 47 (Abstract parity barrier) (Informal) Let be as in Problem 3. Let be a finitely supported sequence of non-negative reals such that
for all . Suppose that we have an additional “parity function” with the property that
for all in the support of , but such that
for all . Then any upper bound for Problem 3 cannot be significantly smaller than , and any lower bound for Problem 3 cannot be significantly larger than zero. In particular, one does not expect to be able to show using such sieve methods that the support of contains any appreciable number of elements outside of .
Proof: (Non-rigorous!) For the statement about the upper bound, apply the previous discussion to the sequence . For the statement about the lower bound, apply the previous discussion to the sequence .
Informally, the proposition asserts that if the sifted set is sensitive to the parity function , but original pre-sifted sequence and the sets are not, then traditional sieve methods cannot give any non-trivial lower bounds, and any upper bounds must be off from the truth by a factor of at least two.
In practice, one can heuristically obey the hypotheses of this proposition if one takes to be some variant of the Liouville function , which counts the parity of the number of prime factors of , in which case the parity problem asserts (roughly speaking) that the sifting problem must allow both numbers with an even number of prime factors and numbers with an odd number of prime factors to survive if one is to have any hope of good lower bounds. For instance:
Corollary 48 (Parity barrier for twin primes) (Very informal) One cannot use traditional sieve methods to resolve the twin prime conjecture.
Proof: (Very non-rigorous! Also assumes Liouville pseudorandomness) As mentioned in the introduction, there are a number of different ways to try to approach the twin prime conjecture by sieve methods. Consider for instance the approach in which one starts with the primes in and removes the residue class for each . The quantity will then be on the sifted set (since will be prime on that set), while the Liouville pseudorandomness heuristic predicts that
is negligible for all (say). The claim then follows from Proposition 47.
Exercise 49 (Informal) Develop similar parity obstructions for the other sieve-theoretic approaches to the twin prime conjecture. In the case when one is sieving out the integers by two residue classes for each prime , argue that any sieve-theoretic bound must in fact be off from the truth by a factor of at least four. (Hint: experiment with weights such as .)
Exercise 50 (Informal) Argue why any sieve-theoretic upper bound on prime -tuples that is based on equidistribution estimates for the natural numbers and primes must be off from the truth by a factor of at least . (This should be compared with the known upper bounds of and discussed in previous sections. It is not clear at present which of these bounds is closest to the true limit of sieve theory.)
The above argument is far from watertight, and there have been successful attempts to circumvent the parity barrier. For instance:
- (i) The parity barrier does not prohibit the possibility of producing a very small number of prime (or near-prime) patterns, even if one cannot hope to get lower bounds that are of the “right” order of magnitude. For instance, it was shown unconditionally by Goldston, Graham, Pintz, and Yildirim that there exist infinitely many pairs of consecutive numbers , each of which are the product of exactly four primes. Cramér type random models (see Supplement 4) predict the number of such pairs up to to be approximately (ignoring some factors of ), and the parity barrier prevents one from getting lower bounds of this magnitude; but the construction of Goldston, Graham, Pintz, and Yildirim only produces a quite sparse set of pairs (about or so) and so evades the parity barrier. Using a very different class of arguments, it is also possible to produce analogues of patterns like twin primes in the function field setting (in which the role of primes are replaced by irreducible polynomials over a finite field) by using very sparse sequences of polynomials for which irreducibility can be tested easily; see for instance this paper of Hall for the case of twin irreducible polynomials.
- (ii) The parity barrier may disappear if one is somehow in a situation in which the Möbius or Liouville pseudorandomness principles break down. An extreme example of this is when one assumes the existence of one (or many) Siegel zeroes, which imply extremely high correlation between the Möbius or Liouville functions and a Dirichlet character, in strong violation of the pseudorandomness principle. A striking example of this case occurs in this paper of Heath-Brown, which demonstrates the truth of the twin prime conjecture if one assumes the existence of an infinite sequence of Siegel zeroes.
- (iii) Traditional sieve theoretic methods only assume that one can control “linear” or “Type I” sums of the form . However, in some cases it is also possible to control “bilinear” or “Type II” sums such as for various coefficients of controlled size and support but uncontrolled phase. Note that multiplicative parity functions such as will be distinguishable with such sums, since will not exhibit any cancellation if both oscillate with the sign of respectively; as such, the parity barrier is no longer in effect. In such situations it can be possible to go beyond the limits of traditional sieve theory, for instance by using decompositions similar to the Vaughan identity employed in Notes 3. Of course, one still has to actually establish the required bilinear sum asymptotics, and this typically requires some deep and non-trivial input outside of sieve theory. A model example here is the work of Friedlander and Iwaniec establishing the infinitude of primes of the form . Here, the relevant bilinear input is obtained using the algebraic structure of the Gaussian integers and some non-trivial results on lattice counting problems.
- (iv) Sieve theory is focused on “one-dimensional” sums, such as , or perhaps . The presence of the parity problem relies on the ability to modify weights such as or in such sums in such a fashion that the value of the sum dramatically changes. If one instead is trying to control a multi-dimensional sum, such as (which roughly speaking counts arithmetic progressions of primes of length four), then there is no easy way to modify any of the factors of such a sum in the fashion required for the parity barrier to be in effect. For such multi-dimensional sums one can then hope to use additional estimates to produce non-trivial lower bounds. For instance, with Ben Green, we used Selberg sieves combined with the additional tool of Szemerédi’s theorem to obtain lower bounds for multi-dimensional sums similar to the one above, proving in particular that the primes contained arbitrarily long arithmetic progressions.
It is one of the major problems in analytic prime number theory to continue to find more ways to defeat the parity barrier, so that we can produce prime patterns in many more situations than what is currently possible. However, this has proven to be a surprisingly challenging task, and one which typically requires the injection of new ideas and methods external to sieve theory.
Remark 51 As mentioned previously, it is not known in general if the parity problem represents the only obstruction to sieve methods. However, under the additional hypothesis of the Elliott-Halberstam conjecture, and when focusing on “binary” problems such as the twin prime conjecture, there is a more detailed analysis of the parity problem due to Bombieri which indicates that, in some sense, the parity problem is indeed the only information not captured by sieve-theoretic methods, in that the asymptotic distribution of primes or almost primes in pairs is completely determined by a single parameter that represents the ratio between the actual number of twin primes and the expected number of twin primes; the parity problem prevents us from obtaining any further information about other than that it lies between and , but once is known, essentially all other asymptotics regarding the prime factorisation of and will also be known. See Chapter 16 of Friedlander-Iwaniec for further discussion.
Remark 52 For a slightly different way to formalise the parity problem, see this previous blog post. Among other things, it is shown in that post that the parity problem prevents one from proving Theorem 38 with any choice of with using purely sieve-theoretic methods, in particular ruling out a solution to the twin prime conjecture by sieve methods.