Many problems in non-multiplicative prime number theory can be recast as sieving problems. Consider for instance the problem of counting the number of pairs of twin primes
contained in
for some large
; note that the claim that
for arbitrarily large
is equivalent to the twin prime conjecture. One can obtain this count by any of the following variants of the sieve of Eratosthenes:
- Let
be the set of natural numbers in
. For each prime
, let
be the union of the residue classes
and
. Then
is the cardinality of the sifted set
.
- Let
be the set of primes in
. For each prime
, let
be the residue class
. Then
is the cardinality of the sifted set
.
- Let
be the set of primes in
. For each prime
, let
be the residue class
. Then
is the cardinality of the sifted set
.
- Let
be the set
. For each prime
, let
be the residue class
Then
is the cardinality of the sifted set
.
Exercise 1 Develop similar sifting formulations of the other three Landau problems.
In view of these sieving interpretations of number-theoretic problems, it becomes natural to try to estimate the size of sifted sets for various finite sets
of integers, and subsets
of integers indexed by primes
dividing some squarefree natural number
(which, in the above examples, would be the product of all primes up to
). As we see in the above examples, the sets
in applications are typically the union of one or more residue classes modulo
, but we will work at a more abstract level of generality here by treating
as more or less arbitrary sets of integers, without caring too much about the arithmetic structure of such sets.
It turns out to be conceptually more natural to replace sets by functions, and to consider the more general the task of estimating sifted sums
for some finitely supported sequence of non-negative numbers; the previous combinatorial sifting problem then corresponds to the indicator function case
. (One could also use other index sets here than the integers
if desired; for much of sieve theory the index set and its subsets
are treated as abstract sets, so the exact arithmetic structure of these sets is not of primary importance.)
Continuing with twin primes as a running example, we thus have the following sample sieving problem:
Problem 2 (Sieving problem for twin primes) Let
, and let
denote the number of natural numbers
which avoid the residue classes
for all primes
. In other words, we have
where
,
is the product of all the primes strictly less than
(we omit
itself for minor technical reasons), and
is the union of the residue classes
. Obtain upper and lower bounds on
which are as strong as possible in the asymptotic regime where
goes to infinity and the sifting level
grows with
(ideally we would like
to grow as fast as
).
From the preceding discussion we know that the number of twin prime pairs in
is equal to
, if
is not a perfect square; one also easily sees that the number of twin prime pairs in
is at least
, again if
is not a perfect square. Thus we see that a sufficiently good answer to Problem 2 would resolve the twin prime conjecture, particularly if we can get the sifting level
to be as large as
.
We return now to the general problem of estimating (1). We may expand
where (with the convention that
). We thus arrive at the Legendre sieve identity
Specialising to the case of an indicator function , we recover the inclusion-exclusion formula
Such exact sieving formulae are already satisfactory for controlling sifted sets or sifted sums when the amount of sieving is relatively small compared to the size of . For instance, let us return to the running example in Problem 2 for some
. Observe that each
in this example consists of
residue classes modulo
, where
is defined to equal
when
and
when
is odd. By the Chinese remainder theorem, this implies that for each
,
consists of
residue classes modulo
. Using the basic bound
for any and any residue class
, we conclude that
for any , where
is the multiplicative function
Since and there are at most
primes dividing
, we may crudely bound
, thus
Also, the number of divisors of is at most
. From the Legendre sieve (3), we thus conclude that
We can factorise the main term to obtain
This is compatible with the heuristic
coming from the equidistribution of residues principle (Section 3 of Supplement 4), bearing in mind (from the modified Cramér model, see Section 1 of Supplement 4) that we expect this heuristic to become inaccurate when becomes very large. We can simplify the right-hand side of (7) by recalling the twin prime constant
(see equation (7) from Supplement 4); note that
so from Mertens’ third theorem (Theorem 42 from Notes 1) one has
as . Bounding
crudely by
, we conclude in particular that
when with
. This is somewhat encouraging for the purposes of getting a sufficiently good answer to Problem 2 to resolve the twin prime conjecture, but note that
is currently far too small: one needs to get
as large as
before one is counting twin primes, and currently
can only get as large as
.
The problem is that the number of terms in the Legendre sieve (3) basically grows exponentially in , and so the error terms in (4) accumulate to an unacceptable extent once
is significantly larger than
. An alternative way to phrase this problem is that the estimate (4) is only expected to be truly useful in the regime
; on the other hand, the moduli
appearing in (3) can be as large as
, which grows exponentially in
by the prime number theorem.
To resolve this problem, it is thus natural to try to truncate the Legendre sieve, in such a way that one only uses information about the sums for a relatively small number of divisors
of
, such as those
which are below a certain threshold
. This leads to the following general sieving problem:
Problem 3 (General sieving problem) Let
be a squarefree natural number, and let
be a set of divisors of
. For each prime
dividing
, let
be a set of integers, and define
for all
(with the convention that
). Suppose that
is an (unknown) finitely supported sequence of non-negative reals, whose sums
are known for all
. What are the best upper and lower bounds one can conclude on the quantity (1)?
Here is a simple example of this type of problem (corresponding to the case ,
,
,
, and
):
Exercise 4 Let
be a finitely supported sequence of non-negative reals such that
,
, and
. Show that
and give counterexamples to show that these bounds cannot be improved in general, even when
is an indicator function sequence.
Problem 3 is an example of a linear programming problem. By using linear programming duality (as encapsulated by results such as the Hahn-Banach theorem, the separating hyperplane theorem, or the Farkas lemma), we can rephrase the above problem in terms of upper and lower bound sieves:
Theorem 5 (Dual sieve problem) Let
be as in Problem 3. We assume that Problem 3 is feasible, in the sense that there exists at least one finitely supported sequence
of non-negative reals obeying the constraints in that problem. Define an (normalised) upper bound sieve to be a function
of the form
for some coefficients
, and obeying the pointwise lower bound
for all
(in particular
is non-negative). Similarly, define a (normalised) lower bound sieve to be a function
of the form
for some coefficients
, and obeying the pointwise upper bound
for all
. Thus for instance
and
are (trivially) upper bound sieves and lower bound sieves respectively.
- (i) The supremal value of the quantity (1), subject to the constraints in Problem 3, is equal to the infimal value of the quantity
, as
ranges over all upper bound sieves.
- (ii) The infimal value of the quantity (1), subject to the constraints in Problem 3, is equal to the supremal value of the quantity
, as
ranges over all lower bound sieves.
Proof: We prove part (i) only, and leave part (ii) as an exercise. Let be the supremal value of the quantity (1) given the constraints in Problem 3, and let
be the infimal value of
. We need to show that
.
We first establish the easy inequality . If the sequence
obeys the constraints in Problem 3, and
is an upper bound sieve, then
and hence (by the non-negativity of and
)
taking suprema in and infima in
we conclude that
.
Now suppose for contradiction that , thus
for some real number
. We will argue using the hyperplane separation theorem; one can also proceed using one of the other duality results mentioned above. (See this previous blog post for some discussion of the connections between these various forms of linear duality.) Consider the affine functional
on the vector space of finitely supported sequences of reals. On the one hand, since
, this functional is positive for every sequence
obeying the constraints in Problem 3. Next, let
be the space of affine functionals
of the form
for some real numbers , some non-negative function
which is a finite linear combination of the
for
, and some non-negative
. This is a closed convex cone in a finite-dimensional vector space
; note also that
lies in
. Suppose first that
, thus we have a representation of the form
for any finitely supported sequence . Comparing coefficients, we conclude that
for any (i.e.,
is an upper bound sieve), and also
and thus , a contradiction. Thus
lies outside of
. But then by the hyperplane separation theorem, we can find an affine functional
on
that is non-negative on
and negative on
. By duality, such an affine functional takes the form
for some finitely supported sequence
and
(indeed,
can be supported on a finite set consisting of a single representative for each atom of the finite
-algebra generated by the
). Since
is non-negative on the cone
, we see (on testing against multiples of the functionals
or
) that the
and
are non-negative, and that
for all
; thus
is feasible for Problem 3. Since
is negative on
, we see that
and thus , giving the desired contradiction.
Exercise 6 Prove part (ii) of the above theorem.
Exercise 7 Show that the infima and suprema in the above theorem are actually attained (so one can replace “infimal” and “supremal” by “minimal” and “maximal” if desired).
Exercise 8 What are the optimal upper and lower bound sieves for Exercise 4?
In the case when consists of all the divisors of
, we see that the Legendre sieve
is both the optimal upper bound sieve and the optimal lower bound sieve, regardless of what the quantities
are. However, in most cases of interest,
will only be some strict subset of the divisors of
, and there will be a gap between the optimal upper and lower bounds.
Observe that a sequence of real numbers will form an upper bound sieve
if one has the inequalities
and
for all ; we will refer to such sequences as upper bound sieve coefficients. (Conversely, if the sets
are in “general position” in the sense that every set of the form
for
is non-empty, we see that every upper bound sieve arises from a sequence of upper bound sieve coefficients.) Similarly, a sequence
of real numbers will form a lower bound sieve
if one has the inequalities
and
for all with
; we will refer to such sequences as lower bound sieve coefficients.
Exercise 9 (Brun pure sieve) Let
be a squarefree number, and
a non-negative integer. Show that the sequence
defined by
where (in contrast with the rest of this set of notes)
denotes the number of prime factors of
, is a sequence of upper bound sieve coefficients for even
, and a sequence of lower bound sieve coefficients for odd
. Deduce the Bonferroni inequalities
when
is odd, whenever one is in the situation of Problem 3 (and
contains all
with
). The resulting upper and lower bound sieves are sometimes known as Brun pure sieves. The Legendre sieve can be viewed as the limiting case when
.
In many applications the sums in (9) take the form
for some quantity independent of
, some multiplicative function
with
, and some remainder term
whose effect is expected to be negligible on average if
is restricted to be small, e.g. less than a threshold
; note for instance that (5) is of this form if
for some fixed
(note from the divisor bound, Lemma 23 of Notes 1, that
if
). We are thus led to the following idealisation of the sieving problem, in which the remainder terms
are ignored:
Problem 10 (Idealised sieving) Let
(we refer to
as the sifting level and
as the level of distribution), let
be a multiplicative function with
, and let
. How small can one make the quantity
for a sequence
of upper bound sieve coefficients, and how large can one make the quantity
for a sequence
of lower bound sieve coefficients?
Thus, for instance, the trivial upper bound sieve and the trivial lower bound sieve
show that (14) can equal
and (15) can equal
. Of course, one hopes to do better than these trivial bounds in many situations; usually one can improve the upper bound quite substantially, but improving the lower bound is significantly more difficult, particularly when
is large compared with
.
If the remainder terms in (13) are indeed negligible on average for
, then one expects the upper and lower bounds in Problem 3 to essentially be the optimal bounds in (14) and (15) respectively, multiplied by the normalisation factor
. Thus Problem 10 serves as a good model problem for Problem 3, in which all the arithmetic content of the original sieving problem has been abstracted into two parameters
and a multiplicative function
. In many applications,
will be approximately
on the average for some fixed
, known as the sieve dimension; for instance, in the twin prime sieving problem discussed above, the sieve dimension is
. The larger one makes the level of distribution
compared to
, the more choices one has for the upper and lower bound sieves; it is thus of interest to obtain equidistribution estimates such as (13) for
as large as possible. When the sequence
is of arithmetic origin (for instance, if it is the von Mangoldt function
), then estimates such as the Bombieri-Vinogradov theorem, Theorem 17 from Notes 3, turn out to be particularly useful in this regard; in other contexts, the required equidistribution estimates might come from other sources, such as homogeneous dynamics, or the theory of expander graphs (the latter arises in the recent theory of the affine sieve, discussed in this previous blog post). However, the sieve-theoretic tools developed in this post are not particularly sensitive to how a certain level of distribution is attained, and are generally content to use sieve axioms such as (13) as “black boxes”.
In some applications one needs to modify Problem 10 in various technical ways (e.g. in altering the product , the set
, or the definition of an upper or lower sieve coefficient sequence), but to simplify the exposition we will focus on the above problem without such alterations.
As the exercise below (or the heuristic (7)) suggests, the “natural” size of (14) and (15) is given by the quantity (so that the natural size for Problem 3 is
):
Exercise 11 Let
be as in Problem 10, and set
.
- (i) Show that the quantity (14) is always at least
when
is a sequence of upper bound sieve coefficients. Similarly, show that the quantity (15) is always at most
when
is a sequence of lower bound sieve coefficients. (Hint: compute the expected value of
when
is a random factor of
chosen according to a certain probability distribution depending on
.)
- (ii) Show that (14) and (15) can both attain the value of
when
. (Hint: translate the Legendre sieve to this setting.)
The problem of finding good sequences of upper and lower bound sieve coefficients in order to solve problems such as Problem 10 is one of the core objectives of sieve theory, and has been intensively studied. This is more of an optimisation problem rather than a genuinely number theoretic problem; however, the optimisation problem is sufficiently complicated that it has not been solved exactly or even asymptotically, except in a few special cases. (It can be reduced to a optimisation problem involving multilinear integrals of certain unknown functions of several variables, but this problem is rather difficult to analyse further; see these lecture notes of Selberg for further discussion.) But while we do not yet have a definitive solution to this problem in general, we do have a number of good general-purpose upper and lower bound sieve coefficients that give fairly good values for (14), (15), often coming within a constant factor of the idealised value , and which work well for sifting levels
as large as a small power of the level of distribution
. Unfortunately, we also know of an important limitation to the sieve, known as the parity problem, that prevents one from taking
as large as
while still obtaining non-trivial lower bounds; as a consequence, sieve theory is not able, on its own, to sift out primes for such purposes as establishing the twin prime conjecture. However, it is still possible to use these sieves, in conjunction with additional tools, to produce various types of primes or prime patterns in some cases; examples of this include the theorem of Ben Green and myself in which an upper bound sieve is used to demonstrate the existence of primes in arbitrarily long arithmetic progressions, or the more recent theorem of Zhang in which an upper bound sieve (among other things) was used to demonstrate the existence of infinitely many pairs of primes whose difference was bounded. In such arguments, the upper bound sieve was used not so much to count the primes or prime patterns directly, but to serve instead as a sort of “container” to efficiently envelop such prime patterns; when used in such a manner, the upper bound sieves are sometimes known as enveloping sieves. If the original sequence was supported on primes, then the enveloping sieve can be viewed as a “smoothed out indicator function” that is concentrated on almost primes, which in this context refers to numbers with no small prime factors.
In a somewhat different direction, it can be possible in some cases to break the parity barrier by assuming additional equidistribution axioms on the sequence than just (13), in particular controlling certain bilinear sums involving
rather than just linear sums of the
. This approach was in particular pursued by Friedlander and Iwaniec, leading to their theorem that there are infinitely many primes of the form
.
The study of sieves is an immense topic; see for instance the recent 527-page text by Friedlander and Iwaniec. We will limit attention to two sieves which give good general-purpose results, if not necessarily the most optimal ones:
- (i) The beta sieve (or Rosser-Iwaniec sieve), which is a modification of the classical combinatorial sieve of Brun. (A collection of sieve coefficients
is called combinatorial if its coefficients lie in
.) The beta sieve is a family of upper and lower bound combinatorial sieves, and are particularly useful for efficiently sieving out all primes up to a parameter
from a set of integers of size
, in the regime where
is moderately large, leading to what is sometimes known as the fundamental lemma of sieve theory.
- (ii) The Selberg upper bound sieve, which is a general-purpose sieve that can serve both as an upper bound sieve for classical sieving problems, as well as an enveloping sieve for sets such as the primes. (One can also convert the Selberg upper bound sieve into a lower bound sieve in a number of ways, but we will only touch upon this briefly.) A key advantage of the Selberg sieve is that, due to the “quadratic” nature of the sieve, the difficult optimisation problem in Problem 10 is replaced with a much more tractable quadratic optimisation problem, which can often be solved for exactly.
Remark 12 It is possible to compose two sieves together, for instance by using the observation that the product of two upper bound sieves is again an upper bound sieve, or that the product of an upper bound sieve and a lower bound sieve is a lower bound sieve. Such a composition of sieves is useful in some applications, for instance if one wants to apply the fundamental lemma as a “preliminary sieve” to sieve out small primes, but then use a more precise sieve like the Selberg sieve to sieve out medium primes. We will see an example of this in later notes, when we discuss the linear beta-sieve.
We will also briefly present the (arithmetic) large sieve, which gives a rather different approach to Problem 3 in the case that each consists of some number (typically a large number) of residue classes modulo
, and is powered by the (analytic) large sieve inequality of the preceding section. As an application of these methods, we will utilise the Selberg upper bound sieve as an enveloping sieve to establish Zhang’s theorem on bounded gaps between primes. Finally, we give an informal discussion of the parity barrier which gives some heuristic limitations on what sieve theory is able to accomplish with regards to counting prime patters such as twin primes.
These notes are only an introduction to the vast topic of sieve theory; more detailed discussion can be found in the Friedlander-Iwaniec text, in these lecture notes of Selberg, and in many further texts.
— 1. Combinatorial sieving and the fundamental lemma of sieve theory —
We begin with a discusion of combinatorial upper and lower bound sieves, in which the sieve coefficients take values in
. These sieves, first introduced by Brun, can be viewed as truncations of the Legendre sieve (which corresponds to the choice of coefficients
for all
). The Legendre sieve, and more generally the Brun pure sieves in Exercise 9, give basic examples of combinatorial sieves.
We loosely follow the discussion of Friedlander and Iwaniec. The combinatorial sieves may be motivated by observing the Buchstab identity
for any and any collection of sets
for
. This identity reflects the basic fact that if
does lie in
, then there is a unique prime
such that
lies in
, but does not lie in
for any
.
There is an analogous identity for the function :
Exercise 13 For any arithmetic function
, show that
How is this identity related to (16)?
The Buchstab identity (16) allows one to convert upper bound sieves into lower bound sieves and vice versa. Indeed, if is a family of lower bound sieves for each
, then (16) tells us that
is an upper bound sieve; similarly, if is a family of upper bound sieves for each
, then
is a lower bound sieve. In terms of sieve coefficients, we see that if are lower bound sieve coefficients for each
, then
is a sequence of upper bound sieve coefficients (where denotes the largest prime factor of
). Similarly, if
are upper bound sieve coefficients for each
, then
is a sequence of lower bound sieve coefficients.
One can iterate this procedure to produce an alternating sequence of upper and lower bound sieves. If one iterates out the Buchstab formula completely, one ends up back at the Legendre sieve. However, one can truncate this iteration by using the trivial lower bound sieve of for some of the primes
. For instance, suppose one seeks an upper bound for
. Applying (16), we only save some of the summands, say those
which obey some predicate
to be chosen later. For the remaining summands, we use the trivial lower bound sieve of
, giving
For the surviving summands, we apply (16) again. With the sign change, the trivial lower bound sieve is not applicable, so we do not discard any further summands and arrive at
For the summands in the second sum on the right, we apply (16) once again. We once again have a favourable sign and can use the trivial lower bound sieve of to discard some of the resulting summands, keeping only those triples
that obey some predicate
, giving
Since one cannot have an infinite descent of primes, this process will eventually terminate to produce an upper bound sieve. A similar argument (but with the signs reversed) gives a lower bound sieve. We formalise this discussion as follows:
Proposition 14 (General combinatorial sieve) Let
. For each natural number
, let
be a predicate pertaining to a sequence
of
decreasing primes, thus
is either true or false for a given choice of such primes. Let
(resp.
) denote the set of
which, when factored as
for
, are such that
holds for all odd (resp. even)
. Thus
and
- (i) For any sets
,
is an upper bound sieve, and
is a lower bound sieve. In fact, we have the more precise identities
and
where
denotes the least prime factor of
and, for each natural number
,
is the collection of products
with
such that
holds for all
with the same parity as
, but fails for
. Thus for instance
and
- (ii) The sequences
and
are upper and lower sieve coefficient sequences respectively.
- (iii) For any multiplicative function
, one has the identities
and
where
.
Note that the Legendre sieve is the special case of the combinatorial sieve when all the predicates are true for all choices of inputs
(i.e. one does not ever utilise the trivial lower bound sieve); more generally, the Brun pure sieve with parameter
corresponds to the case when
is always true for
and always false for
. The predicates
for odd
are only used for the upper bound sieve and not the lower bound sieve; conversely, the predicates
for even
are only used for the lower bound sieve and not the upper bound sieve.
Exercise 15 Prove Proposition 14.
Exercise 16 Interpret the sieves in Exercise 8 as combinatorial sieves. What are the predicates
in this case?
We now use the above proposition to attack Problem 10 for a given choice of parameters and
; we will focus on the regime when
for some , thus we are sifting out primes up to a small power of the sieve level
. To do this, one needs to select the predicates
in such a fashion that the sets
only consist of divisors
that are less than or equal to
. There are a number of ways to achieve this, but experience has shown that a good general-purpose choice in this regard is to define
to be the predicate
to be chosen later. The combinatorial sieve with this choice of predicate is known as the beta sieve or Rosser-Iwaniec sieve at sieve level . Observe that if
lies in
for some
and
, then at least one of
or
will hold, and hence
since
. If
, we have the same conclusion thanks to (19) since
. Thus under the hypotheses (21), (19), the beta sieves are indeed upper and lower bound sieves at level
.
Now we estimate the quantities (14), (15) for the beta sieve coefficients for some multiplicative function
with
for all
. In order to actually compute asymptotics, we assume the Mertens-like axiom
whenever , where
and some fixed
, which we refer to as the sieve dimension. From Mertens’ theorems, we see that these axioms will in particular be obeyed if
and
for all primes and some fixed
depending only on
.
Let be as above. From (17), (18), (22) we see that the quantities (14), (15) are both of the form
We now estimate the error term; our estimates will be somewhat crude in order to simplify the calculations, but this will already suffice for many applications. First observe that if then
We conclude that the condition is automatically satisfied when
, and so the
summand in (23) can be restricted to the range
.
Next, note that if for some
, then from (20) we have
for all with the same parity as
, which implies (somewhat crudely) that
for all , regardless of parity (note that the
case follows since
). We rearrange this inequality as
which iterates to give
On the other hand, from the definition of we have
which leads to a lower bound on :
so in particular
We can thus estimate (23) somewhat crudely as
We can estimate
thanks to (22) and the crude bound coming from the Taylor expansion of
. If we choose
large enough so that
(note the left-hand side decays like as
), we thus can estimate (23) by
which sums to
We conclude
Lemma 17 (Fundamental lemma of sieve theory) Let
. If
for some
, then there exist combinatorial upper and lower sieve coefficients
supported in
such that
for any multiplicative function
with
for all primes
, obeying the bounds (22).
Informally, the fundamental lemma shows that one can get exponentially close to the Legendre sieve limit of as long as the sieve level
is a large power of the sifting range
.
Proof: Without loss of generality we may assume that is sufficiently large depending on
, since for small
we may simply reduce
(for the purposes of the upper bound sieve) or use the trivial lower bound sieve
. But then we may find
obeying (24) and less than
, and the claim follows.
Exercise 18 Improve the
error in (25) to
for any
. (It is possible to show that this error term is essentially best possible, at least in the linear dimension case
, by using asymptotics for rough numbers (Exercise 28 of Supplement 4), but we will not do so here.)
The fundamental lemma is often used as a preliminary sifting device to remove small primes, in preparation for a more sophisticated sieve to deal with medium primes. But one can of course use it directly. Indeed, thanks to Theorem 5, we now have an adequate answer to Problem 3 in the regime where the sieve dimension is fixed and the sifting level is small compared to the level of distribution:
Corollary 19 (Fundamental lemma, applied directly) Let
for some
, and suppose that
is a multiplicative function obeying the bounds (22) for all
, where
. For each
, let
be a set of integers, and let
is a finitely supported sequence of non-negative reals such that
for all square-free
, where
and
, and
. Then
We now specialise to the twin prime problem in Problem 2. Here, , and
where
for odd
, with
. In particular we may take
. We set
for some fixed
, and
for some quantity
going sufficiently slowly to infinity, so
. From (8) we have
and from (5) and the divisor bound we have
for any , as
. From the fundamental lemma, we conclude that
If we let the term in
decay sufficiently slowly, then the final error term in the above expression can be absorbed into the second error term, and so
Setting , we conclude in particular that
which implies from the sieve of Eratosthenes that there are pairs of twin primes in
. This implies a famous theorem of Brun:
Exercise 20 (Brun’s theorem) Show that the sum
is convergent.
Alternatively, if one applies (26) with a sufficiently large natural number, we see that
for all sufficiently large . This implies that there are
numbers
in
with the property that
are both not divisible by any prime less than or equal to
. In particular (if
),
and
both have at most
prime factors. We conclude a version of the twin prime conjecture for almost primes:
Corollary 21 There is an absolute constant
such that there are infinitely many natural numbers
such that
are each the product of at most
primes.
Exercise 22 By going through the beta sieve estimates carefully, show that one can take
in the above corollary. (One can do substantially better if one uses more sophisticated sieve estimates, and of course Chen’s theorem tells us that
can be taken as low as
; we will review the proof of Chen’s theorem in subsequent notes.)
In the above corollary, both elements of the pair are permitted to be almost prime rather than prime. It is possible to use the fundamental lemma to improve upon this by forcing
(for instance) to be prime. To do this, we run the twin prime sieve differently; rather than start with all integers and sieve out two residue classes modulo each prime
, we instead start with the primes and sieve out one residue class modulo
. More precisely, we consider the quantity
where is the residue class
and
(say). Observe that the sum is only as large as
at worst for even
. For odd
, we expect from the prime number theorem in arithmetic progressions that
for some small . Thus it is natural to apply Corollary 19 with
and
, so that
. From the Bombieri-Vinogradov theorem (see Theorem 17 of Notes 3), we see that
for any fixed . We thus take
for some small fixed
and
, and apply Corollary 19 to conclude that
For a sufficiently large fixed constant, we conclude from Mertens’ theorem that
for large enough. The contribution for those
for which
is a power of a prime, rather than a prime, is easily seen to be negligible, and we conclude that there are
primes
less than
such that
has no factors less than
, and thus is the product of at most
primes. We thus have the following improvement of Corollary 21:
Proposition 23 There is an absolute constant
such that there are infinitely many primes
such that
is the product of at most
primes.
More generally, the fundamental lemma (combined with equidistribution results such as the Bombieri-Vinogradov theorem) lets one prove analogues of various unsolved conjectures about primes, in which one either (a) is content with upper bounds of the right order of magnitude, or (b) is willing to replace primes with a suitable notion of almost prime. We give some examples in the exercises below.
Exercise 24 (Approximations to the prime
-tuples conjecture) Let
be an admissible
-tuple of integers for some
, thus the
are all distinct, and for any
, the number
of residue classes occupied by
is never equal to
. Let
denote the singular series
- (i) Show that
for all sufficiently large
. (This should be compared with the Hardy-Littlewood prime tuples conjecture
as
, see Exercise 13 of Supplement 4.) Thus the fundamental lemma tells us that the prime tuples conjecture is only off from the truth by a multiplicative factor of
, at least as far as upper bounds are concerned.
- (ii) Show that if
is sufficiently large depending on
, then there are infinitely many
such that
are each the product of at most
prime factors.
- (iii) Strengthen (ii) to include the requirement that
(say) is prime.
Exercise 25 (Approximations to the even Goldbach conjecture) If
is an even natural number, define the singular series
- (i) Show that the number of ways to write
as the sum of two primes
is
if
is sufficiently large. (This should be compared with the prediction of
as
, see Exercise 12 of Supplement 4.)
- (ii) Show that if
is a sufficiently large natural number, then every sufficiently large even number
is expressible as the sum of two natural numbers
, each of which are the product of at most
prime factors.
- (iii) Strengthen (ii) to include the requirement that
is prime.
Exercise 26 (Approximations to Legendre’s conjecture)
- (i) Show that for any
, the number of primes in the interval
is at most
.
- (ii) Show that if
is a sufficiently large natural number, then for all sufficiently large
, there is at least one natural number
between
and
that is the product of at most
primes.
Exercise 27 (Approximations to Landau’s fourth problem) Define the singular series
where
is the number of residue classes
such that
.
- (i) Show that the number of primes of the form
with
is
for sufficiently large
. (This should be compared with the prediction
, see Exercise 15 of Supplement 4.)
- (ii) Show that if
is a sufficiently large natural number, then there are an infinite number of numbers of the form
that are the product of at most
primes.
— 2. The Selberg upper bound sieve —
We now turn to the Selberg upper bound sieve, which is a simple but remarkably useful general-purpose upper bound sieve. We again follow the discussion of Friedlander and Iwaniec.
The idea of the sieve comes from the following trivial observation: if is a squarefree number,
is a set of integers for each
, and
are arbitrary real numbers with
, then the function
is an upper bound sieve, since it is clearly non-negative and equals outside of
. Equivalently, the sequence
for is a sequence of upper bound sieve coefficients, where
is the least common multiple of
. If we set
and assume that the
are only non-zero for
, then this sequence of sieve coefficients will only be non-zero on
. We will refer to sieves (and sieve coefficients) constructed by this method as Selberg upper bound sieves (or Selberg upper bound sieve coefficients); they are also referred to as
sieves in the literature.
A key advantage of the Selberg sieve is that the coefficients for
are completely unconstrained within the real numbers, and the sieve depends in a quadratic fashion on these coefficients, so the problem of optimising the Selberg sieve for a given application usually reduces to an unconstrained quadratic optimisation problem, which can often be solved exactly in many cases. Of course, the optimal Selberg sieve need not be the globally optimal sieve amongst all upper bound sieves; but in practice the Selberg sieve tends to give quite good results even if it might not be the globally optimal choice.
Selberg sieves can be used for many purposes, but we begin with the classical sieve problem posed in Problem 10. To avoid some degeneracies we will assume that
for all . Restricting to upper bound sieve coefficients of the Selberg form, the quantity (14) becomes
This is a quadratic form in the coefficients ; our task is to minimise this amongst all choices of coefficients with
, with
vanishing for
.
Observe that any can be expressed as
,
with
, in which case
. By multiplicativity, we may thus write (30) as
which we can rearrange as as
The inner sum almost factorises as a square, but we have the coprimality constraint to deal with. The standard trick for dealing with this constraint is Möbius inversion:
inserting this and writing ,
, we can now write (30) as
which now does factor as a sum of squares:
Writing , this becomes
where
and is defined on factors of
by the Dirichlet convolution
Thus is the multiplicative function with
for all ; we extend
by zero outside of the factors of
. In particular
is non-negative.
The coefficients are a transform of the original coefficients
. For instance, if
was replaced by a single prime
, then the coefficients
would be related to the coefficients
by the transform
which is inverted as
More generally, we have the following inversion formula:
Exercise 28 (Möbius-type inversion formula) Verify that
for any
.
From this formula, we see in particular that is supported on
if and only if
is supported on
. The constraint
can now be written in terms of the
coefficients as
Our task is now to minimise the quadratic form (31) amongst all coefficients supported on
obeying (34). The Cauchy-Schwarz inequality gives the lower bound
where , with equality precisely when
for
. Inserting this back into (33), we arrive at the optimised Selberg sieve coefficients
and the quantity (14) is equal to .
We have a pleasant size bound on the coefficients :
Lemma 29 We have
for all
.
Proof: Writing , we have
but
since one can compute that . The claim follows.
The upper bound sieve coefficients in (28) is then bounded by
where is the third divisor function. Applying Theorem 5, we thus have a general upper bound for Problem 3:
Theorem 30 (Selberg sieve upper bound) Let
be a squarefree natural number. For each prime
dividing
, let
be a set of integers. Let
be a multiplicative function with
for all
, and define the multiplicative function
by (32), extending
by zero outside of the factors of
. Let
for some
. Suppose that
is a finitely supported sequence of non-negative reals, such that
for all
and some reals
. Then
Remark 31 To compare this bound against the “expected” value of
, observe from Euler products that
The quantity can often be computed in practice using standard methods for controlling sums of multiplicative functions, such as Theorem 27 of Notes 1. We illustrate this with the following sample application of the Selberg sieve:
Theorem 32 (Sieving an interval) Let
be fixed natural numbers, and let
. For each prime number
, let
be the union of
residue classes modulo
, where
for all
, and
for all
. Then for any
, one has
whenever
is sufficiently large depending on
, where
is the singular series
Proof: We apply Theorem 30 with ,
,
, and
. We set
and
. For any
,
is the union of
residue classes modulo
, and thus by (36) we have
and in particular
From Theorem 30, we thus can upper bound the left-hand side of (37) by
By the divisor bound and the choice of we have
so it will suffice to show (after adjusting ) that
But from Theorem 27(ii) of Notes 1 (with replaced by
), the left-hand side is
where
Since , we see that
, and the claim follows (after adjusting
appropriately).
The upper bound in (37) is off by a factor of about from what one expects to be the truth. The following exercises give some quick applications of Theorem 32.
Exercise 33 (Brun-Titchmarsh inequality) If
, and
is sufficiently large depending on
, establish the Brun-Titchmarsh inequality
for all
, where
is the number of primes less than
. More generally, if
,
is a primitive residue class, and
is sufficiently large depending on
, establish the Brun-Titchmarsh inequality
for all
, where
is the number of primes less than
that lie in
. Comparing this with the prime number theorem in arithmetic progressions, we see that the Brun-Titchmarsh inequality is off from the truth by a factor of two when
; however, the Brun-Titchmarsh inequality is applicable even when
is extremely large compared with
, whereas the prime number theorem gives no non-trivial bound in this regime. The
can in fact be deleted in these inequalities, a result of Montgomery and Vaughan (using a refinement of the large sieve, discussed below).
Exercise 34 (Upper bound for
-tuple conjecture) If
is an admissible
-tuple (as in Exercise 24), and let
be the singular series (27). Show that
as
. (This should be compared with the prime tuples conjecture, which predicts
as
.)
As with the fundamental lemma, one can use the Selberg sieve together with the Bombieri-Vinogradov theorem to sift out a set of primes, rather than a set of intervals:
Exercise 35 (Sieving primes) Let
be fixed natural numbers, and let
. For each prime number
, let
be the union of
primitive residue classes modulo
, where
for all
, and
for all
. Then for any
, one has
where
is the singular series
(You will need Exercise 23 of Notes 3 to control the
terms arising from the remainder term.) If one assumes the Elliott-Halberstam conjecture (see Exercise 22 of Notes 3), show that the factor of
here can be lowered to
.
Exercise 36 Show that the quantity
in (39) may be replaced by
, which is a superior bound for
. (It remains an open question to improve upon the bound
without assuming any further unproven conjectures.) Assuming the Elliott-Halberstam conjecture, show that one can replace
instead by
.
We have approached the analysis of the Selberg sieve as a “discrete” optimisation problem, in which one is trying to optimise the parameters or
that are indexed by the discrete set
. It is however also instructive to consider a “continuous” version of the optimisation problem, in which the coefficients
or
are described by a continuous function of
or
; this is a slightly more restrictive class of sieves, but turns out to lead to numerical bounds that come very close to the discrete optimum, and allow for the use of methods from calculus of variations to come into play when using the Selberg sieve in more sophisticated ways. (See these lecture notes of Selberg for further discussion of how the continuous problem serves as a good approximation of the discrete one.) We give a basic illustration of this by presenting an alternate proof of (a very slightly weaker form of) Theorem 32 in which we do not use the choice (35) for the sieve weights
, but instead choose weights of the form
for some smooth compactly supported function that vanishes on
, and which equals
at the origin; compare this smoothly truncated version of the Möbius function with the more abrupt truncations of the Möbius function that arise in combinatorial sieving. Such cases of the Selberg sieve were studied by several authors, most notably Goldston, Pintz, and Yildirim (although they made
a polynomial on
, rather than a smooth function). As in the previous proof of Theorem 32, we take
,
,
,
,
, and
. The quantity (30) (or (14)) is then equal to
We can remove the constraints , since the summands vanish outside of this range. To estimate this sum we use an argument related to that used to prove Proposition 9 from Notes 2. We first use Fourier inversion, applied to the function
, to obtain a representation of the form
for some smooth, rapidly decreasing function , thus
for any . We then have
and
and so by Fubini’s theorem (using the divisor bound ) we may write (30) as
and then by Euler products and the definition of we may factorise this as
where
We have the crude upper bound
which by the asymptotic for
(see equation (22) of Notes 1) implies that
From this and the rapid decrease of , we see that the contribution of the cases
or
(say) to (44) is
for any
. Thus we can write (44) as
Since , we can factor
where is the local Euler factor
The point of this factorisation is that, thanks to a routine but somewhat tedious Taylor expansion, one has the asymptotics
for any complex with imaginary part less than
in magnitude, and hence (by the Cauchy integral formula)
for any complex with imaginary part less than
in magnitude. By the fundamental theorem of calculus, we thus have
when are real. Also, observe from (38) that
and hence
From this and the standard asymptotic for
close to
, we have
and thus we can write (44) as
From the rapid decrease of , we may now remove the constraints on
and write this as
To handle the weight , we observe from dividing (42) by
and then differentiating
times that
and hence by Fubini’s theorem
Using the Gamma function identity for any
with
, we conclude from another application of Fubini’s theorem that
Note from the support of that we can restrict the
integral to
. Putting this all together, we see that with the choice of weights (41), the expression (14) takes the form
Recall that we are requiring to equal
, which after
applications of the fundamental theorem of calculus (or integration by parts) is equivalent to the constraint
Since , we conclude from Cauchy-Schwarz that
with equality occurring when . Equality is not quite attainable with
smooth, but by a mollification one can select an
smoothly supported on (say)
with
and
for any . The expression (14) now takes the form
Meanwhile, from (41) we have and thus
much as before. We can then apply Theorem 5 as before to recover Theorem 32.
Remark 37 The Selberg sieve is most naturally used as an upper bound sieve. However, it is possible to modify the Selberg sieve to produce a lower bound sieve in a number of ways. One way is to simply insert the Selberg sieve into the Buchstab identity (16); see for instance this paper of Ankeny and Onishi for an investigation of this approach. Another method is to multiply the Selberg upper bound sieve by a lower bound sieve
; somewhat surprisingly, even a very crude lower bound sieve such as the
Bonferroni sieve
can give reasonably good results for this purpose. See Friedlander and Iwaniec for further analysis of this lower bound sieve (sometimes known as the
sieve).
— 3. A multidimensional Selberg sieve, and bounded gaps between primes —
We now discuss a multidimensional variant of the Selberg sieve that establishes a recent result of Maynard (and independently by myself), giving further partial progress towards the Hardy-Littlewood prime tuples conjecture. For sake of exposition we shall omit some details; see this previous blog post for a more careful treatment; see also this survey of Granville.
More precisely, we sketch the proof of
Theorem 38 (Maynard’s theorem) Let
be a natural number, and let
be sufficiently large depending on
. Then for any admissible
-tuple
, there exist infinitely many natural numbers
such that at least
of the numbers
are prime.
Exercise 39 Show that Maynard’s theorem implies that the quantity
is finite for each natural number
, where
is the
prime, is finite. (For the most recent bounds on
, see this Polymath paper.) In particular, the
case of Theorem 38 yields “bounded gaps between primes”, in that there is a finite
for which
for infinitely many
.
Even the case of this theorem was only proven in 2013 by Zhang, building upon earlier work of Goldston, Pintz, and Yildirim. Zhang’s original value of
in the “bounded gaps” application was
; this has since been lowered several times, with the current record being
, due to the Polymath project. The original arguments of Zhang and Goldston-Pintz-Yildirim used a “one-dimensional” Selberg sieve, with coefficients given by (41), and required quite strong equidistribution hypotheses on the primes (stronger than what the Bombieri-Vinogradov theorem provides). However, the argument of Maynard relies on a more general and flexible “multi-dimensional” Selberg sieve, which has better numerical performance, and as such does not need any equidistribution result beyond the Bombieri-Vinogradov theorem. This sieve has since been used for a number of further applications in analytic number theory; see this web page for a list of papers in this direction.
Let be as in Theorem 38. Suppose that, for sufficiently large
, we can find a non-negative function
supported on
with the property that
Then by the pigeonhole principle, there exists at least one such that
, that is to say at least
of the numbers
are prime. Letting
go to infinity (keeping
fixed), we will obtain Theorem 38.
It remains to establish the existence of a function with the required properties. Prior to the work of Goldston, Pintz, and Yildirim, it was only known (using tools such as the fundamental lemma of sieve theory, or off-the-shelf Selberg sieves) how to construct sieve weights
obeying (45) with
replaced by a small constant less than
. The earlier arguments of Zhang and Goldston-Pintz-Yildirim chose a weight
which was essentially of the Selberg sieve form
(compare with (41)), where for some fixed
, and
was a smooth fixed compactly supported function. Omitting many calculations, their conclusion was that one could prove the
case of Theorem 38 once one had some sort of distribution result on the primes at level
(as defined in Exercise 22 of Notes 3) with
. Unfortunately, this just falls short of what the Bombieri-Vinogradov theorem provides, which is a level of distribution
for any
. One of the key contributions of Zhang was to obtain a partial distribution result at a value of
slightly above
, which when combined with (a modification of) the previous work of Goldston, Pintz, and Yildirim, was able to yield the
case of Theorem 38.
The approach of Maynard and myself was a little different, based on multidimensional Selberg sieves such as
for some multidimensional smooth function supported on the simplex
; the one-dimensional sieve considered by Zhang and Goldston-Pintz-Yildirim is then essentially the case when
is a function just of
on this simplex. Actually, to avoid some (very minor) issues concerning common factors between the
, and also to essentially eliminate the role of the singular series, it is convenient to work with a slight modification
of the above sieve, where for some
slowly growing to infinity with
(e.g.
will do), and
is a residue class such that
is primitive for all
(such a
exists thanks to the admissibility of
and the Chinese remainder theorem). Note that for
large enough, the
have no common factors, and so the
are automatically coprime to each other and to
. (In Maynard’s paper, the coefficients
were replaced by a sequence
which was not directly arising from a smooth function
, but instead the analogue of the
coefficients from the preceding section were chosen to be of the form
for some suitable function
. However, the differences between the two approaches are fairly minor.)
It is possible to estimate the right-hand side of (45) using a modification of the arguments of the previous section:
Exercise 40 Let
be as in the above discussion. Assume that
. If
goes to infinity at a sufficiently slow rate at
, establish the asymptotic
as
, where
and
denotes the mixed partial derivative
in each of the
variables
.
Now we consider a term on the left-hand side, say . When
is prime, the quantity
in (46) is necessarily
, so (46) simplifies to a
-dimensional version of itself:
If , one can control the effect of the weight
using the Bombieri-Vinogradov theorem, and after some moderately complicated calculations one obtains
Exercise 41 Let
be as in the above discussion. Assume that
. If
goes to infinity at a sufficiently slow rate at
, establish the asymptotic
as
, where
and
denotes the mixed partial derivative
in the
variables
. Similarly for permutations of the
indices.
In view of these two calculations, one can establish Maynard’s theorem as soon as one can find and
for which
where is defined similarly to
but with the role of the
and
indices swapped. This is now a problem purely in calculus of variations rather than number theory. It is thus natural to take
close to
(though, for the qualitative analysis performed here, actually any non-zero choice of
independent of
will suffice). Our task is now to find
with
The optimal choice of for a given
is not known for any
, but one can generate a “good enough” choice of
by the following set of calculations. Firstly, we restrict to the case when
is symmetric in the
, so that all the
are equal and our objective is now to satisfy the inequality
Next, we make the change of variables , thus
is smooth on
and supported on the simplex
but is otherwise arbitrary on this simplex. From the fundamental theorem of calculus, the desired inequality now becomes
We select a function of the truncated product form
where is a smooth non-negative function normalised so that
so that
and our objective (after a rescaling) is now to get
The emergence of the function as a parameter in the multidimensional sieve is a feature that is not present in previous sieves (at least not without sharply curtailing the support of
), and is the main reason why this sieve performs better than previous sieves for this problem.
We interpret the above optimisation problem probabilistically. Let be independent real values in
with probability density function
. Our task is now to obtain the inequality
Suppose that we can locate a fixed function (independent of
) obeying (47) with
Then from the law of large numbers, goes to infinity almost surely as
to
, and so from dominated convergence one has
and the claim follows from (49) if is sufficiently large depending on
.
We are thus reduced to the easy one-dimensional problem of producing a smooth function obeying the constraints (47), (48), (49). However, one can verify that the choice
(barely) obeys (49) with and
both finite, and the claim follows by a routine rescaling.
Exercise 42 By working through a more quantitative version of the above argument, establish Theorem 38 with
as
. (Hint: One can replace the use of the law of large numbers with Chebyshev’s inequality.) It is not currently known how to obtain this theorem with any choice of
that grows faster than logarithmically in
; the current record is
, due to the Polymath project. It is known, though, that bounds on the order of
are the limit of the Selberg sieve method, and one either needs new sieves, or new techniques beyond sieve theory, to increase
beyond this rate; see the Polymath paper for further discussion.
One can use the multidimensional sieve to study the classical sieving problem of counting prime -tuples, but unfortunately the numerology it gives is precisely the same as the numerology given by previous sieves:
Exercise 43 Let
be an admissible tuple, and let
go to infinity sufficiently slowly as
.
— 4. The large sieve —
In the general discussion of sieving problems such as Problem 3 in previous sections, there was no structure imposed on the sets beyond the fact that one could control the sums (9). However, as we have seen in practice,
is typically the union of some number
of residue classes modulo
. It is possible to exploit this structure using Fourier analysis, leading to an approach to sieving known as the large sieve, and relying crucially on the analytic large sieve inequality from Notes 3. The large sieve tends to give results similar to that of the Selberg sieve; indeed, the two sieves are in some sense “dual” to each other and are powered by the same sort of
-based techniques (such as the Cauchy-Schwarz inequality). However, the large sieve methods can offer sharper control on error terms than the Selberg sieve if carried out carefully. As the name suggests, the large sieve is particularly useful in the regime when the number of residue classes modulo
being sieved out is large, for instance as large as a constant multiple of
. (When
is very large – close to
, then there is another sieve that gives even sharper results, namely the larger sieve of Gallagher, which we do not discuss further here.)
To apply the analytic large sieve to sieving problems, we need to exploit an “uncertainty principle” of Montgomery (discussed further in this previous blog post), that shows that functions which avoid certain residue classes in the physical domain are necessarily spread out in the frequency domain. More precisely, we have
Theorem 44 (Montgomery uncertainty principle) Let
be a finitely supported function, and let
be the Fourier transform
, where
. Suppose that for each prime
, there are
residue classes modulo
on which
vanishes identically for some
. Then for any
and any natural number
, one has
where
is the multiplicative function
Proof: We may assume that is squarefree, since otherwise we have
and the claim (50) is trivial. Observe from the Chinese remainder theorem that if the inequality (50) holds for two coprime values
of
, then it also holds for
(as every fraction
in
with
can be uniquely written as
with
). Thus it suffices to verify the claim when
is prime. By multiplying
by
we can also reduce to the case
, thus our task is now to show that
or equivalently
By the Plancherel identity in , the left-hand side can be written as
and the right-hand side is
Since the sum is only non-vanishing for at most
choices of
, the claim follows from the Cauchy-Schwarz inequality.
We then obtain the following variant of Theorem 32 or Theorem 30:
Theorem 45 (Arithmetic large sieve) Let
. For each prime number
, let
be the union of
residue classes modulo
, where
for all
. Then for any
and natural number
, we have
where
and
is given by (51).
Proof: Let be the indicator function of the set in (52). From Theorem 44 with
, and
summed up to
, one has
Now observe from the basic identity
that the Farey sequence is
-separated in
. We may thus invoke the analytic large sieve inequality (see Proposition 6 of Notes 3) to conclude that
But from construction of , the left-hand side of (52) is equal to both
and
, and the claim follows.
Note that if one uses the sharp form of the large sieve inequality (Remark 7 from Notes 3) then one can sharpen (52) to
Setting , one can then recover sharper versions of Theorem 32 and its consequences, for instance by working along these lines (and using a very sharp version of the analytic large sieve inequality that exploits the variable spacing of the set of fractions
), result of Montgomery and Vaughan were able to completely delete the error term in the Brun-Titchmarsh inequality (Exercise 33).
Remark 46 It is a general phenomenon that large sieve methods tend to give the same main term as the Selberg sieve; see Section 9.5 of Friedlander-Iwaniec for further discussion.
— 5. The parity problem —
We have seen that sieve theoretic techniques are reasonably good at obtaining upper bounds to various sieving problems, and can also provide lower bounds if the sifting limit is low. However, these methods have proven to be frustratingly ineffective at providing non-trivial lower bounds for the original problems that motivated the development of sieve theory in the first place, namely counting patterns of primes such as twin primes. While we do not fully understand the exact limitations of what sieve theory can and cannot do, we do have a convincing heuristic barrier, known as the parity barrier or parity problem and first raised explicitly by Selberg, which explains why sieves have so much difficulty in lower bounding (or efficiently upper bounding) counts involving primes; in particular they explain why sieve-theoretic methods have thus far been unable to fully resolve any of the four Landau problems. The barrier is not completely rigorous, as it relies on the Möbius pseudorandomness principle (or Liouville pseudorandomness principle) discussed in Section 2 of Supplement 4. As such, our discussion in this section will be somewhat informal in nature; for instance, we will use vague symbols such as without defining their meaning precisely.
Consider the general sieving problem in Problem 3, in which one tries to control the quantity via knowledge of the sums
, together with the hypothesis that the
are non-negative. In most situations, one expects this problem to be underdetermined; in particular, there will probably exist two different finitely supported sequences
,
of non-negative reals which are essentially indistinguishable with regards to the inputs of this sieve problem, in the sense that
for all , but which have very different outputs for this sieve problem, in that
Then any upper bound for Problem 3 with this choice of must exceed (at least approximately) the greater of
and
; similarly, any lower bound for this problem must be (approximately) less than the lesser of
and
.
As a special case of this observation, we have
Proposition 47 (Abstract parity barrier) (Informal) Let
be as in Problem 3. Let
be a finitely supported sequence of non-negative reals such that
for all
. Suppose that we have an additional “parity function”
with the property that
for all
in the support of
, but such that
for all
. Then any upper bound for Problem 3 cannot be significantly smaller than
, and any lower bound for Problem 3 cannot be significantly larger than zero. In particular, one does not expect to be able to show using such sieve methods that the support of
contains any appreciable number of elements outside of
.
Proof: (Non-rigorous!) For the statement about the upper bound, apply the previous discussion to the sequence . For the statement about the lower bound, apply the previous discussion to the sequence
.
Informally, the proposition asserts that if the sifted set is sensitive to the parity function
, but original pre-sifted sequence
and the sets
are not, then traditional sieve methods cannot give any non-trivial lower bounds, and any upper bounds must be off from the truth by a factor of at least two.
In practice, one can heuristically obey the hypotheses of this proposition if one takes to be some variant of the Liouville function
, which counts the parity of the number of prime factors of
, in which case the parity problem asserts (roughly speaking) that the sifting problem must allow both numbers with an even number of prime factors and numbers with an odd number of prime factors to survive if one is to have any hope of good lower bounds. For instance:
Corollary 48 (Parity barrier for twin primes) (Very informal) One cannot use traditional sieve methods to resolve the twin prime conjecture.
Proof: (Very non-rigorous! Also assumes Liouville pseudorandomness) As mentioned in the introduction, there are a number of different ways to try to approach the twin prime conjecture by sieve methods. Consider for instance the approach in which one starts with the primes in
and removes the residue class
for each
. The quantity
will then be
on the sifted set
(since
will be prime on that set), while the Liouville pseudorandomness heuristic predicts that
is negligible for all (say). The claim then follows from Proposition 47.
Exercise 49 (Informal) Develop similar parity obstructions for the other sieve-theoretic approaches to the twin prime conjecture. In the case when one is sieving out the integers by two residue classes
for each prime
, argue that any sieve-theoretic bound must in fact be off from the truth by a factor of at least four. (Hint: experiment with weights such as
.)
Exercise 50 (Informal) Argue why any sieve-theoretic upper bound on prime
-tuples that is based on equidistribution estimates for the natural numbers and primes must be off from the truth by a factor of at least
. (This should be compared with the known upper bounds of
and
discussed in previous sections. It is not clear at present which of these bounds is closest to the true limit of sieve theory.)
The above argument is far from watertight, and there have been successful attempts to circumvent the parity barrier. For instance:
- (i) The parity barrier does not prohibit the possibility of producing a very small number of prime (or near-prime) patterns, even if one cannot hope to get lower bounds that are of the “right” order of magnitude. For instance, it was shown unconditionally by Goldston, Graham, Pintz, and Yildirim that there exist infinitely many pairs of consecutive numbers
, each of which are the product of exactly four primes. Cramér type random models (see Supplement 4) predict the number of such pairs up to
to be approximately
(ignoring some factors of
), and the parity barrier prevents one from getting lower bounds of this magnitude; but the construction of Goldston, Graham, Pintz, and Yildirim only produces a quite sparse set of pairs (about
or so) and so evades the parity barrier. Using a very different class of arguments, it is also possible to produce analogues of patterns like twin primes in the function field setting (in which the role of primes are replaced by irreducible polynomials over a finite field) by using very sparse sequences of polynomials for which irreducibility can be tested easily; see for instance this paper of Hall for the case of twin irreducible polynomials.
- (ii) The parity barrier may disappear if one is somehow in a situation in which the Möbius or Liouville pseudorandomness principles break down. An extreme example of this is when one assumes the existence of one (or many) Siegel zeroes, which imply extremely high correlation between the Möbius or Liouville functions and a Dirichlet character, in strong violation of the pseudorandomness principle. A striking example of this case occurs in this paper of Heath-Brown, which demonstrates the truth of the twin prime conjecture if one assumes the existence of an infinite sequence of Siegel zeroes.
- (iii) Traditional sieve theoretic methods only assume that one can control “linear” or “Type I” sums of the form
. However, in some cases it is also possible to control “bilinear” or “Type II” sums such as
for various coefficients
of controlled size and support but uncontrolled phase. Note that multiplicative parity functions such as
will be distinguishable with such sums, since
will not exhibit any cancellation if
both oscillate with the sign of
respectively; as such, the parity barrier is no longer in effect. In such situations it can be possible to go beyond the limits of traditional sieve theory, for instance by using decompositions similar to the Vaughan identity employed in Notes 3. Of course, one still has to actually establish the required bilinear sum asymptotics, and this typically requires some deep and non-trivial input outside of sieve theory. A model example here is the work of Friedlander and Iwaniec establishing the infinitude of primes of the form
. Here, the relevant bilinear input is obtained using the algebraic structure of the Gaussian integers and some non-trivial results on lattice counting problems.
- (iv) Sieve theory is focused on “one-dimensional” sums, such as
, or perhaps
. The presence of the parity problem relies on the ability to modify weights such as
or
in such sums in such a fashion that the value of the sum dramatically changes. If one instead is trying to control a multi-dimensional sum, such as
(which roughly speaking counts arithmetic progressions of primes of length four), then there is no easy way to modify any of the factors of such a sum in the fashion required for the parity barrier to be in effect. For such multi-dimensional sums one can then hope to use additional estimates to produce non-trivial lower bounds. For instance, with Ben Green, we used Selberg sieves combined with the additional tool of Szemerédi’s theorem to obtain lower bounds for multi-dimensional sums similar to the one above, proving in particular that the primes contained arbitrarily long arithmetic progressions.
It is one of the major problems in analytic prime number theory to continue to find more ways to defeat the parity barrier, so that we can produce prime patterns in many more situations than what is currently possible. However, this has proven to be a surprisingly challenging task, and one which typically requires the injection of new ideas and methods external to sieve theory.
Remark 51 As mentioned previously, it is not known in general if the parity problem represents the only obstruction to sieve methods. However, under the additional hypothesis of the Elliott-Halberstam conjecture, and when focusing on “binary” problems such as the twin prime conjecture, there is a more detailed analysis of the parity problem due to Bombieri which indicates that, in some sense, the parity problem is indeed the only information not captured by sieve-theoretic methods, in that the asymptotic distribution of primes or almost primes in pairs
is completely determined by a single parameter
that represents the ratio between the actual number of twin primes and the expected number of twin primes; the parity problem prevents us from obtaining any further information about
other than that it lies between
and
, but once
is known, essentially all other asymptotics regarding the prime factorisation of
and
will also be known. See Chapter 16 of Friedlander-Iwaniec for further discussion.
Remark 52 For a slightly different way to formalise the parity problem, see this previous blog post. Among other things, it is shown in that post that the parity problem prevents one from proving Theorem 38 with any choice of
with
using purely sieve-theoretic methods, in particular ruling out a solution to the twin prime conjecture by sieve methods.
41 comments
Comments feed for this article
21 January, 2015 at 7:27 pm
Fred Lunnon
Remark 46 — missing reference?
[Corrected, thanks – T.]
22 January, 2015 at 12:17 am
observer
Remark 46: sieve sieve.
[Corrected, thanks – T.]
22 January, 2015 at 2:37 am
Bo Jacoby
Thank you. When I read from the email all formulas appear superscripted like exponents. When I read from https://terrytao.wordpress.com/2015/01/21/254a-notes-4-some-sieve-theory/#comments it looks all right. Can I do something to make the email look all right?
22 January, 2015 at 5:09 am
Anonymous
In Problem 2, it seems (in line 3) that “
” should be “
“.
should appear as the LHS of the displayed formula.
Also, it seems that
[Corrected, thanks – T.]
22 January, 2015 at 6:40 am
tomcircle
Reblogged this on Math Online Tom Circle.
22 January, 2015 at 7:38 am
arch1
In Problem 2, if we omit z itself in the product defining P(z), how does this avoid the residue classes 0, -2 (p) when p=z?
[Corrected, thanks – T.]
22 January, 2015 at 12:28 pm
arch1
In (6), you need to incorporate the immediately preceding crude bound.
[Corrected, thanks – T.]
22 January, 2015 at 4:50 pm
arch1
Proposition 14(i): The superscripts in
and
should instead be subscripts.
Proposition 23: n+2 -> p+2
[Corrected, thanks – T.]
23 January, 2015 at 6:32 pm
arch1
In the 1st sentence after (3), should
instead be
?
[Corrected, thanks – T.]
23 January, 2015 at 9:01 pm
kodlu
“… theorem of Zhang which (among other things) used an upper bound sieve was used to demonstrate the existence of infinitely many pairs of primes whose difference was bounded.”
could be rewritten
“…theorem of Zhang in which (among other things) an upper bound sieve was used to demonstrate the existence of infinitely many pairs of primes whose difference was bounded.”
[Corrected, thanks – T.]
26 January, 2015 at 6:01 am
arch1@gmail.com
As a beginner it took me a while to see that the terms in the expansion of (2)ās RHS correspond to the subsets of Pās prime factors, which in turn correspond to the divisors d of P on the following line.
[Additional step added to clarify – T.]
27 January, 2015 at 2:23 am
Anonymous
In exercise 13 should’t it be
in the last productory, instead of
? Very nice notes, by the way, hope I get some time to look at them carefully.
[Corrected, thanks – T.]
29 January, 2015 at 5:34 am
arch1
Maybe overkill, but it would have helped me if (4) had started with
and (5) with 
29 January, 2015 at 11:32 am
254A, Supplement 5: The linear sieve and Chen’s theorem (optional) | What's new
[…] continue the discussion of sieve theory from Notes 4, but now specialise to the case of the linear sieve in which the sieve dimension is equal to , […]
22 February, 2015 at 9:02 am
254A, Notes 7: Linnik’s theorem on primes in arithmetic progressions | What's new
[…] options here; we will use a variant of the “continuous Selberg sieve” from Section 2 of Notes 4. Fix a smooth function that equals on and is supported on ; we allow implied constants to depend […]
25 February, 2015 at 6:01 am
arch1
In the first sentence after Problem 2, should
instead be
?
[Corrected, thanks – T.]
30 March, 2015 at 12:49 pm
254A, Notes 8: The Hardy-Littlewood circle method and Vinogradov’s theorem | What's new
[…] function is somewhat similar to the continuous Selberg sieve weights studied in Notes 4, with the main difference being that we did not square the divisor sum as we will not need to take […]
28 April, 2015 at 4:23 am
Karaskas
Theorem 44: The condition about q being squre-free looks unnecessary thanks to the definition of h(q), doesn’t it?
In the definition of Fourier transform shouldn’t it be
instead of
? It seems that if we define the transform without minus sign, then we will have to multiply the function f by
during the reducing to the
case.
[Corrected, thanks – T.]
4 August, 2015 at 3:13 am
Anonymous
Are these notes available as pdf? I want to take a printout.
18 September, 2015 at 4:46 pm
The logarithmically averaged Chowla and Elliott conjectures for two-point correlations; the Erdos discrepancy problem | What's new
[…] such as (2) or (3) are known to be subject to the “parity problem” (discussed numerous times previously on this blog), which roughly speaking means that they cannot be proven solely using […]
23 November, 2015 at 9:24 pm
A cheap version of Halasz’s inequality | What's new
[…] This follows for instance from the fundamental lemma of sieve theory (see e.g. Corollary 19 of this blog post). (One can also use the Selberg sieve or the large […]
17 July, 2016 at 8:54 am
Notes on the Bombieri asymptotic sieve | What's new
[…] where denotes the derivative of . Note the loss of that had previously been pointed out. In the arguments that follows I will be a little brief with the details, as they are standard (see e.g. this previous post). […]
25 September, 2016 at 8:19 am
manhtuankhtn
In the definition of $G(t_1,t_2)$ after equation (44): should the term $dt_1dt_2$ removed?
[Corrected, thanks – T.]
25 September, 2016 at 11:44 am
Anonymous
In the definition of $G(t_1,t_2)$ (after (44)) and $E_p(t_1,t_2)$: $1+p^{\frac{1+it_2}{\log R}}$ should be $p^{1+\frac{1+it_2}{\log R}}$.
[Corrected, thanks – T.]
25 September, 2016 at 11:54 am
Anonymous
”which by the asymptotic $\sum_p \frac{1}{p^s} = \log(s-1) + O(1)$ for $s > 1$” should be ”…. $-\log(s-1)$…” (missing minus sign).
[Corrected, thanks – T.]
28 September, 2016 at 12:44 am
Anonymous
I have a question which is somewhat related to the Brun-Titchmarsh inequality. It would be great if you can give me a reference. Does the following statement hold: Almost all natural numbers
in the interval
has the property that all prime factors of
are at most
, where
grows slowly to infinity with
. Please ignore (delete) my question if you think it is inappropriate for this post. Thanks!
28 September, 2016 at 8:02 am
Terence Tao
There has been a lot of work on counting smooth numbers in short intervals, see e.g. this paper of Croot http://people.math.gatech.edu/~ecroot/smooth6.pdf for some results and references.
18 December, 2018 at 6:31 am
Anonymous
Exercise 26 (Approximations to Legendreās conjecture):
I guess it should be O(y/logx) instead of O(y/logy), shouldn’t it?
18 December, 2018 at 3:56 pm
Terence Tao
No, one can only hope to get
here (one can only make good use of the portion of the sieve of Eratosthenes up to primes of size about
). Note for instance that the number of primes in
does not go to zero in the limit
(indeed, we believe it equal to 2 infinitely often!)
8 November, 2019 at 5:23 pm
254A, Notes 2: Complex-analytic multiplicative number theory | What's new
[…] 10 In Notes 4 we will use a similar method to that used to prove Theorem 9 to estimate sums such […]
16 June, 2021 at 1:25 pm
Anonymous
Dear Prof. Tao,
if you Ctrl+F for “taking s”, you’ll find a stray
.
16 June, 2021 at 1:27 pm
Webspinner
Also, for the proof of Brun’s theorem, one can’t just set
, because the
-notation does not yield a bound on the entire real line, but only a complement-bounded segment of it. One instead needs the containment of prime numbers within the set of (suitably) rough numbers.
16 June, 2021 at 1:33 pm
Webspinner
Finally, right after eqn. (19),
is probably meant to be the level of distribution, and not the sieve level.
16 June, 2021 at 1:35 pm
Webspinner
Oh, and regarding social matters: Perhaps my comment about the German saying has been percieved as somewhat grumpy. Yet, I observed that you, Prof. Tao, usually implement suggestions within the day, and I was put off by the fact that for my suggestions, that hasn’t happened yet.
17 June, 2021 at 7:54 am
Webspinner
Regarding lemma 17, I wonder whether the formulation might be optimised. Right now there is some class of functions obeying (22), and for this class one has the dependency on the sieve dimension. However, in the subsequent exercises one does not use the class of functions and the dependency on
at all. As it stands, it might be misunderstood to state that the bound depends on
and nothing else, which may be true but is not proven here, because (22) is assumed as an axiom.
Also, I’d be happy to get some kind of feedback from you whether my comments are welcome here or whether I’m just wasting my time.
17 June, 2021 at 8:21 am
Webspinner
Also, in the paragraph after (22), I’m not sure at all whether the inequality is sufficient for the desired bound, which in
needs to hold even for the small primes, if I’m not mistaken. I worry that a weird fluctuation in the beginning might throw the bound off. Also, I’m not quite sure whether the inequality is enough, because one bound is from above and the other from below.
17 June, 2021 at 8:51 am
Anonymous
Just to be clear: I mean fluctuations in
, not
(even though it best not be zero for small primes).
17 June, 2021 at 1:21 pm
Webspinner
In the proof of lemma 17, for the lower bound of
, the third equation should probably not be strict, because the case
is not yet ruled out. Two equations later, there is a
missing in the exponent of
.
Dude, I’m not trying to piss anyone off. But it would just be nice to have a freely available reference, and with a bit of collaboration, we can make it happen!
24 June, 2021 at 9:31 am
Webspinner
Hello there,
this is a gentle reminder kindly asking you to implement the above suggestions.
30 July, 2021 at 12:30 pm
Webspinner
I give up. I’ve made some good faith suggestions, and all I got was an estranging remark by a certain “Anonymous” that said “Spin somewhere else” (see the comment section to chapter 2 of this lecture). I’m going to take the advice of this “Anonymous” seriously. Farewell.
30 July, 2021 at 12:32 pm
Webspinner
I should add that this comment has not been removed yet, as of today.