The parity problem is a notorious problem in sieve theory: this theory was invented in order to count prime patterns of various types (e.g. twin primes), but despite superb success in obtaining upper bounds on the number of such patterns, it has proven to be somewhat disappointing in obtaining lower bounds. [Sieves can also be used to study many other things than primes, of course, but we shall focus only on primes in this post.] Even the task of reproving Euclid’s theorem – that there are infinitely many primes – seems to be extremely difficult to do by sieve theoretic means, unless one of course injects into the theory an estimate at least as strong as Euclid’s theorem (e.g. the prime number theorem). The main obstruction is the parity problem: even assuming such strong hypotheses as the Elliott-Halberstam conjecture (a sort of “super-generalised Riemann Hypothesis” for sieves), sieve theory is largely (but not completely) unable to distinguish numbers with an odd number of prime factors from numbers with an even number of prime factors. This “parity barrier” has been broken for some select patterns of primes by injecting some powerful non-sieve theory methods into the subject, but remains a formidable obstacle in general.

I’ll discuss the parity problem in more detail later in this post, but I want to first discuss how sieves work [drawing in part on some excellent unpublished lecture notes of Iwaniec]; the basic ideas are elementary and conceptually simple, but there are many details and technicalities involved in actually executing these ideas, and which I will try to suppress for sake of exposition.

Let’s consider a basic question in prime number theory, namely how to count the number of primes in a given range, say between N and 2N for some large integer N. (This problem is more or less equivalent to that of counting primes between 1 and N, thanks to dyadic decomposition, but by keeping the magnitude of all numbers comparable to N we can simplify some (very minor) technicalities.) Of course, we know that this particular question can be settled fairly satisfactorily (the answer is (1+o(1)) \frac{N}{\log N}) using known facts about the Riemann zeta function, but let us pretend for now that we do not know about this function. (Once one moves to slightly more complicated additive questions about the primes, such as counting twin primes, the theory of the zeta function and its relatives becomes much less powerful, even assuming such things as the Riemann hypothesis; the problem is that these functions are measuring the multiplicative structure of the primes rather than the additive structure.)

The set of primes does not appear to have enough usable structure in order to perform such counts quickly. However, one can count other sets of numbers between N and 2N with much more ease. For instance, the set of integers between N and 2N can be easily counted with small error:

| \{ n \in [N,2N]: n \hbox{ integer} \}| = N + O(1);

the error term O(1) in this case is in fact just 1. Similarly, we can count, say, the number of odd numbers between N and 2N,

| \{ n \in [N,2N]: n \hbox{ odd} \}| = \frac{1}{2} N + O(1),

simply because the set of odd numbers has density \frac{1}{2} and is periodic of period 2. The error term O(1) now depends on the parity of N. More generally, we can count any given residue class in [N,2N] to a reasonable accuracy:

| \{ n \in [N,2N]: n = a\mod q \}| = \frac{1}{q} N + O(1),

where the error term is now more complicated, and depends on what N is doing modulo q. This estimate is quite good as long as q is small compared with N, but once q is very large, the error term O(1) can begin to overwhelm the main term (especially if the main term is going to appear in a delicate summation with lots of cancellation). In general, any summation involving the main term N/q will be relatively easy to manipulate (because it is essentially multiplicative in q, and thus amenable to all the methods of multiplicative number theory, in particular Euler products and zeta functions); it is the error term O(1) which causes all the difficulty.

Once we have figured out how to count these basic sets, we can also count some combinations of these sets, as long as these combinations are simple enough. For instance, suppose we want to count

| \{ n \in [N,2N]: n \hbox{ coprime to } 2,3 \}| (1).

Well, we know that the total number of integers in [N,2N] is N+O(1). Of this set, we know that \frac{1}{2} N + O(1) are not coprime to 2 (i.e. divisible by 2), and that \frac{1}{3} N + O(1) are not coprime to 3. So we should subtract those two sets from the original set, leaving \frac{1}{6}N + O(1). But the numbers which are divisible by both 2 and 3 (i.e. divisible by 6) have been subtracted twice, so we have to put them back in; this adds in another \frac{1}{6} N + O(1), giving a final count of \frac{1}{3} N + O(1) for the quantity (1); this is of course a simple instance of the inclusion-exclusion principle in action. An alternative way to estimate (1) is to use the Chinese remainder theorem to rewrite (1) as

|\{ n \in [N,2N]: n = 1,5 \mod 6 \}|

and use our ability to count residue classes modulo 6 to get the same final count of \frac{1}{3} N + O(1) (though the precise bound on the error term will be slightly different). For very small moduli such as 2 and 3, the Chinese remainder theorem is quite efficient, but it is somewhat rigid, and for higher moduli (e.g. for moduli much larger than \log N) it turns out that the more flexible inclusion-exclusion principle gives much better results (after applying some tricks to optimise the efficiency of that principle).

We can of course continue the example in (1), counting the numbers in [N,2N] which are coprime to 2,3,5,7, etc., which by the sieve of Eratosthenes will eventually give us a count for the primes in [N,2N], but let us pause for a moment to look at the larger picture. We have seen that some sets in [N,2N] are fairly easy to count accurately (e.g. residue classes with small modulus), and others are not (e.g. primes, twin primes). What is the defining characteristic of the former types of sets? One reasonable answer is that the sets that are easy to count are low-complexity, but this is a rather vaguely defined term. I would like to propose instead that sets (or more generally, weight functions – see below) are easy to count (or at least estimate) whenever they are smooth in a certain sense to be made more precise shortly. This terminology comes from harmonic analysis rather than from number theory (though number theory does have the related concept of a smooth number), so I will now digress a little bit to talk about smoothness, as it seems to me that this concept implicitly underlies the basic strategy of sieve theory.

Instead of talking about the problem of (approximately) counting a given set in [N,2N], let us consider instead the analogous problem of (approximately) computing the area of a given region E (e.g. a solid ellipse) in the unit square {}[0,1]^2. As we are taught in high school, one way to do this is to subdivide the square into smaller squares, e.g. squares of length 10^{-n} for some n, and count how many of these small squares lie completely or partially in the set E, and multiply by the area of each square; this is of course the prelude to the Riemann integral. It works well as long as the set E is “smooth” in the sense that most of the small squares are either completely inside or completely outside the set E, with few borderline cases; this notion of smoothness can also be viewed as a quantitative version of Lebesgue measurability. Another way of saying this is that if one wants to determine whether a given point (x,y) lies in E, it is usually enough just to compute x and y to the first n significant digits in the decimal expansion.

Now we return to counting sets in [N,2N]. One can also define the notion of a “smooth set” here by again using the most significant digits of the numbers n in the interval [N,2N]; for instance, the set [1.1 N, 1.2 N] would be quite smooth, as one would be fairly confident whether n would lie in this set or not after looking at just the top two or three significant digits. However, with this “Euclidean” or “Archimedean” notion of smoothness, sets such as the primes or the odd numbers are certainly not smooth. However, things look a lot better if we change the metric, or (more informally) if we redefine what “most significant digit” is. For instance, if we view the last digit in the base 10 expansion of a number n (i.e. the value of n \mod 10) as the most significant one, rather than the first – or more precisely, if we use the 10-adic metric intead of the Euclidean one, thus embedding the integers into {\Bbb Z}_{10} rather than into {\Bbb R} – then the odd numbers become quite smooth (the most significant digit completely determines membership in this set). The primes in [N,2N] are not fully smooth, but they do exhibit some partial smoothness; indeed, if the most significant digit is 0, 2, 4, 5, 6, or 8, this fully determines membership in the set, though if the most significant digit is 1, 3, 7, or 9 then one only has partial information on membership in the set.

Now, the 10-adic metric is not fully satisfactory for characterising the elusive concept of number-theoretic “smoothness”. For instance, the multiples of 3 should be a smooth set, but this is not the case in the 10-adic metric (one really needs all the digits before one can be sure whether a number is a multiple of 3!). Also, we have the problem that the set [N/2,N] itself is now no longer smooth. This can be fixed by working not with just the Euclidean metric or a single n-adic metric, but with the product of all the n-adic metrics and the Euclidean metric at once. Actually, thanks to the Chinese remainder theorem, it is enough to work with the product of the p-adic metrics for primes p and the Euclidean metric, thus embedding the integers in the integer adele ring {\Bbb R} \times \prod_p {\Bbb Z}_p. For some strange reason, this adele ring is not explicitly used in most treatments of sieve theory, despite its obvious relevance (and despite the amply demonstrated usefulness of this ring in algebraic number theory or in the theory of L-functions, as exhibited for instance by Tate’s thesis). At any rate, we are only using the notion of “smoothness” in a very informal sense, and so we will not need the full formalism of the adeles here. Suffice to say that a set of integers in [N,2N] is “smooth” if membership in that set can be largely determined by its most significant digits in the Euclidean sense, and also in the p-adic senses for small p; roughly speaking, this means that this set is approximately the pullback of some “low complexity” set in the adele ring – a set which can be efficiently fashioned out of a few of basic sets which generate the topology and \sigma-algebra of that ring. (Actually, in many applications of sieve theory, we only need to deal with moduli q which are square-free, which means that we can replace the p-adics {\Bbb Z}_p with the cyclic group {\Bbb Z}/p{\Bbb Z}, and so it is now just the residues mod p for small p, together with the Euclidean most significant digits, which should control what smooth sets are; thus the adele ring has been replaced by the product {\Bbb R} \times \prod_p ({\Bbb Z}/p{\Bbb Z}).)

Let us now return back to sieve theory, and the task of counting “rough” sets such as the primes in [N,2N]. Since we know how to accurately count “smooth” sets such as \{ n \in [N,2N]: n = a \mod q \} with q small, one can try to describe the rough set of primes as some sort of combination of smooth sets. The most direct implementation of this idea is the sieve of Eratosthenes; if one then tries to compute the number of primes using the inclusion-exclusion principle, one obtains the Legendre sieve; we implicitly used this idea previously when counting the quantity (1). However, the number of terms in the inclusion-exclusion formula is very large; if one runs the sieve of Eratosthenes for k steps (i.e. sieving out multiples of the first k primes), there are basically 2^k terms in the inclusion-exclusion formula, leading to an error term which in the worst case could be of size O(2^k). A related issue is that the modulus q in many of the terms in the Legendre sieve become quite large – as large as the product of the first k primes (which turns out to be roughly e^k in size). Since the set one is trying to count is only of size N, we thus see that the Legendre sieve becomes useless after just \log N or so steps of the Eratosthenes sieve, which is well short of what one needs to accurately count primes (which requires that one uses N^{1/2}/\log N or so steps). More generally, “exact” sieves such as the Legendre sieve are useful for any situation involving only a logarithmically small number of moduli, but are unsuitable for sieving with much larger numbers of moduli.

One can describe the early development of sieve theory as a concerted effort to rectify the drawbacks of the Legendre sieve. The first main idea here is to not try to compute the size of the rough set exactly – as this is too “expensive” in terms of the number of smooth sets required to fully describe the rough set – but instead to just settle for upper or lower bounds on the size of this set, which use fewer smooth sets. There is thus a tradeoff between how well the bounds approximate the original set, and how well one can compute the bounds themselves; by selecting various parameters appropriately one can optimise this tradeoff and obtain a final bound which is non-trivial but not completely exact. For instance, in using the Legendre sieve to try to count primes between N and 2N, one can instead use that sieve to count the much larger set of numbers between N and 2N which are coprime to the first k primes, thus giving an upper bound for the primes between N and 2N. It turns out that the optimal value of k here is roughly \log N or so (after this, the error terms in the Legendre sieve get out of hand), and give an upper bound of O( N / \log \log N ) for the number of primes between N and 2N – somewhat far from the truth (which is \sim N/\log N), but still non-trivial.

In a similar spirit, one can work with various truncated and approximate versions of the inclusion-exclusion formula which involve fewer terms. For instance, to estimate the cardinality |\bigcup_{j=1}^k A_j| of the union of k sets, one can replace the inclusion-exclusion formula

|\bigcup_{j=1}^k A_j| = \sum_{j=1}^k |A_j| - \sum_{1 \leq j_1 < j_2 \leq k} |A_{j_1} \cap A_{j_2}|

+\sum_{1 \leq j_1 < j_2 < j_3 \leq k} |A_{j_1}\cap A_{j_2} \cap A_{j_3}| \ldots (2)

by the obvious upper bound

|\bigcup_{j=1}^k A_j| \leq \sum_{j=1}^k |A_j|

(also known as the union bound), or by the slightly less obvious lower bound

|\bigcup_{j=1}^k A_j| \geq \sum_{j=1}^k |A_j| - \sum_{1 \leq j_1 < j_2 \leq k} |A_{j_1} \cap A_{j_2}|.

More generally, if one takes the first n terms on the right-hand side of (2), this will be an upper bound for the left-hand side for odd n and a lower bound for even n. These inequalities, known as the Bonferroni inequalities, are a nice exercise to prove: they are equivalent to the observation that in the binomial identity

0 = (1-1)^m = \binom{m}{0} - \binom{m}{1} + \binom{m}{2} - \binom{m}{3} + \ldots + (-1)^m \binom{m}{m}

for any m \geq 1, the partial sums on the right-hand side alternate in sign between non-negative and non-positive. If one inserts these inequalities into the Legendre sieve and optimises the parameter, one can improve the upper bound for the number of primes in [N,2N] to O( N \log \log N / \log N ), which is significantly closer to the truth. Unfortunately, this method does not provide any lower bound other than the trivial bound of 0; either the main term is negative, or the error term swamps the main term. A similar argument was used by Brun to show that the number of twin primes in [N,2N] was O( N (\log \log N / \log N)^2 ) (again, the truth is conjectured to be \sim N / \log^2 N), which implied his famous theorem that the sum of reciprocals of the twin primes is convergent.

The full inclusion-exclusion expansion is a sum over 2^k terms, which one can view as binary strings of 0s and 1s of length k. In the Bonferroni inequalities, one only sums over a smaller collection of strings, namely the Hamming ball of strings which only involve n or fewer 1s. There are other collections of strings one can use which lead to upper or lower bounds; one can imagine revealing such a string one digit at a time and then deciding whether to keep or toss out this string once some threshold rule is reached. There are various ways to select these thresholding rules, leading to the family of combinatorial sieves. One particularly efficient such rule is similar to that given by the Bonferroni inequalities, but instead of using the number of 1s in a string to determine membership in the summation, one uses a weighted number of 1s (giving large primes more weight than small primes, because they tend to increase the modulus too quickly and thus should be removed from the sum sooner than the small primes). This leads to the beta sieve, which for instance gives the correct order of magnitude of O( N / \log N ) for the number of primes in [N,2N] or O( N / \log^2 N ) for the number of twin primes in [N,2N]. This sieve is also powerful enough to give lower bounds, but only if one stops the sieve somewhat early, thus enlarging the set of primes to a set of almost primes (numbers which are coprime to all numbers less than a certain threshold, and thus have a bounded number of prime factors). For instance, this sieve can show that there are an infinite number of twins n, n+2, each of which has at most nine prime factors (the nine is not optimal, but to get better results requires much more work).

There seems however to be a limit as to what can be accomplished by purely combinatorial sieves. The problem stems from the “binary” viewpoint of such sieves: any given term in the inclusion-exclusion expansion is either included or excluded from the sieve upper or lower bound, and there is no middle ground. This leads to the next main idea in modern sieve theory, which is to work not with the cardinalities of sets in [N,2N], but rather with more flexible notion of sums of weight functions (real-valued functions on [N,2N]). The starting point is the obvious formula

|A| = \sum_{n \in [N, 2N]} 1_A(n)

for the cardinality of a set A in [N,2N], where 1_A is the indicator function of the set A. Applying this to smooth sets such as \{ n \in [N,2N]: n = a \mod q\}, we obtain

\sum_{n \in [N, 2N]} 1_{n=a \mod q}(n) = \frac{N}{q} + O(1);

in particular, specialising to the 0 residue class a = 0 (which is the residue class of importance for counting primes) we have

\sum_{n \in [N,2N]} 1_{d|n}(n) = \frac{N}{d} + O(1)

for any d. Thus if we can obtain a pointwise upper bound on 1_A by a divisor sum (which is a number-theoretic analogue of a smooth function), thus

1_A(n) \leq \sum_{d} c_d 1_{d|n}(n) (3)

for all n and some real constants c_d (which could be positive or negative), then on summing we obtain the upper bound

|A| \leq N \sum_{d} \frac{c_d}{d} + O( \sum_d |c_d| ). (4)

One can also hope to obtain lower bounds on |A| by a similar procedure (though in practice, lower bounds for primes have proven to be much more difficult to obtain, due to the parity problem which we will discuss below). These strategies are suited for the task of bounding the number of primes in [N,2N]; if one wants to do something fancier such as counting twin primes n, n+2, one has to either involve more residue classes (e.g. the class a = -2 will play a role in the twin prime problem) or else insert additional weights in the summation (e.g. weighting all summations in n by an additional factor of \Lambda(n+2), where \Lambda is the von Mangoldt function). To simplify the exposition let us just stick with the plainer problem of counting primes.

These strategies generalise the combinatorial sieve strategy, which is a special case in which the constants c_d are restricted to be +1, 0, or -1. In practice, the sum \sum_{d} \frac{c_d}{d} in (4) is relatively easy to sum by multiplicative number theory techniques (the coefficients c_d, in applications, usually involve the Möbius function (not surprising, since they are encoding some sort of inclusion-exclusion principle) and are often related to the coefficients of a Hasse-Weil zeta function, as they basically count solutions modulo d to some set of algebraic equations), and the main task is to ensure that the error term in (3) does not swamp the main term. To do this, one basically needs the weights c_d to be concentrated on those d which are relatively small compared with N, for instance they might be restricted to some range d \leq R where the sieve level R = N^\theta is some small power of N. Thus for instance, starting with the identity

\Lambda(n) = \sum_{d|n} \mu(d) \log\frac{n}{d} = -\sum_{d|n} \mu(d) \log(d), (5)

which corresponds to the zeta-function identity

-\frac{\zeta'(s)}{\zeta(s)} = \zeta(s) \frac{d}{ds} \frac{1}{\zeta(s)},

where \Lambda is the von Mangoldt function and \mu is the Möbius function, we obtain the upper bound

1_A(n) \leq - \sum_{d \leq 2N} \mu(d) \frac{\log d}{\log N} 1_{d|n}

where A denotes the primes from N to 2N. This is already enough (together with the elementary asymptotic \sum_{d \leq 2N} \frac{\mu(d)}{d} \log d = O(1)) to obtain the weak prime number theorem |A| = O( N / \log N ), but unfortunately this method does not give a nontrivial lower bound for |A|. However, a variant of the method does give a nice asymptotic for P_2 almost primes – products of at most two (large) primes [e.g. primes larger than N^\epsilon for some fixed \epsilon > 0]. Indeed, if one introduces the second von Mangoldt function

\Lambda_2(n) = \sum_{d|n} \mu(d) \log^2(\frac{n}{d}) = \Lambda(n) \log n + \sum_{d|n} \Lambda(d) \Lambda(\frac{n}{d}) (6)

which is mostly supported on P_2 almost primes (indeed, \Lambda_2(p) = \log^2 p and \Lambda_2(pq) = 2\log p \log q for distinct primes p, q, and \Lambda_2 is mostly zero otherwise), and uses the elementary asymptotic

\sum_{d \leq N} \frac{\mu(d) }{d} \log^2 \frac{N}{d} = 2 \log N + O(1)

then one obtains the Selberg symmetry formula

\sum_{n \leq N} \Lambda_2(N) = 2 N \log N + O(N).

This formula (together with the weak prime number theorem mentioned earlier) easily implies an “P_2 almost prime number theorem”, namely that the number of P_2 almost primes less than N is (2+o(1)) \frac{N}{\log N}. [This fact is much easier to prove than the prime number theorem itself. In terms of zeta functions, the reason why the prime number theorem is difficult is that the simple pole of \frac{\zeta'(s)}{\zeta(s)} at s=1 could conceivably be counteracted by other simple poles on the line \hbox{Re}(s)=1. On the other hand, the P_2 almost prime number theorem is much easier because the effect of the double pole of \frac{\zeta''(s)}{\zeta(s)} at s=1 is not counteracted by the other poles on the line \hbox{Re}(s)=1, which are at most simple.]

The P_2 almost prime number theorem establishes the prime number theorem “up to a factor of 2”. It is surprisingly difficult to improve upon this factor of 2 by elementary methods, though once one can replace 2 by 2-\epsilon for some \epsilon > 0 (a fact which is roughly equivalent to the absence of zeroes of \zeta(s) on the line \hbox{Re}(s)=1), one can iterate the Selberg symmetry formula (together with the tautological fact that an P_2 almost prime is either a prime or the product of two primes) to get the prime number theorem; this is essentially the Erdős-Selberg elementary proof of that theorem.

One can obtain other divisor bounds of the form (3) by various tricks, for instance by modifying the weights in the above formulae (5) and (6). A surprisingly useful upper bound for the primes between N and 2N is obtained by the simple observation that

1_A(n) \leq (\sum_{d < N} \lambda_d 1_{d|n}(n))^2

whenever \lambda_d are arbitrary real numbers with \lambda_1 = 1, basically because the square of any real number is non-negative. This leads to the Selberg sieve, which suffices for many applications; for instance, it can prove the Brun-Titchmarsh inequality, which asserts that the number of primes between N and N+M is at most (2+o(1)) M / \log M, which is again off by a factor of 2 from the truth when N and M are reasonably comparable. There are also some useful lower bounds for the indicator function of the almost primes of divisor sum type, which can be used for instance to derive Chen’s theorem that there are infinitely many primes p such that p+2 is a P_2 almost prime, or the theorem that there are infinitely many P_2 almost primes of the form n^2+1.

In summary, sieve theory methods can provide good upper bounds, lower bounds, and even asymptotics for almost primes, which lead to upper bounds for primes which tend to be off by a constant factor such as 2. Rather frustratingly, though, sieve methods have proven largely unable to count or even lower bound the primes themselves, thus leaving the twin prime conjecture (or the conjecture about infinitely many primes of the form n^2+1) still out of reach. The reason for this – the parity problem – was first clarified by Selberg. Roughly speaking, it asserts:

Parity problem. If A is a set whose elements are all products of an odd number of primes (or are all products of an even number of primes), then (without injecting additional ingredients), sieve theory is unable to provide non-trivial lower bounds on the size of A. Also, any upper bounds must be off from the truth by a factor of 2 or more.

Thus we can hope to count P_2 almost primes (because they can have either an odd or an even number of factors), or to count numbers which are the product of 6 or 7 primes (which can for instance be done by a sieve of Bombieri), but we cannot hope to use plain sieve theory to just count primes, or just count semiprimes (the product of exactly two primes).

To explain this problem, we introduce the Liouville function \lambda(n) (a close relative of the Möbius function), which is equal to +1 when n is the product of an even number of primes and -1 otherwise. Thus the parity problem applies whenever \lambda is identically +1 or identically -1 on the set A of interest.

The Liouville function oscillates quite randomly between +1 and -1. Indeed, the prime number theorem turns out to be equivalent to the assertion that \lambda is asymptotically of mean zero,

\sum_{n \leq N} \lambda(n) = o(N)

(a fact first observed by Landau), and if the Riemann hypothesis is true then we have a much better estimate

\sum_{n \leq N} \lambda(n) = O_\epsilon(N^{1/2+\epsilon}) for all \epsilon > 0.

Assuming the generalised Riemann hypothesis, we have a similar claim for residue classes:

\sum_{n \leq N} 1_{n = a \mod q} \lambda(n) = O_\epsilon(N^{1/2+\epsilon}) for all \epsilon > 0.

What this basically means is that the Liouville function is essentially orthogonal to all smooth sets, or all smooth functions. Since sieve theory attempts to estimate everything in terms of smooth sets and functions, it thus cannot eliminate an inherent ambiguity coming from the Liouville function. More concretely, let A be a set where \lambda is constant (e.g. \lambda is identically -1, which would be the case if A consisted of primes) and suppose we attempt to establish a lower bound for the size of a set A in, say, [N,2N] by setting up a divisor sum lower bound

1_A(n) \geq \sum_{d} c_d 1_{d|n}(n), (6)

where the divisors d are concentrated in d \leq R for some reasonably small sieve level R. If we sum in n we obtain a lower bound of the form

|A| \geq \sum_{d} c_d \frac{N}{d} + ... (7)

and we can hope that the main term \sum_{d} c_d \frac{N}{d} will be strictly positive and the error term is of lesser order, thus giving a non-trivial lower bound on |A|. Unfortunately, if we multiply both sides of (6) by the non-negative weight 1 + \lambda(n) and sum in n, we obtain

0 \geq \sum_{d} c_d 1_{d|n}(n) (1 + \lambda(n))

since we are assuming \lambda to equal -1 on A. If we sum this in n, and use the fact that \lambda is essentially orthogonal to divisor sums, we obtain

0 \geq \sum_{d} c_d \frac{N}{d} + ...

which basically means that the bound (7) cannot improve upon the trivial bound |A| \geq 0. A similar argument using the weight 1 - \lambda(n) also shows that any upper bound on |A| obtained via sieve theory has to essentially be at least as large as 2|A|.

Despite this parity problem, there are a few results in which sieve theory, in conjunction with other methods, can be used to count primes. The first of these is the elementary proof of the prime number theorem alluded to earlier, using the multiplicative structure of the primes inside the almost primes. This method unfortunately does not seem to generalise well; for instance, the product of twin primes is not a twin almost prime. Other examples arise if one starts counting certain special two-parameter families of primes; for instance, Friedlander and Iwaniec showed that there are infinitely many primes of the form a^2 + b^4 by a lengthy argument which started with Vaughan’s identity, which is sort of like an exact sieve, but with a (non-smooth) error term which has the form of a bilinear sum, which captures correlation with the Liouville function. The main difficulty is to control this bilinear error term, which after a number of (non-trivial) arithmetic manipulations (in particular, factorising a^2 + b^4 over the Gaussian integers) reduces to understanding some correlations between the Möbius function and the Jacobi symbol, which is then achieved by a variety of number-theoretic tools. The method was then modified by Heath-Brown to also show infinitely many primes of the form a^3 + 2b^3. Related results for other cubic forms using similar methods have since been obtained by Heath-Brown and Moroz and by Helfgott (analogous claims for quadratic forms date back to Iwaniec). These methods all seem to require that the form be representable as a norm over some number field and so it does not seem as yet to yield a general procedure to resolve the parity problem.

The parity problem can also be sometimes be overcome when there is an exceptional Siegel zero, which basically means that there is a quadratic character \chi(n) = \left( \frac{n}{q} \right) which correlates very strongly with the primes. Morally speaking, this means that the primes can be largely recovered from the P_2 almost primes as being those almost primes which are quadratic non-residues modulo the conductor q of \chi, and this additional information seems (in principle, at least) to overcome the parity problem obstacle (related to this is the fact that Siegel zeroes, if they exist, disprove GRH, and so the Liouville function is no longer as uniformly distributed on smooth sets as Selberg’s analysis assumed). For instance, Heath-Brown showed that if a Siegel zero existed, then there are infinitely many prime twins. Of course, assuming GRH then there are no Siegel zeroes, in which case these results would be technically vacuous; however, they do suggest that to break the parity barrier, we may assume without loss of generality that there are no Siegel zeroes.

Another known way to partially get around the parity problem is to combine precise asymptotics on almost primes (or of weight functions concentrated near the almost primes) with a lower bound on the number of primes, and then use combinatorial tools to parlay the lower bound on primes into lower bounds on prime patterns. For instance, suppose you knew could count the set

A := \{ n \in [N,2N]:  n, n+2, n+6 \in P_2\}

accurately (where P_2 is the set of P_2-almost primes), and also obtain sufficiently good lower bounds on the sets

A_1 := \{ n \in A: n \hbox{ prime} \}

A_2 := \{ n \in A: n+2 \hbox{ prime} \}

A_3 := \{ n \in A: n+6 \hbox{ prime} \},

and more precisely that one obtains

|A_1| + |A_2| + |A_3| > |A|.

(For comparison, the parity problem predicts that one cannot hope to do any better than showing that |A_1|, |A_2|, |A_3| \geq |A|/2, so the above inequality is not ruled out by the parity problem obstruction.)

Then, just from the pigeonhole principle, one deduces the existence of n \in [N,2N] such that at least two of n, n+2, n+6 are prime, thus yielding a pair of primes whose gap is at most 6. This naive approach does not quite work directly, but by carefully optimising the argument (for instance, replacing the condition n, n+2, n+6 \in P_2 with something more like n(n+2)(n+6) \in P_6), Goldston, Yildirim, and Pintz were able to show unconditionally that prime gaps in {}[N, 2N] could be as small as o(\log N), and could in fact be as small as 16 infinitely often if one assumes the Elliot-Halberstam conjecture.

In a somewhat similar spirit, my result with Ben Green establishing that the primes contain arbitrarily long progressions proceeds by first using sieve theory methods to show that the almost primes (or more precisely, a suitable weight function \nu concentrated near the almost primes) are very pseudorandomly distributed, in the sense that several self-correlations of \nu can be computed and agree closely with what one would have predicted if the almost primes were distributed randomly (after accounting for some irregularities caused by small moduli). Because of the parity problem, the primes themselves are not known to be as pseudorandomly distributed as the almost primes; however, the prime number theorem does at least tell us that the primes have a positive relative density in the almost primes. The main task is then to show that any set of positive relative density in a sufficiently pseudorandom set contains arithmetic progressions of any specified length; this combinatorial result (a “relative Szemerédi theorem“) plays roughly the same role that the pigeonhole principle did in the work of Goldston-Yildirim-Pintz. (On the other hand, the relative Szemerédi theorem works even for arbitrarily low density, whereas the pigeonhole principle does not; because of this, our sieve theory analysis is far less delicate than that in Goldston-Yildirim-Pintz.)

It is probably premature, with our current understanding, to try to find a systematic way to get around the parity problem in general, but it seems likely that we will be able to find some further ways to get around the parity problem in special cases, and perhaps once we have assembled enough of these special cases, it will become clearer what to do in general.

[Update, June 6: definition of almost prime modified, to specify that all prime factors are large. Reference to profinite integers deleted due to conflict with established notation.]