The twin prime conjecture is one of the oldest unsolved problems in analytic number theory. There are several reasons why this conjecture remains out of reach of current techniques, but the most important obstacle is the parity problem which prevents purely sieve-theoretic methods (or many other popular methods in analytic number theory, such as the circle method) from detecting pairs of prime twins in a way that can distinguish them from other twins of almost primes. The parity problem is discussed in these previous blog posts; this obstruction is ultimately powered by the Möbius pseudorandomness principle that asserts that the Möbius function is asymptotically orthogonal to all “structured” functions (and in particular, to the weight functions constructed from sieve theory methods).
However, there is an intriguing “alternate universe” in which the Möbius function is strongly correlated with some structured functions, and specifically with some Dirichlet characters, leading to the existence of the infamous “Siegel zero“. In this scenario, the parity problem obstruction disappears, and it becomes possible, in principle, to attack problems such as the twin prime conjecture. In particular, we have the following result of Heath-Brown:
Theorem 1 At least one of the following two statements are true:
- (Twin prime conjecture) There are infinitely many primes such that is also prime.
- (No Siegel zeroes) There exists a constant such that for every real Dirichlet character of conductor , the associated Dirichlet -function has no zeroes in the interval .
Informally, this result asserts that if one had an infinite sequence of Siegel zeroes, one could use this to generate infinitely many twin primes. See this survey of Friedlander and Iwaniec for more on this “illusory” or “ghostly” parallel universe in analytic number theory that should not actually exist, but is surprisingly self-consistent and to date proven to be impossible to banish from the realm of possibility.
for some large value of , where is the von Mangoldt function. Actually, in this post we will work with the slight variant
is the second von Mangoldt function, and denotes Dirichlet convolution, and is an (unsquared) Selberg sieve that damps out small prime factors. This sum also detects twin primes, but will lead to slightly simpler computations. For technical reasons we will also smooth out the interval and remove very small primes from , but we will skip over these steps for the purpose of this informal discussion. (In Heath-Brown’s original paper, the Selberg sieve is essentially replaced by the more combinatorial restriction for some large , where is the primorial of , but I found the computations to be slightly easier if one works with a Selberg sieve, particularly if the sieve is not squared to make it nonnegative.)
If there is a Siegel zero with close to and a Dirichlet character of conductor , then multiplicative number theory methods can be used to show that the Möbius function “pretends” to be like the character in the sense that for “most” primes near (e.g. in the range for some small and large ). Traditionally, one uses complex-analytic methods to demonstrate this, but one can also use elementary multiplicative number theory methods to establish these results (qualitatively at least), as will be shown below the fold.
The fact that pretends to be like can be used to construct a tractable approximation (after inserting the sieve weight ) in the range (where for some large ) for the second von Mangoldt function , namely the function
Roughly speaking, we think of the periodic function and the slowly varying function as being of about the same “complexity” as the constant function , so that is roughly of the same “complexity” as the divisor function
which is considerably simpler to obtain asymptotics for than the von Mangoldt function as the Möbius function is no longer present. (For instance, note from the Dirichlet hyperbola method that one can estimate to accuracy with little difficulty, whereas to obtain a comparable level of accuracy for or is essentially the Riemann hypothesis.)
One expects to be a good approximant to if is of size and has no prime factors less than for some large constant . The Selberg sieve will be mostly supported on numbers with no prime factor less than . As such, one can hope to approximate (1) by the expression
as it turns out, the error between this expression and (1) is easily controlled by sieve-theoretic techniques. Let us ignore the Selberg sieve for now and focus on the slightly simpler sum
Accordingly, let us look (somewhat informally) at the task of estimating the model sum (3). One can think of this problem as basically that of counting solutions to the equation with in various ranges; this is clearly related to understanding the equidistribution of the hyperbola in . Taking Fourier transforms, the latter problem is closely related to estimation of the Kloosterman sums
where is the greatest common divisor of (with the convention that this is equal to if vanish), and the decays to zero as . The Weil bound yields good enough control on error terms to estimate (3), and as it turns out the same method also works to estimate (2) (provided that with large enough).
Actually one does not need the full strength of the Weil bound here; any power savings over the trivial bound of will do. In particular, it will suffice to use the weaker, but easier to prove, bounds of Kloosterman:
Proof: Observe from change of variables that the Kloosterman sum is unchanged if one replaces with for . For fixed , the number of such pairs is at least , thanks to the divisor bound. Thus it will suffice to establish the fourth moment bound
The left-hand side can be rearranged as
which by Fourier summation is equal to
Observe from the quadratic formula and the divisor bound that each pair has at most solutions to the system of equations . Hence the number of quadruples of the desired form is , and the claim follows.
We will also need another easy case of the Weil bound to handle some other portions of (2):
Proof: As is the conductor of a primitive real Dirichlet character, is equal to times a squarefree odd number for some . By the Chinese remainder theorem, it thus suffices to establish the claim when is an odd prime. We may assume that is not divisible by this prime , as the claim is trivial otherwise. If vanishes then does not vanish, and the claim follows from the mean zero nature of ; similarly if vanishes. Hence we may assume that do not vanish, and then we can normalise them to equal . By completing the square it now suffices to show that
whenever . As is on the quadratic residues and on the non-residues, it now suffices to show that
But by making the change of variables , the left-hand side becomes , and the claim follows.
While the basic strategy of Heath-Brown’s argument is relatively straightforward, implementing it requires a large amount of computation to control both main terms and error terms. I experimented for a while with rearranging the argument to try to reduce the amount of computation; I did not fully succeed in arriving at a satisfactorily minimal amount of superfluous calculation, but I was able to at least reduce this amount a bit, mostly by replacing a combinatorial sieve with a Selberg-type sieve (which was not needed to be positive, so I dispensed with the squaring aspect of the Selberg sieve to simplify the calculations a little further; also for minor reasons it was convenient to retain a tiny portion of the combinatorial sieve to eliminate extremely small primes). Also some modest reductions in complexity can be obtained by using the second von Mangoldt function in place of . These exercises were primarily for my own benefit, but I am placing them here in case they are of interest to some other readers.
— 1. Consequences of a Siegel zero —
It is convenient to phrase Heath-Brown’s theorem in the following equivalent form:
Theorem 4 Suppose one has a sequence of real Dirichlet characters of conductor going to infinity, and a sequence of real zeroes with as . Then there are infinitely many prime twins.
Henceforth, we omit the dependence on from all of our quantities (unless they are explicitly declared to be “fixed”), and the asymptotic notation , , , etc. will always be understood to be with respect to the parameter, e.g. means that for some fixed . (In the language of this previous blog post, we are thus implicitly using “cheap nonstandard analysis”, although we will not explicitly use nonstandard analysis notation (other than the asymptotic notation mentioned above) further in this post. With this convention, we now have a single (but not fixed) Dirichlet character of some conductor with a Siegel zero
which can be proven by elementary means (see e.g. Exercise 57 of this post), although one can use Siegel’s theorem to obtain the better bound . Standard arguments (see also Lemma 59 of this blog post) then give
We now use this Siegel zero to show that pretends to be like for primes that are comparable (in log-scale) to :
For more precise estimates on the error, see the paper of Heath-Brown (particularly Lemma 3).
Proof: It suffices to show, for sufficiently large fixed , that
for each fixed natural number .
for some large (which we will eventually take to be a power of ); we will exploit the fact that this sum is very stable for comparable to in log-scale. By the Dirichlet hyperbola method, we can write this as
Since , one can show through summation by parts (see Lemma 71 of this previous post) that
for any , while from the integral test (see Lemma 2 of this previous post) we have
We can thus estimate (9) as
From summation by parts we again have
and we have the crude bound
so by using (7) and we arrive at
On the other hand, observe that is always non-negative, and that whenever and , with primes with . Since any number with has at most representations of the form with and , and no outside of the range has such a representation, we thus see that
Comparing this with (10), we conclude that
since , the claim follows.
— 2. Main argument —
We let be a large absolute constant ( will do) and set to be the primorial of . Set for some large fixed (large compared to or ). Let be a smooth non-negative function supported on and equal to at . Set
Thus is a smooth cutoff to the region , and is a smooth cutoff to the region . It will suffice to establish the lower bound
because the non-twin primes contribute at most to the left-hand side. The weight is an unsquared Selberg sieve designed to damp out those for which or have somewhat small prime factors; we did not square this weight as is customary with the Selberg sieve in order to simplify the calculations slightly (the fact that the weight can be non-negative sometimes will not be a serious concern for us).
Thus is non-negative, and supported on those products of primes with and . Convolving (11) by and using the identity , we have
the intuition here is that Lemma 5 is showing that is “sparse” and so the contribution of should be relatively small
We begin with (13). Let be a small fixed quantity to be chosen later. Observe that if is non-zero, then must have a factor on which is non-zero, which implies that is either divisible by a prime with , or by the square of a prime. If the former case occurs, then either or is divisible by ; since , this implies that either is divisible by a prime with , or that is divisible by a prime less than . To summarise, at least one of the following three statements must hold:
- is divisible by a prime .
- is divisible by the square of a prime .
- is divisible by a prime with .
as the claim then follows by summing and sending slowly to zero.
and the claim follows.
Next we turn to (14). We can very crudely bound
for all .
We use a modification of the argument used to prove Proposition 4.2 of this Polymath8b paper. By Fourier inversion, we may write
for some rapidly decreasing function , so that
and hence by the triangle inequality
for any fixed . Since , we can thus (after substituting ) bound the left-hand side of (18) by
for any and .
We factor where are primes, and then write where and is the largest index for which . Clearly and with , and the least prime factor of is such that
we have on the support of , and so
and thus . Clearly we have
We write , where denotes the number of prime factors of counting multiplicity. We can thus bound the left-hand side of (19) by
We may replace the weight with a restriction of to the interval . The constraint removes two residue classes modulo every odd prime less than , while the constraint restricts to residue classes modulo . Standard sieve theory then gives
and so we are reduced to showing that
Factoring , we can bound the left-hand side by
which (for large enough) is bounded by
which by Mertens’ theorem is bounded by
and the claim follows.
for all .
It remains to prove (12), which we write as
On the support of , we can write
The contribution of the error term can be bounded by
applying (20), this is bounded by which is acceptable for large enough. Thus it suffices to show that
which we write as
We begin with (24), which is a relatively easy consequence of the cancellation properties of . We may rewrite the left-hand side as
The summand vanishes unless , , and is coprime to , so that . For fixed , the constraints , restricts to residue classes of the form , with , in particular and for some with . Let us fix and consider the sum
Writing , this becomes
From Lemma 3, we have
since is coprime to . From summation by parts we thus have
(noting that if is large enough) and so we can bound the left-hand side of (24) in magnitude by
and (24) follows.
Now we prove (23), which is where we need nontrivial bounds on Kloosterman sums. Expanding out and using the triangle inequality, it suffices (for large enough) to show that
for all . By Fourier expansion of the and constraints (retaining only the restriction that is odd), it suffices to show that
for every .
Actually, we may delete the condition since this is implied by the constraints and odd.
We first dispose of the case when is large in the sense that . Making the change of variables , we may rewrite the left-hand side as
We can assume is coprime to and odd with coprime to and , as the contribution of all other cases vanish. The constraints that is odd and then restricts to a single residue class modulo , with restricted to a single residue class modulo . We split this into residue classes modulo to make the phase constant on each residue class. The modulus is not divisible by , since is coprime to and . As such, has mean zero on every consecutive elements in each residue class modulo under consideration, and from summation by parts we then have
and hence the contribution of the case to (25) is
which is acceptable.
It remains to control the contribution of the case to (25). By the triangle inequality, it suffices to show that
for all coprime to . We can of course restrict to be coprime to each other and to . Writing , the constraint is equivalent to
and so we can rewrite the left-hand side as
By Fourier expansion, we can write as a linear combination of with bounded coefficients and , so it suffices to show that
Next, by Fourier expansion of the constraint , we write the left-hand side as
for some integer , where denotes the distance from to the nearest integer. The contribution of the which do not satisfy this relation is easily seen to be acceptable. From the support of we see in particular that there are only remaining choices for . Thus it suffices by the triangle inequality to show that
for each of the form (26).
We rearrange the left-hand side as
Suppose first that is of the form for some integer . Then the phase is periodic with period and has mean zero here (since ). From this, we can estimate the inner sum by ; since is restricted to be of size , this contribution is certainly acceptable. Thus we may assume that is not of the form . A similar argument works when (say), so we may assume that , so that .
for any , so from Poisson summation we have
since is constrained to be , the claim follows.
Finally, we prove (22), which is a routine sieve-theoretic calculation. We rewrite the left-hand side as
The summand vanishes unless are coprime to with and . From Poisson summation one then has
The error term is certainly negligible, so it suffices to show that
We can control the left-hand side by Fourier analysis. Writing
for some rapidly decreasing functions , the left-hand side may be expressed as
for , and
for . From Mertens’ theorem we have the crude bound
which by the rapid decrease of allows one to restrict to the range with an error of . In particular, we now have .
for , we can factor
(the restriction being to prevent vanishing for and small) and one has
for , and
for odd . In particular, from the Cauchy integral formula we see that
for . Since we also have in this region, we thus can write (27) as
and our task is now to show that
when (even when have negative real part); since , we conclude from the Cauchy integral formula that
when . For the remaining primes , we have
when and . Summing in using Lemma 5 to handle those between and , and Mertens’ theorem and the trivial bound for all other , we conclude that
From this and the rapid decrease of , we may restrict the range of even further to for any that goes to infinity arbitrarily slowly with . For sufficiently slow , the above estimates on and Lemma 5 (now used to handle those between and for some going sufficiently slowly to zero) give
and so we are reduced to establishing that
We may once again use the rapid decrease of to remove the prefactor as well as the restrictions , and reduce to showing that
For large enough, it will suffice to show that
with the implied constant independent of . But the left-hand side evaluates to , and the claim follows.