For any , let denote the assertion that there are infinitely many pairs of consecutive primes whose difference is at most , or equivalently that
thus for instance is the notorious twin prime conjecture. While this conjecture remains unsolved, we have the following recent breakthrough result of Zhang, building upon earlier work of Goldston-Pintz-Yildirim, Bombieri, Fouvry, Friedlander, and Iwaniec, and others:
Theorem 1 (Zhang’s theorem) is true for some finite .
In fact, Zhang’s paper shows that is true with .
About a month ago, the Polymath8 project was launched with the objective of reading through Zhang’s paper, clarifying the arguments, and then making them more efficient, in order to improve the value of . This project is still ongoing, but we have made significant progress; currently, we have confirmed that holds for as low as , and provisionally for as low as subject to certain lengthy arguments being checked. For several reasons, our methods (which are largely based on Zhang’s original argument structure, though with numerous refinements and improvements) will not be able to attain the twin prime conjecture , but there is still scope to lower the value of a bit further than what we have currently.
The precise arguments here are quite technical, and are discussed at length on other posts on this blog. In this post, I would like to give a “high level” summary of how Zhang’s argument works, and give some impressions of the improvements we have made so far; these would already be familiar to the active participants of the Polymath8 project, but perhaps may be of value to people who are following this project on a more casual basis.
While Zhang’s arguments (and our refinements of it) are quite lengthy, they are fortunately also very modular, that is to say they can be broken up into several independent components that can be understood and optimised more or less separately from each other (although we have on occasion needed to modify the formulation of one component in order to better suit the needs of another). At the top level, Zhang’s argument looks like this:
- Statements of the form are deduced from weakened versions of the Hardy-Littlewood prime tuples conjecture, which we have denoted (the stands for “Dickson-Hardy-Littlewood”), by locating suitable narrow admissible tuples (see below). Zhang’s paper establishes for the first time an unconditional proof of for some finite ; in his initial paper, was , but we have lowered this value to (and provisionally to ). Any reduction in the value of leads directly to reductions in the value of ; a web site to collect the best known values of in terms of has recently been set up here (and is accepting submissions for anyone who finds narrower admissible tuples than are currently known).
- Next, by adapting sieve-theoretic arguments of Goldston, Pintz, and Yildirim, the Dickson-Hardy-Littlewood type assertions are deduced in turn from weakened versions of the Elliott-Halberstam conjecture that we have denoted (the stands for “Motohashi-Pintz-Zhang”). More recently, we have replaced the conjecture by a slightly stronger conjecture to significantly improve the efficiency of this step (using some recent ideas of Pintz). Roughly speaking, these statements assert that the primes are more or less evenly distributed along many arithmetic progressions, including those that have relatively large spacing. A crucial technical fact here is that in contrast to the older Elliott-Halberstam conjecture, the Motohashi-Pintz-Zhang estimates only require one to control progressions whose spacings have a lot of small prime factors (the original conjecture requires the spacing to be smooth, but the newer variant has relaxed this to “densely divisible” as this turns out to be more efficient). The parameter is more important than the technical parameter ; we would like to be as large as possible, as any increase in this parameter should lead to a reduced value of . In Zhang’s original paper, was taken to be ; we have now increased this to be almost as large as (and provisionally ).
- By a certain amount of combinatorial manipulation (combined with a useful decomposition of the von Mangoldt function due Heath-Brown), estimates such as can be deduced from three subestimates, the “Type I” estimate , the “Type II” estimate , and the “Type III” estimate , which all involve the distribution of certain Dirichlet convolutions in arithmetic progressions. Here is an adjustable parameter that demarcates the border between the Type I and Type III estimates; raising makes it easier to prove Type III estimates but harder to prove Type I estimates, and lowering of course has the opposite effect. There is a combinatorial lemma that asserts that as long as one can find some between and for which all three estimates , , hold, one can prove . (The condition arises from the combinatorics, and appears to be rather essential; in fact, it is currently a major obstacle to further improvement of and hence and .)
- The Type I estimates are asserting good distribution properties of convolutions of the form , where are moderately long sequences which have controlled magnitude and length but are otherwise arbitrary. Estimates that are roughly of this type first appeared in a series of papers by Bombieri, Fouvry, Friedlander, Iwaniec, and other authors, and Zhang’s arguments here broadly follow those of previous authors, but with several new twists that take advantage of the many factors of the spacing . In particular, the dispersion method of Linnik is used (which one can think of as a clever application of the Cauchy-Schwarz inequality) to ultimately reduce matters (after more Cauchy-Schwarz, as well as treatment of several error terms) to estimation of incomplete Kloosterman-type sums such as
Zhang’s argument uses classical estimates on this Kloosterman sum (dating back to the work of Weil), but we have improved this using the “-van der Corput -process” introduced by Heath-Brown and Ringrose.
- The Type II estimates are similar to the Type I estimates, but cover a small hole in the coverage of the Type I estimates which comes up when the two sequences are almost equal in length. It turns out that one can modify the Type I argument to cover this case also. In practice, these estimates give less stringent conditions on than the other two estimates, and so as a first approximation one can ignore the need to treat these estimates, although recently our Type I and Type III estimates have become so strong that it has become necessary to tighten the Type II estimates as well.
- The Type III estimates are an averaged variant of the classical problem of understanding the distribution of the ternary divisor function in arithmetic progressions. There are various ways to attack this problem, but most of them ultimately boil down (after the use of standard devices such as Cauchy-Schwarz and completion of sums) to the task of controlling certain higher-dimensional Kloosterman-type sums such as
In principle, any such sum can be controlled by invoking Deligne’s proof of the Weil conjectures in arbitrary dimension (which, roughly speaking, establishes the analogue of the Riemann hypothesis for arbitrary varieties over finite fields), although in the higher dimensional setting some algebraic geometry is needed to ensure that one gets the full “square root cancellation” for these exponential sums. (For the particular sum above, the necessary details were worked out by Birch and Bombieri.) As such, this part of the argument is by far the least elementary component of the whole. Zhang’s original argument cleverly exploited some additional cancellation in the above exponential sums that goes beyond the naive square root cancellation heuristic; more recently, an alternate argument of Fouvry, Kowalski, Michel, and Nelson uses bounds on a slightly different higher-dimensional Kloosterman-type sum to obtain results that give better values of . We have also been able to improve upon these estimates by exploiting some additional averaging that was left unused by the previous arguments.
As of this time of writing, our understanding of the first three stages of Zhang’s argument (getting from to , getting from or to , and getting to or from Type I, Type II, and Type III estimates) are quite satisfactory, with the implications here being about as efficient as one could hope for with current methods, although one could still hope to get some small improvements in parameters by wringing out some of the last few inefficiencies. The remaining major sources of improvements to the parameters are then coming from gains in the Type I, II, and III estimates; we are currently in the process of making such improvements, but it will still take some time before they are fully optimised.
Below the fold I will discuss (mostly at an informal, non-rigorous level) the six steps above in a little more detail (full details can of course be found in the other polymath8 posts on this blog). This post will also serve as a new research thread, as the previous threads were getting quite lengthy.
— 1. Finding narrow prime tuples —
For any , we define an admissible -tuple to be a tuple of distinct integers , such that avoids at least one residue class modulo for each prime . Thus for instance is an admissible -tuple, and is an admissible -tuple, but is not admissible (it does not avoid any residue classes modulo ).
The prime tuples conjecture asserts that whenever is an admissible -tuple, then there are infinitely many translates that consist entirely of primes. This conjecture is still out of reach of current methods, but we can hope to establish results of the following form for a given :
Conjecture 2 () If is an admissible -tuple, then there are infinitely many translates of that contain at least two primes.
Let be the minimal diameter of an admissible -tuple: thus for instance , , and . It is clear that if holds for some , then is true for , since we have an admissible -tuple of diameter , and any translate of that tuple that contains two primes will generate a pair of consecutive primes of distance at most apart. So we can use statements of the form to deduce estimates of the form , provided that we can find admissible -tuples that are as narrow as possible, in order to obtain as good a value of that one can hope to get.
Perhaps the easiest way to obtain an admissible -tuple of reasonably narrow width is to take “the first primes past “, i.e. the tuple , where is the first prime larger than . It is not difficult to show that this is an admissible tuple. Zhang used this construction with to obtain ; as observed later by Trudgian, this construction actually has diameter , giving the bound .
For large , this construction gives an asymptotic of the form
as can be seen by a careful application of the prime number theorem.
One can do better than this in a number of ways. Firstly, it turns out that one can reduce a bit and still have admissible; for , it turns out after some computation that this trick lowers the diameter a little bit to ; asymptotically this gives
(because can be taken to be ). A larger improvement comes from the work of Hensley and Richards, namely to shift the tuple to cover positive and negative primes around the origin, in order to exploit the slightly increased density of the primes in this interval. Specifically, if one uses a tuple of the form
and optimises in , one can bound . Asymptotically, this gives the slight improvement
One can get a little more improvement by deleting a few primes from one end of this tuple and adding them to another, although the gains here are modest. Incidentally, Hensley and Richards used this improved asymptotic to demonstrate that the prime tuples conjecture was inconsistent with the second Hardy-Littlewood conjecture, which is now widely believed to be false.
One can view the Hensley-Richard construction as a “sieve” similar to the sieve of Eratosthenes, formed by starting with the integers in an interval and striking out the residue class for all primes up to . (In principle, this sieve could leave a few extra survivors, namely powers of primes larger than , together with their negations, though in practice such powers fall outside the interval of interest.) This sieve already does most of the “work” of establishing admissibility as it handles all the small moduli, leaving only the larger moduli to be checked computationally. It was observed by Schinzel that one could do very slightly better than this by sieving out the odd integers rather than the even integers , leaving behind numbers of the form for various primes ; this is conjecturally believed to give the bound
By similarly sieving out for other small primes , it is in fact conjectured that
(see this paper of Gordon and Rodemich for a heuristic justification) although for the values of that we have worked with, these asymptotically superior constructions are actually numerically inferior to what one can get just from the first version of the Schinzel sieve. As before, some slight further gains can be obtained by shifting the interval that one is sieving over.
More recently, we have worked with tuples that initially started from partially performing the sieving from a Schinzel construction, but at some point switching to a “greedy” algorithm in which one deletes, for each prime, the residue class that removes the fewest elements from the tuple. The next technical innovation was a “merging” approach in which one took two reasonably good tuples already constructed, merged them together by taking their union, and then greedily pruning away residue classes to obtain a merged tuple that is hopefully superior to either of its two “parents”. This procedure can be iterated, and also combined with local optimisations in which the sieving interval is shifted or a small number of elements are otherwise added and deleted. This has produced many of the best constructions so far, typically about 5% narrower than what previous constructions such as the Schinzel sieve construction gave. (For small values of , say , the older work of Engelsma on finding narrow tuples, which used slightly different methods, is often still the record-holder.
Very recently, a web page has been set up to record the best known tuples of cardinality up to (including the older data of Engelsma), and to accept submissions for new “record holders”. This will help with further improvements to , but the database of narrow tuples is likely to have other uses in effective prime number theory as well.
Numerical regression suggests that is roughly for small or medium values of . There is no theoretical justification for this bound; the best lower bounds we have on asymptotically come from the Brun-Titchmarsh theorem and its refinements (and in particular various versions of the large sieve), and take the form
It looks very difficult to improve upon this loss of (this problem may possibly be related to the parity problem in sieve theory, although the precise connection is unclear).
Further information on this part of the Polymath8 project may be found at this page.
— 2. Sieving and Dickson-Hardy-Littlewood type results —
Now we informally describe how to obtain results of the form . Zhang’s approach (and thus ours) follows the basic strategy given by the breakthrough paper of Goldston, Pintz, and Yildirim establishing small (but not bounded) gaps between primes.
Let be an admissible -tuple. The basic difficulty in trying to locate translates that contain two or more primes is that the primes get sparser and sparser when one goes out to infinity, so if one chooses to be an integer chosen at random from (say) the interval , one would expect most such tuples to in fact contain no primes.
The trick of Goldston, Pintz, and Yildirim is not to pick uniformly from , but instead to pick in a weighted fashion that significantly increases the probability of each member of the translated tuple being prime. To oversimplify, suppose we could construct a moderately large subset of with the property that for each , we have that is prime for more than of the elements of . Then just from the pigeonhole principle, one could conclude the existence of at least one such that is prime for at least two choices of , which give .
Of course, one has to choose a set for which one can prove the desired claim, that each is prime for more than of the elements of . Intuitively, the strategy of Goldston, Pintz, and Yildirim is to let consist of those for which the quantity is almost prime – that it only has a relatively small number of prime factors (not much more than ).
Actually, it turns out that it is best not to try to use a set for the above strategy, but rather a non-negative weight function on which is concentrated on those for which is almost prime. One then seeks to maximise the ratio
for each ; in particular, if one can get these ratios all above , one has proven .
One then needs to construct a non-negative weight for which the ratio (1) is (a) computable, and (b) as large as possible. For this one uses a version of the Selberg sieve, choosing of the form
for some weights to be optimised later. In order to be able to compute both the numerator and denominator of (1), one needs to limit the sieve weights to be supported on the range for some sieve level , which one would like to be as large as possible to give maximum flexibility to the sieve. The denominator of (1) is relatively easy to compute (as long as does not exceed , but the numerator is more difficult. Expanding out the formula for , the numerator becomes
As is standard in analytic number theory, it is slightly more convenient to replace this sum by the closely related sum
where is the von Mangoldt function, which serves as a useful proxy for the prime numbers.
The summation is restricted to a fairly small number of residue classes modulo , which is a modulus of size at most . To control the numerator of (1), one thus needs some sort of distribution result for the primes (or the von Mangoldt function) in arithmetic progressions of modulus at least as large as . If one had to get good distribution for all moduli up to this level, this would be a task comparable to the generalised Riemann hypothesis, and thus hopeless; but fortunately we only need such distribution results on average for such moduli . For this, standard tools such as the Bomberi-Vinogradov theorem are available, at least for up to about or so. This was already strong enough for Goldston, Pintz, and Yildirim to obtain reasonably narrow prime gaps (in which , for instance), but to get bounded prime gaps it turns out to be necessary to obtain distribution results up to a slightly higher exponent for some . The standard distribution conjecture in this direction is the Elliott-Halberstam conjecture
for any fixed , where is a primitive residue class modulo for each , and is the Euler totient function. This estimate is conjectured to hold for arbitrarily close to , and Goldston, Pintz and Yildirim showed that the assertion (2) for even a small was sufficient to imply for some (the condition they obtained was asymptotically of the form , and if one could get sufficiently close to , one could in fact obtain ). Unfortunately, the Elliott-Halberstam conjecture (2) remains open for any choice of .
Zhang made a number of observations to get around this obstacle. First, he observed that the residue classes needed for the application to were not completely arbitrary, but were linked to each other, basically obeying a Chinese remainder theorem
for any coprime . (These congruences basically come from the roots of the polynomial modulo .) Secondly, he observed that by suitably truncating the Selberg sieve, one did not need to establish (2) for all moduli , but only those moduli which were squarefree and smooth, in the sense that all prime divisors of were at most for some small . (This observation had also been made previously by Motohashi and Pintz.) One had the freedom to select as one wished, although if one set to be too small, the value of one obtained would increase.
One can then formulate a weakened version of (2) which we have called (the MPZ standing for Motohashi, Pintz, and Zhang), in which the are constrained to obey the Chinese remainder theorem and the are constrained to be square-free and smooth. Zhang was able to establish for , which he could then use to establish for (note that is roughly of the order of as mentioned previously).
We have improved Zhang’s argument in a number of ways. Firstly, we have made the sieving argument that deduces from results of the form (2) more efficient by optimising a certain cutoff function involved in the construction of the Selberg sieve; the optimal cutoff turns out to be given in terms of a certain Bessel function, and improves the relationship to ; see this post for details. Also, a modification of the truncated Selberg sieve due to Pintz allows one to take to be extremely small with almost no impact on (but at a certain technical cost, namely that the property of being smooth has to be weakened to another property which we have called “dense divisibility”, replacing the conjecture by a variant that we have called ); see this post for details. Finally, by optimising the remaining steps in Zhang’s argument (see below), we have managed to increase significantly, with the currently largest value of that we can use being close to (and provisionally to ). The combination of these improvements has allowed us to take as low as (and provisionally to ).
At this stage we believe our arguments converting (or variants thereof) to is about as efficient as can be hoped for given the methods we have, and that the most significant further improvements will be coming from raising beyond the currently available range.
— 3. The Heath-Brown identity and combinatorics —
Now we discuss how to establish estimates such as (2) in the case of smooth moduli, and in the case of congruences obeying a Chinese remainder theorem. Prior to Zhang’s paper, there were a number of results of this type established in a series of papers by (various combinations of) Bombieri, Fouvry, Friedlander, and Iwaniec, as well as some followup work by later authors, although the prior results were not directly applicable for the current problem (either because the function was replaced by a simpler arithmetic function, or the absolute values in (2) were replaced instead with a “well-factorable” weight, or the congruence classes involved were more specialised, e.g. of the form where was bounded and independent of ). However, Zhang was able to adapt and improve the analysis in the previous literature in a number of ways, to the point where he could in fact obtain an estimate of the required strength.
We won’t describe the entire proof here, but give some of the important ideas in these arguments. The first observation is that one can decompose the von Mangoldt function as a combination of simpler functions, such as the order divisor functions
the modified order divisor functions
or more generally certain combinations
of “nice” arithmetic functions (we will be vague about what “nice” means here), where is the usual Dirichlet convolution. The point is that functions with this convolution structure are easier to sum and estimate. For instance, the basic estimate
is of course the prime number theorem, which is already non-trivial, and improving the error term here to for some fixed is a major open problem (and would be a huge breakthrough towards the Riemann hypothesis). In contrast, the divisor function
is much easier to sum, and one can easily obtain the estimate
by the elementary Dirichlet hyperbola method, which ultimately relies on the Dirichlet convolution structure of . Similarly for the higher order divisor functions and their variants , although the error terms become harder to estimate well when increases.
Why would we expect to be expressible in terms of functions such as or ? To illustrate this sort of decomposition, let us make the major cheat that all numbers are square-free and have at most (say) four prime factors. (This is clearly quite a cheat; but note that a positive fraction of integers are square-free, and if we restrict attention to large prime factors, say factors larger than , then we do indeed have at most four such prime factors, so the cheat actually is not so terrible in some ways.) Then the quantity is equal to when has one prime factor and zero when has two, three, or four prime factors. Meanwhile, is equal to when has one, two, three, or four prime factors respectively. Similarly is equal to and is equal to . Thus we see in all cases that
Note that while a little computation is required to verify this identity, it is immediate from linear algebra that some identity of this type must exist, at least for square-free numbers with a bounded number of factors.
which is valid for all natural numbers ; this identity can be proven by starting with the formal identity
for the Riemann zeta function , differentiating this identity, then comparing Dirichlet series coefficients. (It is also a fun exercise to find a purely combinatorial proof of this identity.)
Morally speaking, Linnik’s identity reduces the task of proving (2) (or variants thereof) to the task of proving
for . However, this implication is not quite rigorous, because the parameter on the right-hand side is unbounded. There is however a clever truncation of Linnik’s identity due to Heath-Brown, which only has finitely many terms, each one of which resembles a divisor function , albeit one which is averaged by an additional Dirichlet convolution. The precise identity is a little complicated (although not difficult to prove); see this post for details. Using this identity, we can indeed reduce (2) to slightly more complicated versions of the identities (5). By decomposing the versions of (5) obtained in this fashion into dyadic pieces (actually, one uses a decomposition that is slightly finer than the dyadic decomposition, but never mind this technical detail), it turns out that one can organise the resulting estimates to be proven into three classes, which Zhang calls “Type I estimates”, “Type II estimates”, and “Type III estimates”. The precise formulation of such estimates depends on the two parameters alluded to earlier, as well as an additional parameter demarcating the boundary between the Type I and Type III estimates. (The lower bound is needed to effect the combinatorial partition into Type I, Type II, Type III sums; it would be nice to remove this bound, but this seems to necessitate the introduction of further types of estimates (“Type IV” and “Type V”) which we have so far been unable to deal with effectively.) A precise definition of these estimates may be found at this page, but let us for now give an oversimplified caricature of what these estimates look like. A model case of a Type I estimate takes the form
where are two sequences localised to scales and respectively, where one should think of and as being like and respectively; the functions also obey certain bounds similar to those enjoyed by the divisor functions (e.g. their average value should be ) which we do not give in detail here. The modulus should be understood to be squarefree and smooth as in the statement of (or if one is trying to prove instead, “smooth” needs to be replaced by “densely divisible”). The Type II sums are similar, except that one should now think of and as both being like . A model case of a Type III estimate takes the form
where , and again should be understood to be square-free and smooth. (This is an oversimplification; the true Type III estimate involves a smoothly truncated, averaged version of , and it is to compensate for this oversimplification that the upper bound for has changed.) The Type I estimates are harder to prove when gets larger, while the Type III estimates become easier. It is thus necessary to work out what estimates one can prove on both the Type I and Type III sides before one can optimise in . The Type II estimates are less important (and in practice, in all the regimes in which we can prove Type I and Type III estimates so far, we have also been able to prove Type II estimates), but still need to be dealt of course. Thus far we have not been able to improve upon this portion of Zhang’s arguments much; we rely on the same Heath-Brown identity and combinatorial decomposition that Zhang does, as they seem to be more or less optimally efficient for decompositions of this type, at least for the appliction at hand. The task of proving (or ) and hence and for various choices of parameters is thus reduced to that of verifying Type I, Type II, and Type III estimates.
— 4. Type I estimates —
Now we give a very informal sketch of Zhang’s argument for establishing Type I estimates (details may be found at these two posts). It is based on previous arguments of Bombieri, Friedlander, and Iwaniec, relying on various analytic manipulations such as the use of the Cauchy-Schwarz inequality (and in particular on the dispersion method of Linnik that is based on this inequality) and Fourier identities (such as the Pólya-Vinogradov method of completion of sums), in order to eliminate the poorly understood sequences , until one is only left with the task of estimating a certain incomplete Kloosterman-type sum, to which one uses standard tools from analytic number theory (such as the Weil bound for completed Kloosterman sums). We have continued to follow the same general strategy, but have found some improvements based on better bounds for incomplete Kloosterman sums.
We now give a little more detail. We will ignore the role of the technical parameter , and assume simply that all moduli are as smooth as needed for the argument. We change notation a little bit, replacing and by and , and consider a dyadic component of (8), namely
One now takes advantage of the square-free smooth nature of to factor , where are coprime, and is a little bit smaller than , so that is a little bit larger than ; the reason for this choice of factorisation is that it will turn out that we want to be as large as possible while still staying a bit less than . There is also a technical step in which we shift all the tiny prime factors of to and not to (where “tiny” means something like “less than “); this is needed to ensure that most pairs of ‘s that we encounter become coprime at a crucial juncture later in the argument. See this post for more details of this step. The congruence similarly factors into and by (3), so we now have to prove something like
where for some small and .
Roughly speaking, this estimate is asserting that we can estimate a quantity such as
to accuracy whenever are arbitrary bounded coefficients (where we will be a bit vague on what it means to estimate something to a given accuracy). Splitting as and ignoring the constraint , we may rearrange this expression as
We are unable to take much advantage of the averaging, and will thus seek to estimate
to accuracy for a fixed .
Now we start eliminating some of the poorly controlled expressions . Applying a Cauchy-Schwarz, it (morally) suffices to estimate
to accuracy , i.e. a little bit better than . Note that because is a little bit less than , we have a non-trivial number of obeying the congruence condition for any given . This prevents the summation inside the absolute values from being so trivial that Cauchy-Schwarz could not gain anything.
The above expression can be rearranged as
The constraint forces . Since is so close to , this basically forces to equal (or to equal a shift of for some small ). To simplify the exposition we will just look at the diagonal case (the off-diagonal terms are treated similarly). Our task is now to control
with accuracy better than . Using averaging notation, this is basically the same as controlling
with accuracy better than .
Recall that we had the foresight to remove all the tiny primes from . This has the effect of making typically coprime. We will discard the non-coprime contributions (it turns out that they can be controlled with relatively easy estimates) and insert the condition in the above summation. We can also make coprime to since the summation vanishes otherwise.
The summation is over an interval of length over a single congruence class of modulus . This sum can then be inverted via the Fourier transform to a dual sum over a range ; morally speaking we may replace
to accuracy better than .
We can eliminate the next by performing a Cauchy-Schwarz in , but it turns out to be a bit more efficient to Cauchy-Schwarz in and , not just in . This places us in the position of having to control
to accuracy better than , where we have suppressed the various coprimality conditions on the for brevity. So it suffices to get bounds on the exponential sum
for some modulus of typical size and a coefficient which turns out to typically be more or less coprime to except in a certain diagonal case , but it turns out in the Type I case that this diagonal contribution is manageable. A standard “Pólya-Vinogradov” type bound for such sums, ultimately based on Weil’s famous bound
for completed Kloosterman sums, allows one to bound (11) by (ignoring some logarithmic type errors and lower order terms). This gives us an acceptable estimate if
which may be rearranged as
(Recall that we are treating as negligible, otherwise we have to insert some factors of in this constraint as well.) Crucially, this constraint allows to exceed , and thus gives a good range of Type I estimates if is small enough. Further improvements have been obtained by using more refined estimates on incomplete Kloosterman sums (first discussed here); to date we have managed to (provisionally) replace the above constraint to
which turns out to give superior numerical results. This reveals a tradeoff, in that more advanced exponential sum estimates can improve at the expense of ; we have not yet located what the optimal balance for this tradeoff is.
— 5. Type II estimates —
The Type II estimates are treated quite similarly to the Type I estimates and will only be discussed briefly here. If one repeats the Type I estimates verbatim, most of the same arguments go through (in fact they become better because the terms are not present); however it turns out that the diagonal contribution that was manageable in the Type I setting, no longer becomes manageable in the Type II setting. To get around this one has to reduce the influence of the diagonal by taking a narrower Cauchy-Schwarz when eliminating from (10), namely that one only performs a Cauchy-Schwarz in rather than . The price one pays for this is that the off-diagonal terms become slightly worse (conceding a few more powers of ); the best estimate we currently have here (again ignoring issues) is
— 6. Type III estimates —
The Type III estimates (7) were originally treated by Zhang using the method of Weyl differencing. More recently, a simpler method which gives better numerology was introduced by Fouvry, Kowalski, Michel, and Nelson, and may be sketched as follows. Our goal here is to control something like
to accuracy better than , where are bounded coefficients, and is a quantity larger than (which we would like to take to be as large as possible). Expanding out the function and using dyadic decomposition, this basically amounts to controlling
to accuracy for some that multiply to (e.g. one can take for sake of discussion, although the precise choice of turns out to not be so important). Applying Fourier decomposition as in (9), one can write this expression roughly speaking as
where are the dual scales . It turns out that the contributions when vanish, or more generally share a common factor with , can be dealt with, so we ignore them and assume that are coprime with . In this case, we can write the inner sum after a change of variables as
The normalisation in (12) reflects the expected square root cancellation in this sum, which is a sum of about phases that one expects to behave more or less independently of each other. Indeed, using the deep results of Deligne on the Weil conjectures, one can show that
for coprime to (ignoring some log factors).
Replacing by (which can be done by adjusting the a little bit), our task is now to control
to accuracy better than . If we apply Deligne’s bound (13) right away, we get a bound of , which only works for , which recovers the Bombieri-Vinogradov theorem for but no better. To improve upon this we perform a Cauchy-Schwarz. We first perform the useful substitution , which rewrites the above sum as something like
where is a truncated version of (counting factorisations with ). Note that is given by , so that . We interchange the sum, so we are now asking to bound
to accuracy better than .
At this point it turns out to be optimal to use the smoothness of the modulus to split where , so we wish to control something like
to accuracy better than . If we Cauchy-Schwarz in and , we end up having to bound
to accuracy better than . The diagonal contribution is thanks to (13). Now we turn to the opposite case when are coprime. The goal is then to establish the estimate
for an incomplete Kloosterman correlation with net modulus . It turns out that there is a Pólya-Vinogradov type bound of for this sum (though this requires a non-trivial amount of Deligne’s Weil conjecture machinery to actually prove). So we are done if
which simplifies to , or equivalently . So this argument allows one to obtain (7) for up to , which corresponds to the constraint
which, happily, overlaps enough with the Type I bounds and the combinatorial constraint to give (or ) for relatively good values of (and , which we have ignored in this discussion). We have managed to (provisionally) improve upon this constraint to
by exploiting an additional averaging that is present in the true Type III estimate (and which was suppressed in (7)). Combining this with the best Type I estimate we currently have (and with the combinatorial constraint ) gives us the best value of currently available, namely (note that this is well below the Type II constraint, which may thus be ignored). Actually, the Type III estimates are now so strong that they are no longer critical for further improvements in the numerology; it is either the Type I estimate or the combinatorial constraint that needs improving instead. (Alternatively, if one is willing to use a worse value of , namely , one can use the Type I estimates to raise up to , which closes off the Type III case completely and allows for a slightly more elementary proof of Zhang’s theorem in that the full strength of Deligne’s proof of the Weil conjectures is no longer needed (although Weil’s classical bound on Kloosterman sums is still required).)