For any {H \geq 2}, let {B[H]} denote the assertion that there are infinitely many pairs of consecutive primes {p_n, p_{n+1}} whose difference {p_{n+1}-p_n} is at most {H}, or equivalently that

\displaystyle  \lim\inf_{n \rightarrow \infty} p_{n+1} - p_n \leq H;

thus for instance {B[2]} is the notorious twin prime conjecture. While this conjecture remains unsolved, we have the following recent breakthrough result of Zhang, building upon earlier work of Goldston-Pintz-Yildirim, Bombieri, Fouvry, Friedlander, and Iwaniec, and others:

Theorem 1 (Zhang’s theorem) {B[H]} is true for some finite {H}.

In fact, Zhang’s paper shows that {B[H]} is true with {H = 70,000,000}.

About a month ago, the Polymath8 project was launched with the objective of reading through Zhang’s paper, clarifying the arguments, and then making them more efficient, in order to improve the value of {H}. This project is still ongoing, but we have made significant progress; currently, we have confirmed that {B[H]} holds for {H} as low as {12,006}, and provisionally for {H} as low as {6,966} subject to certain lengthy arguments being checked. For several reasons, our methods (which are largely based on Zhang’s original argument structure, though with numerous refinements and improvements) will not be able to attain the twin prime conjecture {B[2]}, but there is still scope to lower the value of {H} a bit further than what we have currently.

The precise arguments here are quite technical, and are discussed at length on other posts on this blog. In this post, I would like to give a “high level” summary of how Zhang’s argument works, and give some impressions of the improvements we have made so far; these would already be familiar to the active participants of the Polymath8 project, but perhaps may be of value to people who are following this project on a more casual basis.

While Zhang’s arguments (and our refinements of it) are quite lengthy, they are fortunately also very modular, that is to say they can be broken up into several independent components that can be understood and optimised more or less separately from each other (although we have on occasion needed to modify the formulation of one component in order to better suit the needs of another). At the top level, Zhang’s argument looks like this:

  1. Statements of the form {B[H]} are deduced from weakened versions of the Hardy-Littlewood prime tuples conjecture, which we have denoted {DHL[k_0,2]} (the {DHL} stands for “Dickson-Hardy-Littlewood”), by locating suitable narrow admissible tuples (see below). Zhang’s paper establishes for the first time an unconditional proof of {DHL[k_0,2]} for some finite {k_0}; in his initial paper, {k_0} was {3,500,000}, but we have lowered this value to {1,466} (and provisionally to {902}). Any reduction in the value of {k_0} leads directly to reductions in the value of {H}; a web site to collect the best known values of {H} in terms of {k_0} has recently been set up here (and is accepting submissions for anyone who finds narrower admissible tuples than are currently known).
  2. Next, by adapting sieve-theoretic arguments of Goldston, Pintz, and Yildirim, the Dickson-Hardy-Littlewood type assertions {DHL[k_0,2]} are deduced in turn from weakened versions of the Elliott-Halberstam conjecture that we have denoted {MPZ[\varpi,\delta]} (the {MPZ} stands for “Motohashi-Pintz-Zhang”). More recently, we have replaced the conjecture {MPZ[\varpi,\delta]} by a slightly stronger conjecture {MPZ'[\varpi,\delta]} to significantly improve the efficiency of this step (using some recent ideas of Pintz). Roughly speaking, these statements assert that the primes are more or less evenly distributed along many arithmetic progressions, including those that have relatively large spacing. A crucial technical fact here is that in contrast to the older Elliott-Halberstam conjecture, the Motohashi-Pintz-Zhang estimates only require one to control progressions whose spacings {q} have a lot of small prime factors (the original {MPZ[\varpi,\delta]} conjecture requires the spacing {q} to be smooth, but the newer variant {MPZ'[\varpi,\delta]} has relaxed this to “densely divisible” as this turns out to be more efficient). The {\varpi} parameter is more important than the technical parameter {\delta}; we would like {\varpi} to be as large as possible, as any increase in this parameter should lead to a reduced value of {k_0}. In Zhang’s original paper, {\varpi} was taken to be {1/1168}; we have now increased this to be almost as large as {1/148} (and provisionally {1/108}).
  3. By a certain amount of combinatorial manipulation (combined with a useful decomposition of the von Mangoldt function due Heath-Brown), estimates such as {MPZ[\varpi,\delta]} can be deduced from three subestimates, the “Type I” estimate {Type_I[\varpi,\delta,\sigma]}, the “Type II” estimate {Type_{II}[\varpi,\delta]}, and the “Type III” estimate {Type_{III}[\varpi,\delta,\sigma]}, which all involve the distribution of certain Dirichlet convolutions in arithmetic progressions. Here {1/10 < \sigma < 1/2} is an adjustable parameter that demarcates the border between the Type I and Type III estimates; raising {\sigma} makes it easier to prove Type III estimates but harder to prove Type I estimates, and lowering {\sigma} of course has the opposite effect. There is a combinatorial lemma that asserts that as long as one can find some {\sigma} between {1/10} and {1/2} for which all three estimates {Type_I[\varpi,\delta,\sigma]}, {Type_{II}[\varpi,\delta]}, {Type_{III}[\varpi,\delta,\sigma]} hold, one can prove {MPZ[\varpi,\delta]}. (The condition {\sigma > 1/10} arises from the combinatorics, and appears to be rather essential; in fact, it is currently a major obstacle to further improvement of {\varpi} and hence {k_0} and {H}.)
  4. The Type I estimates {Type_I[\varpi,\delta,\sigma]} are asserting good distribution properties of convolutions of the form {\alpha * \beta}, where {\alpha,\beta} are moderately long sequences which have controlled magnitude and length but are otherwise arbitrary. Estimates that are roughly of this type first appeared in a series of papers by Bombieri, Fouvry, Friedlander, Iwaniec, and other authors, and Zhang’s arguments here broadly follow those of previous authors, but with several new twists that take advantage of the many factors of the spacing {q}. In particular, the dispersion method of Linnik is used (which one can think of as a clever application of the Cauchy-Schwarz inequality) to ultimately reduce matters (after more Cauchy-Schwarz, as well as treatment of several error terms) to estimation of incomplete Kloosterman-type sums such as

    \displaystyle  \sum_{n \leq N} e_d( \frac{c}{n} ).

    Zhang’s argument uses classical estimates on this Kloosterman sum (dating back to the work of Weil), but we have improved this using the “{q}-van der Corput {A}-process” introduced by Heath-Brown and Ringrose.

  5. The Type II estimates {Type_{II}[\varpi,\delta]} are similar to the Type I estimates, but cover a small hole in the coverage of the Type I estimates which comes up when the two sequences {\alpha,\beta} are almost equal in length. It turns out that one can modify the Type I argument to cover this case also. In practice, these estimates give less stringent conditions on {\varpi,\delta} than the other two estimates, and so as a first approximation one can ignore the need to treat these estimates, although recently our Type I and Type III estimates have become so strong that it has become necessary to tighten the Type II estimates as well.
  6. The Type III estimates {Type_{III}[\varpi,\delta,\sigma]} are an averaged variant of the classical problem of understanding the distribution of the ternary divisor function {\tau_3(n) := \sum_{abc=n} 1} in arithmetic progressions. There are various ways to attack this problem, but most of them ultimately boil down (after the use of standard devices such as Cauchy-Schwarz and completion of sums) to the task of controlling certain higher-dimensional Kloosterman-type sums such as

    \displaystyle  \sum_{t,t' \in ({\bf Z}/d{\bf Z})^\times} \sum_{l \in {\bf Z}/d{\bf Z}: (l,d)=(l+k,d)=1} e_d( \frac{t}{l} - \frac{t'}{l+k} + \frac{m}{t} - \frac{m'}{t'} ).

    In principle, any such sum can be controlled by invoking Deligne’s proof of the Weil conjectures in arbitrary dimension (which, roughly speaking, establishes the analogue of the Riemann hypothesis for arbitrary varieties over finite fields), although in the higher dimensional setting some algebraic geometry is needed to ensure that one gets the full “square root cancellation” for these exponential sums. (For the particular sum above, the necessary details were worked out by Birch and Bombieri.) As such, this part of the argument is by far the least elementary component of the whole. Zhang’s original argument cleverly exploited some additional cancellation in the above exponential sums that goes beyond the naive square root cancellation heuristic; more recently, an alternate argument of Fouvry, Kowalski, Michel, and Nelson uses bounds on a slightly different higher-dimensional Kloosterman-type sum to obtain results that give better values of {\varpi,\delta,\sigma}. We have also been able to improve upon these estimates by exploiting some additional averaging that was left unused by the previous arguments.

As of this time of writing, our understanding of the first three stages of Zhang’s argument (getting from {DHL[k_0,2]} to {B[H]}, getting from {MPZ[\varpi,\delta]} or {MPZ'[\varpi,\delta]} to {DHL[k_0,2]}, and getting to {MPZ[\varpi,\delta]} or {MPZ'[\varpi,\delta]} from Type I, Type II, and Type III estimates) are quite satisfactory, with the implications here being about as efficient as one could hope for with current methods, although one could still hope to get some small improvements in parameters by wringing out some of the last few inefficiencies. The remaining major sources of improvements to the parameters are then coming from gains in the Type I, II, and III estimates; we are currently in the process of making such improvements, but it will still take some time before they are fully optimised.

Below the fold I will discuss (mostly at an informal, non-rigorous level) the six steps above in a little more detail (full details can of course be found in the other polymath8 posts on this blog). This post will also serve as a new research thread, as the previous threads were getting quite lengthy.

— 1. Finding narrow prime tuples —

For any {k_0 \geq 2}, we define an admissible {k_0}-tuple to be a tuple {{\mathcal H} = (h_1,\ldots,h_{k_0})} of {k_0} distinct integers {h_1 < \ldots < h_{k_0}}, such that {{\mathcal H}} avoids at least one residue class modulo {p} for each prime {p}. Thus for instance {(0,2)} is an admissible {2}-tuple, and {(0,2,6)} is an admissible {3}-tuple, but {(0,2,4)} is not admissible (it does not avoid any residue classes modulo {3}).

The prime tuples conjecture asserts that whenever {{\mathcal H} = (h_1,\ldots,h_{k_0})} is an admissible {k_0}-tuple, then there are infinitely many translates {n+{\mathcal H} = (n+h_1,\ldots,n+h_{k_0})} that consist entirely of primes. This conjecture is still out of reach of current methods, but we can hope to establish results of the following form for a given {k_0 \geq 2}:

Conjecture 2 ({DHL[k_0,2]}) If {{\mathcal H}} is an admissible {k_0}-tuple, then there are infinitely many translates {n + {\mathcal H}} of {{\mathcal H}} that contain at least two primes.

Let {H(k_0)} be the minimal diameter {h_{k_0}-h_1} of an admissible {k_0}-tuple: thus for instance {H(2) = 2}, {H(3) = 6}, and {H(4)=8}. It is clear that if {DHL[k_0,2]} holds for some {k_0}, then {B[H]} is true for {H = H(k_0)}, since we have an admissible {k_0}-tuple {{\mathcal H}} of diameter {H(k_0)}, and any translate of that tuple that contains two primes will generate a pair of consecutive primes of distance at most {H} apart. So we can use statements of the form {DHL[k_0,2]} to deduce estimates of the form {B[H]}, provided that we can find admissible {k_0}-tuples that are as narrow as possible, in order to obtain as good a value of {H} that one can hope to get.

Perhaps the easiest way to obtain an admissible {k_0}-tuple of reasonably narrow width is to take “the first {k_0} primes past {k_0}“, i.e. the tuple {(p_{m+1},\ldots,p_{m+k_0})}, where {p_{m+1}} is the first prime larger than {k_0}. It is not difficult to show that this is an admissible tuple. Zhang used this construction with {k_0 = 3,500,000} to obtain {H[3,500,000] \leq 70,000,000}; as observed later by Trudgian, this construction actually has diameter {59,874,594}, giving the bound {H[3,500,000] \leq 59,874,594}.

For large {k_0}, this construction gives an asymptotic of the form

\displaystyle  H(k_0) \leq k_0 \log k_0 + k_0 \log\log k_0 + O(k_0)

as can be seen by a careful application of the prime number theorem.

One can do better than this in a number of ways. Firstly, it turns out that one can reduce {m} a bit and still have {(p_{m+1},\ldots,p_{m+k_0})} admissible; for {k_0=3,500,000}, it turns out after some computation that this trick lowers the diameter a little bit to {59,093,364}; asymptotically this gives

\displaystyle  H(k_0) \leq k_0 \log k_0 + k_0 \log\log k_0 - k_0 + o(k_0)

(because {m} can be taken to be {o(k_0/\log k_0)}). A larger improvement comes from the work of Hensley and Richards, namely to shift the tuple to cover positive and negative primes around the origin, in order to exploit the slightly increased density of the primes in this interval. Specifically, if one uses a tuple of the form

\displaystyle  (-p_{m+\lfloor k_0-2/2\rfloor}, \ldots, -p_{m+1},-1,+1,p_{m+1},\ldots,p_{m+\lfloor k_0-1/2 \rfloor})

and optimises in {m}, one can bound {H[3,500,000] \leq 57,554,068}. Asymptotically, this gives the slight improvement

\displaystyle  H(k_0) \leq k_0 \log k_0 + k_0 \log\log k_0 - (1 + \log 2) k_0 + o(k_0).

One can get a little more improvement by deleting a few primes from one end of this tuple and adding them to another, although the gains here are modest. Incidentally, Hensley and Richards used this improved asymptotic to demonstrate that the prime tuples conjecture was inconsistent with the second Hardy-Littlewood conjecture, which is now widely believed to be false.

One can view the Hensley-Richard construction as a “sieve” similar to the sieve of Eratosthenes, formed by starting with the integers in an interval {[-p_{m+\lfloor k_0-2/2\rfloor}, \ldots,p_{m+\lfloor k_0-1/2 \rfloor}]} and striking out the residue class {0 \ (p) = 0 \hbox{ mod } (p)} for all primes {p} up to {p_m}. (In principle, this sieve could leave a few extra survivors, namely powers of primes larger than {p_m}, together with their negations, though in practice such powers fall outside the interval of interest.) This sieve already does most of the “work” of establishing admissibility as it handles all the small moduli, leaving only the larger moduli to be checked computationally. It was observed by Schinzel that one could do very slightly better than this by sieving out the odd integers {1\ (2)} rather than the even integers {0\ (2)}, leaving behind numbers of the form {\pm 2^j p} for various primes {p}; this is conjecturally believed to give the bound

\displaystyle  H(k_0) \leq k_0 \log k_0 + k_0 \log\log k_0 - (1 + 2\log 2) k_0 + o(k_0).

By similarly sieving out {1\ (p)} for other small primes {p}, it is in fact conjectured that

\displaystyle  H(k_0) \leq k_0 \log k_0 + k_0 \log\log k_0 - (1+o(1)) k_0 \log\log\log k_0

(see this paper of Gordon and Rodemich for a heuristic justification) although for the values of {k_0} that we have worked with, these asymptotically superior constructions are actually numerically inferior to what one can get just from the first version of the Schinzel sieve. As before, some slight further gains can be obtained by shifting the interval that one is sieving over.

More recently, we have worked with tuples that initially started from partially performing the sieving from a Schinzel construction, but at some point switching to a “greedy” algorithm in which one deletes, for each prime, the residue class that removes the fewest elements from the tuple. The next technical innovation was a “merging” approach in which one took two reasonably good tuples already constructed, merged them together by taking their union, and then greedily pruning away residue classes to obtain a merged tuple that is hopefully superior to either of its two “parents”. This procedure can be iterated, and also combined with local optimisations in which the sieving interval is shifted or a small number of elements are otherwise added and deleted. This has produced many of the best constructions so far, typically about 5% narrower than what previous constructions such as the Schinzel sieve construction gave. (For small values of {k_0}, say {k_0 \leq 4000}, the older work of Engelsma on finding narrow tuples, which used slightly different methods, is often still the record-holder.

Very recently, a web page has been set up to record the best known tuples of cardinality up to {5,000} (including the older data of Engelsma), and to accept submissions for new “record holders”. This will help with further improvements to {B[H]}, but the database of narrow tuples is likely to have other uses in effective prime number theory as well.

Numerical regression suggests that {H(k_0)} is roughly {k_0 \log k_0 + k_0} for small or medium values of {k_0}. There is no theoretical justification for this bound; the best lower bounds we have on {H(k_0)} asymptotically come from the Brun-Titchmarsh theorem and its refinements (and in particular various versions of the large sieve), and take the form

\displaystyle  H(k_0) \geq (\frac{1}{2} + o(1)) k_0 \log k_0.

It looks very difficult to improve upon this loss of {1/2} (this problem may possibly be related to the parity problem in sieve theory, although the precise connection is unclear).

Further information on this part of the Polymath8 project may be found at this page.

— 2. Sieving and Dickson-Hardy-Littlewood type results —

Now we informally describe how to obtain results of the form {DHL[k_0,2]}. Zhang’s approach (and thus ours) follows the basic strategy given by the breakthrough paper of Goldston, Pintz, and Yildirim establishing small (but not bounded) gaps between primes.

Let {{\mathcal H} = (h_1,\ldots,h_{k_0})} be an admissible {k_0}-tuple. The basic difficulty in trying to locate translates {n + {\mathcal H} = (n+h_1,\ldots,n+h_{k_0})} that contain two or more primes is that the primes get sparser and sparser when one goes out to infinity, so if one chooses {n} to be an integer chosen at random from (say) the interval {[x,2x]}, one would expect most such tuples to in fact contain no primes.

The trick of Goldston, Pintz, and Yildirim is not to pick {n} uniformly from {[x,2x]}, but instead to pick {n} in a weighted fashion that significantly increases the probability of each member {n+h_i} of the translated tuple being prime. To oversimplify, suppose we could construct a moderately large subset {A} of {[x,2x]} with the property that for each {1 \leq i \leq k_0}, we have that {n+h_i} is prime for more than {1/k_0} of the elements {n} of {A}. Then just from the pigeonhole principle, one could conclude the existence of at least one {n \in A} such that {n+h_i} is prime for at least two choices of {i}, which give {DHL[k_0,2]}.

Of course, one has to choose a set {A} for which one can prove the desired claim, that each {n+h_i} is prime for more than {1/k_0} of the elements {n} of {A}. Intuitively, the strategy of Goldston, Pintz, and Yildirim is to let {A} consist of those {n \in [x,2x]} for which the quantity {P(n) := \prod_{i=1}^{k_0} (n+h_i)} is almost prime – that it only has a relatively small number of prime factors (not much more than {k_0}).

Actually, it turns out that it is best not to try to use a set {A} for the above strategy, but rather a non-negative weight function {\nu} on {[x,2x]} which is concentrated on those {n} for which {P(n)} is almost prime. One then seeks to maximise the ratio

\displaystyle  \frac{ \sum_n \nu(n) 1_{n+h_i \hbox{ prime}} }{\sum_n \nu(n) } \ \ \ \ \ (1)

for each {i}; in particular, if one can get these ratios all above {1/k_0}, one has proven {DHL[k_0,2]}.

One then needs to construct a non-negative weight {\nu} for which the ratio (1) is (a) computable, and (b) as large as possible. For this one uses a version of the Selberg sieve, choosing {\nu} of the form

\displaystyle  \nu(n) := 1_{[x,2x]}(n) (\sum_{d|P(n)} \lambda_d)^2

for some weights {\lambda_d} to be optimised later. In order to be able to compute both the numerator and denominator of (1), one needs to limit the sieve weights {\lambda_d} to be supported on the range {d \leq R} for some sieve level {R}, which one would like to be as large as possible to give maximum flexibility to the sieve. The denominator of (1) is relatively easy to compute (as long as {R} does not exceed {x^{1/2}}, but the numerator is more difficult. Expanding out the formula for {\nu}, the numerator becomes

\displaystyle  \sum_{d_1,d_2} \lambda_{d_1} \lambda_{d_2} \sum_{n \in [x,2x]: [d_1,d_2] | P(n)} 1_{n+h_i \hbox{ prime}}.

As is standard in analytic number theory, it is slightly more convenient to replace this sum by the closely related sum

\displaystyle  \sum_{d_1,d_2} \lambda_{d_1} \lambda_{d_2} \sum_{n \in [x,2x]: [d_1,d_2] | P(n)} \Lambda(n+h_i)

where {\Lambda} is the von Mangoldt function, which serves as a useful proxy for the prime numbers.

The {n} summation is restricted to a fairly small number of residue classes {a_q\ (q)} modulo {q := [d_1,d_2]}, which is a modulus of size at most {R^2}. To control the numerator of (1), one thus needs some sort of distribution result for the primes (or the von Mangoldt function) in arithmetic progressions of modulus {q} at least as large as {R^2}. If one had to get good distribution for all moduli {q} up to this level, this would be a task comparable to the generalised Riemann hypothesis, and thus hopeless; but fortunately we only need such distribution results on average for such moduli {q}. For this, standard tools such as the Bomberi-Vinogradov theorem are available, at least for {R^2} up to about {x^{1/2}} or so. This was already strong enough for Goldston, Pintz, and Yildirim to obtain reasonably narrow prime gaps (in which {p_{n+1}-p_n = o(\log p_n)}, for instance), but to get bounded prime gaps it turns out to be necessary to obtain distribution results up to a slightly higher exponent {x^{1/2+2\varpi}} for some {\varpi > 0}. The standard distribution conjecture in this direction is the Elliott-Halberstam conjecture

\displaystyle  \sum_{q \leq x^{1/2+2\varpi}} | \sum_{n = a_q\ (q): n \in [x,2x]} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n \in [x,2x]} \Lambda(n) | \ll x \log^{-A} x \ \ \ \ \ (2)

for any fixed {A>0}, where {a_q\ (q)} is a primitive residue class modulo {q} for each {q}, and {\phi} is the Euler totient function. This estimate is conjectured to hold for {\varpi} arbitrarily close to {1/4}, and Goldston, Pintz and Yildirim showed that the assertion (2) for even a small {\varpi>0} was sufficient to imply {DHL[k_0,2]} for some {k_0} (the condition they obtained was asymptotically of the form {k_0 \sim \varpi^{-2}}, and if one could get {\varpi} sufficiently close to {1/4}, one could in fact obtain {DHL[6,2]}). Unfortunately, the Elliott-Halberstam conjecture (2) remains open for any choice of {\varpi>0}.

Zhang made a number of observations to get around this obstacle. First, he observed that the residue classes {a_q\ (q)} needed for the application to {DHL[k_0,2]} were not completely arbitrary, but were linked to each other, basically obeying a Chinese remainder theorem

\displaystyle  a_{qr}\ (qr) = (a_q\ (q)) \cap (a_r\ (r)) \ \ \ \ \ (3)

for any coprime {q,r}. (These congruences {a_q\ (q)} basically come from the roots of the polynomial {P} modulo {q}.) Secondly, he observed that by suitably truncating the Selberg sieve, one did not need to establish (2) for all moduli {q}, but only those moduli which were squarefree and smooth, in the sense that all prime divisors of {q} were at most {x^\delta} for some small {\delta>0}. (This observation had also been made previously by Motohashi and Pintz.) One had the freedom to select {\delta} as one wished, although if one set {\delta} to be too small, the value of {k_0} one obtained would increase.

One can then formulate a weakened version of (2) which we have called {MPZ[\varpi,\delta]} (the MPZ standing for Motohashi, Pintz, and Zhang), in which the {a_q\ (q)} are constrained to obey the Chinese remainder theorem and the {q} are constrained to be square-free and smooth. Zhang was able to establish {MPZ[\varpi,\delta]} for {\varpi = \delta = 1/1,168}, which he could then use to establish {DHL[k_0,2]} for {k_0 = 3,500,000} (note that {k_0} is roughly of the order of {\varpi^{-2}} as mentioned previously).

We have improved Zhang’s argument in a number of ways. Firstly, we have made the sieving argument that deduces {DHL[k_0,2]} from results of the form (2) more efficient by optimising a certain cutoff function involved in the construction of the Selberg sieve; the optimal cutoff turns out to be given in terms of a certain Bessel function, and improves the relationship {k_0 \sim \varpi^{-2}} to {k_0 \sim \varpi^{-3/2}}; see this post for details. Also, a modification of the truncated Selberg sieve due to Pintz allows one to take {\delta} to be extremely small with almost no impact on {k_0} (but at a certain technical cost, namely that the property of being smooth has to be weakened to another property which we have called “dense divisibility”, replacing the conjecture {MPZ[\varpi,\delta]} by a variant that we have called {MPZ'[\varpi,\delta]}); see this post for details. Finally, by optimising the remaining steps in Zhang’s argument (see below), we have managed to increase {\varpi} significantly, with the currently largest value of {\varpi} that we can use being close to {1/148} (and provisionally to {1/108}). The combination of these improvements has allowed us to take {k_0} as low as {1,466} (and provisionally to {962}).

At this stage we believe our arguments converting {MPZ[\varpi,\delta]} (or variants thereof) to {DHL[k_0,2]} is about as efficient as can be hoped for given the methods we have, and that the most significant further improvements will be coming from raising {\varpi} beyond the currently available range.

— 3. The Heath-Brown identity and combinatorics —

Now we discuss how to establish estimates such as (2) in the case of smooth moduli, and in the case of congruences obeying a Chinese remainder theorem. Prior to Zhang’s paper, there were a number of results of this type established in a series of papers by (various combinations of) Bombieri, Fouvry, Friedlander, and Iwaniec, as well as some followup work by later authors, although the prior results were not directly applicable for the current problem (either because the function {\Lambda} was replaced by a simpler arithmetic function, or the absolute values in (2) were replaced instead with a “well-factorable” weight, or the congruence classes {a_q\ (q)} involved were more specialised, e.g. of the form {a\ (q)} where {a} was bounded and independent of {q}). However, Zhang was able to adapt and improve the analysis in the previous literature in a number of ways, to the point where he could in fact obtain an estimate {MPZ[\varpi,\delta]} of the required strength.

We won’t describe the entire proof here, but give some of the important ideas in these arguments. The first observation is that one can decompose the von Mangoldt function {\Lambda} as a combination of simpler functions, such as the {k^{th}} order divisor functions

\displaystyle  \tau_k(n) := \sum_{n_1,\ldots,n_k \geq 1: n_1 \ldots n_k = n} 1,

the modified {k^{th}} order divisor functions

\displaystyle  \tau'_k(n) := \sum_{n_1,\ldots,n_k > 1: n_1 \ldots n_k = n} 1,

or more generally certain combinations

\displaystyle  \psi_1 \ast \ldots \ast \psi_k

of “nice” arithmetic functions {\psi_1,\ldots,\psi_k} (we will be vague about what “nice” means here), where {f \ast g(n) := \sum_{d|n} f(d) g(\frac{n}{d})} is the usual Dirichlet convolution. The point is that functions with this convolution structure are easier to sum and estimate. For instance, the basic estimate

\displaystyle  \sum_{n \leq x} \Lambda(n) = x + o(x)

is of course the prime number theorem, which is already non-trivial, and improving the error term {o(x)} here to {O(x^{1-\epsilon})} for some fixed {\epsilon>0} is a major open problem (and would be a huge breakthrough towards the Riemann hypothesis). In contrast, the divisor function

\displaystyle  \tau(n) = \tau_2(n) = \sum_{n_1,n_2 \geq 1: n_1 n_2 = n} 1

is much easier to sum, and one can easily obtain the estimate

\displaystyle  \sum_{n \leq x} \tau(n) = x \log x + (2\gamma-1) x + O( x^{1/2} )

by the elementary Dirichlet hyperbola method, which ultimately relies on the Dirichlet convolution structure {\tau = 1 \ast 1} of {\tau}. Similarly for the higher order divisor functions {\tau_k} and their variants {\tau'_k}, although the error terms become harder to estimate well when {k} increases.

Why would we expect {\Lambda} to be expressible in terms of functions such as {\tau_k} or {\tau'_k}? To illustrate this sort of decomposition, let us make the major cheat that all numbers {n} are square-free and have at most (say) four prime factors. (This is clearly quite a cheat; but note that a positive fraction of integers are square-free, and if we restrict attention to large prime factors, say factors larger than {n^{1/5}}, then we do indeed have at most four such prime factors, so the cheat actually is not so terrible in some ways.) Then the quantity {\Lambda(n)} is equal to {\log n} when {n} has one prime factor and zero when {n} has two, three, or four prime factors. Meanwhile, {\tau'_2(n)} is equal to {0, 2, 6, 14} when {n} has one, two, three, or four prime factors respectively. Similarly {\tau'_3(n)} is equal to {0, 0, 6, 36} and {\tau'_4(n)} is equal to {0,0,0,24}. Thus we see in all cases that

\displaystyle  \Lambda(n) = \log(n) ( 1 - \frac{1}{2} \tau'_2(n) + \frac{1}{3} \tau'_3(n) - \frac{1}{4} \tau'_4(n) ).

Note that while a little computation is required to verify this identity, it is immediate from linear algebra that some identity of this type must exist, at least for square-free numbers with a bounded number of factors.

More generally, we have Linnik’s identity

\displaystyle  \Lambda(n) = \log n \sum_{k=1}^\infty \frac{(-1)^{k-1}}{k} \tau'_k(n), \ \ \ \ \ (4)

which is valid for all natural numbers {n}; this identity can be proven by starting with the formal identity

\displaystyle  \log \zeta(s) = \sum_{k=1}^\infty \frac{(-1)^{k-1}}{k} (\zeta(s) - 1)^k

for the Riemann zeta function {\zeta(s) := \sum_{n=1}^\infty \frac{1}{n^s}}, differentiating this identity, then comparing Dirichlet series coefficients. (It is also a fun exercise to find a purely combinatorial proof of this identity.)

Morally speaking, Linnik’s identity reduces the task of proving (2) (or variants thereof) to the task of proving

\displaystyle  \sum_{q \leq x^{1/2+2\varpi}} | \sum_{n = a_q\ (q): n \in [x,2x]} \tau'_k(n) - \frac{1}{\phi(q)} \sum_{n \in [x,2x]} \tau'_k(n) | \ll x \log^{-A} x \ \ \ \ \ (5)

for {k=1,2,3,\ldots}. However, this implication is not quite rigorous, because the {k} parameter on the right-hand side is unbounded. There is however a clever truncation of Linnik’s identity due to Heath-Brown, which only has finitely many terms, each one of which resembles a divisor function {\tau_k}, albeit one which is averaged by an additional Dirichlet convolution. The precise identity is a little complicated (although not difficult to prove); see this post for details. Using this identity, we can indeed reduce (2) to slightly more complicated versions of the identities (5). By decomposing the versions of (5) obtained in this fashion into dyadic pieces (actually, one uses a decomposition that is slightly finer than the dyadic decomposition, but never mind this technical detail), it turns out that one can organise the resulting estimates to be proven into three classes, which Zhang calls “Type I estimates”, “Type II estimates”, and “Type III estimates”. The precise formulation of such estimates depends on the two parameters {\varpi,\delta} alluded to earlier, as well as an additional parameter {1/10 \leq \sigma \leq 1/2} demarcating the boundary between the Type I and Type III estimates. (The lower bound {\sigma \geq 1/10} is needed to effect the combinatorial partition into Type I, Type II, Type III sums; it would be nice to remove this bound, but this seems to necessitate the introduction of further types of estimates (“Type IV” and “Type V”) which we have so far been unable to deal with effectively.) A precise definition of these estimates may be found at this page, but let us for now give an oversimplified caricature of what these estimates look like. A model case of a Type I estimate takes the form

\displaystyle  \sum_{q \leq x^{1/2+2\varpi}} | \sum_{n = a_q\ (q)} \alpha \ast \beta(n) - \frac{1}{\phi(q)} \sum_n \alpha\ast \beta(n) | \ll x \log^{-A} x \ \ \ \ \ (6)

where {\alpha,\beta: {\bf N} \rightarrow {\bf C}} are two sequences localised to scales {\{ m: m \sim M\}} and {\{ n: n \sim N\}} respectively, where one should think of {M} and {N} as being like {x^{1/2+\sigma}} and {x^{1/2-\sigma}} respectively; the functions {\alpha,\beta} also obey certain bounds similar to those enjoyed by the divisor functions {\tau_k} (e.g. their average value should be {O(\log^{O(1)} x)}) which we do not give in detail here. The modulus {q} should be understood to be squarefree and smooth as in the statement of {MPZ[\varpi,\delta]} (or if one is trying to prove {MPZ'[\varpi,\delta]} instead, “smooth” needs to be replaced by “densely divisible”). The Type II sums are similar, except that one should now think of {M} and {N} as both being like {x^{1/2}}. A model case of a Type III estimate takes the form

\displaystyle  \sum_{q \leq x^{\alpha}} | \sum_{n = a_q\ (q): n \in [x,2x]} \tau_3(n) - \frac{1}{\phi(q)} \sum_{n \in [x,2x]} \tau_3(n) | \ll x \log^{-A} x \ \ \ \ \ (7)

where {\alpha = \frac{1/2 + 2 \varpi}{\frac{3}{2}(\frac{1}{2}-\sigma)}}, and again {q} should be understood to be square-free and smooth. (This is an oversimplification; the true Type III estimate involves a smoothly truncated, averaged version of {\tau_3}, and it is to compensate for this oversimplification that the upper bound for {q} has changed.) The Type I estimates are harder to prove when {\sigma} gets larger, while the Type III estimates become easier. It is thus necessary to work out what estimates one can prove on both the Type I and Type III sides before one can optimise in {\sigma}. The Type II estimates are less important (and in practice, in all the regimes in which we can prove Type I and Type III estimates so far, we have also been able to prove Type II estimates), but still need to be dealt of course. Thus far we have not been able to improve upon this portion of Zhang’s arguments much; we rely on the same Heath-Brown identity and combinatorial decomposition that Zhang does, as they seem to be more or less optimally efficient for decompositions of this type, at least for the appliction at hand. The task of proving {MPZ[\varpi,\delta]} (or {MPZ'[\varpi,\delta]}) and hence {DHL[k_0,2]} and {B[H]} for various choices of parameters {\varpi,\delta,k_0,H} is thus reduced to that of verifying Type I, Type II, and Type III estimates.

— 4. Type I estimates —

Now we give a very informal sketch of Zhang’s argument for establishing Type I estimates (details may be found at these two posts). It is based on previous arguments of Bombieri, Friedlander, and Iwaniec, relying on various analytic manipulations such as the use of the Cauchy-Schwarz inequality (and in particular on the dispersion method of Linnik that is based on this inequality) and Fourier identities (such as the Pólya-Vinogradov method of completion of sums), in order to eliminate the poorly understood sequences {\alpha,\beta}, until one is only left with the task of estimating a certain incomplete Kloosterman-type sum, to which one uses standard tools from analytic number theory (such as the Weil bound for completed Kloosterman sums). We have continued to follow the same general strategy, but have found some improvements based on better bounds for incomplete Kloosterman sums.

We now give a little more detail. We will ignore the role of the technical parameter {\delta}, and assume simply that all moduli are as smooth as needed for the argument. We change notation a little bit, replacing {n} and {q} by {\tilde n} and {\tilde q}, and consider a dyadic component of (8), namely

\displaystyle  \sum_{\tilde q \sim x^{1/2+2\varpi}} | \sum_{\tilde n = a_{\tilde q}\ (\tilde q)} \alpha \ast \beta(\tilde n) - \frac{1}{\phi(\tilde q)} \sum_{\tilde n} \alpha\ast \beta(\tilde n) | \ll x \log^{-A} x \ \ \ \ \ (8)

One now takes advantage of the square-free smooth nature of {\tilde q} to factor {\tilde q = qr}, where {q,r} are coprime, and {r} is a little bit smaller than {N = x^{1/2-\sigma}}, so that {q} is a little bit larger than {x^{1/2+2\varpi}/N = x^{\sigma+2\varpi}}; the reason for this choice of factorisation is that it will turn out that we want {r} to be as large as possible while still staying a bit less than {N}. There is also a technical step in which we shift all the tiny prime factors of {qr} to {r} and not to {q} (where “tiny” means something like “less than {\exp(\log^{1/3} x)}“); this is needed to ensure that most pairs of {q}‘s that we encounter become coprime at a crucial juncture later in the argument. See this post for more details of this step. The congruence {\tilde n = a_{\tilde q}\ (\tilde q)} similarly factors into {\tilde n = a_q\ (q)} and {\tilde n = a_r\ (r)} by (3), so we now have to prove something like

\displaystyle  \sum_{r \sim R} \sum_{q \sim Q: (q,r)=1} | \sum_{\tilde n = a_{q}\ (q); \tilde n = a_r\ (r); \tilde n \in [x,2x]} \alpha \ast \beta(\tilde n)

\displaystyle  - \frac{1}{\phi(q) \phi(r)} \sum_{\tilde n \in [x,2x]} \alpha\ast \beta(\tilde n) | \ll x \log^{-A} x

where {R = x^{1/2-\sigma-\epsilon}} for some small {\epsilon>0} and {Q = x^{\sigma+2\varpi+\epsilon}}.

Roughly speaking, this estimate is asserting that we can estimate a quantity such as

\displaystyle  \sum_{r \sim R} \sum_{q \sim Q: (q,r)=1} c_{q,r} \sum_{\tilde n = a_{q}\ (q); \tilde n = a_r\ (r); \tilde n \in [x,2x]} \alpha \ast \beta(\tilde n)

to accuracy {O( x \log^{-A} x )} whenever {c_{q,r}} are arbitrary bounded coefficients (where we will be a bit vague on what it means to estimate something to a given accuracy). Splitting {\alpha \ast \beta(\tilde n)} as {\sum_{m,n: mn = \tilde n} \alpha(m) \beta(n)} and ignoring the constraint {\tilde n \in [x,2x]}, we may rearrange this expression as

\displaystyle  \sum_m \alpha(m) \sum_{r \sim R} \sum_{q \sim Q: (q,r)=1} c_{q,r} \sum_{n: mn = a_{q}\ (q); mn = a_r\ (r)} \beta(n).

We are unable to take much advantage of the {r} averaging, and will thus seek to estimate

\displaystyle  \sum_m \alpha(m) \sum_{q \sim Q: (q,r)=1} c_{q,r} \sum_{n: mn = a_{q}\ (q); mn = a_r\ (r)} \beta(n)

to accuracy {O( x R^{-1} \log^{-A} x)} for a fixed {r \sim R}.

Now we start eliminating some of the poorly controlled expressions {\alpha, c_{q,r}, \beta}. Applying a Cauchy-Schwarz, it (morally) suffices to estimate

\displaystyle  \sum_{m \sim M} | \sum_{q \sim Q: (q,r)=1} c_{q,r} \sum_{n: mn = a_{q}\ (q); mn = a_r\ (r)} \beta(n)|^2

to accuracy {O( x^2 R^{-2} M^{-1} \log^{-A} x )}, i.e. a little bit better than {O( x^{1/2 + \sigma} )}. Note that because {r} is a little bit less than {N}, we have a non-trivial number of {n \sim N} obeying the congruence condition {mn = a_r\ (r)} for any given {m}. This prevents the summation inside the absolute values from being so trivial that Cauchy-Schwarz could not gain anything.

The above expression can be rearranged as

\displaystyle  \sum_{q_1,q_2 \sim Q: (q_1q_2,r)=1} c_{q_1,r} \overline{c_{q_2,r}} \sum_{n,n'} \beta(n) \overline{\beta(n')}

\displaystyle  \sum_{m \sim M: mn=a_{q_1}\ (q_1); mn' = a_{q_2}\ (q_2); mn=mn' = a_r\ (r)} 1.

The constraint {mn=mn' = a_r\ (r)} forces {n=n'\ (r)}. Since {r} is so close to {N}, this basically forces {n'} to equal {n} (or to equal a shift {n+kr} of {n} for some small {k}). To simplify the exposition we will just look at the diagonal case {n=n'} (the off-diagonal terms are treated similarly). Our task is now to control

\displaystyle  \sum_{q_1,q_2 \sim Q: (q_1q_2,r)=1} c_{q_1,r} \overline{c_{q_2,r}} \sum_{n} |\beta(n)|^2 \sum_{m \sim M: mn = a_{q_1}\ (q_1); mn = a_{q_2}\ (q_2); mn = a_r\ (r)} 1

with accuracy better than {O( x^{1/2+\sigma})}. Using averaging notation, this is basically the same as controlling

\displaystyle  {\bf E}_{q_1,q_2 \sim Q: (q_1q_2,r)=1} c_{q_1,r} \overline{c_{q_2,r}} {\bf E}_{n \sim N} |\beta(n)|^2

\displaystyle  \sum_{m \sim M: mn = a_{q_1}\ (q_1); mn = a_{q_2}\ (q_2); mn = a_r\ (r)} 1

with accuracy better than {O( x^{-4\varpi})}.

Recall that we had the foresight to remove all the tiny primes from {q_1,q_2}. This has the effect of making {q_1,q_2} typically coprime. We will discard the non-coprime contributions (it turns out that they can be controlled with relatively easy estimates) and insert the condition {(q_1,q_2)=1} in the above summation. We can also make {n} coprime to {q_1q_2r} since the {m} summation vanishes otherwise.

The {m} summation is over an interval of length {M = x^{1/2+\sigma}} over a single congruence class {a_{q_1 q_2 r}/n\ (q_1 q_2 r)} of modulus {q_1 q_2 r \sim x^{1/2+\sigma+4\varpi}}. This sum can then be inverted via the Fourier transform to a dual sum over a range {H \sim x^{4\varpi}}; morally speaking we may replace

\displaystyle  \sum_{m \sim M: m = a_{q_1 q_2 r} / n\ (q_1 q_2 r)} 1

with something like

\displaystyle  {\bf E}_{h = O(H)} e_{q_1 q_2 r}( \frac{a_{q_1 q_2 r}h}{n} ) \ \ \ \ \ (9)

where {e_q(n) := e^{2\pi in/q}} for {x \in {\bf Z}/q{\bf Z}}, and the reciprocal {\frac{1}{n}} here is taken inside the ring {{\bf Z}/q_1q_2 r{\bf Z}}. The zero frequency {h=0} can be controlled directly; we ignore it and work with non-zero {h}. We are now trying to control

\displaystyle  {\bf E}_{n \sim N; h = O(H); q_1,q_2 \sim Q: (q_1 q_2,r) = (q_1,q_2)=1} |\beta(n)|^2 c_{q_1,r} \overline{c_{q_2,r}} e_{q_1 q_2 r}( \frac{a_{q_1 q_2 r}h}{n} ) \ \ \ \ \ (10)

to accuracy better than {O( x^{-4\varpi})}.

We can eliminate the {\beta} next by performing a Cauchy-Schwarz in {n}, but it turns out to be a bit more efficient to Cauchy-Schwarz in {n} and {q_1}, not just in {n}. This places us in the position of having to control

\displaystyle  {\bf E}_{n \sim N; h,h' = O(H); q_1,q_2,q'_2 \sim Q} \overline{c_{q_2,r}} c_{q'_2,r} e_{q_1 q_2 r}( \frac{a_{q_1 q_2 r}h}{n} ) e_{q_1 q'_2 r}( - \frac{a_{q_1 q'_2 r}h'}{n} )

to accuracy better than {O( x^{-8\varpi} )}, where we have suppressed the various coprimality conditions on the {q_1,q_2,q'_2} for brevity. So it suffices to get bounds on the exponential sum

\displaystyle  {\bf E}_{n \sim N} e_{q_1 q_2 r}( \frac{a_{q_1 q_2 r}h}{n} ) e_{q_1 q'_2 r}( - \frac{a_{q_1 q'_2 r}h'}{n} )

which beat the trivial bound of {O(1)} by about {x^{8\varpi}} on average, for “typical” values of {h,h',q_1,q_2,q'_2}. This is basically an incomplete Kloosterman sum of the shape

\displaystyle  \frac{1}{N} \sum_{n \sim N} e_q( \frac{c}{n} ) \ \ \ \ \ (11)

for some modulus {q} of typical size {q \sim q_1 q_2 q'_2 r \sim x^{1/2+2\sigma+6\varpi}} and a coefficient {c} which turns out to typically be more or less coprime to {q} except in a certain diagonal case {hq'_2 = h'q_2}, but it turns out in the Type I case that this diagonal contribution is manageable. A standard “Pólya-Vinogradov” type bound for such sums, ultimately based on Weil’s famous bound

\displaystyle  |\sum_{n \in {\bf Z}/p{\bf Z}: n \neq 0} e_p( \frac{a}{n} + bn)| \leq 2\sqrt{p}

for completed Kloosterman sums, allows one to bound (11) by {O( \frac{1}{N} \sqrt{q} )} (ignoring some logarithmic type errors and lower order terms). This gives us an acceptable estimate if

\displaystyle  \frac{1}{ x^{1/2-\sigma} } \sqrt{ x^{1/2+2\sigma+6\varpi} } \ll x^{-8\varpi}

which may be rearranged as

\displaystyle  8 \sigma + 44 \varpi < 1.

(Recall that we are treating {\delta} as negligible, otherwise we have to insert some factors of {\delta} in this constraint as well.) Crucially, this constraint allows {\sigma} to exceed {1/10}, and thus gives a good range of Type I estimates if {\varpi} is small enough. Further improvements have been obtained by using more refined estimates on incomplete Kloosterman sums (first discussed here); to date we have managed to (provisionally) replace the above constraint to

\displaystyle  5 \sigma + 54 \varpi < 1

which turns out to give superior numerical results. This reveals a tradeoff, in that more advanced exponential sum estimates can improve {\sigma} at the expense of {\varpi}; we have not yet located what the optimal balance for this tradeoff is.

— 5. Type II estimates —

The Type II estimates are treated quite similarly to the Type I estimates and will only be discussed briefly here. If one repeats the Type I estimates verbatim, most of the same arguments go through (in fact they become better because the {\sigma} terms are not present); however it turns out that the diagonal contribution {hq'_2=h'q_2} that was manageable in the Type I setting, no longer becomes manageable in the Type II setting. To get around this one has to reduce the influence of the diagonal by taking a narrower Cauchy-Schwarz when eliminating {\beta} from (10), namely that one only performs a Cauchy-Schwarz in {n} rather than {n,q_1}. The price one pays for this is that the off-diagonal terms become slightly worse (conceding a few more powers of {\varpi}); the best estimate we currently have here (again ignoring {\delta} issues) is

\displaystyle  68 \varpi < 1.

— 6. Type III estimates —

The Type III estimates (7) were originally treated by Zhang using the method of Weyl differencing. More recently, a simpler method which gives better numerology was introduced by Fouvry, Kowalski, Michel, and Nelson, and may be sketched as follows. Our goal here is to control something like

\displaystyle  \sum_{q \sim Q} c_q \sum_{n = a_q\ (q): n \in [x,2x]} \tau_3(n)

to accuracy better than {O( x )}, where {c_q} are bounded coefficients, and {Q} is a quantity larger than {x^{1/2}} (which we would like to take to be as large as possible). Expanding out the {\tau_3} function and using dyadic decomposition, this basically amounts to controlling

\displaystyle  \sum_{q \sim Q} c_q \sum_{n_1 \sim N_1; n_2 \sim N_2; n_3 \sim N_3} 1_{n_1 n_2 n_3 = a_q\ (q)}

to accuracy {O( x \log^{-A} x)} for some {N_1,N_2,N_3} that multiply to {x} (e.g. one can take {N_1=N_2=N_3=x^{1/3}} for sake of discussion, although the precise choice of {N_1,N_2,N_3} turns out to not be so important). Applying Fourier decomposition as in (9), one can write this expression roughly speaking as

\displaystyle  \sum_{q \sim Q} c_q {\bf E}_{h_1 = O(H_1), h_2 = O(H_2), h_3 = O(H_3)} \sum_{x,y,z \in {\bf Z}/q{\bf Z}: xyz=a_q} e_q( h_1 x + h_2 y + h_3 z )

where {H_1,H_2,H_3} are the dual scales {H_i := Q / N_i}. It turns out that the contributions when {h_1,h_2,h_3} vanish, or more generally share a common factor with {q}, can be dealt with, so we ignore them and assume that {h_1,h_2,h_3} are coprime with {q}. In this case, we can write the inner sum after a change of variables as

\displaystyle  \sum_{x,y,z \in {\bf Z}/q{\bf Z}: xyz=a_q} e_q( h_1 x + h_2 y + h_3 z ) = q K_3( a_q h_1 h_2 h_3; q )

where {K_3} is the hyper-Kloosterman sum

\displaystyle  K_3(a;q) := \frac{1}{q} \sum_{x,y,z \in {\bf Z}/q{\bf Z}: xyz = a} e_q(x+y+z). \ \ \ \ \ (12)

The normalisation {\frac{1}{q}} in (12) reflects the expected square root cancellation in this sum, which is a sum of about {O(q^2)} phases that one expects to behave more or less independently of each other. Indeed, using the deep results of Deligne on the Weil conjectures, one can show that

\displaystyle  |K_3(a;q)| \ll 1 \ \ \ \ \ (13)

for {a} coprime to {q} (ignoring some log factors).

Replacing {q} by {Q} (which can be done by adjusting the {c_q} a little bit), our task is now to control

\displaystyle  \sum_{q \sim Q} c_q {\bf E}_{h_1 = O(H_1), h_2 = O(H_2), h_3 = O(H_3)} K_3(a_q h_1 h_2 h_3; q)

to accuracy better than {O( x / Q )}. If we apply Deligne’s bound (13) right away, we get a bound of {O(Q)}, which only works for {Q < x^{1/2}}, which recovers the Bombieri-Vinogradov theorem for {\tau_3} but no better. To improve upon this we perform a Cauchy-Schwarz. We first perform the useful substitution {h=h_1h_2h_3}, which rewrites the above sum as something like

\displaystyle  \sum_{q \sim Q} c_q {\bf E}_{h = O(H_1 H_2 H_3)} \tilde \tau_3(h) K_3(a_q h; q)

where {\tilde \tau_3} is a truncated version of {\tau_3} (counting factorisations {h=h_1h_2h_3} with {h_1 = O(H_1), h_2 = O(H_2), h_3=O(H_3)}). Note that {H := H_1 H_2 H_3} is given by {H = Q^3 / N_1 N_2 N_3 \sim Q^3 / x}, so that {x/Q \sim Q^2 / H}. We interchange the sum, so we are now asking to bound

\displaystyle  {\bf E}_{h = O(H)} \tilde \tau_3(h) {\bf E}_{q \sim Q} c_q K_3(a_q h; q)

to accuracy better than {Q/H}.

At this point it turns out to be optimal to use the smoothness of the modulus {q} to split {q=rs} where {r,s \sim Q^{1/2}}, so we wish to control something like

\displaystyle  {\bf E}_{h = O(H); r \sim Q^{1/2}} \tilde \tau_3(h) {\bf E}_{s \sim Q^{1/2}} c_{rs} K_3(a_{rs} h; rs).

to accuracy better than {Q/H}. If we Cauchy-Schwarz in {h} and {r}, we end up having to bound

\displaystyle  {\bf E}_{h = O(H); r \sim Q^{1/2}} {\bf E}_{s,s' \sim Q^{1/2}} c_{rs} c_{rs'} K_3(a_{rs} h; rs) \overline{K_3(a_{rs'} h; rs')}

to accuracy better than {Q^2/H^2}. The diagonal contribution {s=s'} is {O( Q^{-1/2} )} thanks to (13). Now we turn to the opposite case when {s,s'} are coprime. The goal is then to establish the estimate

\displaystyle  | {\bf E}_{h= O(H)} K_3(a_{rs} h; rs) \overline{K_3(a_{rs'} h; rs')} | \ll Q^2 / H^2

for an incomplete Kloosterman correlation with net modulus {rss'}. It turns out that there is a Pólya-Vinogradov type bound of {O( (rss')^{1/2} / H ) \sim Q^{3/4} / H} for this sum (though this requires a non-trivial amount of Deligne’s Weil conjecture machinery to actually prove). So we are done if

\displaystyle  Q^{-1/2}, Q^{3/4}/H \ll Q^2 / H^2

which simplifies to {H \ll Q^{5/4}}, or equivalently {Q \ll x^{4/7}}. So this argument allows one to obtain (7) for {\alpha} up to {4/7}, which corresponds to the constraint

\displaystyle  12 \sigma > 1 + 28 \varpi

which, happily, overlaps enough with the Type I bounds and the combinatorial constraint {\sigma > 1/10} to give {MPZ[\varpi,\delta]} (or {MPZ'[\varpi,\delta]}) for relatively good values of {\varpi} (and {\delta}, which we have ignored in this discussion). We have managed to (provisionally) improve upon this constraint to

\displaystyle  18 \sigma > 1 + 56 \varpi

by exploiting an additional averaging that is present in the true Type III estimate (and which was suppressed in (7)). Combining this with the best Type I estimate we currently have (and with the combinatorial constraint {\sigma > 1/10}) gives us the best value of {\varpi} currently available, namely {1/108} (note that this is well below the Type II constraint, which may thus be ignored). Actually, the Type III estimates are now so strong that they are no longer critical for further improvements in the numerology; it is either the Type I estimate or the combinatorial constraint {\sigma>1/10} that needs improving instead. (Alternatively, if one is willing to use a worse value of {\varpi}, namely {\varpi < 1/324}, one can use the Type I estimates to raise {\sigma} up to {1/6}, which closes off the Type III case completely and allows for a slightly more elementary proof of Zhang’s theorem in that the full strength of Deligne’s proof of the Weil conjectures is no longer needed (although Weil’s classical bound on Kloosterman sums is still required).)