Suppose one is given a {k_0}-tuple {{\mathcal H} = (h_1,\ldots,h_{k_0})} of {k_0} distinct integers for some {k_0 \geq 1}, arranged in increasing order. When is it possible to find infinitely many translates {n + {\mathcal H} =(n+h_1,\ldots,n+h_{k_0})} of {{\mathcal H}} which consists entirely of primes? The case {k_0=1} is just Euclid’s theorem on the infinitude of primes, but the case {k_0=2} is already open in general, with the {{\mathcal H} = (0,2)} case being the notorious twin prime conjecture.

On the other hand, there are some tuples {{\mathcal H}} for which one can easily answer the above question in the negative. For instance, the only translate of {(0,1)} that consists entirely of primes is {(2,3)}, basically because each translate of {(0,1)} must contain an even number, and the only even prime is {2}. More generally, if there is a prime {p} such that {{\mathcal H}} meets each of the {p} residue classes {0 \hbox{ mod } p, 1 \hbox{ mod } p, \ldots, p-1 \hbox{ mod } p}, then every translate of {{\mathcal H}} contains at least one multiple of {p}; since {p} is the only multiple of {p} that is prime, this shows that there are only finitely many translates of {{\mathcal H}} that consist entirely of primes.

To avoid this obstruction, let us call a {k_0}-tuple {{\mathcal H}} admissible if it avoids at least one residue class {\hbox{ mod } p} for each prime {p}. It is easy to check for admissibility in practice, since a {k_0}-tuple is automatically admissible in every prime {p} larger than {k_0}, so one only needs to check a finite number of primes in order to decide on the admissibility of a given tuple. For instance, {(0,2)} or {(0,2,6)} are admissible, but {(0,2,4)} is not (because it covers all the residue classes modulo {3}). We then have the famous Hardy-Littlewood prime tuples conjecture:

Conjecture 1 (Prime tuples conjecture, qualitative form) If {{\mathcal H}} is an admissible {k_0}-tuple, then there exists infinitely many translates of {{\mathcal H}} that consist entirely of primes.

This conjecture is extremely difficult (containing the twin prime conjecture, for instance, as a special case), and in fact there is no explicitly known example of an admissible {k_0}-tuple with {k_0 \geq 2} for which we can verify this conjecture (although, thanks to the recent work of Zhang, we know that {(0,d)} satisfies the conclusion of the prime tuples conjecture for some {0 < d < 70,000,000}, even if we can’t yet say what the precise value of {d} is).

Actually, Hardy and Littlewood conjectured a more precise version of Conjecture 1. Given an admissible {k_0}-tuple {{\mathcal H} = (h_1,\ldots,h_{k_0})}, and for each prime {p}, let {\nu_p = \nu_p({\mathcal H}) := |{\mathcal H} \hbox{ mod } p|} denote the number of residue classes modulo {p} that {{\mathcal H}} meets; thus we have {1 \leq \nu_p \leq p-1} for all {p} by admissibility, and also {\nu_p = k_0} for all {p>h_{k_0}-h_1}. We then define the singular series {{\mathfrak G} = {\mathfrak G}({\mathcal H})} associated to {{\mathcal H}} by the formula

\displaystyle {\mathfrak G} := \prod_{p \in {\mathcal P}} \frac{1-\frac{\nu_p}{p}}{(1-\frac{1}{p})^{k_0}}

where {{\mathcal P} = \{2,3,5,\ldots\}} is the set of primes; by the previous discussion we see that the infinite product in {{\mathfrak G}} converges to a finite non-zero number.

We will also need some asymptotic notation (in the spirit of “cheap nonstandard analysis“). We will need a parameter {x} that one should think of going to infinity. Some mathematical objects (such as {{\mathcal H}} and {k_0}) will be independent of {x} and referred to as fixed; but unless otherwise specified we allow all mathematical objects under consideration to depend on {x}. If {X} and {Y} are two such quantities, we say that {X = O(Y)} if one has {|X| \leq CY} for some fixed {C}, and {X = o(Y)} if one has {|X| \leq c(x) Y} for some function {c(x)} of {x} (and of any fixed parameters present) that goes to zero as {x \rightarrow \infty} (for each choice of fixed parameters).

Conjecture 2 (Prime tuples conjecture, quantitative form) Let {k_0 \geq 1} be a fixed natural number, and let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then the number of natural numbers {n < x} such that {n+{\mathcal H}} consists entirely of primes is {({\mathfrak G} + o(1)) \frac{x}{\log^{k_0} x}}.

Thus, for instance, if Conjecture 2 holds, then the number of twin primes less than {x} should equal {(2 \Pi_2 + o(1)) \frac{x}{\log^2 x}}, where {\Pi_2} is the twin prime constant

\displaystyle \Pi_2 := \prod_{p \in {\mathcal P}: p>2} (1 - \frac{1}{(p-1)^2}) = 0.6601618\ldots.

As this conjecture is stronger than Conjecture 1, it is of course open. However there are a number of partial results on this conjecture. For instance, this conjecture is known to be true if one introduces some additional averaging in {{\mathcal H}}; see for instance this previous post. From the methods of sieve theory, one can obtain an upper bound of {(C_{k_0} {\mathfrak G} + o(1)) \frac{x}{\log^{k_0} x}} for the number of {n < x} with {n + {\mathcal H}} all prime, where {C_{k_0}} depends only on {k_0}. Sieve theory can also give analogues of Conjecture 2 if the primes are replaced by a suitable notion of almost prime (or more precisely, by a weight function concentrated on almost primes).

Another type of partial result towards Conjectures 1, 2 come from the results of Goldston-Pintz-Yildirim, Motohashi-Pintz, and of Zhang. Following the notation of this recent paper of Pintz, for each {k_0>2}, let {DHL[k_0,2]} denote the following assertion (DHL stands for “Dickson-Hardy-Littlewood”):

Conjecture 3 ({DHL[k_0,2]}) Let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Then there are infinitely many translates {n+{\mathcal H}} of {{\mathcal H}} which contain at least two primes.

This conjecture gets harder as {k_0} gets smaller. Note for instance that {DHL[2,2]} would imply all the {k_0=2} cases of Conjecture 1, including the twin prime conjecture. More generally, if one knew {DHL[k_0,2]} for some {k_0}, then one would immediately conclude that there are an infinite number of pairs of consecutive primes of separation at most {H(k_0)}, where {H(k_0)} is the minimal diameter {h_{k_0}-h_1} amongst all admissible {k_0}-tuples {{\mathcal H}}. Values of {H(k_0)} for small {k_0} can be found at this link (with {H(k_0)} denoted {w} in that page). For large {k_0}, the best upper bounds on {H(k_0)} have been found by using admissible {k_0}-tuples {{\mathcal H}} of the form

\displaystyle {\mathcal H} = ( - p_{m+\lfloor k_0/2\rfloor - 1}, \ldots, - p_{m+1}, -1, +1, p_{m+1}, \ldots, p_{m+\lfloor (k_0+1)/2\rfloor - 1} )

where {p_n} denotes the {n^{th}} prime and {m} is a parameter to be optimised over (in practice it is an order of magnitude or two smaller than {k_0}); see this blog post for details. The upshot is that one can bound {H(k_0)} for large {k_0} by a quantity slightly smaller than {k_0 \log k_0} (and the large sieve inequality shows that this is sharp up to a factor of two, see e.g. this previous post for more discussion).

In a key breakthrough, Goldston, Pintz, and Yildirim were able to establish the following conditional result a few years ago:

Theorem 4 (Goldston-Pintz-Yildirim) Suppose that the Elliott-Halberstam conjecture {EH[\theta]} is true for some {1/2 < \theta < 1}. Then {DHL[k_0,2]} is true for some finite {k_0}. In particular, this establishes an infinite number of pairs of consecutive primes of separation {O(1)}.

The dependence of constants between {k_0} and {\theta} given by the Goldston-Pintz-Yildirim argument is basically of the form {k_0 \sim (\theta-1/2)^{-2}}. (UPDATE: as recently observed by Farkas, Pintz, and Revesz, this relationship can be improved to {k_0 \sim (\theta-1/2)^{-3/2}}.)

Unfortunately, the Elliott-Halberstam conjecture (which we will state properly below) is only known for {\theta<1/2}, an important result known as the Bombieri-Vinogradov theorem. If one uses the Bombieri-Vinogradov theorem instead of the Elliott-Halberstam conjecture, Goldston, Pintz, and Yildirim were still able to show the highly non-trivial result that there were infinitely many pairs {p_{n+1},p_n} of consecutive primes with {(p_{n+1}-p_n) / \log p_n \rightarrow 0} (actually they showed more than this; see e.g. this survey of Soundararajan for details).

Actually, the full strength of the Elliott-Halberstam conjecture is not needed for these results. There is a technical specialisation of the Elliott-Halberstam conjecture which does not presently have a commonly accepted name; I will call it the Motohashi-Pintz-Zhang conjecture {MPZ[\varpi]} in this post, where {0 < \varpi < 1/4} is a parameter. We will define this conjecture more precisely later, but let us remark for now that {MPZ[\varpi]} is a consequence of {EH[\frac{1}{2}+2\varpi]}.

We then have the following two theorems. Firstly, we have the following strengthening of Theorem 4:

Theorem 5 (Motohashi-Pintz-Zhang) Suppose that {MPZ[\varpi]} is true for some {0 < \varpi < 1/4}. Then {DHL[k_0,2]} is true for some {k_0}.

A version of this result (with a slightly different formulation of {MPZ[\varpi]}) appears in this paper of Motohashi and Pintz, and in the paper of Zhang, Theorem 5 is proven for the concrete values {\varpi = 1/1168} and {k_0 = 3,500,000}. We will supply a self-contained proof of Theorem 5 below the fold, the constants upon those in Zhang’s paper (in particular, for {\varpi = 1/1168}, we can take {k_0} as low as {341,640}, with further improvements on the way). As with Theorem 4, we have an inverse quadratic relationship {k_0 \sim \varpi^{-2}}.

In his paper, Zhang obtained for the first time an unconditional advance on {MPZ[\varpi]}:

Theorem 6 (Zhang) {MPZ[\varpi]} is true for all {0 < \varpi \leq 1/1168}.

This is a deep result, building upon the work of Fouvry-Iwaniec, Friedlander-Iwaniec and BombieriFriedlanderIwaniec which established results of a similar nature to {MPZ[\varpi]} but simpler in some key respects. We will not discuss this result further here, except to say that they rely on the (higher-dimensional case of the) Weil conjectures, which were famously proven by Deligne using methods from l-adic cohomology. Also, it was believed among at least some experts that the methods of Bombieri, Fouvry, Friedlander, and Iwaniec were not quite strong enough to obtain results of the form {MPZ[\varpi]}, making Theorem 6 a particularly impressive achievement.

Combining Theorem 6 with Theorem 5 we obtain {DHL[k_0,2]} for some finite {k_0}; Zhang obtains this for {k_0 = 3,500,000} but as detailed below, this can be lowered to {k_0 = 341,640}. This in turn gives infinitely many pairs of consecutive primes of separation at most {H(k_0)}. Zhang gives a simple argument that bounds {H(3,500,000)} by {70,000,000}, giving his famous result that there are infinitely many pairs of primes of separation at most {70,000,000}; by being a bit more careful (as discussed in this post) one can lower the upper bound on {H(3,500,000)} to {57,554,086}, and if one instead uses the newer value {k_0 = 341,640} for {k_0} one can instead use the bound {H(341,640) \leq 4,982,086}. (Many thanks to Scott Morrison for these numerics.) UPDATE: These values are now obsolete; see this web page for the latest bounds.

In this post we would like to give a self-contained proof of both Theorem 4 and Theorem 5, which are both sieve-theoretic results that are mainly elementary in nature. (But, as stated earlier, we will not discuss the deepest new result in Zhang’s paper, namely Theorem 6.) Our presentation will deviate a little bit from the traditional sieve-theoretic approach in a few places. Firstly, there is a portion of the argument that is traditionally handled using contour integration and properties of the Riemann zeta function; we will present a “cheaper” approach (which Ben Green and I used in our papers, e.g. in this one) using Fourier analysis, with the only property used about the zeta function {\zeta(s)} being the elementary fact that blows up like {\frac{1}{s-1}} as one approaches {1} from the right. To deal with the contribution of small primes (which is the source of the singular series {{\mathfrak G}}), it will be convenient to use the “{W}-trick” (introduced in this paper of mine with Ben), passing to a single residue class mod {W} (where {W} is the product of all the small primes) to end up in a situation in which all small primes have been “turned off” which leads to better pseudorandomness properties (for instance, once one eliminates all multiples of small primes, almost all pairs of remaining numbers will be coprime).

— 1. The {W}-trick —

In this section we introduce the “{W}-trick”, which is a simple but useful device that automatically takes care of local factors arising from small primes, such as the singular series {{\mathfrak G}}. The price one pays for this trick is that the explicit decay rates in various {o(1)} terms can be rather poor, but for the applications here, we will not need to know any information on these decay rates and so the {W}-trick may be freely applied.

Let {w} be a natural number, which should be thought of as either fixed and large, or as a very slowly growing function of {x}. Actually, the two viewpoints are basically equivalent for the purposes of asymptotic analysis (at least at the qualitative level of {o(1)} decay rates), thanks to the following basic principle:

Lemma 7 (Overspill principle) Let {F(w,x)} be a quantity depending on {w} and {x}. Then the following are equivalent:

  • (i) For every fixed {\epsilon>0} there exists a fixed {w_\epsilon > 0} such that

    \displaystyle |F(w,x)| \leq \epsilon + o(1)

    for all fixed {w \geq w_\epsilon}.

  • (ii) We have

    \displaystyle F(w,x) = o(1)

    whenever {w = w(x)} is a function of {x} going to infinity that is sufficiently slowly growing. (In other words, there exists a function {w_0: {\bf R}^+ \rightarrow {\bf N}} going to infinity with the property that {F(w,x)=o(1)} whenever {w = w(x)} is a natural number-valued function of {x} is such that {w(x) \rightarrow \infty} as {x \rightarrow \infty} and {w(x) \leq w_0(x)} for all sufficiently large {x}.)

This principle is closely related to the overspill principle from nonstandard analysis, though we will not explicitly adopt a nonstandard perspective here. It is also similar in spirit to the diagonalisation trick used to prove the Arzela-Ascoli theorem.

Proof: We first show that (i) implies (ii). By (i), we see that for every natural number {n}, we can find a real number {x_n} with the property that

\displaystyle |F(w,x)| \leq \frac{2}{m}

whenever {1 \leq m \leq n}, {1 \leq w \leq n}, and {x \geq x_n} are such that {w \geq w_{1/m}}. By increasing the {x_n} as necessary we may assume that they are increasing and go to infinity as {n \rightarrow \infty}. If we then define {w_0(x)} to equal the largest natural number {n} for which {x \geq x_n}, or equal to {1} if no such number exists, then one easily verifies that {F(w,x)=o(1)} whenever {w= w(x)} goes to infinity and is bounded by {w_0} for sufficiently large {x}.

Now we show that (ii) implies (i). Suppose for contradiction that (i) failed, then we can find a fixed {\epsilon>0} with the property that for any natural number {n}, there exist {w_n \geq n} such that {|F(w_n,x_n)| \geq \epsilon} for arbitrarily large {x_n}. We can select the {w_n} to be increasing to infinity, and then we can find a sequence {x_n} increasing to infinity such that {|F(w_n,x_n)| \geq \epsilon} for all {n}; by increasing {x_n} as necessary, we can also ensure that {w_0(x) \geq w_n} for all {x \geq x_n} and {n}. If we then define {w(x)} to be {w_n} when {x_n \leq x < x_{n+1}}, and {w(x)=1} for {x < x_1}, we see that {|F(w,x)| \geq \epsilon} whenever {x=x_n}, contradicting (ii). \Box

Henceforth we will usually think of {w} as a sufficiently slowly growing function of {x}, although we will on occasion take advantage of Lemma 7 to switch to thinking of {w} as a large fixed quantity instead. In either case, we should think of {w} as exceeding the size of fixed quantities such as {k} or {h_k-h_1}, at least in the limit where {x} is large; in particular, for a fixed {k_0}-tuple {{\mathcal H}}, we will have

\displaystyle \nu_p = k_0 \hbox{ for all } p > w \ \ \ \ \ (1)


if {x} is large enough. A particular consequence of the growing nature of {w} is that

\displaystyle \sum_{p > w} \frac{1}{p^2} = o(1) \ \ \ \ \ (2)


as this follows from the absolutely convergent nature of the sum {\sum_{n=1}^\infty \frac{1}{n^2}} and hence also {\sum_p \frac{1}{p^2}}. As a consequence of this, once we “turn off” all the primes less than {w}, any errors in our sieve-theoretic analysis which are quadratic or higher in {1/p} can be essentially ignored, which will be very convenient for us. In a similar vein, for any fixed {k_0}-tuple {{\mathcal H}}, one has

\displaystyle \prod_{p>w} \frac{1-\frac{\nu_p}{p}}{(1-\frac{1}{p})^{k_0}} = 1+o(1) \ \ \ \ \ (3)


which allows one to truncate the singular series:

\displaystyle {\mathfrak G} = \prod_{p \leq w} \frac{1-\frac{\nu_p}{p}}{(1-\frac{1}{p})^{k_0}} + o(1). \ \ \ \ \ (4)


In order to “turn off” all the small primes, we introduce the quantity {W}, defined as the product of all the primes up to {w} (i.e. the primorial of {w}):

\displaystyle W := \prod_{p \leq w} p.

As {w} is going to infinity, {W} is going to infinity also (but as slowly as we please). The idea of the {W}-trick is to search for prime patterns in a single residue class {b \hbox{ mod } W}, which as mentioned earlier will “turn off” all the primes less than {w} in the sieve-theoretic analysis.

Using (4) and the Chinese remainder theorem, we may thus approximate the singular series as

\displaystyle {\mathfrak G} = \frac{|C(W)|}{W} (\frac{\phi(W)}{W})^{-k_0} + o(1) \ \ \ \ \ (5)


where {\phi(W)} is the Euler totient function of {W}, and {C(W) \subset {\bf Z}/W{\bf Z}} is the set of residue classes {b \hbox{ mod } W} such that all of the shifts {b+h_1,\ldots,b+h_{k_0}} are coprime to {W}. Note that if {n+{\mathcal H}} consists purely of primes and {n} is sufficiently large, then {n} must lie in one of the residue classes in {C(W)}. Thus we can count tuples with {n+{\mathcal H}} all prime by working in each residue class in {C(W)} separately. We conclude that Conjecture 2 is equivalent to the following “{W}-tricked version” in which the singular series is no longer present (or, more precisely, has been replaced by some natural normalisation factors depending on {W}, such as {(\phi(W)/W)^{-k_0}}):

Conjecture 8 (Prime tuples conjecture, W-tricked quantitative form) Let {k_0 \geq 1} be a fixed natural number, and let {{\mathcal H}} be a fixed admissible {k_0}-tuple. Assume {w} is a sufficiently slowly growing function of {x}. Then for any residue class {b \hbox{ mod } W} in {C(W)}, the number of natural numbers {n < x} with {n=b \hbox{ mod } W} such that {n+{\mathcal H}} consists entirely of primes is {(\frac{1}{W} (\frac{W}{\phi(W)})^{k_0} + o(1)) \frac{x}{\log^{k_0} x}}.

We will work with similarly {W}-tricked asymptotics in the analysis below.

— 2. Sums of multiplicative functions —

As a result of the sieve-theoretic computations to follow, we will frequently need to estimate sums of the form

\displaystyle S_{0,R,I}( f, g ) := \sum_{d \in {\mathcal S}_I} \frac{f(d)}{d} g( \frac{\log d}{\log R} )


\displaystyle S_{1,R,I}( f, g ) := \sum_{d \in {\mathcal S}_I} \mu(d) \frac{f(d)}{d} g( \frac{\log d}{\log R} )

where {f: {\bf N} \rightarrow {\bf C}} is a multiplicative function, the sieve level {R>0} (also denoted {D} in some literature) is a fixed power of {x} (such as {x^{1/4}} or {x^{1/4+\varpi}}), {\mu} is the Möbius function, {g: {\bf R} \rightarrow {\bf C}} is a fixed smooth compactly supported function, {I} is a (possibly half-infinite) interval in {(w,+\infty)}, and {{\mathcal S}_I} is the set of square-free numbers that are products {p_1 \ldots p_j} of distinct primes {p_1,\ldots,p_j} in {I}. (Actually, in applications {g} won’t quite be smooth, but instead have some high order of differentiability (e.g. {k_0+l_0-1} times continuously differentiable for some {l_0>0}), but we can extend the analysis of smooth {g} to sufficiently differentiable {g} by various standard limiting or approximation arguments which we will not dwell on here.) We will also need to control the more complicated variant

\displaystyle S_{2,R,I}(f,g_1,g_2) := \sum_{d_1,d_2 \in {\mathcal S}_I} \frac{\mu(d_1) \mu(d_2) f([d_1,d_2])}{[d_1,d_2]} g_1( \frac{\log d_1}{\log R} ) g_2( \frac{\log d_2}{\log R} )

where {g_1,g_2:{\bf R} \rightarrow {\bf C}} are also smooth compactly supported functions. In practice, the interval {I} will be something like {(w, x^{1/4+\varpi})}, {(w, x^\varpi)}, {[x^\varpi,x^{1/4+\varpi}]}. In particular, thanks to the {W}-trick we will be able to turn off all the primes up to {w}, so that {I} only contains primes larger than {w}, allowing us to take advantage of bounds such as (2).

Once {d} is restricted to {{\mathcal S}_I}, the quantity {f(d)} is determined entirely by the values of the multiplicative function {f} at primes in {I}:

\displaystyle f(d) = \prod_{p \in {\mathcal P} \cap I: p | d} f(p).

In applications, {f} will have the size bound

\displaystyle f(p) = k + O( \frac{1}{p} ) \ \ \ \ \ (6)


for all {p \in I} and some fixed positive {k} (note that we allow the implied constants in the {O()} notation to depend on quantities such as {k}); we refer to {k} as the dimension of the multiplicative function {f}. Henceforth we assume that {f} has a fixed dimension {k}. We remark that we could unify the treatment of {S_{0,R,I}} and {S_{1,R,I}} in what follows by allowing multiplicative functions of negative dimension, but we will avoid doing so here. In our applications {k} will be an integer; one could also generalise much of the discussion below to the fractional dimension case, but we will not need to do so here.

Traditionally the above expressions are handled by complex analysis, starting with Perron’s formula. We will instead take a slightly different Fourier-analytic approach. We perform a Fourier expansion of the smooth compactly supported function {e^x g(x)} to obtain a representation

\displaystyle e^x g(x) = \int_{\bf R} e^{-itx} \hat g(t)\ dt \ \ \ \ \ (7)


for some Schwartz function {\hat g}; in particular, {\hat g} is rapidly decreasing. (Strictly speaking, {\hat g} is the Fourier transform of {g} shifted in the complex domain by {i}, rather than the true Fourier transform of {g}, but we will ignore this distinction for the purposes of this discussion.) In particular we have

\displaystyle g(\frac{\log d}{\log R}) = \int_{\bf R} \frac{1}{d^{\frac{1+it}{\log R}}} \hat g(t)\ dt

for any {d}. By Fubini’s theorem, we can thus write {S_{0,R,I}} as

\displaystyle S_{0,R,I}(f,g) = \int_{\bf R} \sum_{d \in {\mathcal S}_I} \frac{f(d)}{d^{1+\frac{1+it}{\log R}}} \hat g(t)\ dt,

which factorises as

\displaystyle S_{0,R,I}(f,g) = \int_{\bf R} (\prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt.

Similarly one has

\displaystyle S_{1,R,I}(f,g) = \int_{\bf R} (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt.


\displaystyle S_{2,R,I}(f,g_1,g_2) = \int_{\bf R} \int_{\bf R} (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{f(p)}{p^{1+\frac{1+it_2}{\log R}}} + \frac{f(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ))

\displaystyle \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2.

In order to use asymptotics of the Riemann zeta function near the pole {s=1}, it is convenient to temporarily truncate the above integrals to the region {|t| \leq \sqrt{\log R}} or {|t_1|, |t_2| \leq \sqrt{\log R}}:

Lemma 9 For any fixed {A>0}, we have

\displaystyle S_{0,R,I}(f,g) = \int_{|t| \leq \sqrt{\log R}} (\prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt + O( \log^{-A} R)


\displaystyle S_{1,R,I}(f,g) = \int_{|t| \leq \sqrt{\log R}} (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) \hat g(t)\ dt + O( \log^{-A} R)


\displaystyle S_{2,R,I}(f,g) = \int_{|t_1|, |t_2| \leq \sqrt{\log R}}

\displaystyle (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{f(p)}{p^{1+\frac{1+it_2}{\log R}}} + \frac{f(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ))

\displaystyle \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2 + O(\log^{-A} R).

Also we have the crude bound

\displaystyle S_{0,R,I}(f,g) = O( \log^k R ).

Proof: We begin with the bounds on {S_{0,R,I}}. From (6) we have

\displaystyle \log |1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})| \leq k p^{-1-\frac{1}{\log R}} + O( p^{-2} )

for {p \in I} (which forces {p>w}, so there is no issue with the singularity of the logarithm) and thus

\displaystyle \prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}}) = O( \exp( k \sum_p p^{-1-\frac{1}{\log R}} ) ).


\displaystyle \prod_p (1-\frac{1}{p^{1+1/\log R}}) = \frac{1}{\zeta(1+1/\log R)} = \frac{1}{\log R + O(1)}

we see on taking logarithms that

\displaystyle \sum_p p^{-1-\frac{1}{\log R}} = \log\log R + O(1)

and thus

\displaystyle \prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}}) = O( \log^k R ).

The bounds on {S_{0,R,I}(f,g)} then follow from the rapid decrease of {\hat g}. The bounds for {S_{1,R,I}} and {S_{2,R,I}} are proven similarly. \Box

From (6) and the restriction of {I} to quantities larger than {w}, we see that

\displaystyle (\prod_{p \in I} (1 + \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) = (1+o(1)) \zeta_I(1+\frac{1+it}{\log R})^k


\displaystyle (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it}{\log R}}})) = (1+o(1)) \zeta_I(1+\frac{1+it}{\log R})^{-k}


\displaystyle (\prod_{p \in I} (1 - \frac{f(p)}{p^{1+\frac{1+it_1}{\log R}}} - \frac{f(p)}{p^{1+\frac{1+it_2}{\log R}}} + \frac{f(p)}{p^{1+\frac{1+it_1+1+it_2}{\log R}}} ))

\displaystyle = (1+o(1)) \zeta_I(1+\frac{1+it_1}{\log R})^{-k} \zeta_I(1+\frac{1+it_2}{\log R})^{-k}

\displaystyle \zeta_I(1-\frac{1+it_1+1+it_2}{\log R})^{k}

where {\zeta_I} is the restricted Euler product

\displaystyle \zeta_I(s) := \prod_{p \in I} (1-\frac{1}{p^s})^{-1},

which is well-defined for {\hbox{Re}(s)>1} at least (and this is the only region of {s} for which we will need {\zeta_I}).

We now specialise to the model case {I = (w,+\infty)}, in which case

\displaystyle \zeta_I(s) = \zeta(s) \prod_{p \leq w} (1 - \frac{1}{p^s})

where {\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s}} is the Riemann zeta function. Using the basic (and easily proven) asymptotic {\zeta(s) = \frac{1}{s-1} + O(1)} for {s} near {1}

\displaystyle \zeta_I(s) = (1+o(1)) \frac{\phi(W)}{W} \frac{1}{s-1}

for {s = 1+O(1/\sqrt{\log R})}, if {w} is sufficiently slowly growing (this can be seen by first working with a fixed large {W} and then using Lemma 7). Note that because of the above truncation, we do not need any deeper bounds on {\zeta} than what one can obtain from the simple pole at {s=1}; in particular no zero-free regions near the line {\{ 1+it: t \in {\bf R} \}} are needed here. (This is ultimately because of the smooth nature of {g}, which is sufficient for the applications in this post; if one wanted rougher cutoff functions here then the situation is closer to that of the prime number theorem, and non-trivial zero-free regions would be required.)

We conclude in the case {I = (w,+\infty)} that

\displaystyle S_{0,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^k \int_{|t| \leq \sqrt{\log R}} (1+o(1)) (1+it)^{-k} \hat g(t)\ dt

\displaystyle + O( \log^{-A} R)


\displaystyle S_{1,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k} \int_{|t| \leq \sqrt{\log R}} (1+o(1)) (1+it)^k \hat g(t)\ dt

\displaystyle + O( \log^{-A} R)


\displaystyle S_{2,R,I}(f,g_1,g_2) = (\frac{\phi(W)}{W} \log R)^{-k} \int_{|t_1|, |t_2| \leq \sqrt{\log R}}

\displaystyle (1+o(1)) (1+it_1)^k (1+it_2)^k (1+it_1+1+it_2)^{-k} \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2

\displaystyle + O(\log^{-A} R);

using the rapid decrease of {\hat g, \hat g_1, \hat g_2}, we thus have

\displaystyle S_{0,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^k (\int_{\bf R} (1+it)^{-k} \hat g(t)\ dt + o(1))


\displaystyle S_{1,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k} (\int_{\bf R} (1+it)^k \hat g(t)\ dt + o(1))


\displaystyle S_{2,R,I}(f,g_1,g_2) = (\frac{\phi(W)}{W} \log R)^{-k} (\int_{\bf R} \int_{\bf R}

\displaystyle (1+it_1)^k (1+it_2)^k (1+it_1+1+it_2)^{-k}

\displaystyle \hat g_1(t_1) \hat g_2(t_2)\ dt_1 dt_2 + o(1)).

We can rewrite these expressions in terms of {g} instead of {\hat g}. Using the Gamma function identity

\displaystyle (1+it)^{-k} = \int_0^\infty e^{-x(1+it)} \frac{x^{k-1}}{(k-1)!}\ dx

and (7) we see that

\displaystyle \int_{\bf R} (1+it)^{-k} \hat g(t)\ dt = \int_0^\infty g(x) \frac{x^{k-1}}{(k-1)!}\ dx

whilst from differentiating (7) {k} times at the origin (after first dividing by {e^x}) we see that

\displaystyle \int_{\bf R} (1+it)^k \hat g(t)\ dt = (-1)^k g^{(k)}(0).

Combining these two methods, we also see that

\displaystyle \int_{\bf R} \int_{\bf R} (1+it_1)^k (1+it_2)^k (1+it_1+1+it_2)^{-k} \hat g_1(t_1) \hat g_1(t_2)\ dt_1 dt_2

\displaystyle = \int_0^\infty g^{(k)}_1(x) g^{(k)}_2(x) \frac{x^{k-1}}{(k-1)!}\ dx.

We have thus obtained the following asymptotics:

Proposition 10 (Asymptotics without prime truncation) Suppose that {I = (w,+\infty)}, and that {f} has dimension {k} for some fixed natural number {k}. Then we have

\displaystyle S_{0,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^k (\int_0^\infty g(x) \frac{x^{k-1}}{(k-1)!}\ dx + o(1))


\displaystyle S_{1,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k} ((-1)^k g^{(k)}(0) + o(1))


\displaystyle S_{2,R,I}(f,g) = (\frac{\phi(W)}{W} \log R)^{-k}

\displaystyle (\int_0^\infty g^{(k)}_1(x) g^{(k)}_2(x) \frac{x^{k-1}}{(k-1)!}\ dx + o(1)).

These asymptotics will suffice for the treatment of the Goldston-Pintz-Yildirim theorem (Theorem 4). For the Motohashi-Pintz-Zhang theorem (Theorem 5) we will also need to deal with truncated intervals {I}, such as {(w,x^{1/\varpi})}; we will discuss how to deal with these truncations later.

— 3. The Goldston-Yildirim-Pintz theorem —

We are now ready to state and prove the Goldston-Yildirim-Pintz theorem. We first need to state the Elliott-Halberstam conjecture properly.

Let {\Lambda: {\bf N} \rightarrow {\bf R}} be the von Mangoldt function, thus {\Lambda(n)} equals {\log p} when {n} is equal to a prime {p} or a power of that prime, and equal to zero otherwise. The prime number theorem in arithmetic progressions tells us that

\displaystyle \sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) = (1 + o(1)) \frac{x}{\phi(q)}

for any fixed arithmetic progression {a \hbox{ mod } q} with {a} coprime to {q}. In particular,

\displaystyle \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)| = o( \frac{x}{\phi(q)} )

where {({\bf Z}/q{\bf Z})^\times} are the residue classes mod {q} that are coprime to {q}. By invoking the Siegel-Walfisz theorem one can obtain the improvement

\displaystyle \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)| = O( \frac{x}{\phi(q) \log^A x} )

for any fixed {A>0} (though, annoyingly, the implied constant here is only ineffectively bounded with current methods; see this previous post for further discussion).

The above error term is only useful when {q} is fixed (or is of logarithmic size in {x}). For larger values of {q}, it is very difficult to get good error terms for each {q} separately, unless one assumes powerful hypotheses such as the generalised Riemann hypothesis. However, it is possible to obtain good control on the error term if one averages in {q}. More precisely, for any {0 < \theta < 1}, let {EH[\theta]} denote the following assertion:

Conjecture 11 ({EH[\theta]}) One has

\displaystyle \sum_{1 \leq q \leq x^\theta} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \Lambda(n) - \frac{1}{\phi(q)} \sum_{n < x} \Lambda(n)|

\displaystyle = O( \frac{x}{\log^A x} )

for all fixed {A>0}.

This should be compared with the asymptotic {\sum_{1 \leq q \leq x^\theta} \frac{x}{\phi(q)} = (C+o(1)) x \log x^\theta} for some absolute constant {C>0}, as can be deduced for instance from Proposition 10. The Elliott-Halberstam conjecture is the assertion that {EH[\theta]} holds for all {0 < \theta < 1}. This remains open, but the important Bombieri-Vinogradov theorem establishes {EH[\theta]} for all {0 < \theta < 1/2}. Remarkably, the threshold {1/2} is also the limit of what one can establish if one directly invokes the generalised Riemann hypothesis, so the Bombieri-Vinogradov theorem is often referred to as an assertion that the generalised Riemann hypothesis (or at least the Siegel-Walfisz theorem) holds “on the average”, which is often good enough for sieve-theoretic purposes.

We may replace the von Mangoldt function {\Lambda(n)} with the slight variant {\theta(n)}, defined to equal {\log p} when {n} is a prime {p} and zero otherwise. Using this replacement, as well as the prime number theorem (with{O(x / \log^A x)} error term), it is not difficult to show that {EH[\theta]} is equivalent to the estimate

\displaystyle \sum_{1 \leq q \leq x^\theta} \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{n < x: n = a \hbox{ mod } q} \theta(n) - \frac{x}{\phi(q)}| = O( \frac{x}{\log^A x} ). \ \ \ \ \ (8)


Now we establish Theorem 4. Suppose that {EH[\theta]} holds for some fixed {1/2 <\theta < 1}, let {k_0} be sufficiently large depending on {\theta} but otherwise fixed, and let {{\mathcal H}} be a fixed admissible {k_0}-tuple. We would like to show that there are infinitely many {n} such that {n + {\mathcal H}} contains at least two primes. We will begin with the {W}-trick, restricting {n} to a residue class {b \hbox{ mod } W} with {b \in C(W)} (note that {C(W)} is non-empty because {{\mathcal H}} is admissible).

The general strategy will be as follows. We will introduce a weight function {\nu: {\bf Z} \rightarrow {\bf R}^+} that obeys the upper bound

\displaystyle \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \nu(n) \leq (\alpha+o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0} R} \ \ \ \ \ (9)


and lower bound

\displaystyle \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \theta(n+h) \nu(n)

\displaystyle \geq (\beta-o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0-1} R} \ \ \ \ \ (10)


for all {h \in {\mathcal H}} and some fixed {\alpha,\beta > 0}, where {R} is a fixed power of {x} (we will eventually take {R = x^{\theta/2}}). (The factors of {W, \phi(W)}, {x}, and {\log R} on the right-hand side are natural normalisations coming from sieve theory and the reader should not pay too much attention to them.) Informally, (9) says that {\nu} has some normalised density at most {\alpha}, and then (10) roughly speaking asserts that relative to the weight {\nu}, {n+h} has a probability of at least {\beta/\alpha -o(1)} of being prime. If we sum (10) for all {h \in H} and then subtract off {\log 3x} copies of (9), we conclude that

\displaystyle \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} (\sum_{h \in {\mathcal H}} \theta(n+h) - \log 3x) \nu(n)

\displaystyle \geq (k_0\beta - \alpha \frac{\log x}{\log R}-o(1)) (\frac{W}{\phi(W)})^{k_0} \frac{x}{W \log^{k_0-1} R}.

In particular, if we have the crucial inequality

\displaystyle k_0 \beta > \alpha \frac{\log x}{\log R} \ \ \ \ \ (11)


we conclude that

\displaystyle \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} (\sum_{h \in {\mathcal H}} \theta(n+h) - \log 3x) \nu(n) \gg \frac{x}{W \log^{k_0-1} x}

and so {\sum_{h \in {\mathcal H}} \theta(n+h) - \log 3x} is positive for at least one value of {n} between {x} and {2x}. This can only occur if {n+{\mathcal H}} contains two or more primes. Thus we must have {n+{\mathcal H}} containing at least two primes for some {n} between {x} and {2x}; sending {x} off to infinity then gives {DHL[k_0,2]} as desired.

It thus suffices to find a weight function {\nu} obeying the required properties (9), (10) with parameters {\alpha,\beta,R} obeying the key inequality (11). It is thus of interest to make {R} as large a power of {x} as possible, and to minimise the ratio between {\alpha} and {\beta}. It is in the former task that the Elliott-Halberstam hypothesis will be crucial.

The key is to find a good choice of {\nu}, and the selection of this weight is arguably the main contribution of Goldston, Pintz, and Yildirim, who use a carefully modified version of the Selberg sieve. Following (a slight modification of) the Goldston-Pintz-Yildirim argument, we will take a weight of the form {\nu(n) = \lambda(n)^2}, where

\displaystyle \lambda(n) := \sum_{d \in {\mathcal S}_I: d|P(n)} \mu(d) g( \frac{\log d}{\log R} ) \ \ \ \ \ (12)


where {g: {\bf R} \rightarrow {\bf R}} is a fixed smooth non-negative function supported on {[-1,1]} to be chosen later, {I := (w,+\infty)}, and {P(n)} is the polynomial

\displaystyle P(n) := \prod_{h \in {\mathcal H}} (n+h).

The intuition here is that {\lambda} is a truncated approximation to a function of the form

\displaystyle \lambda_a(n) := \sum_{d \in {\mathcal S}_I: d|P(n)} \mu(d) \log^a \frac{P(n)}{d}

for some natural number {a}, which one can check is only non-vanishing when {P(n)} has at most {a} distinct prime factors in {I}. So {\nu(n)} is concentrated on those numbers {n} for which {n+h} already has few prime factors for {h \in {\mathcal H}}, which will assist in making the ratio {\alpha/\beta} as small as possible.

Clearly {\nu} is non-negative. Now we consider the task of estimating the left-hand side of (9). Expanding out {\nu = \lambda^2} using (12) and interchanging summations, we can expand this expression as

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} 1.

The constraint {d_1,d_2 | P(n)} is equivalent to requiring that for each prime {p} dividing {[d_1,d_2]}, {n} lies in one of the residue classes {-h_i \hbox{ mod } p} for {i=1,\ldots,k_0}. By choice of {I}, {p > w}, so all the {h_i} are distinct, and so we are constraining {n} to lie in one of {k_0} residue classes modulo {p} for each {p|[d_1,d_2]}; together with the constraint {n = b \hbox{ mod } W} and the Chinese remainder theorem, we are thus constraining {n} to {k_0^{\Omega([d_1,d_2])}} residue classes modulo {W [d_1,d_2]}, where {\Omega([d_1,d_2])} is the number of prime factors of {[d_1,d_2]}. We thus have

\displaystyle \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} 1 = k_0^{\Omega([d_1,d_2])} \frac{x}{W[d_1,d_2]} + O( k_0^{\Omega([d_1,d_2])} ).

Note from the support of {g} that {d_1,d_2} may be constrained to be at most {R}, so that {d_1d_2} is at most {R^2}. We can thus express the left-hand side of (9) as the main term

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) k_0^{\Omega([d_1,d_2])} \frac{x}{W[d_1,d_2]}

plus an error

\displaystyle O( R^2 \sum_{d_1,d_2 \in {\mathcal S}_I} g(\frac{\log d_1}{\log R}) g(\frac{\log d_2}{\log R}) \frac{k_0^{\Omega(d_1)} k_0^{\Omega(d_2)} }{d_1 d_2} ).

By Proposition 10, the error term is {O( R^2 \log^{2k_0} R )}. So if we set

\displaystyle R := x^{\theta/2}

then the error term will certainly give a negligible contribution to (9) with plenty of room to spare. (But when we come to the more difficult sum (10), we will have much less room – only a superlogarithmic amount of room, in fact.) To show (9), it thus suffices to show that

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1,d_2]}

\displaystyle \leq (\alpha+o(1)) (\frac{W}{\phi(W)})^{k_0} \log^{-k_0} R.

But by Proposition 10 (applied to the {k_0}-dimensional multiplicative function {k_0^{\Omega([d_1,d_2])}}) and the support of {g}, this bound holds with {\alpha} equal to the quantity

\displaystyle \alpha = \int_0^1 g^{(k_0)}(x)^2 \frac{x^{k_0-1}}{(k_0-1)!}\ dx.

Now we turn to (10). Fix {h \in {\mathcal H}}. Repeating the arguments for (9), we may expand the left-hand side of (10) as

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R})

\displaystyle \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} \theta(n+h).

Now we consider the inner sum

\displaystyle \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W: d_1,d_2 | P(n)} \theta(n+h).

As discussed earlier, the conditions {n=b \hbox{ mod } W} and {d_1,d_2 | P(n)} split into {k_0^{\Omega([d_1,d_2])}} residue classes {n = a \hbox{ mod } W [d_1,d_2]}. However, if {n = -h \hbox{ mod } p} for one of the primes {p} dividing {[d_1,d_2]}, then {\theta(n+h)} must vanish (since {R = x^\theta} is much less than {n+h}). So there are actually only {(k_0-1)^{\Omega([d_1,d_2])}} residue classes {a \hbox{ mod } W[d_1,d_2]} for which {a+h} is coprime to {W[d_1,d_2]}. We thus have

\displaystyle \sum_{x \leq n \leq 2x: n = a \hbox{ mod } W[d_1,d_2]} \theta(n+h) = (k_0-1)^{\Omega([d_1,d_2])} \frac{x}{\phi(W[d_1,d_2])}

\displaystyle + O( (k_0-1)^{\Omega([d_1,d_2])} E(x; W[d_1,d_2]) )

where {E(x;q)} denotes the quantity

\displaystyle E(x;q) := \sup_{a \in ({\bf Z}/q{\bf Z})^\times} |\sum_{x \leq n \leq 2x: n = a \hbox{ mod } q} \theta(n) - \frac{x}{\phi(q)}|. \ \ \ \ \ (13)


Remark 1 There is an inefficiency here; the supremum in (13) is over all primitive residue classes {a \hbox{ mod } q}, but actually one only needs to take the supremum over the {(k_0-1)^{\Omega(q)}} residue classes {a \hbox{ mod } q} for which {P_h(a) = 0 \hbox{ mod } q}, where {P_h(a) := \prod_{h' \in {\mathcal H}:h' \neq h} (a+h'-h)}. This inefficiency is not exploitable if we insist on using the Elliott-Halberstam conjecture as the starting hypothesis, but will be used in the arguments of the next section in which a more lightweight hypothesis is utilised.

The left-hand side of (10) is thus equal to the main term

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} \frac{x}{\phi(W[d_1,d_2])}

plus an error term

\displaystyle O( \sum_{d_1,d_2 \in {\mathcal S}_I} g(\frac{\log d_1}{\log R}) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} E(x; W[d_1,d_2]) ).

We first deal with the error term. Since {[d_1,d_2]} is in {{\mathcal S}_I} and is bounded by {R^2} on the support of this function, and each {d \in {\mathcal S}_I} has {3^{\Omega(d)}} representations of the form {d = [d_1,d_2]}, we can bound this expression by

\displaystyle O( \sum_{d \in {\mathcal S}_I: d \leq R^2} 3^{\Omega(d)} (k_0-1)^{\Omega(d)} E(x;Wd) ).

Note that we are assuming {g} to be a fixed smooth compactly supported function and so it has magnitude {O(1)}. On the other hand, from Proposition 10 and the trivial bound {E(x;Wd) = O( \frac{x \log x}{Wd} + \frac{x}{\phi(W) \phi(d)} )} we have

\displaystyle \sum_{d \in {\mathcal S}_I: d \leq R^2} 3^{2\Omega(d)} (k_0-1)^{2\Omega(d)} E(x;Wd) = O( x \log^{O(1)} x )

while from (8) (and here we crucially use the choice {R = x^{\theta/2}} of {R}) one easily verifies that

\displaystyle \sum_{d \in {\mathcal S}_I: d \leq R^2} E(x;Wd) = O( x \log^{-A} x )

for any fixed {A}. By the Cauchy-Schwarz inequality we see that the error term to (10) is negligible (assuming {w} sufficiently slowly growing of course). Meanwhile, the main term can be rewritten as

\displaystyle \frac{x}{\phi(W)} \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) \frac{f([d_1,d_2])}{[d_1,d_2]}

where {f} is the {k_0-1}-dimensional multiplicative function

\displaystyle f(d) := \prod_{p|d} (k_0-1) \frac{p}{p-1}.

Applying Proposition 10, we obtain (10) with

\displaystyle \beta = \int_0^1 g^{(k_0-1)}(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt.

To obtain the crucial inequality (11), we thus need to locate a fixed smooth non-negative function supported on {[-1,1]} obeying the inequality

\displaystyle k_0 \int_0^1 g^{(k_0-1)}(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt > \frac{2}{\theta} \int_0^1 g^{(k_0)}(t)^2 \frac{t^{k_0-1}}{(k_0-1)!}\ dt . \ \ \ \ \ (14)


In principle one can use calculus of variations to optimise the choice of {g} here (it will be the ground state of a certain one-dimensional Schrödinger operator), but one can already get a fairly good result here by a specific choice of {g} that is amenable for computations, namely a polynomial of the form {g(t) := \frac{1}{(k_0+l_0)!} (1-t)^{k_0+l_0}} for {t \in [0,1]} and some integer {l_0>0}, with {g} vanishing for {t>1} and smoothly truncated to {[-1,1]} somehow at negative values of {t}. Strictly speaking, this {g} is not admissible here because it is not infinitely smooth at {1}, being only {k_0+l_0-1} times continuously differentiable instead, but one can regularise this function to be smooth without significantly affecting either side of (14), so we will go ahead and test (14) with this function and leave the regularisation details to the reader. The inequality then becomes (after cancelling some factors)

\displaystyle k_0 \int_0^1 (1-t)^{2l_0+2} \frac{t^{k_0-2}}{(k_0-2)!}\ dt > \frac{2}{\theta} \int_0^1 (l_0+1)^2 (1-t)^{2l_0} \frac{t^{k_0-1}}{(k_0-1)!}\ dt .

Using the Beta function identity

\displaystyle \int_0^1 (1-t)^a t^b\ dt = \frac{a! b!}{(a+b+1)!}

we have

\displaystyle \alpha = \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} \ \ \ \ \ (15)



\displaystyle \beta = \frac{(2l_0+2)!}{((l_0+1)!)^2 (k_0+2l_0+1)!} \ \ \ \ \ (16)


and the preceding equation now becomes

\displaystyle k_0 \frac{(2l_0+2)!}{(2l_0+k_0+1)!} > \frac{2}{\theta} (l_0+1)^2 \frac{(2l_0)!}{(2l_0+k_0)!}

which simplifies to

\displaystyle 2\theta > (1 + \frac{1}{2l_0+1}) (1 + \frac{2l_0+1}{k_0}).

Actually, the same inequality is also applicable when {l_0} is real instead of integer, using Gamma functions in place of factorials; we leave the details to the interested reader. We can then optimise in {l_0} by setting {2l_0+1 = \sqrt{k_0}}, arriving at the inequality

\displaystyle 2\theta > (1 + \frac{1}{\sqrt{k_0}})^2.

But as long as {\theta > 1/2}, this inequality is satisfiable for any {k_0} larger than {(\sqrt{2\theta}-1)^{-2}}. This concludes the proof of Theorem 4.

Remark 2 One can obtain slightly better dependencies of {k_0} in terms of {\theta} by using more general functions for {g} than the monomials {\frac{1}{(k_0+l_0)!} (1-x)^{k_0+l_0}}, for instance one can take linear combinations of such functions. See the paper of Goldston, Pintz, and Yildirim for details. Unfortunately, as noted in this survey of Soundararajan, one has the general inequality

\displaystyle k_0 \int_0^1 g^{(k_0-1)}(t)^2 \frac{t^{k_0-2}}{(k_0-2)!}\ dt \leq 4 \int_0^1 g^{(k_0)}(t)^2 \frac{x^{k_0-1}}{(k_0-1)!}\ dx \ \ \ \ \ (17)


which defeats any attempt to directly use this method using only the Bombieri-Vinogradov result that {EH[\theta]} holds for all {\theta < 1/2}. We show (17) in the case when {k_0} is large. Write {f(t) := g^{(k_0-1)}(t) t^{k_0/2-1}}, then (17) simplifies to

\displaystyle \frac{k_0 (k_0-1)}{4} \int_0^1 f(t)^2\ dt \leq \int_0^1 (f'(t) - (k_0/2-1) t^{-1} f(t))^2 t\ dt.

The right-hand side simplifies after some integration by parts to

\displaystyle \int_0^1 f'(t)^2 t + \frac{(k_0-2)^2}{4} f(t)^2 t^{-1}\ dt.

Subtracting off {\int_0^1 \frac{(k_0-2)^2}{4} f(t)^2\ dt} from both sides, one is left with

\displaystyle \frac{3k_0-4}{4} \int_0^1 f(t)^2 \leq \int_0^1 f'(t)^2 t + \frac{(k_0-2)^2}{4} f(t)^2 (t^{-1} - 1)\ dx.

From the fundamental theorem of calculus and Cauchy-Schwarz, one has the bound

\displaystyle |f(y)|^2 \leq (\int_0^1 f'(y)^2 y\ dy) (\log(1/y)).

Using this bound for {y} close to {1} and dominating {\frac{3k_0-4}{4}} by {\frac{(k_0-2)^2}{4} (y^{-1} - 1)} for {y} far from {1}, we obtain the claim (at least if {k_0} is large enough). There is some slack in this argument; it would be of interest to calculate exactly what the best constants are in (17), so that one can obtain the optimal relationship between {\theta} and {k_0}.

To get around this obstruction (17) in the unconditional setting when one only has {EH[\theta]} for {\theta<1/2}, Goldston, Pintz, and Yildirim also considered sums of the form {\sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \theta(n+h) \nu(n)} in which {h} was now outside (but close to) {{\mathcal H}}. While the bounds here were significantly inferior to those in (10), they were still sufficient to prove a variant of the inequality (11) to get reasonably small gaps between primes.

— 4. The Motohashi-Pintz-Zhang theorem —

We now modify the above argument to give Theorem 5. Our treatment here is different from that of Zhang in that it employs the method of Buchstab iteration; a related argument also appears in the paper of Motohashi and Pintz. This arrangement of the argument leads to a more efficient dependence of {k_0} on {\varpi} than in the paper of Zhang. (The argument of Motohashi and Pintz is a bit more complicated and uses a slightly different formulation of the base conjecture {MPZ[\varpi]}, but the final bounds are similar to those given here, albeit with non-explicit constants in the {O()} notation.)

The main idea here is to truncate the interval {I} of relevant primes from {(w,\infty)} to {(w,x^\varpi)} for some small {\varpi}. Somewhat remarkably, it turns out that this apparently severe truncation does not affect the sums (9), (10) here as long as {k_0 \varpi} is large (which is going to be the case in practice, with {k_0} being comparable to {\varpi^{-2}}). The intuition is that {\nu} was already concentrated on those {n} for which {P(n)} had about {O(k_0)} factors, and it is too “expensive” for one of these factors to as large as {x^\varpi} or more, as it forces many of the other factors to be smaller than they “want” to be. The advantage of truncating the set of primes this way is that the version of the Elliott-Halberstam conjecture needed also acquires the same truncation, which gives that version a certain “well-factored” form (in the spirit of the work of Bombieri, Fouvry, Friedlander, and Iwaniec) which is essential in being able to establish that conjecture unconditionally for some suitably small {\varpi}.

To make this more precise, we first formalise the conjecture {MPZ[\varpi]} for {0 < \varpi < 1/4} mentioned earlier.

Conjecture 12 ({MPZ[\varpi]}) Let {{\mathcal H}} be a fixed {k}-tuple (not necessarily admissible) for some fixed {k \geq 2}, and let {b \hbox{ mod } W} be a primitive residue class. Then

\displaystyle \sum_{q \in {\mathcal S}_I: q< x^{1/2+2\varpi}} \sum_{a \in ({\bf Z}/q{\bf Z})^\times: P(a) = 0 \hbox{ mod } q} |\Delta_{b,W}(\theta; q,a)| = O( x \log^{-A} x)

for any fixed {A>0}, where {I = (w,x^{\varpi})} and {\Delta_{b,W}(\theta;q,a)} is the quantity

\displaystyle \Delta_{b,W}(\theta;q,a) := | \sum_{x \leq n \leq 2x: n=b \hbox{ mod } W; n = a \hbox{ mod } q} \theta(n) \ \ \ \ \ (18)


\displaystyle - \frac{1}{\phi(q)} \sum_{x \leq n \leq 2x: n = b \hbox{ mod } W} \theta(n)|.


\displaystyle P(a) := \prod_{h \in {\mathcal H}} (a+h).

This is the {W}-tricked formulation of the conjecture as (implicitly) stated in Zhang’s paper, which did not have the restriction {n = b \hbox{ mod } W} present (and with the interval {I} enlarged from {(w,x^\varpi)} to {(1,x^\varpi)}, and {{\mathcal H} \cup \{0\}} was required to be admissible). However the two formulations are morally equivalent (and Zhang’s arguments establish Theorem 6 with {MPZ[\varpi]} as stated). From the prime number theorem in arithmetic progressions (with {O( x \log^{-A} x)} error term) together with Proposition 10 we observe that we may replace (18) here by the slight variant

\displaystyle \Delta'_{b,W}(\theta;q,a) := | \sum_{x \leq n \leq 2x: n=b \hbox{ mod } W; n = a \hbox{ mod } q} \theta(n) \ \ \ \ \ (19)


\displaystyle - \frac{1}{\phi(Wq)} x|

without affecting the truth of {MPZ[\varpi]}.

It is also not difficult to deduce {MPZ[\varpi]} from {EH[1/2 + 2 \varpi]} after using a Cauchy-Schwarz argument to dispose of the {k^{\Omega(d)}} residue classes {a} in the above sum (cf. the treatment of the error term in (10) in the previous section); we leave the details to the interested reader. Note however that whilst {EH[1/2+2\varpi]} demands control over all primitive residue classes {a} in a given modulus {q}, the conjecture {MPZ[\varpi]} only requires control of a much smaller number of residue classes (roughly polylogarithmic in number, on average). Thus {MPZ[\varpi]} is simpler than {EH[1/2+2\varpi]}, though it is still far from trivial.

We now begin the proof of Theorem 5. Let {0 < \varpi < 1/4} be such that {MPZ[\varpi]} holds, and let {k_0} be a sufficiently large quantity depending on {\varpi} but which is otherwise fixed. As before, it suffices to locate a non-negative sieve weight {\nu} that obeys the estimates (9), (10) for parameters {\alpha,\beta,R} that obey the key inequality (11), and with {g} smooth and supported on {[-1,1]}. The choice of weight {\nu} is almost the same as before; it is also given as a square {\nu(n) = \lambda(n)^2} with {\lambda} given by (12), but now the interval {I} is truncated to {(w,x^\varpi)} instead of {(x,\infty)}. Also, in this argument we take

\displaystyle R = x^{1/4 + \varpi}

We now establish (9). By repeating the previous arguments, the left-hand side of (9) is equal to a main term

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) k_0^{\Omega([d_1,d_2])} \frac{x}{W[d_1,d_2]} \ \ \ \ \ (20)


plus an error term which continues to be acceptable (indeed, the error term is slightly smaller than in the previous case due to the truncated nature of {I}). At this point in the previous section we applied Proposition 10, but that proposition was only available for the untruncated interval {[w,+\infty)} instead of the truncated interval {[w,x^\varpi)}. One could try to adapt the proof of that proposition to the truncated case, but then one is faced with the problem of controlling the truncated zeta function {\zeta_I}. While one can eventually get some reasonable asymptotics for this function, it seems to be more efficient to eschew Fourier analysis and work entirely in “physical space” by the following partial Möbius inversion argument. Write {J := [x^\varpi,\infty)}, thus {I \cup J = [w,+\infty)}. Observe that for any {d \in {\mathcal S}_{I \cup J}}, the quantity {\sum_{a \in {\mathcal S}_J: a|d} \mu(a)} equals {1} when {d} lies in {{\mathcal S}_I} and vanishes otherwise. Hence, for any function {F(d_1,d_2)} of {d_1} and {d_2} supported on squarefree numbers we have the partial Mobius inversion formula

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} F(d_1,d_2) = \sum_{a_1, a_2 \in {\mathcal S}_J} \mu(a_1) \mu(a_2) \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} F(a_1 d_1, a_2 d_2)

and so the main term (20) can be expressed as

\displaystyle \sum_{a_1, a_2 \in {\mathcal S}_J} \mu(a_1) \mu(a_2) \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} \mu(a_1 d_1) g(\frac{\log a_1d_1}{\log R}) \mu(a_2 d_2) g(\frac{\log a_2 d_2}{\log R}) \ \ \ \ \ (21)


\displaystyle k_0^{\Omega([a_1d_1,a_2d_2])} \frac{x}{W [a_1d_1,a_2d_2]}.

We first dispose of the contribution to (21) when {a_i,d_j} share a common prime factor {p_* \in J} for some {i,j=1,2}. For any fixed {i,j}, we can bound this contribution by

\displaystyle \ll \frac{x}{W} \sum_{p_* \in J} \sum_{a_1,a_2 \in {\mathcal S}_J} \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} 1_{p^2_*|a_id_j} (a_1 d_1 a_2 d_2)^{-1/\log R} \frac{k_0^{\Omega([a_1d_1,a_2d_2])}}{[a_1d_1,a_2d_2]}.

Factorising the inner two sums as an Euler product, this becomes

\displaystyle \ll \frac{x}{W} \sum_{p_* \in J} \frac{1}{p_*^2} ( \prod_{p \in I \cup J} 1 + O(\frac{1}{p^{1+1/\log R}}) ).

[UPDATE: The above argument is not quite correct; a corrected (and improved) version is given at this newer post.] The product is {O(\log^{O(1)} R)} by e.g. Mertens’ theorem, while {\sum_{p_* \in J} \frac{1}{p_*^2} \ll x^{-\varpi}}. So the contribution of this case is negligible.

If {a_i,d_j} do not share a common factor {p_* \in J} for any {i,j=1,2}, then we can factor {[a_1d_1,a_2d_2]} as {[a_1,a_2][d_1,d_2]}. Rearranging this portion of (21) and then reinserting the case when {a_i,d_j} have a common factor {p_* \in J} for some {i,j}, we may write (21) up to negligible errors as

\displaystyle \frac{x}{W} \sum_{a_1, a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} \mu(d_1) g(\frac{\log d_1}{\log R} + \frac{\log a_1}{\log R})

\displaystyle \mu(d_2) g(\frac{\log d_2}{\log R} + \frac{\log a_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1,d_2]}.

Note that we can restrict {a_1,a_2} to be at most {R} as otherwise the {g} factors vanish. The inner sum

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_{I \cup J}} \mu(d_1) g(\frac{\log d_1}{\log R} + \frac{\log a_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R} + \frac{\log a_2}{\log R}) \frac{k_0^{\Omega([d_1,d_2])}}{[d_1,d_2]}

is now of the form that can be treated by Proposition 10, and takes the form

\displaystyle (\frac{\phi(W)}{W} \log R)^{-k_0} (\int_0^\infty g^{(k_0)}(x + \frac{\log a_1}{R}) g^{(k_0)}(x + \frac{\log a_2}{R}) \frac{x^{k_0-1}}{(k_0-1)!}\ dx

\displaystyle + o(1)).

Here we make the technical remark that the translates of {g} by shifts between {0} and {1} are uniformly controlled in smooth norms, which means that the {o(1)} error here is uniform in the choices of {a_1, a_2}.

Let us first deal with the contribution of the {o(1)} error term. This is bounded by

\displaystyle o( \frac{x}{W} (\frac{\phi(W)}{W} \log R)^{-k_0} \sum_{a_1,a_2 \in {\mathcal S}_{(x^\varpi, R]}} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} ).

The inner sum factorises as

\displaystyle \prod_{x^\varpi < p \leq R} (1 + \frac{3 k_0}{p})

which by Mertens’ theorem is {O(1)} (albeit with a rather large implied constant!), so this error is negligible for the purposes of (9). Indeed, (9) is now reduced to the inequality

\displaystyle \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]}

\displaystyle \int_0^\infty g^{(k_0)}(x + \frac{\log a_1}{\log R}) g^{(k_0)}(x + \frac{\log a_2}{\log R}) \frac{x^{k_0-1}}{(k_0-1)!}\ dx \ \ \ \ \ (22)


\displaystyle \leq \alpha+o(1).

Note that the factor {\frac{x^{k_0-1}}{(k_0-1)!}} increases very rapidly with {x} when {k} is large, which basically means that any non-trivial shift of the {g^{(k_0)}} factors to the left by {\frac{\log a_1}{\log R}} or {\frac{\log a_2}{\log R}} will cause the integral in (22) to decrease dramatically. Since all the {a_1,a_2} in {{\mathcal J}} are either equal to {1} or bounded below by {x^\varpi}, this will cause the {a_1=a_2=1} term to dominate in the regime when {k_0 \varpi} is large (or more precisely {k_0 \varpi \gg \log k_0}), which is the case in applications.

At this point, in order to perform the computations cleanly, we will mimic the arguments from the previous section and take the explicit choice

\displaystyle g(x) := \frac{1}{(k_0+l_0)!} (1-x)_+^{k_0+l_0}

for some integer {l_0>0} and {x>0} (and some smooth continuation to {[-1,1]} for negative {x}, and so

\displaystyle g^{(k_0)}(x) = (-1)^{k_0} \frac{1}{l_0!} (1-x)^{l_0}_+

for positive {x}. (Again, this function is not quite smooth at {1}, but this issue can be dealt with by an infinitesimal regularisation argument which we omit here.) The left-hand side of (22) now becomes

\displaystyle \frac{1}{(l_0!)^2} \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \int_0^\infty (1-x-\frac{\log a_1}{\log R})_+^{l_0} (1-x-\frac{\log a_2}{\log R})_+^{l_0}

\displaystyle \frac{x^{k_0-1}}{(k_0-1)!}\ dx.

The integral here is a little bit more complicated than a beta integral. To estimate it, we use the beta function identity to observe that

\displaystyle \int_0^\infty (1-x-\frac{\log a_1}{\log R})_+^{2l_0} \frac{x^{k_0-1}}{(k_0-1)!}\ dx = (1 - \frac{\log a_1}{\log R})_+^{k_0+2l_0} \frac{(2l_0)!}{(k_0+2l_0)!}


\displaystyle \int_0^\infty (1-x-\frac{\log a_2}{\log R})_+^{2l_0} \frac{x^{k_0-1}}{(k_0-1)!}\ dx = (1 - \frac{\log a_2}{\log R})_+^{k_0+2l_0} \frac{(2l_0)!}{(k_0+2l_0)!}

and hence by Cauchy-Schwarz

\displaystyle \int_0^\infty (1-x-\frac{\log a_1}{\log R})_+^{l_0} (1-x-\frac{\log a_2}{\log R})_+^{l_0} \frac{x^{k_0-1}}{(k_0-1)!}\ dx

\displaystyle \leq (1 - \frac{\log a_1}{\log R})_+^{k_0/2+l_0} (1 - \frac{\log a_2}{\log R})_+^{k_0/2+l_0} \frac{(2l_0)!}{(k_0+2l_0)!}.

This Cauchy-Schwarz step is a bit wasteful when {a_1,a_2} are far apart, but this does seems to only lead to a minor loss of efficiency in the estimates. We have thus bounded the left-hand side of (22) by

\displaystyle \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]}

\displaystyle (1 - \frac{\log a_1}{\log R})_+^{k_0/2+l_0} (1 - \frac{\log a_2}{\log R})_+^{k_0/2+l_0}.

It is now convenient to collapse the double summation to a single summation. We may bound

\displaystyle (1 - \frac{\log a_1}{\log R})_+^{k_0/2+l_0} (1 - \frac{\log a_2}{\log R})_+^{k_0/2+l_0} \leq (1 - \frac{\log [a_1,a_2]}{\log R^2})_+^{k_0/2+l_0}

(since {\frac{\log [a_1,a_2]}{\log R^2}} is less than the greater of {\frac{\log a_1}{\log R}} and {\frac{\log a_2}{\log R}}) and observe that each {a \in {\mathcal S}_J} has {3^{\Omega(a)}} representations of the form {a = [a_1,a_2]}, so we may now bound the left-hand side of (22) by

\displaystyle \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} \sum_{a \in {\mathcal S}_J} \frac{(3k_0)^{\Omega(a)}}{a} (1 - \frac{\log a}{\log R^2})_+^{k_0/2+l_0}.

Note that an element {a} of {{\mathcal S}_J} is either equal to {1}, or lies in the interval {[x^{n\varpi}, x^{(n+1)\varpi})} for some natural number {n \geq 1}. In the latter case, we have

\displaystyle (1 - \frac{\log a}{\log R^2})_+ \leq (1 - \frac{2n \varpi}{1 + 4\varpi})_+.

In particular, this expression vanishes if {n \geq 2 + \frac{1}{2\varpi}}. We can thus bound the left-hand side of (22) by

\displaystyle \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} (1 + \sum_{1 \leq n < 2 + \frac{1}{2\varpi}} (1 - \frac{2n \varpi}{1 + 4\varpi})^{k_0/2 + l_0}

\displaystyle \sum_{a \in {\mathcal S}_J: a < x^{(n+1)\varpi}} \frac{(3k_0)^{\Omega(a)}}{a} ).

If we introduce the quantity

\displaystyle \Phi_{3k_0}(z,y) := \sum_{j=0}^\infty (3k_0)^j \sum_{y \leq p_1 < \ldots < p_j: p_1 \ldots p_j < z} \frac{1}{p_1 \ldots p_j} \ \ \ \ \ (23)


then we have thus bounded the left-hand side of (22) by

\displaystyle \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} (1 + \sum_{j=1}^\infty (1 - \frac{2j \varpi}{1 + 4\varpi})_+^{k_0/2 + l_0} \Phi_{3k_0}( x^{(j+1)\varpi}, x^\varpi )).

We observe that

\displaystyle \Phi_{3k_0}(z,y) = 1 \ \ \ \ \ (24)


when {y \geq z}, while in general we have the Buchstab identity

\displaystyle \Phi_{3k_0}(z,y) \leq 1 + 3k_0 \sum_{y \leq p < z} \frac{1}{p} \Phi(\frac{z}{p}, p) \ \ \ \ \ (25)


as can be seen by isolating the smallest prime {p_1} in all the terms in (23) with {j \geq 1}. (This inequality is very close to being an identity, the only loss coming from the possibility of the prime factor {p} being repeated in a term associated to {\frac{1}{p} \Phi(\frac{z}{p},p)}.) We can iterate this identity to obtain the following conclusion:

Lemma 13 For any {n \geq 1}, we have

\displaystyle \Phi_{3k_0}(z,y) \leq \prod_{j=1}^{n-1} (1 + 3k_0 \log(1+\frac{1}{j})) + o(1)

whenever {z \leq y^n} and {y \geq x^\varpi} for some fixed {\varpi > 0}, with the error term being uniform in the choice of {z,y}.

Proof: Write {A_n := \prod_{j=1}^{n-1} (1 + 3k_0 \log(1+\frac{1}{j}))}. We prove the bound {\Phi_{3k_0}(z,y) \leq A_n + o(1)} by strong induction on {n}. The case {n=1} follows from (24). Now suppose that {n>1} and that the claim has already been proven for smaller {n}. Let {z \leq y^n} and {y > x^\varpi}. Note that {\frac{z}{p} \leq p^j} whenever {p \geq z^{\frac{1}{j+1}}}. We thus have from (25) and the induction hypothesis that

\displaystyle \Phi_{3k_0}(z,y) \leq 1 + 3k_0 \sum_{j=1}^{n-1} \sum_{z^{\frac{1}{j+1}} \leq p < z^{\frac{1}{j}}} \frac{1}{p} (A_j + o(1) );

applying Mertens’ theorem (or the prime number theorem) we have

\displaystyle \sum_{z^{\frac{1}{j+1}} \leq p < z^{\frac{1}{j}}} \frac{1}{p} ( A_j + o(1) ) = A_j \log(1 + \frac{1}{j}) + o(1)

and the claim follows from the telescoping identity

\displaystyle A_n = 1 + 3k_0 \sum_{j=1}^{n-1} A_j \log(1+\frac{1}{j}).


Applying this inequality, we have established (22) with

\displaystyle \alpha := \frac{(2l_0)!}{(l_0!)^2 (k_0+2l_0)!} (1 + \kappa) \ \ \ \ \ (26)



\displaystyle \kappa := \sum_{1 \leq n < 2 + \frac{1}{2\varpi}} (1 - \frac{2n \varpi}{1 + 4\varpi})^{k_0/2 + l_0} \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ). \ \ \ \ \ (27)


We remark that as a first approximation we have

\displaystyle \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ) \approx \frac{(3k_0)^{n}}{n!}


\displaystyle (1 - \frac{2n \varpi}{1 + 4\varpi})^{k_0/2 + l_0} \approx \exp( - n k_0 \varpi )

so in the regime {k_0 \varpi \gg \log k_0}, {\kappa} is roughly {3k_0 \exp( - k_0 \varpi )}, which will be negligible for the parameter ranges of {k_0,\varpi} of interest. Thus the {\alpha} in this argument is quite close to the {\alpha} from (15) in practice.

Now we turn to (10). Fix {h \in {\mathcal H}}. As in the previous section, we can bound the left-hand side of (10) as the sum of the main term

\displaystyle \sum_{d_1,d_2 \in {\mathcal S}_I} \mu(d_1) g(\frac{\log d_1}{\log R}) \mu(d_2) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} \frac{x}{\phi(W[d_1,d_2])}

plus an error term

\displaystyle O( \sum_{d_1,d_2 \in {\mathcal S}_I} g(\frac{\log d_1}{\log R}) g(\frac{\log d_2}{\log R}) (k_0-1)^{\Omega([d_1,d_2])} E'(x; [d_1,d_2]) )

where {E'(x;q)} is the quantity

\displaystyle E'(x;q) := \sum_{a \in ({\bf Z}/q{\bf Z})^\times: P_h(a) = 0 \hbox{ mod } q} |\Delta'_{b,W}(\theta; q,a)|,

{P_h} is the polynomial {P_h(a) := \prod_{h' \in {\mathcal H} \backslash \{h\}} (n+h'-h)}, and {\Delta'_{b,W}} was defined in (19). Using the hypothesis {MPZ[\varpi]} and Cauchy-Schwarz as in the previous section we see that the error term is negligible for the purposes of establishing (10). As for the main term, the same argument used to reduce (9) to (22) shows that (10) reduces to

\displaystyle \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{(k_0-1)^{\Omega([a_1,a_2])}}{\phi([a_1,a_2])} \int_0^\infty g^{(k_0-1)}(x + \frac{\log a_1}{\log R}) g^{(k_0-1)}(x + \frac{\log a_2}{\log R}) \frac{x^{k_0-2}}{(k_0-2)!}\ dx

\displaystyle \geq \beta-o(1).

Here, we can do something a bit crude; with our choice of {g}, the integrand is non-negative, so we can simply discard all but the {a_1=a_2=1} term and reduce to

\displaystyle \int_0^\infty g^{(k_0-1)}(x) g^{(k_0-1)}(x) \frac{x^{k_0-2}}{(k_0-2)!}\ dx \geq \beta

(The intuition here is that by refusing to sieve using primes larger than {x^\varpi}, we have enlarged the sieve {\nu}, which makes the upper bound (9) more difficult but the lower bound (10) actually becomes easier.) So we can take the same choice (16) of {\beta} as in the previous section:

\displaystyle \beta := \frac{(2l_0+2)!}{((l_0+1)!)^2 (k_0+2l_0+1)!}.

Inserting this and (26) into (11) and simplifying, we see that we can obtain {DHL[k_0,2]} once we can verify the inequality

\displaystyle 1+4\varpi > (1 + \frac{1}{2l_0+1}) (1 + \frac{2l_0+1}{k_0}) (1+\kappa).

As before, {l_0} can be taken to be non-integer if desired. Setting {k_0} to be slightly larger than {(\sqrt{1+4\varpi}-1)^{-2} \approx (2\varpi)^{-2}} we obtain Theorem 5.

— 5. Using optimal values of {g} (NEW, June 5, 2013) —

We can do better than given above by using an optimal value of {g}. The following result was obtained recently by Farkas, Pintz, and Revesz, and independently worked out by commenters on this blog:

Theorem 14 (Optimal GPY weight) Let {k_0 > 2} be an integer. Then the ratio

\displaystyle \frac{\int_0^1 f'(x)^2 x^{k_0-1}\ dx}{\int_0^1 f(x)^2 x^{k_0-2}\ dx}

where {f: [0,1] \rightarrow {\bf R}} is a smooth function with {f(1)=0} that is not identically vanishing, has a minimal value of

\displaystyle \lambda := \frac{j_{k_0-2}^2}{4}

where {j_{k_0-2}} is the first zero of the Bessel function {J_{k_0-2}}. Furthermore, this minimum is attained if (and only if) {f} is a scalar multiple of the function

\displaystyle f_0(x) = x^{1-k_0/2} J_{k_0-2}(2\sqrt{x\lambda}).

Proof: The function {J_{k_0-2}}, by definition, obeys the Bessel differential equation

\displaystyle x^2 \frac{d}{dx^2} J_{k_0-2} + x \frac{d}{dx} J_{k_0-2} + (x^2 - (k_0-2)^2) J_{k_0-2} = 0

and also vanishes to order {k_0-2} at the origin. From this and routine computations it is easy to see that {f_0} is smooth, strictly positive on {[0,1)}, and obeys the differential equation

\displaystyle \frac{d}{dx} (x^{k_0-1} \frac{d}{dx} f_0(x)) + \lambda x_0^{k-2} f_0(x) = 0. \ \ \ \ \ (28)


If we write {g_0(x) := \frac{f'_0}{f_0}(x)}, which is well-defined away from {1} since {f_0} is non-vanishing on {[0,1)}, then {g_0} obeys the Ricatti-type equation

\displaystyle (k_0-1) g_0(x) + x g'_0(x) + x g_0(x)^2 + \lambda = 0. \ \ \ \ \ (29)


Now consider the quadratic form

\displaystyle Q( f ) := \int_0^1 f'(x)^2 x^{k_0-1}\ dx - \lambda \int_0^1 f(x)^2 x^{k_0-2}\ dx

for smooth functions {f: [0,1] \rightarrow {\bf R}} with {f(1)=0}. A calculation using (29) and integration by parts shows that

\displaystyle \int_0^1 (f'(x)-g_0(x)f(x))^2 x^{k_0-1}\ dx = Q(f)

and so {Q(f) \geq 0}, giving the first claim; the second claim follows by noting that {f'-g_0 f} vanishes if and only if {f} is a scalar multiple of {f_0}. (Note that the integration by parts is a little subtle, because {f_0} vanishes to first order at {x=1} and so {g_0} blows up to first order; but {g_0 f^2} still vanishes to first order at {x=1}, allowing one to justify the integration by parts by a standard limiting argument.) \Box

If we now test (14) with a function {g: [0,1] \rightarrow {\bf R}} which is smooth, vanishes to order {k_0} at {x=1}, and has a {(k_0-1)^{th}} derivative equal to {f_0}, we see that we can deduce {DHL[k_0,2]} from {EH[\theta]} whenever

\displaystyle 2\theta > \frac{j_{k_0-2}^2}{k_0(k_0-1)}.

Using the known asymptotic

\displaystyle j_n = n + c n^{1/3} + O( n^{-1/3} )

for {c := 1.8557571\ldots} and large {n} (see e.g. Abramowitz and Stegun), this is asymptotically of the form

\displaystyle 2 \theta > 1 + 2c k_0^{-2/3} + O( k_0^{-1} )


\displaystyle k_0 > (2c (2\theta-1))^{-3/2},

thus giving a relationship of the form {k_0 \sim (\theta-1/2)^{-3/2}} that is superior to the previous relationship {k_0 \sim (\theta-1/2)^{-2}}.

A similar argument can be given for Theorem 5, using {g} of the form above rather than a monomial {\frac{(1-x)^{k_0+l_0}}{(k_0+l_0)!}} (and extended by zero to {[1,+\infty)}). For future optimisation we consider a generalisation {MPZ[\varpi,\delta]} of {MPZ[\varpi]} in which the interval {I} is of the form {[w,x^\delta)} rather than {[w,x^\varpi)}, so that {J} is now {[x^\delta,\infty)} rather than {[x^\varpi,\infty)}. As before, the key point is the estimation of {\alpha}. The arguments leading to (22) go through for any test function {g}, so we have to show

\displaystyle \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \int_0^\infty g^{(k_0)}(x + \frac{\log a_1}{\log R}) g^{(k_0)}(x + \frac{\log a_2}{\log R}) \frac{x^{k_0-1}}{(k_0-1)!}\ dx \leq \alpha+o(1).

As {g} has {(k_0-1)^{th}} derivative equal to {f_0}, this is

\displaystyle \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} \int_0^{1-\frac{\log \max(a_1,a_2)}{\log R}} f'_0(x + \frac{\log a_1}{\log R}) f'_0(x + \frac{\log a_2}{\log R}) \ \ \ \ \ (30)


\displaystyle \frac{x^{k_0-1}}{(k_0-1)!}\ dx \leq \alpha+o(1).

We need some sign information on {f'_0}:

Lemma 15 On {[0,1)}, {f_0} is positive, {f'_0} is negative and {f''_0} is positive.

Proof: From (28) we have

\displaystyle x f''_0(x) + (k_0-1) f'_0(x) + \lambda f_0(x) = 0.

From construction we already know that {f_0} is positive on {[0,1)}. The above equation then shows that {f'_0} is negative at {x=0}, and that {f_0} cannot have any local minimum in {(0,1)}, so {f'_0} is negative throughout. To obtain the final claim {f''_0>0} we use an argument provided by Gergely Harcos in the comments: from the recursive relations for Bessel functions we can check that {f''_0} is a positive multiple of {x^{-k_0/2} J_{k_0}(2 \sqrt{x\lambda})}, and the claim then follows from the interlacing properties of the zeroes of Bessel functions. \Box

Write {A := \int_0^1 f_0(x)^2 \frac{x^{k_0-2}}{(k_0-2)!}}, so {A} is positive and by Theorem 14 we have

\displaystyle \int_0^1 f'_0(x)^2 \frac{x^{k_0-1}}{(k_0-1)!} = \frac{j_{k_0-2}^2}{4(k_0-1)} A.

If {a_1 < R}, then as {f'_0} is negative and increasing we have

\displaystyle -f'_0(x + \frac{\log a_1}{\log R}) \leq -f'_0( x / (1-\frac{\log a_1}{\log R}) )

for {0 \leq x \leq 1 - \frac{\log a_1}{\log R}}, and thus by change of variable

\displaystyle \int_0^{1-\frac{\log a_1}{\log R}} f'_0(x + \frac{\log a_1}{\log R})^2 \frac{x^{k_0-1}}{(k_0-1)!} \leq (1-\frac{\log a_1}{\log R})^{k_0} \frac{j_{k_0-2}^2}{4(k_0-1)} A

for {a_1 < R}, and thus

\displaystyle \int_0^{1-\frac{\log a_1}{\log R}} f'_0(x + \frac{\log a_1}{\log R})^2 \frac{x^{k_0-1}}{(k_0-1)!} \leq (1-\frac{\log a_1}{\log R})_+^{k_0} \frac{j_{k_0-2}^2}{4(k_0-1)} A

for all {a_1}. Similarly

\displaystyle \int_0^{1-\frac{\log a_2}{\log R}} f'_0(x + \frac{\log a_2}{\log R})^2 \frac{x^{k_0-1}}{(k_0-1)!} \leq (1-\frac{\log a_2}{\log R})^{k_0}_+ \frac{j_{k_0-2}^2}{4(k_0-1)} A

for all {a_2}. By Cauchy-Schwarz we can thus bound the integral in (30) by

\displaystyle (1-\frac{\log a_1}{\log R})_+^{k_0/2} (1-\frac{\log a_2}{\log R})_+^{k_0/2} \frac{j_{k_0-2}^2}{4(k_0-1)} A

and so (30) reduces to

\displaystyle \frac{j_{k_0-2}^2}{4(k_0-1)} A \sum_{a_1,a_2 \in {\mathcal S}_J} \frac{k_0^{\Omega([a_1,a_2])}}{[a_1,a_2]} (1-\frac{\log a_1}{\log R})_+^{k_0/2} (1-\frac{\log a_2}{\log R})_+^{k_0/2} \leq \alpha + o(1).

Repeating the arguments of the previous section, we can reduce this to

\displaystyle \frac{j_{k_0-2}^2}{4(k_0-1)} A \sum_{a \in {\mathcal S}_J} \frac{(3k_0)^{\Omega(a)}}{a} (1-\frac{\log a}{\log R})_+^{k_0/2} \leq \alpha + o(1)

and by further continuing the arguments of the previous section we end up being able to take

\displaystyle \alpha = \frac{j_{k_0-2}^2}{4(k_0-1)} A (1 + \kappa)


\displaystyle \kappa := \sum_{1 \leq n < \frac{1+4\varpi}{2\delta}} (1 - \frac{2n \delta}{1 + 4\varpi})^{k_0/2} \prod_{j=1}^{n} (1 + 3k_0 \log(1+\frac{1}{j})) ). \ \ \ \ \ (31)


Also, the previous arguments allow us to take

\displaystyle \beta = A.

The key inequality (11) now becomes

\displaystyle 1 + 4\varpi > \frac{j_{k_0-2}^2}{k_0(k_0-1)} (1+\kappa), \ \ \ \ \ (32)


thus {MPZ[\varpi,\delta]} implies {DHL[k_0,2]} whenever (32) is obeyed with the value (31) of {\kappa}.