This is the third thread for the Polymath8b project to obtain new bounds for the quantity

\displaystyle H_m := \liminf_{n \rightarrow\infty} (p_{n+m} - p_n),

either for small values of {m} (in particular {m=1,2}) or asymptotically as {m \rightarrow \infty}. The previous thread may be found here. The currently best known bounds on {H_m} are:

  • (Maynard) Assuming the Elliott-Halberstam conjecture, {H_1 \leq 12}.
  • (Polymath8b, tentative) {H_1 \leq 330}. Assuming Elliott-Halberstam, {H_2 \leq 330}.
  • (Polymath8b, tentative) {H_2 \leq 484{,}126}. Assuming Elliott-Halberstam, {H_4 \leq 493{,}408}.
  • (Polymath8b) {H_m \leq \exp( 3.817 m )} for sufficiently large {m}. Assuming Elliott-Halberstam, {H_m \ll e^{2m} m \log m} for sufficiently large {m}.

Much of the current focus of the Polymath8b project is on the quantity

\displaystyle M_k = M_k({\cal R}_k) := \sup_F \frac{\sum_{m=1}^k J_k^{(m)}(F)}{I_k(F)}

where {F} ranges over square-integrable functions on the simplex

\displaystyle {\cal R}_k := \{ (t_1,\ldots,t_k) \in [0,+\infty)^k: t_1+\ldots+t_k \leq 1 \}

with {I_k, J_k^{(m)}} being the quadratic forms

\displaystyle I_k(F) := \int_{{\cal R}_k} F(t_1,\ldots,t_k)^2\ dt_1 \ldots dt_k


\displaystyle J_k^{(m)}(F) := \int_{{\cal R}_{k-1}} (\int_0^{1-\sum_{i \neq m} t_i} F(t_1,\ldots,t_k)\ dt_m)^2

\displaystyle dt_1 \ldots dt_{m-1} dt_{m+1} \ldots dt_k.

It was shown by Maynard that one has {H_m \leq H(k)} whenever {M_k > 4m}, where {H(k)} is the narrowest diameter of an admissible {k}-tuple. As discussed in the previous post, we have slight improvements to this implication, but they are currently difficult to implement, due to the need to perform high-dimensional integration. The quantity {M_k} does seem however to be close to the theoretical limit of what the Selberg sieve method can achieve for implications of this type (at the Bombieri-Vinogradov level of distribution, at least); it seems of interest to explore more general sieves, although we have not yet made much progress in this direction.

The best asymptotic bounds for {M_k} we have are

\displaystyle \log k - \log\log\log k + O(1) \leq M_k \leq \frac{k}{k-1} \log k \ \ \ \ \ (1)


which we prove below the fold. The upper bound holds for all {k > 1}; the lower bound is only valid for sufficiently large {k}, and gives the upper bound {H_m \ll e^{2m} \log m} on Elliott-Halberstam.

For small {k}, the upper bound is quite competitive, for instance it provides the upper bound in the best values

\displaystyle 1.845 \leq M_4 \leq 1.848


\displaystyle 2.001162 \leq M_5 \leq 2.011797

we have for {M_4} and {M_5}. The situation is a little less clear for medium values of {k}, for instance we have

\displaystyle 3.95608 \leq M_{59} \leq 4.148

and so it is not yet clear whether {M_{59} > 4} (which would imply {H_1 \leq 300}). See this wiki page for some further upper and lower bounds on {M_k}.

The best lower bounds are not obtained through the asymptotic analysis, but rather through quadratic programming (extending the original method of Maynard). This has given significant numerical improvements to our best bounds (in particular lowering the {H_1} bound from {600} to {330}), but we have not yet been able to combine this method with the other potential improvements (enlarging the simplex, using MPZ distributional estimates, and exploiting upper bounds on two-point correlations) due to the computational difficulty involved.

— 1. Upper bound —

We now prove the upper bound in (1). The key estimate is

\displaystyle (\int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)\ dt_1)^2 \ \ \ \ \ (2)


\displaystyle \leq \frac{\log k}{k-1} \int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)^2 (1 - t_1-\ldots-t_k+ kt_1)\ dt_1.

Assuming this estimate, we may integrate in {t_2,\ldots,t_k} to conclude that

\displaystyle J_k^{(1)}(F) \leq \frac{\log k}{k-1} \int F^2 (1-t_1-\ldots-t_k+kt_1)\ dt_1 \ldots dt_k

which symmetrises to

\displaystyle \sum_{m=1}^k J_k^{(m)}(F) \leq k \frac{\log k}{k-1} \int F^2\ dt_1 \ldots dt_k

giving the upper bound in (1).

It remains to prove (2). By Cauchy-Schwarz, it suffices to show that

\displaystyle \int_0^{1-t_2-\ldots-t_k} \frac{dt_1}{1 - t_1-\ldots-t_k+ kt_1} \leq \frac{\log k}{k-1}.

But writing {s = t_2+\ldots+t_k}, the left-hand side evaluates to

\displaystyle \frac{1}{k-1} (\log k(1-s) - \log (1-s) ) = \frac{\log k}{k-1}

as required.

This also suggests that extremal {F(t_1,\ldots,t_k)} behave like {\tilde F(t_2,\ldots,t_k) / (1-t_1-\ldots-t_k + kt_1)} for some function {\tilde F}, and similarly for permutations. However, it is not possible to have exact equality in all variables simultaneously, which indicates that the upper bound (1) is not optimal, although in practice it does remarkably well for small {k}.

— 2. Lower bound —

Using the notation of this previous post, we have the lower bound

\displaystyle M_k \geq \frac{m_1^2}{m_2} {\bf P}( X_1 + \ldots + X_{k-1} \leq k - T )

whenever {g} is supported on {[0,T]}, {m_i := \int_0^T g(t)^i\ dt}, and {X_1,\ldots,X_k} are independent random variables on {[0,T]} with density {\frac{1}{m_2} g(t)^2\ dt}. We select the function

\displaystyle g(t) := \frac{1}{1 + A t} 1_{[0,T]}(t)

with {A := \log k}, and {T := \varepsilon \frac{k}{A}} for some {0 < \varepsilon < 1} to be chosen later. We have

\displaystyle m_1 = \log(1+AT) = \log( 1 + \varepsilon k)


\displaystyle m_2 = \int_0^T \frac{1}{(1+At)^2}\ dt

\displaystyle = \frac{1}{A} - \frac{1}{A (1+AT)}

\displaystyle \leq \frac{1}{A} = \log k

and so

\displaystyle M_k \geq \frac{\log^2(1+\varepsilon k)}{\log k} {\bf P}( X_1 + \ldots + X_{k-1} \leq k - T ).

Observe that the random variables {X_i} have mean

\displaystyle \mu = \frac{1}{m_2} \frac{1}{A^2} (\log(1+AT)-1+\frac{1}{1+AT})

\displaystyle = (A + \frac{1}{\varepsilon k}) (\log(1+\varepsilon k)-1+\frac{1}{1+\varepsilon k})

\displaystyle \leq 1 - \frac{1}{\log k} + O( \frac{\log k}{\varepsilon k} ).

The variance { \sigma^2} may be bounded crudely by

\displaystyle \sigma^2 \leq \frac{1}{m_2} \int_0^T \frac{t^2}{(1+At)^2}\ dt

\displaystyle = O( A \frac{T}{A^2} ) = O( \frac{\varepsilon k}{\log^2 k} ).

Thus the random variable {X_1 + \ldots + X_{k-1}} has mean at most {k - \frac{k}{\log k} + O( \frac{\log k}{\varepsilon} )} and variance {O( \frac{\varepsilon k^2}{\log^2 k} )}, with each variable bounded in magnitude by {T = \frac{\varepsilon k}{\log k}}. By Hoeffding’s inequality, this implies that {X_1 + \ldots + X_{k-1}} is at least { k - T} with probability at most {O( \exp(- c / \varepsilon^2 )} for some absolute constant {c}. If we set { \varepsilon := C / (\log \log k)^{1/2}} for a sufficiently large absolute constant {C}, we thus have

\displaystyle {\bf P}( X_1 + \ldots + X_{k-1} \leq k - T ) = 1 - O( 1 / \log k)

and thus

\displaystyle M_k \geq \log k - \log\log\log k + O(1)

giving the lower bound in (1).

Hoeffding’s bound is proven by the exponential moment method, which more generally gives the bound

\displaystyle M_k \geq \frac{m_1^2}{m_2} (1 - e^{-s(k-T)} (\frac{1}{m_2} \int_0^T g(t)^2 e^{st}\ dt)^{k-1} )

for any {s > 0}. However, this bound is inferior to the linear algebra method for small {k}; for instance, we can only obtain {DHL[k,2]} for {k=339} by this method (compared to {k=64} from quadratic programming), even if one uses the best available MPZ estimate.

Using {MPZ[\varpi,\delta]}, one can modify this lower bound, obtaining {DHL[k_0,m+1]} whenever we can find {\varpi, \delta, A, T, s, s'} obeying

\displaystyle \frac{m_1^2}{m_2} ( 1 - \kappa_1 - \kappa_2 ) > \frac{m}{1/4+\varpi}


\displaystyle \kappa_1 := e^{-s(k-T)} (\frac{1}{m_2}\int_0^T e^{st} g(t)^2\ dt)^{k-1}


\displaystyle \kappa_2 := e^{s' \theta k} (\frac{1}{m_2}\int_0^{\tilde \delta k} e^{-s't} g(t)^2\ dt)^{k-1}


\displaystyle \tilde \delta' := \frac{T}{k}

\displaystyle \delta' = \tilde \delta' (\frac{1}{4}+\varpi)

\displaystyle \tilde \delta := \frac{\delta}{1/4+\varpi}.

\displaystyle \theta := \frac{i(\delta'-\delta)/2 + \varpi}{1/4+\varpi}

\displaystyle = i (\tilde \delta' - \tilde \delta)/2 + \frac{\varpi}{1/4+\varpi}.

Details can be found at this comment.

— 3. Quadratic programming —

The expressions {\sum_{m=1}^k J_k^{(m)}(F)}, {I_k(F)} are quadratic forms in {F}, so one can (in principle, at least) obtain lower bounds for {M_k} by restricting to a finite-dimensional space of {F} and performing quadratic programming on the resulting finite-dimensional quadratic forms.

One can take {F} to be symmetric without loss of generality. One can then look at functions {F} that are linear combinations of symmetric polynomials

\displaystyle m(\alpha) = \sum_{\sigma \in S_k} x_{\sigma(1)}^{\alpha_1} \ldots x_{\sigma(k)}^{\alpha_k}

for various signatures {\alpha_1,\alpha_2,\ldots}; actually it turns out numerically that one can get more efficient results by working with combinations of {m(\alpha)} for {\alpha} up to a fixed degree, and {(1-P_1)^a m(\alpha)} for {\alpha} at that degree afterwords, where {P_1(x_1,\ldots,x_k) = x_1+\ldots+x_k} is the first symmetric polynomial. (Note that the GPY type sieves come from the case when {F} is purely a function of {P_1}.)

It may be that other choices of bases here are more efficient still (perhaps choosing bases that reflect the belief that {F} should behave something like {\prod_{i=1}^k \frac{1}{1+A k^{1/2} t_i}} for some smallish {A}), but one numerical obstacle is that we are only able to accurately compute the required coefficients for {J_k^{(m)}(F), I_k(F)} in the case of polynomials.