Polymath8b, II: Optimising the variational problem and the sieve

22 November, 2013 in math.NT, polymath | Tags: polymath8 | by Terence Tao

This is the second thread for the Polymath8b project to obtain new bounds for the quantity

$\displaystyle H_m := \liminf_{n \rightarrow\infty} (p_{n+m} - p_n),$

either for small values of ${m}$ (in particular ${m=1,2}$ ) or asymptotically as ${m \rightarrow \infty}$ . The previous thread may be found here. The currently best known bounds on ${H_m}$ are:

(Maynard) ${H_1 \leq 600}$ .
(Polymath8b, tentative) ${H_2 \leq 484,276}$ .
(Polymath8b, tentative) ${H_m \leq \exp( 3.817 m )}$ for sufficiently large ${m}$ .
(Maynard) Assuming the Elliott-Halberstam conjecture, ${H_1 \leq 12}$ , ${H_2 \leq 600}$ , and ${H_m \ll m^3 e^{2m}}$ .

Following the strategy of Maynard, the bounds on ${H_m}$ proceed by combining four ingredients:

Distribution estimates ${EH[\theta]}$ or ${MPZ[\varpi,\delta]}$ for the primes (or related objects);
Bounds for the minimal diameter ${H(k)}$ of an admissible ${k}$ -tuple;
Lower bounds for the optimal value ${M_k}$ to a certain variational problem;
Sieve-theoretic arguments to convert the previous three ingredients into a bound on ${H_m}$ .

Accordingly, the most natural routes to improve the bounds on ${H_m}$ are to improve one or more of the above four ingredients.

Ingredient 1 was studied intensively in Polymath8a. The following results are known or conjectured (see the Polymath8a paper for notation and proofs):

(Bombieri-Vinogradov) ${EH[\theta]}$ is true for all ${0 < \theta < 1/2}$ .
(Polymath8a) ${MPZ[\varpi,\delta]}$ is true for ${\frac{600}{7} \varpi + \frac{180}{7}\delta < 1}$ .
(Polymath8a, tentative) ${MPZ[\varpi,\delta]}$ is true for ${\frac{1080}{13} \varpi + \frac{330}{13} \delta < 1}$ .
(Elliott-Halberstam conjecture) ${EH[\theta]}$ is true for all ${0 < \theta < 1}$ .

Ingredient 2 was also studied intensively in Polymath8a, and is more or less a solved problem for the values of ${k}$ of interest (with exact values of ${H(k)}$ for ${k \leq 342}$ , and quite good upper bounds for ${H(k)}$ for ${k < 5000}$ , available at this page). So the main focus currently is on improving Ingredients 3 and 4.

For Ingredient 3, the basic variational problem is to understand the quantity

$\displaystyle M_k({\cal R}_k) := \sup_F \frac{\sum_{m=1}^k J_k^{(m)}(F)}{I_k(F)}$

for ${F: {\cal R}_k \rightarrow {\bf R}}$ bounded measurable functions, not identically zero, on the simplex

$\displaystyle {\cal R}_k := \{ (t_1,\ldots,t_k) \in [0,+\infty)^k: t_1+\ldots+t_k \leq 1 \}$

with ${I_k, J_k^{(m)}}$ being the quadratic forms

$\displaystyle I_k(F) := \int_{{\cal R}_k} F(t_1,\ldots,t_k)^2\ dt_1 \ldots dt_k$

and

$\displaystyle J_k^{(m)}(F) := \int_{{\cal R}_{k-1}} (\int_0^{1-\sum_{i \neq m} t_i} F(t_1,\ldots,t_k)\ dt_i)^2 dt_1 \ldots dt_{m-1} dt_{m+1} \ldots dt_k.$

Equivalently, one has

$\displaystyle M_k({\cal R}_k) := \sup_F \frac{\int_{{\cal R}_k} F {\cal L}_k F}{\int_{{\cal R}_k} F^2}$

where ${{\cal L}_k: L^2({\cal R}_k) \rightarrow L^2({\cal R}_k)}$ is the positive semi-definite bounded self-adjoint operator

$\displaystyle {\cal L}_k F(t_1,\ldots,t_k) = \sum_{m=1}^k \int_0^{1-\sum_{i \neq m} t_i} F(t_1,\ldots,t_{m-1},s,t_{m+1},\ldots,t_k)\ ds,$

so ${M_k}$ is the operator norm of ${{\cal L}}$ . Another interpretation of ${M_k({\cal R}_k)}$ is that the probability that a rook moving randomly in the unit cube ${[0,1]^k}$ stays in simplex ${{\cal R}_k}$ for ${n}$ moves is asymptotically ${(M_k({\cal R}_k)/k + o(1))^n}$ .

We now have a fairly good asymptotic understanding of ${M_k({\cal R}_k)}$ , with the bounds

$\displaystyle \log k - 2 \log\log k -2 \leq M_k({\cal R}_k) \leq \log k + \log\log k + 2$

holding for sufficiently large ${k}$ . There is however still room to tighten the bounds on ${M_k({\cal R}_k)}$ for small ${k}$ ; I’ll summarise some of the ideas discussed so far below the fold.

For Ingredient 4, the basic tool is this:

Theorem 1 (Maynard) If ${EH[\theta]}$ is true and ${M_k({\cal R}_k) > \frac{2m}{\theta}}$ , then ${H_m \leq H(k)}$ .

Thus, for instance, it is known that ${M_{105} > 4}$ and ${H(105)=600}$ , and this together with the Bombieri-Vinogradov inequality gives ${H_1\leq 600}$ . This result is proven in Maynard’s paper and an alternate proof is also given in the previous blog post.

We have a number of ways to relax the hypotheses of this result, which we also summarise below the fold.

— 1. Improved sieving —

A direct modification of the proof of Theorem 1 also shows:

Theorem 2 If ${MPZ[\varpi,\delta]}$ is true and ${M_k({\cal R}_k \cap [0,\frac{\delta}{1/4+\varpi}]^k) > \frac{m}{1/4+\varpi}}$ , then ${H_m \leq H(k)}$ .

Here ${M_k}$ is defined for the truncated simplex ${{\cal R}_k \cap [0,\frac{\delta}{1/4+\varpi}]^k}$ in the obvious fashion. This allows us to use the MPZ-type bounds obtained in Polymath8a, at the cost of requiring the test functions ${F}$ to have somewhat truncated support. Fortunately, in the large ${k}$ setting, the functions we were using had such a truncated support anyway. It looks likely that we can replace the cube ${[0,\frac{\delta}{1/4+\varpi}]^k}$ by significantly larger regions by using the (multiple) dense divisibility versions of ${MPZ}$ , but we have not yet looked into this.

It also appears that if one generalises the Elliott-Halberstam conjecture ${EH[\theta]}$ to also encompass more general Dirichlet convolutions ${\alpha * \beta}$ than the von Mangoldt function ${\Lambda}$ (see e.g. Conjecture 1 of Bombieri-Friedlander-Iwaniec), then one can enlarge the simplex ${{\cal R}_k}$ in Theorem 1 (and probably for Theorem 2 also) to the slightly larger region

$\displaystyle {\cal R}'_k := \{ (t_1,\ldots,t_k) \in [0,+\infty)^k: \sum_{i \neq m} t_i \leq 1 \hbox{ for all } m=1,\ldots,k \}.$

Basically, the reason for this is that the restriction to the simplex ${{\cal R}_k}$ (as opposed to ${{\cal R}'_k}$ ) is only needed to control the sum ${\sum_n \nu(n)}$ , but by splitting ${\nu}$ into products of simpler divisor sums, and using the Elliott-Halberstam hypothesis to control one of the factors, it looks like one can still control error terms in the larger region ${{\cal R}'_k}$ (but this will have to be checked at some point, if we end up using this refinement). This is only likely to give a slight improvement, except when ${k}$ is small; from the inclusions

$\displaystyle {\cal R}_k \subset {\cal R}'_k \subset \frac{k}{k-1} \cdot {\cal R}_k$

and a scaling argument we see that

$\displaystyle M_k({\cal R}_k) \leq M_k({\cal R}'_k) \leq \frac{k}{k-1} M_k( {\cal R}_k ).$

Assume EH. To improve the bound ${H_1 \leq 12}$ to ${H_1 \leq 10}$ , it suffices to obtain a bound of the form

$\displaystyle P_0 + P_2 + P_6 + P_8 + P_{12} > 1 + P_{0,12}$

where

$\displaystyle P_h = \sum_n 1_{n+h \hbox{ prime}} \nu(n) / \sum_n \nu(n)$

and

$\displaystyle P_{0,12} = \sum_n 1_{n,n+12 \hbox{ prime}} \nu(n) / \sum_n \nu(n).$

With ${\nu}$ given in terms of a cutoff function ${F}$ , the left-hand side ${P_0 + P_2 + P_6 + P_8 + P_{12}}$ can be computed as usual as

$\displaystyle P_0 + P_2 + P_6 + P_8 + P_{12} = \frac{1}{2} \sum_{m=0}^5 J_5^{(m)}(F) / I_5(F) + o(1)$

while we have the upper bound

$\displaystyle P_{0,12} \leq \frac{1}{2} \int \frac{(\int_{t_1+t_5 \leq 1-t_2-t_3-t_4} F(t_1,t_2,t_3,t_4,t_5)\ dt_1 dt_5)^2}{1-t_2-t_3-t_4}\ dt_2 t_3 t_4 / I_5(F)$

$\displaystyle + o(1)$

and other bounds may be possible. (This is discussed in this comment.)

For higher ${k}$ , it appears that similar maneuvers will have a relatively modest impact, perhaps shaving ${\sqrt{k}}$ or so off of the current values of ${k}$ .

— 2. Upper bound on ${M_k}$ —

We have the upper bound

$\displaystyle M_k \leq (1 + \frac{1}{A}) \log(1+Ak)$

for any ${A>0}$ . To see this, observe from Cauchy-Schwarz that

$\displaystyle (\int_0^{1-\sum_{i \neq m} t_i} F(t_1,\ldots,t_k)\ dt_i)^2 \leq$

$\displaystyle (\int_0^{1-\sum_{i \neq m} t_i} (1+Akt_m) F(t_1,\ldots,t_k)^2\ dt_i)$

$\displaystyle \times (\int_0^1 \frac{1}{1+Akt_m}\ dt_m).$

The final factor is ${\frac{1}{Ak} \log (1+Ak)}$ , and so

$\displaystyle J_k^{(m)}(F) \leq \frac{1}{Ak} \log (1+Ak) \int_{{\cal R}_k} (1+Akt_m) F(t_1,\ldots,t_k)^2\ dt_1 \ldots dt_k.$

Summing in ${m}$ and noting that ${t_1+\ldots+t_k \leq 1}$ on the simplex we have

$\displaystyle J_k^{(m)}(F) \leq \frac{1}{Ak} \log (1+Ak) \int_{{\cal R}_k} (k+Ak) F(t_1,\ldots,t_k)^2\ dt_1 \ldots dt_k,$

and the claim follows.

Setting ${A = \log k}$ , we conclude that

$\displaystyle M_k \leq \log k + \log\log k + 2$

for sufficiently large ${k}$ . There may be some room to improve these bounds a bit further.

— 3. Lower bounds on ${M_k}$ —

For small ${k}$ , one can optimise the quadratic form

$\displaystyle \frac{\int F {\cal L} F}{\int F^2}$

by specialising ${F}$ to a finite-dimensional space and then performing the appropriate linear algebra. It is known that we may restrict without loss of generality to symmetric ${F}$ ; one could in principle also restrict to the functions of the form

$\displaystyle F(t_1,\ldots,t_k) = \sum_{m=1}^k G( t_1,\ldots,t_{m-1},t_{m+1},\ldots,t_k)$

for some symmetric function ${G: {\cal R}_{k-1} \rightarrow {\bf R}}$ (indeed, morally at least ${F}$ should be an eigenfunction of ${{\cal L}}$ ), although we have not been able to take much advantage of this yet.

For large ${k}$ , we can use the bounds

$\displaystyle M_k({\cal R}_k)^m \geq \int_{{\cal R}_k} F {\cal L}^m F$

for any ${m \geq 1}$ and any ${F}$ with ${\int_{{\cal R}_k} F^2 \leq 1}$ ; we can also start with a given ${F}$ and improve it by replacing it with ${{\cal L} F}$ (normalising in ${L^2}$ if desired), and perhaps even iterating and accelerating this process.

The basic functions ${F}$ we have been using take the form

$\displaystyle F(t_1,\ldots,t_k) := 1_{t_1+\ldots+t_k \leq 1} \prod_{i=1}^k \frac{k^{1/2}}{m_2^{1/2}} g(k t_i)$

where

$\displaystyle g(t) := 1_{[0,T]} \frac{1}{1+AT}$

and

$\displaystyle m_1 := \int_0^T g(t)\ dt = \frac{1}{A} \log(1+AT)$

$\displaystyle m_2 := \int_0^T g(t)^2\ dt = \frac{1}{A} (1 - \frac{1}{1+AT}) = \frac{T}{1+AT}.$

Then ${\int_{{\cal R}_k} F^2 \leq 1}$ , and

$\displaystyle M_k \geq \int F {\cal L} F \geq \frac{m_1^2}{m_2} {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T )$

where ${X_1,\ldots,X_k}$ are iid random variables on ${[0,T]}$ with density ${\frac{1}{m_2} g(t)^2\ dt}$ . By Chebyshev’s inequality we then have

$\displaystyle M_k \geq \frac{m_1^2}{m_2} ( 1 - \frac{(k-1)\sigma^2}{(k-T-(k-1)\mu)^2} )$

if ${k-T-(k-1)\mu>0}$ , where

$\displaystyle \mu := \frac{1}{m_2} \int_0^T t g(t)^2\ dt$

$\displaystyle = \frac{1}{m_2} \frac{1}{A^2}( \log(1+AT) - 1 + \frac{1}{1+AT} )$

and

$\displaystyle \sigma^2 := \frac{1}{m_2} \int_0^T t^2 g(t)^2\ dt - \mu^2$

$\displaystyle = \frac{1}{m_2} (\frac{T}{A^2} + \frac{T}{A^2 (1+AT)} - \frac{2\log(1+AT)}{A^3}) - \mu^2.$

A lengthier computation for ${\int F {\cal L}^2 F}$ gives

$\displaystyle M_k^2 \geq (1-\frac{1}{k}) \frac{m_1^4}{m_2^2} (1 - \frac{(k-2)\sigma^2}{(k-2T-(k-2)\mu)^2})$

$\displaystyle + \frac{1}{k} \frac{m_1^2}{m_2} (k-T-(k-1)\mu)$

assuming ${k-2T-(k-2)\mu > 0}$ .

129 comments

Comments feed for this article

22 November, 2013 at 7:51 pm

Terence Tao

It might be time to start collecting upper and lower bounds for $M_k$ for small k (say, up to k=105) to get a sense of how much room there is for improvement here, and what one can hope to achieve from the MPZ estimates (or from using fancier symmetric functions). The fact that $M_k$ is now known to not grow any faster than $\log k$ is a little disappointing for the large $m$ theory, but perhaps there is yet another way to modify the sieve to break this barrier (much as the GPY sieve, for which M_k never exceeded 4, was modified into the current sieve introduced by James). The only “hard” obstruction I know of is the parity obstruction, which basically stems from the belief that $\sum_n \nu(n) \mu(n+h)$ is negligible for just about any sieve under consideration, which means in probabilistic terms that

${\bf P}( n+h \hbox{ has an odd number of prime factors} ) = 1/2+o(1)$

and in particular

${\bf P}( n+h \hbox{ is prime} ) \leq 1/2+o(1).$

This translates to the trivial bound $M_k \leq k$ in the current context, which is well short of the truth; it seems there is a substantial gap between “having an odd number of prime factors” and “is prime” when is working with k-tuple sieves.

22 November, 2013 at 9:24 pm

Pace Nielsen

I thought I’d play around with your heuristics for $P_{0,12}$ .

If we plug in Maynard’s function

$F=(1-P_1)P_2$ $+ \frac{7}{10}(1-P_1)^2$ $+\frac{1}{14}P_2 - \frac{3}{14}(1-P_1)$

(where $P_1,P_2$ are the first two power-symmetric polynomials) into the upper-bound for $P_{0,12}$ (and dropping little-o terms) we get that $1+P_{0,12}\leq 1.11662$ . We needed to beat $1.00058$ to force $k$ down one.

I imagine that there are better choices for $F$ in this context, and I’ll start thinking about that.

22 November, 2013 at 10:11 pm

Terence Tao

One can presumably get some gain by enlarging the support of F from the simplex to the larger (but more complicated) region $\{ t_1+\ldots+t_5 \leq 1 + \min(t_1,\ldots,t_5)\}$ . (The cutoff $t_1+t_5 \leq 1-t_2-t_3-t_4$ in the formula for $P_{0,12}$ would be similarly modified.) One would have to do a certain amount of fiddly numerics though to integrate on this more messy polytope.

Also, we’ve broken the symmetry a bit, and it’s no longer necessarily optimal to use F that are symmetric in all five variables; instead, it should just be symmetric in $t_1,t_5$ and in $t_2,t_3,t_4$ . But perhaps we get a bit more flexibility with this broken symmetry.

23 November, 2013 at 8:07 am

Terence Tao

In a previous comment, James indicated that if we simply inserted our best confirmed value $\varpi = 7/600$ into the low k machinery, ignoring delta, we could potentially reduce k from 105 to 68 (and presumably we do a little better than this if we use the tentative value $\varpi = 13/1080$ ). The main thing blocking us from making this a rigorous argument is that in order to use $MPZ[\varpi,\delta]$ , we have to use test functions F that are not only restricted to the simplex, but also to the cube $[0, \frac{\delta}{1/4+\varpi}]^k$ . This is not a serious problem for the large k argument (because the test functions were already by design supported on the cube $[0,T/k]^k$ ) but is so for the low k argument (because we are instead using polynomials of the symmetric polynomials P_1 and P_2.

However, it could be, as was the experience in Zhang and Polymath8a, that the contributions outside of this cube are exponentially small and thus hopefully negligible (although we have to be careful because our values of k are now small enough that “exponentially small” does not _automatically_ imply negligibility any longer). Basically we could follow arguments similar to that in Section 4 of the Polymath8a paper, writing the truncated F as $F_1 - F_2$ where $F_1$ is untruncated and $F_2$ is the portion outside of the cube, and use the inequality

$\int F {\cal L} F \geq \int F_1 {\cal L} F_1 - 2 \int F {\cal L} F_2$

and upper bound the error term $\int F {\cal L} F_2$ by controlling the contribution when a given coordinate $t_i$ exceeds $\frac{\delta}{1/4+\varpi}$ . Here the point is that this is a small portion of the simplex; by volume, it is something like $k (1 - \frac{\delta}{1/4+\varpi} )^k$ .

At present, $\delta,k$ are so small that this is actually not so negligible, but we can then pull out the other Polymath8a trick, which is to exploit dense divisibility and introduce another parameter $\delta'$ . This allows us to enlarge the cube $[0, \frac{\delta}{1/4+\varpi}]^k$ to something like the portion of $[0, \frac{\delta'}{1/4+\varpi}]^k$ in which $\sum_{t_j > \frac{\delta}{1/4+\varpi}} t_j > i \delta' - \delta$ (this is basically Lemma 4.12(iv) from Polymath8a), which should lead to better error terms as per Polymath8a.

23 November, 2013 at 12:50 pm

Terence Tao

A little bit more elaboration of the above idea, using some of the Cauchy-Schwarz arguments from the upper bound theory.

Let $F_1$ be some symmetric test function on the simplex, and let $F$ be the restriction of $F_1$ to the cube $[0,\tilde \delta]^k$ where $\tilde \delta = \frac{\delta}{1/4+\varpi}$ . Then $F = F_1 - F_2$ where $F_2$ is the portion of $F_1$ outside of the cube. We then have

$\int F {\cal L} F = \int F_1 {\cal L} F_1 - 2 \int F {\cal L} F_2 - \int F_2 {\cal L} F_2.$

The last term we can bound by $M_2 \int F_2^2$ . Now we try to upper bound $\int F {\cal L} F_2$ . By symmetry this is

$k \int_{{\cal R}_{k-1}} (\int F(t_1,t_2,\ldots,t_k)\ dt_1) (\int F_2(t'_1,t_2,\ldots,t_k)\ dt'_1)\ dt_2 \ldots dt_k$ . (1)

The integrand is only non-zero when $t'_1 \geq \tilde \delta \geq t_1$ and $t_2,\ldots,t_k \leq \tilde \delta$ . By Cauchy-Schwarz we have

$\displaystyle \int F(t_1,t_2,\ldots,t_k)\ dt_1 \leq \frac{\log^{1/2}(1+kA\tilde \delta)}{(kA)^{1/2}}$

$\displaystyle ( \int F(t_1,\ldots,t_k)^2 (1+kAt_1)\ dt_1 )^{1/2}$

and

$\displaystyle \int F_2(t_1,t_2,\ldots,t_k)\ dt_1 \leq (1-\tilde \delta)^{1/2} ( \int F_2(t_1,\ldots,t_k)^2\ dt_1 )^{1/2}$

and so by the AM-GM inequality we can bound (1) by

$\displaystyle \frac{1}{2} \frac{k^{1/2} \log^{1/2}(1+kA \tilde \delta)}{A^{1/2}} (1-\tilde \delta)^{1/2} (k^{-1/2} \varepsilon \int F^2 (1+kAt_1)\ dt_1 \ldots dt_k$
$\displaystyle + k^{1/2} \varepsilon^{-1} \int_{t_2,\ldots,t_k \leq \tilde \delta} F_2^2\ dt_1 \ldots dt_k )$

for any $\varepsilon$ , which symmetrises to

$\displaystyle \frac{1}{2} \frac{\log^{1/2}(1+kA \tilde \delta)}{A^{1/2}} (1-\tilde \delta)^{1/2} (\varepsilon \int F^2 (1+A)\ dt_1 \ldots dt_k$
$\displaystyle + \varepsilon^{-1} \int F_2^2\ dt_1 \ldots dt_k )$

(here we use the disjoint supports of the various $F_2$ integrals) so on optimising in $\varepsilon$ we obtain a bound of

$\displaystyle \log^{1/2}(1+kA \tilde \delta) (1+\frac{1}{A})^{1/2} (1-\tilde \delta)^{1/2} (\int F^2)^{1/2} (\int F_2^2)^{1/2}$

so one just need a reasonably good bound on $\int F_2^2$ (compared with $\int F^2$ ) to be able to neglect these errors. As I said before, the ratio of $\int F_2^2$ to $\int F^2$ is likely to be something like $k (1 - \tilde \delta)^k$ ; this doesn’t look too good for the values of k we are considering (i.e. below 105), but maybe if we split into two deltas as before, then it won’t be so bad.

23 November, 2013 at 8:32 am

Eytan Paldi

In the above definition of $J_k^{(m)}$ , the subscript $i$ in
$d t_i, d t_{i-1}, d t _{i+1}$ should be replaced by $m$ .
(with similar correction in the proof of the upper bound on $M_k$ .)

[Corrected, thanks – T.]

23 November, 2013 at 10:29 am

Eytan Paldi

This typo still appears in several places

1. In the definition of $j_k^{(m)}$ , $d t_i$ should be $d t_m$ and the last $d t$ in the integral should be $d t_k$ .

2. In the proof of the upper bound on $M_k$ , in the first two integrals
$d t_i$ should be $d t_m$ .

[Corrected, thanks – T.]

23 November, 2013 at 9:30 am

Eytan Paldi

In the proof of the upper bound on $M_k$ , it should be
“noting that $t_1 + ... + t_k \leq 1$ “.

[Corrected, thanks – T.]

23 November, 2013 at 9:58 am

Terence Tao

Here is an idea that might not pan out, but I’m throwing it out there anyways.

Take k=3 for brevity. Right now, we need to control sums such as $\sum_n \Lambda(n) \nu(n)$ , where $\nu(n)$ is a divisor sum basically of the form

$\sum_{d_1|n, d_2|n+2, d_3|n+6} \lambda_{d_1,d_2,d_3}.$

We can then set $d_1=1$ and are looking at a sum of the form

$\sum_{d_2,d_3} \lambda_{1,d_2,d_3} \sum_{n: n=-2\ (d_2), n = -6\ (d_3)} \Lambda(n).$

We can compare the inner sum against its expected value $E_{d_2,d_3} = \frac{x}{\phi([d_2,d_3])}$ , and are left with dealing with an error term

$\sum_{d_2,d_3} \lambda_{1,d_2,d_3} ( \sum_{n: n=-2\ (d_2), n = -6\ (d_3)} \Lambda(n) - E_{d_2,d_3} ).$

Now at this point, what Zhang and Polymath8a do is to take absolute values and start looking at something roughly of the shape

$\sum_d | \sum_{n=a\ (d)} \Lambda(n) - E_d|$

where $d$ came from $[d_2,d_3]$ and $a\ (d)$ is something formed from the Chinese remainder theorem (and thus could potentially be quite large). But this throws away two things: firstly, that the residue class $a\ (d)$ came from two residue classes which were bounded, and secondly, that the coefficients $\lambda_{1,d_2,d_3}$ were not completely arbitrary, but instead had a somewhat multiplicative structure (basically of the form $\mu( d_2 d_3 )$ ). It may be that if we delve into the Type I/II estimates, we can exploit this extra structure to go beyond the exponents that we currently do. In particular, the coefficients $c_{q,r}$ that show up in Polymath8a roughly seem to have the shape $\mu(q) \mu(r)$ (times smooth factors), and so $c_{q,r} c_{q',r}$ has some smoothness in r which looks helpful (but somewhat difficult to exploit).

23 November, 2013 at 11:38 am

Aubrey de Grey

Am I right in assuming that the recent “Deligne-free” values from Pace that are shown on the “world records” page are derived assuming only B-V, following James’s paper? If so, perhaps it would be of interest to do as in Polymath8a and have a look at what k0 values the Deligne-free [varpi,delta] derived on July 30th gives rise to in the post-Maynard universe?

25 November, 2013 at 8:19 am

Terence Tao

This is something we can certainly do without much difficulty once our results have stabilised, but at present the four benchmarks we have (m=1 from BV, m=1 using Polymath8a, m=2 from BV, m=2 using Polymath8a) are covering enough of the “parameter space” that any further benchmarking would probably just be confusing (although perhaps a m=3 case, say using the best Polymath8a result, might be worth looking at).

23 November, 2013 at 2:40 pm

Terence Tao’s ”What’s New”- The Polymath. | Edgar's Creative

[…] Polymath8b, II: Optimising the variational problem and the sieve. […]

23 November, 2013 at 6:44 pm

Eytan Paldi

The upper and lower bounds on $M_k$ may be improved (in particular for small $k$ ) by generalizing the basic function $1+A t$ (e.g. by $1+ A t + B t^2$ ) and optimizing over the additional parameters (e.g. $B$ ) as well.

24 November, 2013 at 5:05 am

Eytan Paldi

I don’t understand (8.9) in Maynard’s paper (the integral is divergent.)

24 November, 2013 at 5:39 am

Pace Nielsen

There is a minor typo. He just dropped $\prod_{i=2}^{k} g(u_i)^2$ from the integral.

24 November, 2013 at 5:50 am

Pace Nielsen

To see the equality in (8.9) after that factor is reinserted, just integrate each summand separately, and count.

By the way, I had your very question earlier. There seems to be a second typo in the line above (8.13). It says that $\int_{0}^{T} u g(u)^2 du=\mu$ , but I believe the RHS should be $\mu\gamma$ .

24 November, 2013 at 6:02 am

Eytan Paldi

Thanks for the explanation!

25 November, 2013 at 2:53 pm

James Maynard

Just to confirm: Everything that Pace says is correct. Both of these are typos in the paper – sorry for the confusion this caused.

If anyone happens to spot similar typos I’d quite like to hear – either emailing me or mentioning on here would be appreciated!

24 November, 2013 at 6:49 pm

Terence Tao

I can now modify the Cauchy-Schwarz argument to prove the clean upper bound

$\displaystyle M_k \leq \frac{k}{k-1} \log k$ (1)

which saves a $\log\log k$ or so from the previous bound.

The key estimate is

$\displaystyle (\int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)\ dt_1)^2$

$\displaystyle \leq \frac{\log k}{k-1} \int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)^2 (1 - t_1-\ldots-t_k+ kt_1)\ dt_1.$ (2)

Assuming this estimate, we may integrate in $t_2,\ldots,t_k$ to conclude that

$\displaystyle J_k^{(1)}(F) \leq \frac{\log k}{k-1} \int F^2 (1-t_1-\ldots-t_k+kt_1)\ dt_1 \ldots dt_k$

which symmetrises to

$\displaystyle \sum_{m=1}^k J_k^{(m)}(F) \leq k \frac{\log k}{k-1} \int F^2\ dt_1 \ldots dt_k$

giving (1).

It remains to prove (2). By Cauchy-Schwarz, it suffices to show that

$\displaystyle \int_0^{1-t_2-\ldots-t_k} \frac{dt_1}{1 - t_1-\ldots-t_k+ kt_1} \leq \frac{\log k}{k-1}.$

But writing $s = t_2+\ldots+t_k$ , the left-hand side evaluates to

$\frac{1}{k-1} (\log k(1-s) - \log (1-s) ) = \frac{\log k}{k-1}$

as required.

This also suggests that extremal F behave like multiples of $1 / (1-t_1-\ldots-t_k + kt_1)$ in the $t_1$ variable for typical fixed choices of $t_2,\ldots,t_k$ .

In the converse direction, I have an asymptotic bound

$\displaystyle M_k \geq \log k - \log\log\log k + O(1)$

for sufficiently large k. Using the notation of the previous post, we have the lower bound

$\displaystyle M_k \geq \frac{m_1^2}{m_2} {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T )$

whenever $g$ is supported on $[0,T]$ , $m_i = \int_0^T g(t)^i\ dt$ , and $X_1,\ldots,X_k$ are independent random variables on [0,T] with density $\frac{1}{m_2} g(t)^2\ dt$ . We select the function

$g(t) = \frac{1}{1 + A t}$

with $A := \log k$ , and $T := \varepsilon \frac{k}{A}$ for some $0 < \varepsilon < 1$ to be chosen later. We have

$m_1 = \frac{1}{A} \log(1+AT) = \frac{1}{\log k} \log( 1 + \varepsilon k)$

and

$m_2 = \int_0^T \frac{1}{(1+At)^2}\ dt$
$= \frac{1}{A} - \frac{1}{A (1+AT)}$
$\leq \frac{1}{A} = \frac{1}{\log k}$

and so

$\displaystyle M_k \geq \frac{\log^2(1+\varepsilon k)}{\log k} {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T ).$

Observe that the random variables $X_i$ have mean

$\mu = \frac{1}{m_2} \frac{1}{A^2} (\log(1+AT)-1+\frac{1}{1+AT})$

$= (A + \frac{1}{\varepsilon k}) (\log(1+\varepsilon k)-1+\frac{1}{1+\varepsilon k})$

$\leq 1 - \frac{1}{\log k} + O( \frac{\log k}{\varepsilon k} ).$

The variance $\sigma^2$ may be bounded crudely by

$\sigma^2 \leq \frac{1}{m_2} \int_0^T \frac{t^2}{(1+At)^2}\ dt$
$= O( A \frac{T}{A^2} ) = O( \frac{\varepsilon k}{\log^2 k} ).$

Thus the random variable $X_1 + \ldots + X_{k-1}$ has mean at most $k - \frac{k}{\log k} + O( \frac{\log k}{\varepsilon} )$ and variance $O( \frac{\varepsilon k^2}{\log^2 k} )$ , with each variable bounded in magnitude by $T = \frac{\varepsilon k}{\log k}$ . By Hoefding’s inequality, this implies that $X_1 + \ldots + X_{k-1}$ is at most $k - T$ with probability at most $O( \exp(- c / \varepsilon^2 )$ for some absolute constant $c$ . If we set $\varepsilon = C / (\log \log k)^{1/2}$ for a sufficiently large absolute constant $C$ , we thus have

$\displaystyle {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T ) = 1 - O( 1 / \log k)$

and thus

$\displaystyle M_k \geq \log k - \log\log\log k + O(1)$

as claimed.

Hoeffding’s bound is proven by the exponential moment method, which more generally gives the bound

$\displaystyle M_k \geq \frac{m_1^2}{m_2} (1 - e^{-s(k-T)} (\frac{1}{m_2} \int_0^T g(t)^2 e^{st}\ dt)^{k-1} )$

for any $s > 0$ , which is a somewhat complicated, but feasible-looking, optimisation problem in A, T, s. (Roughly speaking, one expects to take A close to $\log k$ , T a bit smaller than $k/log k$ , and $s$ roughly of the order of $\log k / k$ .)

25 November, 2013 at 5:14 am

Eytan Paldi

As an interesting application of this upper bound, we have

1. $M_4 \leq 1.848... < 2$ , so under EH the current method is limited to
$k_0 = 5$ .

2. $M_{50} \le 3.9918... < 4$ , so Under EH[1/2] the current method is limited to $k_0 \geq 51$ .

25 November, 2013 at 8:49 am

Aubrey de Grey

Or 42 under the polymath8a results, yes? (I believe the confirmed and tentative versions both give this value.) Or am I overinterpreting earlier posts concerning the derivability of M from varpi?

27 November, 2013 at 2:09 pm

Eytan Paldi

Unfortunately, the particular $F$ above (used for the derivation of the upper bound) is not in $L^2(\mathcal R_k)$ (since the integration of $F^2$ with respect to $t_1$ gives $1/(k(1- t_2 - ... - t_k))$ – which is not integrable over $\mathcal R_{k-1}$ ) – so this $F$ can’t be used as a basis function for lower bounds evaluation.

It should be remarked that $\mathcal L F \in L^2(\mathcal R_k)$ – since it has only logarithmic singularity.

7 December, 2013 at 9:47 am

Eytan Paldi

For $k=5$ the upper bound is $M_5 \le 2.011797...$ , while from (7.19) in Maynard’s paper $M_5 \ge 2.001162...$ – showing that the upper bound is surprisingly good even for $k=5$ .

14 December, 2014 at 7:28 pm

Eytan Paldi

It seems that there is an error in the above derivation of the lower bound since $m_2 \leq 1/A = 1/\log k$ (not $\log k$ as in the above derivation!)

[Corrected, thanks – there was a similar factor missing in the $m_1$ bound which cancels out the error. -T.]

1 February, 2015 at 7:44 pm

Eytan Paldi

There is another error in the lower bound derivation:
It seems that Hoeffding’s inequality (which is independent of the variance) gives $O(\exp(-c/(k \varepsilon^2)))$ for the probability estimate – which is clearly too crude!
The problem in using this inequality is that the denominator in the exponent is $(k-1) T^2$ while the (much smaller!) numerator is only $O((k/\log k)^2)$ .

Fortunately, by using the above variance estimate, it follows from Bennett’s inequality (or even from Bernstein’s inequality) the probability estimate $O(\exp(-c/ \varepsilon))$ which implies the (slightly weaker) lower bound

$M_k \geq \log k - 2 \log \log \log k + O(1)$

30 December, 2014 at 3:25 pm

Eytan Paldi

For the last lower bound on $M_k$ , it is easy to verify that the lower bound is maximized by (temporarily) fixing $T, s > 0$ and maximizing the integral $m_1$ over $g \in L^2(0, T)$ under the two (integral) constraints that $m_2$ and $\lambda := \int_0^T g(t)^2 e^{s t}dt$ are fixed. It is easy to show that the optimal function $g(t)$ has the form $g(t) = 1/(e^{s t}+C)$ for some $C > -1$ .
Fortunately, the resulting integrals representing $m_1, m_2, \lambda$ (needed for the lower bound) are simple elementary functions of $T, s, C$ – enabling simpler optimization of the lower bound over the parameters $T, s, C$ .
For large $k$ , it seems that the paremeters $T, s, C$ can be reparametrized by
$s = \alpha \log k \log \log k /k$
$T = \beta k / \log k$
$C = -1 + \gamma \log \log k /k$

Where $1 \ll \alpha, \beta, \gamma \ll 1$ are the new parameters.
(Unfortunately, it seems that asymptotically we still get the above lower bound implied by Hoeffding inequality.)

24 November, 2013 at 9:56 pm

xfxie

Seems k0 can drop some, assuming the implementation is correct.

With $\frac{1080}{13}\varpi + \frac{330}{13}\delta <1$ : $k$ =339

Without using the $\varpi, \delta$ inequalities: $k$ =385

25 November, 2013 at 10:07 am

Terence Tao

Two small thoughts:

1. The Cauchy-Schwarz argument giving upper bounds on $M_k$ might more generally be able to give upper bounds for the quantity

$\sum_{m=1}^k \sum_n \nu(n) 1_{n+h_i \hbox{ prime}}/\sum_n \nu(n)$

for any Selberg sieve

$\nu(n) = (\sum_{d_1|n_h+1,\ldots,d_k|n+h_k} \lambda_{d_1,\ldots,d_k})^2$

regardless of the choice of coefficients $\lambda_{d_1,\ldots,d_k}$ , assuming that the error terms are manageable (which presumably means something like $d_1 \ldots d_k \leq x^{\theta/2}$ where $\theta$ is the level of distribution) and also some mild size conditions on the coefficients (e.g. all of size $x^{o(1)}$ ). Upper bounds are of course less directly satisfying than lower bounds (particularly for the purposes of advancing the “world records” table) but would help give a picture of what more one can expect to squeeze out of these methods. (And we also have some possible ways to evade these upper bounds, e.g. by utilising the two-point correlations $\sum_n \nu(n) 1_{n+h_i, n+h_j \hbox{ prime}}$ as discussed previously; one could also hope to use other sieves than the Selberg sieve, though I don’t see at present what alternatives would be viable.)

2. As mentioned in Maynard’s paper, the method here would also allow one to produce small gaps in other sets than the primes, as long as one had a non-trivial level of distribution $\theta$ . For instance, if one replaced primes by semiprimes (products of exactly two primes) Goldston-Graham-Pintz-Yildirim obtained the bounds $H_1 \leq 6$ and $H_m \leq m e^{m-\gamma} (1+o(1))$ unconditionally. These bounds are already quite good (semiprimes are significantly denser than primes, which is why the results here are better than those for primes) but perhaps there is further room for improvement.

25 November, 2013 at 10:48 am

Terence Tao

Just did a quick calculation: using Maynard’s sieve (with T equal to, say, $k^{1/2}$ ) the quantity $\sum_n \nu(n) 1_{n+h_i \hbox{ semiprime}} / \sum_n \nu(n)$ looks to be roughly on the order of $\log^2 k /k$ (compared to $\log k/k$ in the case of primes), by counting products of two primes each of size at least $x^{T/k}$ . (For semiprimes with smaller factors than this, the sieve weight $\nu(n)$ becomes smaller.) So this would give something like $H_k \ll \exp( O( \sqrt{k} ) )$ for semiprimes, and more generally $H_k \ll \exp( O( k^{1/r} ) )$ for products of exactly r primes (with implied constants depending on r).

Actually, in many ways, the numerology for our current problem with primes is very similar to the GGPY numerology for semiprimes.

25 November, 2013 at 8:06 pm

Terence Tao

Looks like the Cauchy-Schwarz argument does indeed block the Selberg sieve from doing much better than $\log k$ primes in a k-tuple, assuming that the weights are fairly uniform in magnitude, which among other things, allows one to discard the small fraction of weights in which the heuristic $g(r) \approx \varphi(r) \approx r$ mentioned in Section 6 of Maynard’s paper fails. The analogue of the ratio $\sum_m J_k^{(m)}(F) / I_k(F)$ is then (in the notation of the paper)

$\displaystyle \frac{W}{\phi(W) \log N} ( \sum_{m=1}^k \sum_{r_1,\ldots,r_k: r_m=1} \frac{1}{r_1 \ldots r_k} (\sum_{a_m} \frac{y_{r_1,\ldots,r_{m-1},a_m,r_{m+1},\ldots,r_k}}{a_m} )^2 )$
$\displaystyle / \sum_{r_1,\ldots,r_k} y_{r_1,\ldots,r_k}^2 / r_1 \ldots r_k.$

Cauchy-Schwarz then gives

$\displaystyle (\sum_{a_m} \frac{y_{r_1,\ldots,r_{m-1},a_m,r_{m+1},\ldots,r_k}}{a_m} )^2$
$\displaystyle \ll \frac{\phi(W) \log N}{W} \frac{\log(1+kA)}{kA} \sum_{a_m} \frac{y_{r_1,\ldots,r_{m-1},a_m,r_{m+1},\ldots,r_k}^2}{a_m} (1 + k A \frac{\log a_m}{\log N} )$

for any $A>0$ (using the fact that $a_m$ is constrained to be coprime to $W$ ); using this and summing in $m$ (using the constraint $r_1 \ldots r_{m-1} a_m r_{m+1} \ldots r_k \ll R$ ) shows that the ratio is of size $\ll \frac{A+1}{A} \log(1+kA)$ , which for bounded choices of A gives a bound of $O(\log k)$ . If one works this argument more carefully we get the same upper bounds that we do for the integral variational problem $M_k$ .

There is a potential loophole in using sieve weights that focus on those moduli r with a unusually large number of prime factors (so that $g(r), \varphi(r), r$ are no longer close to each other), but this is a very sparse set of moduli, to the point where the error terms in the sieve analysis now overwhelm the main terms. Also, intuitively one should not be able to get a good sieve by using only a sparse set of the available moduli.

To me, this suggests that the main remaining hope for breaking the logarithmic barrier is to work with a wider class of sieves than the Selberg sieve. In principle there are a lot of other possible sieves to use (e.g. beta sieve or other combinatorial sieves), but numerically these sieves tend to underperform the Selberg sieve for these sorts of problems (enveloping the primes in a sieve weighted on almost primes in order to make the density of the primes as large as possible). It’s tempting to try to perturb the Selberg sieve in some combinatorial fashion, but the trick is to keep the sieve non-negative (which is crucial in our argument, unless one can somehow get excellent control on the negative portion of the sieve); once one leaves the Selberg sieve form $\nu(n) = \gamma(n)^2$ this becomes quite non-trivial.

26 November, 2013 at 3:08 am

Eytan Paldi

Perhaps there is some generalization to a sieve combining translates of several admissible tuples.

26 November, 2013 at 11:06 am

Terence Tao

I’m now leaning towards the view that any replacement for the Selberg sieve $\nu$ will also encounter a logarithmic type barrier, in that the density $\sum_n \nu(n) 1_{n+h_i\hbox{ prime}}/\sum_n \nu_n$ is unlikely to be much higher than $O(\log k/k)$ . Actually it is rather miraculous that we get the logarithmic gain at all, a naive heuristic computation (which I give below) suggests that $O(1/k)$ should be the threshold.

To motivate things, suppose we start with a one-dimensional sieve $\nu(n) = \sum_{d|n} \alpha_d$ , where $\alpha_d$ are some coefficients for $d$ up to some height $x^c$ . In the Selberg sieve case this divisor sum is the square of some other divisor sum $\sum_{d|n: d \leq R} \lambda_d$ to ensure non-negativity, but we will not assume here that the sieve is of Selberg type.

The primality density $\sum_n \nu(n) 1_{n \hbox{ prime}} / \sum_n \nu(n)$ of such a sieve cannot be much larger than $O(c)$ . This is because all $x^c$ -rough numbers (numbers with no prime factor less than $x^c$ ) are assigned the same value by the sieve $\nu$ , and by the asymptotics for the Dickman-de Bruijn function, the primes in $[x,2x]$ form a proportion of about $c$ of all the $x^c$ -rough numbers in $[x,2x]$ .

Now we look at a multidimensional sieve $\nu(n) = \sum_{d_1|n+h_1;\ldots;d_k|n+h_k} \alpha_{d_1,\ldots,d_k}$ , where again we do not assume the sieve to be of Selberg type, but we impose a restriction of the shape $d_1 \ldots d_k \leq x^{O(1)}$ since we are unlikely to control the sieve error terms without such a hypothesis, even assuming the strongest Elliott-Halberstam type conjectures. Heuristically, this limits each of the $d_j$ to be typically of size $x^{O(1/k)}$ , and then the one-dimensional argument suggests that the primality density $\sum_n \nu(n) 1_{n+h_i\hbox{ prime}}/\sum_n \nu_n$ should not be much larger than $O(1/k)$ for each $i$ .

Now the Maynard sieve does do a bit better than this, because each factor $d_i$ is allowed to be a bit higher than $x^{O(1/k)}$ on occasion (there is a weight roughly of the shape $\frac{1}{1 + k \frac{\log d_j}{\log x}}$ in the sieve), since one can rely on concentration of measure (or the law of large numbers, if you wish) to keep the product $d_1 \ldots d_k$ of a reasonable size $x^{O(1)}$ . This occasional use of higher moduli seems to just barely be sufficient to diminish the sieve $\nu(n)$ on some of the other $x^{O(1/k)}$ -rough numbers than the primes to boost the prime density from $O(1/k)$ to $O(\log k/k)$ . But it seems unlikely that a clever choice of sieve weights could do much better than this.

26 November, 2013 at 12:51 pm

James Maynard

Do you have a heuristic which explains the gain in the log, without looking too much at the sieve functions themselves?

I ask because I’d been working under the same heuristic that you mention above, and it surprised me that the modified weights were able to win by a full factor of a log. I don’t have any good heuristic explanation as to how one might be able to `see’ that these weights do this well a priori. Obviously, although logs are critical in the small-gaps argument, they are very small in comparison to $k$ , and so we shouldn’t necessarily be surprised a heuristic breaks down at this level of accuracy.

That said, I liked this heuristic because it seemed to work well with similar problems. It seems to explain quite naturally the results of GGPY/Thorne on small gaps between numbers with $r$ prime factors, results on sifting limits and results on the number of prime factors of $\prod (n+h_i)$ . (Some of these are more `robust’ to a log factor than others). The `superiority’ of the Selberg sieve over other sieves for larger $k$ seemed to me just that the implied constants in the $O$ -terms were better.

26 November, 2013 at 9:17 pm

Terence Tao

Unfortunately I don’t have much of an explanation for this – at a formal level it shares a lot of similarities with the logarithmic divergences that show up in the endpoint Sobolev inequalities in harmonic analysis, in that one gets contributions from logarithmically many dyadic scales (which, in this case, would be something like the contribution of moduli between $x^{2^j/k}$ and $x^{2^{j+1}/k}$ for $j=1,\ldots,\log k$ ) but the way in which these scales interact seems rather specific to the Selberg sieve, and it is possible that some other sieve might give a different power of log (or more likely, no logarithmic gain at all).

25 November, 2013 at 11:55 am

Aubrey de Grey

Asking the converse question: is there any N for which there are already known to be infinitely many x such that both x and x+2 are products of exactly N primes? If not, is it plausible that Maynard’s methods can be used to identify such an N? (Or if so, to reduce the known N?)

25 November, 2013 at 12:09 pm

Terence Tao

No, this is blocked by the parity problem. Given any sieve candidate $\nu(n)$ , we expect $\sum_n \mu(n) \nu(n)$ and $\sum_n \mu(n+2) \nu(n)$ to be negligible (this is part of the “Mobius pseudorandomness heuristic”), which basically means that if $n$ is drawn at random from the $\nu(n) / \sum_m \nu(m)$ distribution, then $n$ has a probability $1/2+o(1)$ of having an even number of factors, and $1/2+o(1)$ of having an odd number of factors. Similarly for $n+2$ . The upshot of this is that one cannot use purely sieve-theoretic methods to prescribe the parity of the number of prime factors of both $n$ and $n+2$ . But it turns out (as worked out by Bombieri) that on EH, one can basically do anything that does not run into this parity obstruction; for instance one can find infinitely many prime n such that n+2 is the product of either 2013 or 2014 primes. (This should be discussed in Friedlander-Iwaniec’s “Opera del Cribro”, though I don’t have this reference handy at present.)

Breaking the parity barrier for twins would be an absolutely huge breakthrough, even if conditional on Elliott-Halberstam. But it’s going to take a radically new idea; it’s not just a matter of tweaking the sieves.

25 November, 2013 at 11:21 pm

Anonymous

I was wondering if you could elaborate on the intuition for why it will take a radically new idea (similarly, I think I remember you saying that proving or disproving $P versus NP$ will require a new idea. how do you know?) Is it just that a lot of people have tried to solve these problems using many different techniques and failed or is it something else? Thanks.

26 November, 2013 at 9:13 am

Terence Tao

I discuss the parity problem further in my old blog post https://terrytao.wordpress.com/2007/06/05/open-question-the-parity-problem-in-sieve-theory/

27 November, 2013 at 12:00 am

Anonymous

Thanks.

25 November, 2013 at 12:43 pm

Pace Nielsen

Three days ago I started running some computations. Here are the results. Throughout, I will be assuming EH.

————————–

Take $k=5$ . Let $F(t_1,t_2,t_3,t_4,t_5)$ be an arbitrary polynomial function of total degree $d$ , which is symmetric in $t_1,t_5$ and in $t_2,t_3,t_4$ separately. Working over the original simplex $\mathcal{R}_{4}$ of Maynard, set

$P:= \displaystyle \frac{1}{2} \sum_{m=0}^5 J_5^{(m)}(F) / I_5(F) - \frac{1}{2} \int \frac{(\int_{t_1+t_5 \leq 1-t_2-t_3-t_4} F(t_1,t_2,t_3,t_4,t_5)\ dt_1 dt_5)^2}{1-t_2-t_3-t_4}\ dt_2 t_3 t_4 / I_5(F)$ .

Using a numerical optimizer (so results are not necessarily optimal, but should be close), we get:

For $d=3$ , $P\approx -0.0860$ .
For $d=4$ , $P\approx -0.0799$ .

It is not difficult to increase $d$ a little more. Probably up to $d=10$ is feasible.

————————–

Take $k=4$ . Let $F(t_1,t_2,t_3,t_4)$ be an arbitrary polynomial function of total degree $d$ , which is symmetric in the variables. Working over the original simplex $\mathcal{R}_{4}$ of Maynard, and again using a numerical optimizer, we get:

For $d=4$ , $(1/2)M_4(F)\gtrapprox 0.9222$ .
For $d=5$ , $(1/2)M_4(F)\gtrapprox 0.9225$ .
For $d=6$ , $(1/2)M_4(F)\gtrapprox 0.9226$ .

This is already very close to the upper bound of $0.924$ recently discovered by Terry.

————————-

I then tried to do the $k=4$ case for the new simplex $\mathcal{R}_{4}'$ , but the numerics give weird answers. For instance, when $d=4$ , we get $\approx 0.522$ . The reason I didn’t post these computations earlier is that I’ve been trying to figure our where my error is. After two days I still have not been able to track down my error. I wrote up a LaTeX document explaining the computation, and I’m willing to email it (and the mathematica file) to anyone who is interested.

25 November, 2013 at 2:34 pm

James Maynard

I reran the computations I alluded to at the beginning of the previous post.

My calculations agree with yours for the original simplex $\mathcal{R}_k$ (I get the slightly better bound $(1/2) M_4>0.9226$ when $d=5$ .)

This is for $k=4$ , using the enlarged simplex $\mathcal{R}_4'$ . As you have done, I’m taking $F$ to be an arbitrary symmetric polynomial of degree $d$ on $\mathcal{R}_k'$ , and then I’m using the linear-programming method from my paper to calculate the eigenvector which corresponds to the optimal choice of coefficients of $F$ .

I then get the bounds
$d=4:$ $(1/2) M_4>0.9612$
$d=5:$ $(1/2) M_4>0.9623$
$d=6:$ $(1/2) M_4>0.9629$
This was my reason for saying using the enlarged simplex gets `about half way’ to obtaining gaps of size 8: we’ve improved from a lower bound of about 0.92 to about 0.96, and we want to get it to 1.

It appears that the convergence as $d$ increases is for the enlarged simplex is noticeably slower than for the original simplex.

I can give more details of how I did the calculations if interested.

25 November, 2013 at 4:03 pm

Pace Nielsen

When I figure out where my computational error is, now I’ll know what numerics to look for. Thank you!

P.S. The reason I didn’t send those typos I mentioned above to you, is that I’m reading through your paper with Roger Baker (here at BYU) and he is compiling a list of the typos we run across. He will be sending that to you, probably in a couple of weeks.

25 November, 2013 at 7:35 pm

James Maynard

Oops, there was a small typo in my program. The corrected bounds I get are
$d=4:$ $(1/2)M_4>0.9686$
$d=5:$ $(1/2)M_4>0.9688$
$d=6:$ $(1/2)M_4>0.9689$
and the convergence with $d$ seems good.

Summary of my calculation:
$\displaystyle J=\int \dots \int (\int F(t_1,t_2,t_3,t_4)dt_1)^2dt_2 dt_3 dt_4$ ,
we can notice that if $F$ is symmetric, then $J$ is symmetric in $t_2,t_3,t_4$ . This means the region with $t_2>t_3>t_4$ will contribute $1/6$ of the total. In this region, it is straightforward to put exact limits on each integration variable. We get
$\displaystyle J=\int_0^{1/3}\int_{t_4}^{(1-t_4)/2}\int_{t_3}^{1-t_3-t_4} (\int_0^{1-t_2-t_3} F(t_1,t_2,t_3,t_4)dt_1)^2 dt_2 dt_3 dt_4.$
We also get a similar expression for $I$ . Again, these are quadratic forms in the coefficients of $F$ , and so we can find the optimal choice of $F$ of given degree by calculating the largest eigenvalue of a matrix.

26 November, 2013 at 2:14 am

Bogdan

I am a bit confused: the upper bound of 0.924 is reported in post for 25 November, 2013 at 5:14 am, and the conclusion was that “so under EH the current method is limited to k_0=5”, and now you get the lower bound of 0.9689! Is this because of “using the enlarged simplex”?

26 November, 2013 at 11:24 am

James Maynard

Bogdan: Yes, the upper bound of 0.924 was for functions whose support was restricted to the smaller simplex $\mathcal{R}_k$ . By comparison, we are getting a lower bound of 0.922 in this case, so these seem to be in good agreement.

If we use a modification of the original method (such as extending the support to the slightly large simplex $\mathcal{R}'_k$ or incorporating the $1_{n,n+12 prime}$ terms) then this upper bound no longer applies. In particular, the lower bound of 0.96 using the larger simplex isn’t in contradiction to the earlier bound.

26 November, 2013 at 5:29 pm

Eytan Paldi

The fast convergence (using polynomials) indicates the (possible) existence of a smooth maximizer.

25 November, 2013 at 3:23 pm

Anonymous

Comments (from a non-mathematician who is interested in mathematics):

In Maynard’s paper, it is stated that $H_m \ll m^3e^{4m}$ , but in the post, it is stated that $H_m \ll m^3e^{2m}$ .

25 November, 2013 at 4:28 pm

Terence Tao

The bound $H_m \ll m^3 e^{4m}$ is unconditional, but the improved bound $H_m \ll m^3 e^{2m}$ is available if one assumes the Elliott-Halberstam conjecture; this isn’t explicitly stated in Maynard’s paper but it follows easily from the methods of the paper (so if James is revising the paper anyway, one may as well add in that remark).

25 November, 2013 at 3:52 pm

Anonymous

@Maynard:

* The subscript “ $max$ ” should be “ $\max$ ”.
* When typing maps, one should use “\colon” instead of “:” in order to get the correct spacing aroung the colon.

(Typographical comments, valid throughout the paper.)

26 November, 2013 at 8:28 am

Aubrey de Grey

I’m wondering whether it is possible (or desirable) to reformulate the progress being made here on M_k in a manner that more closely resembles prior work, in particular Polymath8a. Taking things down to their barest essentials, I gather that we can informally say that Maynard and polymath8b have improved the k for which DHL[k,m+1] is implied by EH[theta] (but see below) from an exponentially decreasing function of (theta – m/2) (i.e. GPY) to something on the order of e^(2m/theta). I appreciate that there are good mathematical reasons for working centrally with the quantity M_k, but it’s not clear to me whether the evolving bounds on M as a function of k can be “inverted” to give bounds on k as a direct function of m and theta. If that were possible, I think that for expository purposes it would be of great interest.

Also, I have a somewhat similar question relating to polymath8a. Is it possible to state an explicit version of EH[theta] (let’s call it EH’[theta]) that is implied (up to a value of theta that is a function of i. varpi and delta) by MPZ(i)[varpi,delta], and that also suffices to imply DHL[k,m+1] for sufficiently large k? If that were possible, it seems like it would constitute a nicely clean way, again for expository purposes, to separate the progress in reducing delta from the progress in reducing k for a given theta (and m). Can someone please clarify this?

26 November, 2013 at 8:43 am

Terence Tao

The various upper and lower bounds on Maynard’s sieve all correspond to asymptotics of the form $DHL[ k, (\frac{\theta}{2} + o(1)) \log k ]$ for large $k$ , which in particular implies $H_m \leq \exp( (\frac{2}{\theta} + o(1)) m)$ for large $m$ ; the only difference between them lies in the precise nature of the o(1) error term. Once the bounds stabilise, one can of course work these things out more accurately.

The Bombieri-Vinogradov theorem already implies $DHL[k,m+1]$ for sufficiently large k, and is a known theorem and is thus implied by any MPZ type statement. We do have some implications that do a little better than this by exploiting an MPZ hypothesis, by using a cutoff function F for Maynard’s sieve that is supported in an appropriate cube; again, the asymptotics for large m or large k are as above (in particular, the role of delta is asymptotically negligible), but the situation seems to be more complicated in the non-asymptotic regime, e.g. when m=1 or m=2.

26 November, 2013 at 9:36 am

Aubrey de Grey

Thanks! On the first point, absolutely – I just couldn’t exclude the possibility that this inversion might be so easy that there would be no reason to defer it. On the second point, sure, I was of course referring to an EH’ that would do the job for theta > 1/2; however I was mainly referring simply to the pre-Maynard universe and wondering whether this kind of condensation of i, varpi and delta down to a single parameter theta is possible at all.

26 November, 2013 at 11:33 am

Aubrey de Grey

In other words, when I wrote “for sufficiently large k” I should have written “for the same k that MPZ(i)[varpi,delta] itself does”.

26 November, 2013 at 1:13 pm

Terence Tao

A minor (and standard) remark: the Selberg sieve $\nu(n) = (\sum_{d_i | n+h_i} \lambda_{d_1,\ldots,d_k})^2$ can be rewritten as

$\displaystyle \nu(n) = (\sum_{d_1,\ldots,d_k} \frac{\mu(d_1) \ldots \mu(d_k)}{\phi(d_1)\ldots \phi(d_k)} y_{d_1,\ldots,d_k} c_{d_1}(n+h_1) \ldots c_{d_k}(n+h_k))^2$

where $y_{d_1,\ldots,d_k}$ are as in Maynard’s paper and $c_d(n)$ are the Ramanujan sums

$c_d(n) = \sum_{a \in ({\bf Z}/d{\bf Z})^\times} e_d(an) = \prod_{p|d} (p 1_{p|n} - 1)$

(restricting to squarefree $d$ ); this is basically equation (5.8) of Maynard’s paper after some rearranging.

The functions $n \mapsto c_d(n+h_i)$ behave approximately like independent random variables as $d, i$ vary, which heuristically explains why the mean value of $\nu$ is basically $\sum_{d_1,\ldots,d_k} \frac{y_{d_1,\ldots,d_k}^2}{\phi(d_1) \ldots \phi(d_k)}$
and can also give heuristic justification for the asymptotic for $\sum_n \nu(n) 1_{n+h_i \hbox{ prime}}$ in terms of the $y_{d_1,\ldots,d_k}$ (Lemma 5.2 of Maynard’s paper). I wasn’t able to use this representation of the Selberg sieve for much else, though.

26 November, 2013 at 2:51 pm

Anonymous

Typographical comment to Terry: If you use $\dots$ instead of $\ldots$, LaTeX will take case of correct placement of the dots; they shouldn’t always be lowered to the base of the line.

26 November, 2013 at 3:34 pm

Eytan Paldi

In the definition of $M_k$ , the functions $F$ need only to be in
$L^2$ (i.e. not necessarily bounded.)

26 November, 2013 at 5:10 pm

Pace Nielsen

In the spirit of “A poor man’s improvement on Zhang’s result”, I was playing around with Maynards mathematica notebook. I found that increasing “p[5]” to “p[6]” (and increasing the number of variables to 56, and precomputing polynomials up to degree 13) allows one to push $k=102$ . This was confirmed by Maynard. [It appears that no further gain is made in such an easy manner by just increasing the degree further.]

Obviously, further improvements are available if we incorporate some of the higher order power-symmetric polynomials like P_3.

26 November, 2013 at 9:57 pm

Sniffnoy

102 or 103? It says 102 here but 103 on the wiki and I’m wondering whether that is a typo here or a copying error there (the H value listed there is for 103).

[Corrected, thanks – T.]

27 November, 2013 at 8:44 am

Terence Tao

Here is a possible way to generalise the Selberg sieve. We’ll use the “multiimensional GPY” formulation

$\displaystyle \nu(n) = (\sum_{d_1|n+h_1,\ldots,d_k|n+h_k} \mu(d_1)\ldots \mu(d_k) F(\frac{\log d_1}{\log R},\ldots,\frac{\log d_k}{\log R}))^2$

for the sieve, where F is a smooth function supported on the simplex (or the enlarged simplex). We can square this out as

$\displaystyle \nu(n) = \sum_{d_1,d'_1|n+h_1,\ldots,d_k,d'_k|n+h_k} \mu(d_1)\mu(d'_1)\ldots \mu(d_k)\mu(d'_k)$
$\displaystyle F(\frac{\log d_1}{\log R},\ldots,\frac{\log d_k}{\log R}) F(\frac{\log d'_1}{\log R},\ldots,\frac{\log d'_k}{\log R}).$

One can then consider the more general sieve

$\displaystyle \nu(n) = \sum_{d_1,d'_1|n+h_1,\ldots,d_k,d'_k|n+h_k} \mu(d_1)\mu(d'_1)\ldots \mu(d_k)\mu(d'_k)$
$\displaystyle \tilde F(\frac{\log d_1}{\log R},\ldots,\frac{\log d_k}{\log R}, \frac{\log d'_1}{\log R},\ldots,\frac{\log d'_k}{\log R})$

for some smooth function $\tilde F$ supported on the product of two simplices. This function obeys similar asymptotics to the original sieve (this can be seen for instance by using Stone-Weierstrass to approximate $\tilde F$ by a finite linear combination of tensor products, or else by direct computation); for instance one has

$\displaystyle \sum_n \nu(n) 1_{n+h_1 \hbox{ prime}}/\sum_n \nu(n)$

$\displaystyle = \int \tilde F_{2,\ldots,k,2',\ldots,k'}( 0, t_2,\ldots,t_k,0,t_2,\ldots,t_k)\ dt_2 \ldots dt_k$
$/ \int \tilde F_{1,\ldots,k,1',\ldots,k'}(t_1,\ldots,t_k,t_1,\ldots,t_k)\ dt_1 \ldots t_k + o(1)$

if the denominator is non-zero, where the subscripts on F denote differentiation in one or more of the variables $t_1,\ldots,t_k,t'_1,\ldots,t'_k$ . The main problem is that without further hypotheses on $\tilde F$ , the sieve weight $\nu$ is not necessarily non-negative, which is a crucial property in the argument. In the case that $F$ is positive semidefinite (viewed as a “matrix” or “quadratic form” on the simplex), one obtains the non-negativity; but in this case one can diagonalise F into a non-negative combination of Selberg sieves, which cannot give better ratios than a pure Selberg sieve. However, it may be possible to have $\nu$ non-negative even without $\tilde F$ being positive semidefinite, leading to a genuinely more general class of sieve. The positivity requirement is a bit complicated though, even in the base case k=1; a typical condition is

$\tilde F(0,0) - \tilde F(t_1,0) - \tilde F(t_2,0) + \tilde F(t_1+t_2,0)$
$-\tilde F(0,t_1) + \tilde F(t_1,t_1) + \tilde F(t_2,t_1) - \tilde F(t_1+t_2,t_1)$
$-\tilde F(0,t_2) + \tilde F(t_1,t_2) + \tilde F(t_2,t_2) - \tilde F(t_1+t_2,t_2)$
$+\tilde F(0,t_1+t_2) - \tilde F(t_1,t_1+t_2) - \tilde F(t_2,t_1+t_2) + \tilde F(t_1+t_2,t_1+t_2) \geq 0$

whenever $t_1+t_2 =\frac{\log x}{\log R}$ . This condition demonstrates the non-negativity of $\nu$ for products of two primes; one needs similar conditions for products of $r$ primes for any bounded r, and this should suffice (note that these sieves are concentrated on numbers with only a bounded number of prime factors).
These conditions are implied by positive semi-definiteness of $\tilde F$ , but hopefully they are a bit weaker, allowing for more test functions $\tilde F$ to play with.

27 November, 2013 at 3:22 pm

Terence Tao

A cute inequality related to the above considerations: for any smooth function $F: [0,1] \to {\bf R}$ , one has

$\sum_{r=1}^\infty \frac{1}{r!} \int_{{\cal R}_{r-1}} \Delta_{t_1} \ldots \Delta_{t_r} F(0)|^2\ \frac{dt_1 \ldots dt_{r-1}}{t_1 \ldots t_r}$

$\leq \int_0^1 |F'(t)|^2\ dt$

where $t_r := 1-t_1-\ldots-t_{r-1}$ and $\Delta_h$ is the difference operator $\Delta_h F(t):= F(t+h)-F(t)$ . Thus for instance

$\frac{1}{2} \int_0^1 \frac{|F(0)-F(t)-F(1-t)+F(1)|^2}{t}\ dt \leq \int_0^1 |F'(t)|^2\ dt.$

The sieve theoretic interpretation is that the right-hand side is an appropriately normalised version of $\sum_n \nu(n)$ if $\nu(n) = (\sum_{d|n} \mu(d) F(\frac{\log d}{\log R})^2$ , and the left-hand side is the contribution to $\sum_n \nu(n)$ coming from products of exactly $r$ primes for $r=1,2,\ldots$ . For sufficiently smooth functions F (e.g. finite linear combinations of exponentials), the inequality is in fact an identity, as can be seen by a rather fun computation. The claim can also be proven from the recursive distributional identity

$S(F)(T) = F(0)^2 \delta_0(T) + \frac{1}{T} \int_0^T S( \Delta_\tau F)(T-\tau)\ d\tau$

for $T \in [0,+\infty)$ , where $S$ is the nonlinear operator

$S(F)(T) :=F(0)^2 \delta_0(T) + \int_0^T F'(t)^2\ dt.$

27 November, 2013 at 5:57 pm

James Maynard

A couple of calculations regarding attempts to modify the $k_0=5$ case to obtain gaps of size 8:

Using the enlarged simplex, I get that $(1/2)M_5>1.029...$ (and I would expect this to be close to the truth). This means that the `probability’ that one of the elements of a 5-tuple is prime is $\approx 1.029/5\approx 0.205$ . If we assume independence, then we would expect the probability that both $n$ and $n+12$ are prime to be $\approx 0.205^2\approx 0.0424$ (since we don’t expect to be able to get an asymptotic for this quantity, in reality we have to satisfy ourselves with an upper bound, which would be worse than this). This means that the modification of taking away the contribution form the times when $n$ and $n+12$ are simultaneously prime would (in the most optimistic scenario) leave us with a total of $1.029..-0.0424\approx 0.987<1$ . This means we need at least one new idea to improve the situation. We'd need a lower bound on $(1/2)M_5$ which is certainly greater than $5(5-\sqrt{21})/2\approx 1.04...$ for this modification to get us over the edge.

I also tested the upper bound to see how it fared numerically. My calculations should be treated as very provisional, since I haven't checked them properly. In the scenario above, I find that taking a suitable polynomial for $F$ , we get a lower bound for the modified version of $M'_5$ which gives $(1/2)M_5'\ge 0.966$ . This compares well with the most optimistic bound of $0.987$ from above. This also corresponds to an upper bound for $1_{n,n+12 prime}$ of size $\approx 0.062$ , which again is close to the guessed asymptotic value of $\approx 0.042$ .

Since these bounds performed quite well, it is maybe reasonable to expect us to get decent upper bounds when we are considering $k=102$ or similar. I don't know how we would evaluate the integrals that arise, however.

27 November, 2013 at 6:40 pm

Terence Tao

Thanks for these calculations! Due to the parity problem, I expect that any upper bound we obtain on $1_{n,n+12\hbox{ prime}}$ is going to be off by a factor of at least 2 from the truth (so in this case, it would be like $0.085$ rather than $0.042$ ). This isn’t consistent with the $0.062$ number that you are provisionally reporting, but perhaps the numerology for the $M'_5$ optimiser is a bit different from that for the $M_5$ optimiser. For instance, it may be advantageous to break symmetry a bit and work with polynomials F that behave differently in the $t_1,t_5$ variables than in the $t_2,t_3,t_4$ , lowering the probability that n is prime and n+12 is prime (in order to also lower the probability that n,n+12 are both prime) while also raising the probability that n+2,n+6,n+8 are prime.

0.987 is tantalisingly close to 1, so there is still hope :) Certainly things should look better for larger k. One relatively cheap thing to try is just to take the best polynomial we have for $M_{102}$ and compute the upper bounds on $1_{n+h_i,n+h_j \hbox{ prime}}$ (note though that if we retreat from EH to Bombieri-Vinogradov, some of the 1/2 factors in the previous formula for this quantity degrade to 1/4).

27 November, 2013 at 6:52 pm

Terence Tao

A small observation: somewhat frustratingly, the Zhang/Maynard methods do not quite seem to be able to establish Goldbach’s conjecture up to bounded error (i.e. all sufficiently large numbers are within O(1) of the sum of two primes). But one can “split the difference” between bounds on H and establishing Goldbach conjecture with bounded error. For instance, assuming Elliott-Halberstam, one can show that at least one of the following statements hold:

1. $H_1 \leq 6$ (thus improving over the current bound of $H_1 \leq 12$ ); or

2. Every sufficiently large even number lies within 6 of a sum of two primes.

To prove this, let N be a large multiple of 6, and consider the tuple n, n+2, n+6, N-n, N-n-2. One can check that this tuple is admissible in the sense that for every prime p, there is an n such that all five elements of the tuple are coprime to p. A slight modification of the proof of DHL[5,2] then shows that there lots of n for which at least two of the five elements of this tuple are prime; either these two elements are within 6 of each other, or sum to a number between N-2 and N+6.

If we can ever get DHL[4,2] (or more precisely, a variant of this assertion for the tuple n, n+2, N-n, N-n-2), we get a more appealing dichotomy of this type: either the twin prime conjecture is true, or every sufficiently multiple of six lies within 2 of a sum of two primes.

28 November, 2013 at 8:54 am

Aubrey de Grey

Wow! This immediately raises two questions that would surely be of wide interest:

1) For k=5, can one split the difference unevenly, i.e. lower the value 6 in one statement at the expense of using a number larger than 6 in the other? If so, how unevenly – all the way to TP or Goldbach?

2) Can analogous statements be made for larger k? In particular, for k large enough to require only BV (or MPZ) rather than EH?

28 November, 2013 at 11:19 am

Aubrey de Grey

In fact, much better would be to unify the above questions in the obvious way, as follows:

If we denote by “GBE[n]” (Goldbach up to bounded error n – is there an established name for this conjecture?) the assertion that all even numbers are within n of an even sum of two primes (I’m omitting “sufficiently large” here only because I presume that “sufficiently” can easily be shown in this context to be within the range for which GC itself has already been computationally verified – apologies if that’s not true), then for which [a,b,c] can we say that DHL[a,2] implies DHL[b,2] or GBE[c] ? Terry’s observations constitute this statement for [a,b,c] = [5,3,6] and [4,2,4]. How far can the allowed values be broadened?

27 November, 2013 at 9:33 pm

Anonymous

I feel that Goldbach’s conjecture / twin primes conjecture are very close to be proved.
All originated from Zhang’s work!

27 November, 2013 at 10:17 pm

Gil Kalai

I am curious about is the following. Consider a random subset of primes (taking every prime with probability p, independently, and say p=1/2). Now consider only integers involving these primes. I think that it is known that this system of “integers” satisfies (almost surely) PNT but not at all RH. We can consider for such systems the properties BV (Bombieri Vinogradov), or more generally EH(θ) and the quantities $H_m$ . For such systems does BV typically hold? or it is rare like RH. Is Meynard’s implication applies in this generality? Nicely, here we can hope even for infinite consecutive primes.

28 November, 2013 at 8:26 am

Terence Tao

Strictly speaking, if only takes half of the primes, then the remaining integers is a very sparse set – about $x^{1/2}$ elements up to $x$ – which makes all the numerology very different from that of the full primes.

You may be thinking instead of the Beurling integers, in which the primes are replaced by some other set of positive reals with the same asymptotic density properties. These integers enjoy many of the same “multiplicative number theory” properties as the usual integers (e.g. the prime number theorem), but one usually loses all of the “additive number theory” properties, basically because the Beurling integers are not translation-invariant: if n is a Beurling integer, then n+2 usually isn’t, and so asking for things like twin Beurling primes is not really a natural question; similarly for questions about distributions of Beurling primes in residue classes (Bombieri-Vinogradov, EH, etc.) (unless one also somehow creates a theory of “Beurling Dirichlet characters” to go with the Beurling primes, but I do not know how to make this work for anything that isn’t already coming from a number field).

28 November, 2013 at 10:32 am

Gil Kalai

Terry, how can it be $x^{1/2}$ if you leave half the primes alive?

28 November, 2013 at 10:51 am

Terence Tao

Oops, I miscalculated; the density is now $1/\log^{1/2} x$ rather than $1/x^{1/2}$ , so I drop my previous objection to the question :). (My error was thinking that the average number of divisors of n would remain comparable to $\log x$ , when instead it drops to $\log^{1/2} x$ .) The main technical difficulty now is with counting such quantities as the number of integers n less than x such that n and n+2 are both divisible by the surviving primes; offhand I don’t see a promising technique to answer this question (as well as more general questions in which several shifts of n are involved, and some congruence class conditions on n are also imposed) but it looks like an interesting problem.

28 November, 2013 at 11:21 am

Gil Kalai

Indeed, I was thinking about Beurling integers, and my questions can be asked about them but I wanted a very very specific model which will make computations easier. I think that it is not that easy to work with general Beurling primes and finding a concrete stochastic models (like random graphs, sort of) may be more fruitful.

If we want to keep the density of primes on the nose we can possibly still take p_n with independent but not identical probabilities not to corrupt asymptotic density properties, or do some other “correction”, but perhaps taking half the primes is also of interest.

All the questions I asked refer only to the relative ordering of the Beurling integers and *not* to their additive properties. So “twin primes” refers to two Beurling primes with a single composite Beurling integer in between. And “consequtive primes” refers to two Beurling primes without any composite Beurling integer in between (Alas, we don’t have them often in the usual case.) I think that PNT, RH, BV (Bombieri Vinogradov), or more generally EH(θ), and $H_m$ (in the *ordinal*, not *additive* sense) continue to make sense (perhaps even more than one sense) for Beurling primes and it is interesting to see if Meynard implication extends, but it looks to me that considering a stochastic model based on the primes will make the issue more tractable (and perhaps useful).

28 November, 2013 at 12:28 pm

James Maynard

If I understand your question correctly, then I would have thought for such a system you would get (almost surely) $EH(\theta)$ for this subset of the primes, whenever $EH(\theta)$ holds for the primes themselves. (And so bounded gaps with $m$ primes).

If $X_{a,q}$ is the number of chosen primes in the residue class $a \pmod{q}$ which are at most $x$ , then $X_{a,q}$ is distributed as a binomial random variable $B(\pi(x;q,a),1/2)$ . Therefore, we should get, for some constant $C>$

$\displaystyle\mathbb{P}(|X_{a,q}-\pi(x;q,a)/2|\ge t\sqrt{\pi(x,q,a)})\ll \exp(-C t^2)$ .

In particular, over all $a,q$ for which $\pi(x;q,a)\gg x/\phi(q)$ we have almost surely $X_{a,q}=\pi(x;q,a)/2+O(\log^2{x} (x/q)^{1/2})$ . (There are $O(x^2)$ such pairs $a,q$ , and we take $t=\log{x}$ above). The errors from these $O$ -terms are negligible. The errors from those pairs $a,q$ for which ` $\pi(x;q,a)\gg x/\phi(q)$ ‘ does not hold, and the errors $\pi(x;q,a)/2-\pi(x)/(2\phi(q))$ can be bounded by regular $EH(\theta)$ for all primes.

28 November, 2013 at 1:52 pm

Terence Tao

I agree that this form of $EH[\theta]$ will hold almost surely, but I think for applications to bounded gaps between these Beurling-type integers one needs a significantly stronger version of $EH[\theta]$ in which one not only counts those $n = a\ (q)$ for which $n$ is a Beurling prime, but also $n+h_j-h_i$ is a Beurling integer for the other elements $h_j$ in the tuple.

Actually, never mind bounded gaps between Beurling primes: bounded gaps between Beurling integers is already somewhat non-trivial problem, particularly if one wants the correct asymptotic for how often this occurs; this is probably the main obstacle to extending the various results for gaps between primes to this Beurling setting.

28 November, 2013 at 3:16 pm

James Maynard

Can you not use the sieve as before, comparing $\sum_{i=1}^k \mathbf{1}_{Beurling prime}(n+h_i)$ to just the constant function 1 (supported on all integers, Beurling-type or not)?

Since you have $EH(\theta)$ for these primes, you should be able to calculate an asymptotic for the sieve weighted by these terms (which will be 1/2 of what we get for primes). Since $M_k$ tends to infinity, I would have thought we should then get $m$ primes in a $k$ -tuple whenever we would have got $2m$ primes before.

28 November, 2013 at 4:24 pm

Terence Tao

Well, if one uses the existing sieve $\nu$ without any modification, the density $\sum_n \nu(n) 1_{Beurling\ prime}(n+h_i) / \sum_n \nu(n)$ will only be of the order of $1 /\log^{1/2} x$ , which is too sparse to pick up more than one prime in the tuple. Things are presumably better if one replaces $\nu(n)$ by $\nu(n) \prod_{j=1}^k 1_{Beurling\ integer}(n+h_i)$ , but then one needs asymptotics for things like $\sum_{n = a\ (q)} \prod_{j=1}^k 1_{Beurling\ integer}(n+h_i)$ , which is presumably doable but requires some calculation. (It is vaguely reminiscent of the Titchmarsh-type divisor problem of finding integers n, n+h with specified divisor function.)

28 November, 2013 at 1:50 pm

Terence Tao

A new paper on the arXiv by William D. Banks, Tristan Freiberg, and Caroline L. Turnage-Butterbaugh: “Consecutive primes in tuples“. Basically they show that Maynard’s result can be boosted to guarantee that the m primes involved are consecutive, although they have to increase k somewhat (to be double exponential in m rather than single exponential) to do this. (It may be, though, that one can avoid this by computing asymptotics for $\sum_n \nu(n) 1_{n+h \hbox{ prime}}$ for h not in the tuple, much as in the original GPY paper, or by using the ideas of Pintz as is done for instance in http://arxiv.org/abs/1305.6289 .)

28 November, 2013 at 2:59 pm

Eytan Paldi

Perhaps this new type of $H_m$ (for consecutive primes) may be studied here as well.

30 November, 2013 at 7:38 pm

Kalpesh Muchhal

Hi, Corollary 3 of the Banks-Freiberg-TB paper shows that consecutive primes can be part of an AP, while the Green-Tao theorem shows that primes can be consecutive terms of an AP. Can these results be somehow combined to prove consecutive primes can be consecutive terms of an AP infinitely often?

30 November, 2013 at 9:39 pm

Terence Tao

Unfortunately, the primes that we can capture in short intervals are so sparse (one roughly has only $\log H$ or so primes in an interval of length H) that it is not presently possible to force them into arithmetic progression. (My arguments with Ben can produce arithmetic progressions of primes in $[x,x+x^{o(1)}]$ – this was explicitly noted in a later paper of myself with Ziegler – and in view of more recent work (particularly that of Conlon, Fox, and Zhao) it is likely that one can narrow this to $[x,x+\log^{O(1)} x]$ – but this is still a bit too wide to expect that the primes produced in this fashion will be consecutive.)

29 November, 2013 at 7:53 am

Andrew Granville

I do not see why one needs to increase k. If the m-prime result is obtained from the k-tuple, {x+0, x+a_2,…, x+a_k} then we can restrict ourselves to a congruence class (for x) so that x+j has a given smallish prime factor p_j for each j<a_k, which is not an a_i. The congruence x=-j mod p_j, combine to mean we can write x = Q n +b (where Q = prod_j a_j). Hence we work instead with the k-tuple Qn+b, Qn+b+a_2,…,Qn+b+a_k
I think that one can apply the main theorem directly to this k-tuple, so it has (single) exponential size in m

29 November, 2013 at 8:08 am

Andrew Granville

Beurling primes do not satisfy RH in general… You should look for the beautiful 2006 paper of Diamond, Montgomery and Vorhauer. In fact they show that one can have a Siegel-type zero (which perhaps explains why these are so hard to deal with in the actual prime case).

There was also some mention, amongst these comments, of probabilistic models… of course the Gauss-Cramer model, where a random number n is “prime” with probability 1/log n, predicts that there are many prime pairs n, n+1, so needs some adjusting to take account of correlations arising from local divisibility. Sound and i showed some years ago that all of the obvious models proposed to make these sort of corrections have serious flaws (see our paper “An arithmetic uncertainty principle”).

29 November, 2013 at 10:32 am

Gil Kalai

More about systems of Beurling primes:

About my initial suggestion to take half the primes at random, since a quarter of all pairs of primes survive anyway it does not look that the implication (if it applies) from BV(Bombieri-Vinogradov) or EH(θ) to bounded gap is useful here.

Regarding Beurling primes in general. I see three issues

1) Does (or when) Meynard implication from EH(θ) (or just BV) to bounded gaps applies to general systems of Beurling primes with the right density.

2) Can you either show it more easily or apply it more easily for stochastic systems based on ordinary primes.

3) If (or when) the implication is correct, can you get something useful from it for the current project (or even just interesting consequences otherwise.)

I don’t know about 1) but let me just assume it.

Regarding 2) for most natural models small gap pair of primes will survive. (maybe for k-tuples for large k interesting things can be said. since their rate of survival decay exponentially.) Another stochastic model that is interesting is to take all primes *with* repetitions and to take every prime with (say) Poisson or exponential distribution of mean 1. (For applying it we may need bounds on gaps between 2 to m, so that we can ignore repeated primes.) (One of the most beautiful and useful ideas in modern statistics is to study sequences by sampling from them with repetitions.)

Regarding 3) there are various things to think about. You can delete for your Beurling system all pair of primes of gaps at most 1000 (say). Now, for what we know on the number of primes involved with a gap at most 1000 is too high for the statement of Bombieri-Vinogradov to remain correct automatically. (But it is interesting if it remains correct.) However, we may try using it as part of an argument showing that there are many primes of gaps 1000 or less: If the density of those is small then when we delete them EH is not harmed and we can apply the reduction again. This is not a complete argument since new pairs of primes with small gaps (with respect to the new system) may emerge.

Since the estimates for k-tuples density are especially week compared to reality this type of hypothetical argument may work for them better.

Another similar suggestion is to choose for our Beurling system a random prime from every second interval of integers of length 1000. This automatically eliminate (additive) gaps below 1000.

(If such considerations can “just” lead to new examples of Beurling primes which violate RH, BV, or EH this would also be nice, of course.)

1 December, 2013 at 5:01 pm

Gil Kalai

On more thought, I don’t see how to formulate the above questions for general Beurling primes without the additive structure.

1 December, 2013 at 10:41 am

Terence Tao

I’m recording how to use multiple dense divisibility to (in principle at least) improve the value of k, although at present we can only effectively calculate the various integrals involved when the function F is basically built from a tensor product $\prod_i g(t_i)$ , which isn’t the case for our best value 102 for k at m=2, but only for the larger value 339 (for m=2) or 43,134 (for m=3).

To review the situation: under $EH[\theta]$ , we can obtain $DHL[k,m+1]$ (and hence $H_m \leq H(k)$ ) whenever we can find a (bounded measurable) symmetric function $F$ supported on the simplex ${\cal R}_{k}$ with

$\int_{{\cal R}_{k}} |F|^2\ dt_1 \ldots dt_k \leq 1$ (1)

and

$k \int_{{\cal R}_{k-1}} |\int F\ dt_k|^2\ dt_1 \ldots t_{k_1} > \frac{2m}{\theta}$ . (2)

A good choice for F here is a function of the form

$F(t_1,\ldots,t_k) = 1_{{\cal R}_k}(t_1,\ldots,t_k) \prod_{i=1}^k \frac{k^{1/2}}{m_2^{1/2}} g(k t_i)$

where $g(t) = \frac{1}{1+At} 1_{[0,T]}(t)$ for some parameters $A,T>0$ (in practice we take $A$ close to $\log k$ and $T$ close to $k/\log k$ ), and $m_i := \int_0^T g(t)^i\ dt$ . Hypothesis (1) is automatic, and the left-hand side of (2) can be bounded from below by

$\frac{m_1^2}{m_2} ( 1 - {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T ) )$ (3)

where $X_1,\ldots,X_{k-1}$ are iid random variables with distribution $\frac{1}{m_2} g(t)^2\ dt$ .

If one wishes to use $MPZ[\varpi,\delta]$ as the starting hypothesis instead of $EH[\theta]$ , one replaces $\frac{2}{\theta}$ by $\frac{m}{1/4+\varpi}$ , and one has to add the additional hypothesis that $F$ is supported in the cube $[0,\tilde \delta]^k$ with $\tilde \delta := \frac{\delta}{1/4+\varpi}$ . The latter hypothesis is obeyed if

$\frac{T}{k} \leq \tilde \delta.$ (4)

If one uses the stronger hypothesis $MPZ^{(i)}[\varpi,\delta]$ instead, one can relax the support condition on $F$ to the intersection of the larger cube $[0, \tilde \delta']^k$ and the set of $(t_1,\ldots,t_k)$ such that

$\sum_{1 \leq j \leq k: t_j < \tilde \delta} t_j > \theta$

where

$\theta := \frac{ (i\delta'-\delta)/2+ \varpi}{1/4 + \varpi}$

and $\delta' > \delta$ is arbitrary with $\tilde \delta' := \frac{\delta'}{1/4+\varpi}$ ; this follows from Lemma 4.13 of the Polymath8a paper. This has the effect of relaxing (4) to

$\frac{T}{k} \leq \tilde \delta'.$ (4)’

at the cost of worsening (3) to

$\frac{m_1^2}{m_2} ( 1 - {\bf P}( X_1 + \ldots + X_{k-1} \geq k - T )$

$- {\bf P}( \sum_{1 \leq j \leq k-1: X_j< \tilde \delta k} X_j < \theta k ) )$ . (3′)

We may optimise $\tilde \delta'$ by making (4)’ an equality.

Using the exponential moment method, we can lower bound (3′) by

$\frac{m_1^2}{m_2} ( 1 - \kappa_1 - \kappa_2 )$

where

$\kappa_1 := e^{-s(k-T)} (\frac{1}{m_2}\int_0^T e^{st} g(t)^2\ dt)^{k-1}$

and

$\kappa_2 := e^{s' \theta k} (\frac{1}{m_2}\int_0^{\tilde \delta k} e^{-s't} g(t)^2\ dt)^{k-1}$

and s,s’ may be chosen as we please. Thus: we obtain $DHL[k_0,m+1]$ whenever we can find $\varpi, \delta, A, T, s, s'$ obeying

$\frac{m_1^2}{m_2} ( 1 - \kappa_1 - \kappa_2 ) > \frac{m}{1/4+\varpi}$

with $\delta'$ given by equality in (4′), with $\delta \leq \delta'$ . Our best estimate for $MPZ^{(i)}[\varpi,\delta]$ is with $i=4$ and $\frac{1080}{13} \varpi + \frac{330}{13} {\delta} < 1$ .

1 December, 2013 at 5:04 pm

Eytan Paldi

$\kappa_2$ above can be made arbitrarily small (no lower bound on $s'$ .)

[Oops, there was a typo in the definition of $\kappa_2$ , now corrected. -T.]

2 December, 2013 at 9:36 am

Eytan Paldi

There is a typo in the last inequality (last line.) [Fixed, thanks – T.]

2 December, 2013 at 9:52 am

Terence Tao

I spent some time this weekend reading up on the beta sieve (and on its role in the proof of Chen’s theorem that there are infinitely many primes p such that p+2 is an almost prime (the product of at most two primes)), in the hope that it could be a viable substitute for the Selberg sieve for this project; in particular, I think it would be a great result if one could find an alternate proof of bounded gaps between primes that did not rely fundamentally on a Selberg-type sieve. Unfortunately, it seems that while just about any of the standard sieves (including the beta sieve) can produce probability distributions for which one has ${\bf P}( n + h_i \hbox{ prime } ) > \frac{c}{k}$ for some small constant c and all i=1,…,k, it seems thus far that only the Selberg sieve can get c larger than 1 (which gives bounded gaps between primes) and to even grow logarithmically in k (which gives bounds on $H_m$ for all m). I still do not have a clean conceptual understanding of why the Selberg sieve can do this; I have a vague sense that there is some interplay between the “L^2” nature of the Selberg sieve, and the “L^1” nature of density of primes in that sieve, but my intuition on this is still lacking. For the beta sieve, the best I can do is to multiply together k one-dimensional beta sieves of level $x^{1/k}$ ; it is not obvious to me how to create a genuinely k-dimensional beta sieve, although this is perhaps worth pursuing. (I was intending to write up some notes on the beta sieve but they would basically duplicate a chapter of “Opera del Cribro”, in particular there are some rather tricky calculations involving delayed difference equations which I have thus far been unable to avoid.)

Incidentally, in “Opera del Cribro” it is noted that one can prove Chen’s theorem using a combination of beta and Selberg sieves if one uses the FI distribution theorems (which roughly speaking give a level of distribution of 4/7 (or in our notation, $\varpi=1/28$ ) when specialised specifically to the twin prime problem). Roughly speaking, the beta sieve gives a lower bound on the number of numbers $n \leq x$ with $n+2$ prime such that $n$ has no prime factors less than $x^{3/11}$ , and the Selberg sieve gives an upper bound on the number of primes $m$ such that $m-2$ is the product of three primes greater than $x^{3/11}$ , and subtracting the two one gets a positive lower bound for the pairs needed for Chen’s theorem. It may be worth revisiting the proof of the FI distribution theorem and see if it may somehow be combined with some of Zhang’s ideas to get new distribution theorems that are usable for the k-tuple problem and not just the twin prime problem. (I did check to see if one can use the special structure of the coefficients in the error terms in the Zhang/Polymath8a arguments, but these coefficients get eliminated as a by-product of the van der Corput process and other necessary applications of Cauchy-Schwarz, so it is not obvious how to exploit the structure. Though FI, BFI, etc. did somehow manage this…)

3 December, 2013 at 12:04 am

Terence Tao

Here is a side benefit of the multidimensional sieve in Maynard’s paper: it gives a new upper bound on the number of $n \leq x$ such that $n+h_1,\ldots,n+h_k$ are all prime, i.e. the number of fully prime k-tuplets. The standard sieves (Selberg sieve, large sieve, beta sieve) all give an upper bound of $(Ck)^k {\frak G} \frac{x}{\log^k x}$ for this quantity for some absolute constant C and sufficiently large x, as can be seen by looking up Friedlander-Iwaniec’s book; here ${\frak G}$ is the singular series, thus this bound is basically $(Ck)^k$ worse than what the prime tuples conjecture predicts. (The large sieve and Selberg sieve give a slightly better value for C here than the beta sieve.) If one inserts in the multidimensional sieve, one can improve this to $(\frac{Ck}{\log k})^k {\frak G} \frac{x}{\log^k x}$ (in fact one gets this bound whenever one sieves k residue classes modulo p for all large primes p from an interval of length x). Basically, the problem is now to minimise the ratio

$\displaystyle \frac{\int_{{\cal R}_k} F_{1\dots k}^2}{|F(0,\ldots,0)|^2}$

rather than maximise the ratio

$\displaystyle \frac{ \sum_{i=1}^k \int_{{\cal R}_{k-1}} F_{1 \dots i-1,i+1 \dots k}(t_1,\ldots,t_{i-1},0,t_{i+1},\ldots,t_k)^2}{\int_{{\cal R}_k} F_{1\dots k}^2}$

but the two variational problems are comparable in much the same way that the root test and ratio test are comparable in undergraduate analysis.

In the converse direction, if one can find another sieve that gives a bound of $o(k)^k {\frak G} \frac{x}{\log^k x}$ for the number of fully prime k-tuplets, then this sieve is likely to also be able to obtain bounded gaps between primes, or to find short intervals with many primes in them.

3 December, 2013 at 7:18 am

Terence Tao

Oops, turns out I made an arithmetic error :(. The sieve in Maynard’s paper actually gives the same bound $(Ck)^k$ as existing sieves, and this is the limit of the Selberg sieve method (with $C = \frac{2}{e\theta}$ , $\theta$ being the level of distribution), basically because the volume of the simplex is $1/k!$ . As with the root and ratio tests, one only has one direction of the implication: a sieve that can beat the $(Ck)^k$ bound for the k-tuple problem will also likely give many primes in short intervals, but the converse is not necessarily true.

Still it looks like an interesting problem to get $o(k)^k$ for the k-tuple problem. In the converse direction, the parity problem obstruction suggests a lower bound of $2^k$ .

3 December, 2013 at 7:29 am

Andre

Is there still a need for your new simplified minimization problem?

4 December, 2013 at 2:26 pm

Anonymous

I was asking, because I think I have found polynomials $P_{k,m}$ of degree $m>0$ with
$\int_{{\mathcal R}_k}P_{k,m}^2 = 2m^2 \prod_{l=-1}^k (2m+l)^{-1}$
and $P_{k,m}(0) = 1$ ,
which would have given better bounds than the trivial $F := 1$ .

4 December, 2013 at 10:18 pm

Terence Tao

I should have mentioned that in the simplified variational problem, F has to vanish to sufficiently high order on the boundary of the simplex to be admissible. With that assumption, one obtains from the fundamental theorem of calculus that $F(0) = \int_{{\cal R}_k} F_{1\dots k}$ , and so the Cauchy-Schwarz inequality tells us that the extremum occurs when $F_{1\ldots k} = 1$ on the simplex (strictly speaking, one has to smoothly truncate this near the boundary of the simplex, but the error incurred from this can be made arbitrarily small), so the infimal value of this ratio is the reciprocal of the volume of the simplex, i.e. $k!$ .

4 December, 2013 at 4:13 am

The new Prime gap | On the learning curve...

[…] discussed within the community. Terrance Tao again has this nicely articulated and maintains a polymath page for us. As an onlooker, I am as excited as many others to see this […]

4 December, 2013 at 4:48 pm

Pace Nielsen

I thought I would give a quick update on some recent progress in optimizing the methods described in James’ preprint. (As these ideas are already implicit in James’ paper, I’ve told him he is free to use these bounds in the final version of his paper, so they are not properly part of the polymath8b project, but they still might be of interest to people here.)

As usual, let $k$ be fixed. Let $a_1\geq a_2 \geq \ldots \geq a_k\geq 0$ be a sequence of integers. We say that $\alpha=(a_1,a_2,\ldots, a_k)$ is the signature for the monomial symmetric polynomial $m(\alpha)=\sum_{\sigma \in S_k}x_{\sigma(1)}^{a_1}x_{\sigma(2)}^{a_2}\cdots x_{\sigma(k)}^{a_k}$ , and that $d=\sum_{i=1}^{k}a_i$ is the degree. The monomial symmetric polynomials form a vector space basis for the space of symmetric polynomials, and James has a good way of integrating these polynomials over $\mathcal{R}_k$ , with a formula already appearing in his preprint.

So, I wrote a program to search for an optimal solution using such polynomials, up to a given degree. It turns out that there are significant gains over previous computations, and for degree $d=10$ one can get $k=74$ and this is optimal for that degree. Increasing the degree further increases $k$ further, but because of the way I coded the program it runs into RAM issues for $d=11$ . [It has problems computing the permutations of the set $\{(1,0),(1,0),\ldots, (1,0),(0,1),(0,1),\ldots,(0,1)\}$ (where $(1,0),(0,1)$ each occur exactly 11 times.]

To overcome this, I starting doing what James does in his paper, working with polynomials of the form $\{m(\alpha) : {\rm deg}(\alpha)\leq 10\} \cup \{(1-P_1)^a m(\alpha) : {\rm deg}(\alpha)=10, 0\leq a\leq a_0\}$ . Taking $a_0=3$ allows one to get down to $k=64$ .

We are hopeful that eventually these computations will get the optimal $k$ , but there is still a lot of optimizing to do. Currently, taking $a_0=5$ , the computation takes about 1 day for a single value of $k$ .

James and I have some ideas about how to fix the RAM issue and so further improvements are forthcoming. [For example, one option is to restrict to signatures $\alpha$ for which the number of nonzero terms is at most 10.] Anyone interested in running the Mathematica code on their own computer can email me. (Running it shouldn’t be a problem, but I didn’t make many comments in the code, so it may take a bit of work to understand it.)

4 December, 2013 at 10:33 pm

Terence Tao

Very nice to hear! For comparison, the upper bound $M_k \leq \frac{k}{k-1} \log k$ shows that k should not be able to go below 50, and for k=64, we have $M_k \leq 4.22$ (we need $M_k > 4$ to win here on Bombieri-Vinogradov). So things are pretty consistent here (and the upper bound is looking remarkably sharp, all things considered – this may suggest some sort of clue as to what the extremising F are behaving like, and possibly to suggest a better theoretical ansatz for F than the one we are currently using (a truncated tensor product) that has only managed to get down to k=339 thus far).

One might hope that the computations get a little bit faster as k drops. Hopefully it becomes small enough that we can begin computing the effect of truncating the simplex a bit to use MPZ (and also to enlarge the simplex to the slightly larger convex body that is available, and perhaps also to try to use the upper bounds on $P_{ij}$ to shave a little bit more off of H).

5 December, 2013 at 4:41 am

Eytan Paldi

For $m=2$ , this implies that under EH the best $k$ under the current method is in $[51, 64]$ .

5 December, 2013 at 7:24 am

Pace Nielsen

Currently, the dependence of the timing of the computation on the value of $k$ is minimal–it just changes the coefficients a bit but the number of coefficients is unchanged. Eventually $k$ would have an effect, by limiting the lengths of signatures, but as I suggested above, we may need to artificially impose a limit on such lengths to make the computations feasible. (And for $k=51$ this effect would only naturally start taking place when $d=25$ or so.)

I too have thought about whether or not these computations would give us a better idea where to look for the optimal $F$ . I’ll try to look for a pattern in the structure of the optimal $F$ for a given degree as $d$ increases, and report back if I see anything.

5 December, 2013 at 7:44 am

Aubrey de Grey

Pace, could you post a list of the optimal F you’ve found for each degree up to 10? I think it would be fun for everyone to look for such a pattern.

5 December, 2013 at 12:19 pm

Pace Nielsen

Aubrey,

How would you like the data? Would the coefficients of the monomial symmetric polynomials $m(\alpha)$ work for you?

5 December, 2013 at 12:34 pm

Aubrey de Grey

Well, I’m not equipped to make nearly so much use of the data as others here, so I’d prefer to defer to the group as to format. I guess I was assuming that you would essentially emulate equation 7.18 from James’s paper, but please use whatever format seems most convenient and we can all complain as needed :-)

5 December, 2013 at 9:26 am

Eytan Paldi

For a given polynomial $F$ , let $F_j := \mathcal L ^j F$ . My suggestion is to use $F_1, ..., F_N$ (for some N) as a new basis (or add them to the current basis.)

4 December, 2013 at 10:36 pm

Childman

Since it may get increasingly computationally intensive to get lower values of k, it may be a good idea to rewrite the optimization code for some kind of grid or distributed computing..

4 December, 2013 at 6:38 pm

Fan

For $k = 64$ , we have a bound $H = 330$ .

5 December, 2013 at 12:58 pm

Pace Nielsen

As suggested by Aubrey and Roger Baker here at BYU, I’ve now looked at the coefficients of the maximizing function $F$ of a given degree. I have not seen any pattern, but that may be due to a number of different factors. Here is a partial list of the data. For the full list just email me and I’ll send you the file.

Throughout, take $k=59$ (which gives gap sizes of $H=300$ , so it would be really nice to reach this point). First, let me list the lower bounds on $M_k$ for each degree.

For $d=1$ , $M_k \geq 2.8560278...$ .
For $d=2$ , $M_k \geq 3.1819534...$ .
For $d=3$ , $M_k \geq 3.3686854...$ .
For $d=4$ , $M_k \geq 3.5046840...$ .
For $d=5$ , $M_k \geq 3.6092828...$ .
For $d=6$ , $M_k \geq 3.6922649...$ .
For $d=7$ , $M_k \geq 3.7596139...$ .
For $d=8$ , $M_k \geq 3.8147156...$ .
For $d=9$ , $M_k \geq 3.8604234...$ .
For $d=10$ , $M_k \geq 3.8984021...$ .

Any idea if this sequence is converging below 4?

To give the coefficients of $F$ , we first need a way to order the signatures. First, to save space we can drop any zeros from the end of a signature, so the signatures correspond to the partitions of the degree. I then order the signatures first according to total degree, and then according to how Mathematica orders the partitions (which is by taking the maximal $a_1$ , then the maximal $a_2$ etc…).

So, my signatures start out as: $\{\}, \{1\},\{2\},\{1,1\},\{3\},\{2,1\},\{1,1,1\},\ldots$ .

For $d=1$ , the coefficients are $0.7074087...,-0.7068046...$ .
For $d=2$ , the coefficients are $0.3116618...,-0.6281882...,0.3290948...,0.6324104...$ .
For $d=3$ , the coefficients are $-0.0932515...,0.2857225...,-0.3188407...,-0.5819451...,0.1238405...,0.3236689...,0.5909730...$ .

The entries continue to decrease as $d$ increases.

5 December, 2013 at 1:00 pm

Pace Nielsen

(Of course, the above was also suggested by Terry.)

When I said that the entries decrease, I meant in absolute value. They continue to change signs.

5 December, 2013 at 1:21 pm

Andre

perhaps with some more digits, the inverse symbolic calculator might help
http://oldweb.cecm.sfu.ca/projects/ISC/ISCmain.html

5 December, 2013 at 1:57 pm

Aubrey de Grey

Thanks!

I doubt, tentatively, that the sequence M_k[d] which you give converges below 4. By simplistic inspection, I notice that the ratio of (M_k[d+1] – M_k[d]) to (M_k[d+2] – M_k[d+1]) decreases monotonically with increasing d, being around 1.2 for the largest k you list; this makes it look highly unlikely that M_k[15] < 4. However, there is a disconcerting "wobble" in the sequence, in that the last few values of this ratio decrease by 0.029, 0,008, 0.017 and 0.003, inviting apprehension that it might start to increase again, conceivably sharply enough to allow M_k[d] to converge below 4. Let's hope you can push your code that far! (Though, actually the wobble would almost entirely go away if M_k[7] were actually around 3.761 rather than 3.759 – worth a check?)

5 December, 2013 at 2:27 pm

Eytan Paldi

The ratios of the last four consecutive differences are quite stable:
$0.8309..., 0.8295..., 0.8181..., 0.8116...$ – indicating linear convergence (with a rate of about 0.8) to a limit of about $4.06$ .

5 December, 2013 at 2:34 pm

Aubrey de Grey

Yes indeed – my numbers are of course just the reciprocals of yours. The wobble is still evident (differences of 0.014, 0.114, 0.065). I’m still betting Pace will update M_k[7] !

5 December, 2013 at 2:46 pm

Aubrey de Grey

I meant 0.0014, 0.0114, 0.0065 of course.

5 December, 2013 at 4:08 pm

Pace Nielsen

I double checked it, and the value I gave for M_k[7] is correct (and will not improve, unless there is a bug in my program).

5 December, 2013 at 5:12 pm

Aubrey de Grey

Thanks. I actually made a decimal-place error in my own calculations, and the M_59[7] that I should have said I was expecting was actually 3.7594 – but if the wobble is real, so be it!

5 December, 2013 at 3:30 pm

Eytan Paldi

According to this (linear convergence) hypothesis of the lower bounds, it seems that the minimal $d$ needed to conclude that $M_{59} > 4$ is $d =15$ (or $d = 16$ – in case of small extrapolation errors.)

5 December, 2013 at 4:12 pm

Pace Nielsen

Just to give one more data point: Using the notation in my earlier post, and taking $k=59,d=10,a_0=5$ yields the bound $M_k\geq 3.95608...$ . (One can view this as an approximate $d=15$ computation.) So it looks like there can be a big difference between using all symmetric polynomials of a given degree, and just using those of the form $(1-P_1)^a m(\alpha)$ for $\alpha$ of medium degree.

6 December, 2013 at 1:14 pm

Eytan Paldi

Assuming (as described above) that $M_{59}$ is about $4.06$ and (by the current upper and lower bounds on $M_k$ ) that
$M_k - M_{k-1}$ is about $1/k$ , it seems that the minimal $k$ for which $M_k > 4$ is $k = 56$ .
Therefore, I suggest to concentrate in particular on $k = 55, 56$ .

6 December, 2013 at 1:23 pm

Aubrey de Grey

That makes sense – but while there remains a major issue around CPU time, and especially while our speculations as to the sequence for d>10 remain so tentative, I can also see a case for working down from k=64 one step at a time. There is also, of course, the consideration that at any moment Terry or others may chime in with a rigorisation of MPZ in the small-k setting that renders all this BV-context work largely peripheral.

Additionally, I hope people haven’t forgotten too much about m > 1. I guess xfxie has been busy elsewhere lately, but it would be great (especially for Drew!) if there could be an update on the best obtainable k for m=2 arising from Terry’s progress in exploiting the exponential moment method.

5 December, 2013 at 2:16 pm

Anonymous

First pair looks like $\pm \sqrt{2}$ . Is it?

5 December, 2013 at 4:12 pm

Pace Nielsen

No. They are accurate to the decimals I gave, so they are not $\pm \sqrt{2}$ .

6 December, 2013 at 8:18 am

Aubrey de Grey

Hm, but they differ from +/- 1/sqrt(2) by very very nearly exactly the same amount – their average is 0.70710665, and sqrt(2) is 0.70710678. Can that really be a coincidence?

9 December, 2013 at 7:20 am

Pace Nielsen

I just realized something silly.

The switching of signs, the decreasing of absolute values, and the pair above looking like $\pm \sqrt{2}$ are all artifacts of Mathematica normalizing the optimal $F$ .

Recall that $F$ is really only determined up to a constant. What we should have been doing this whole time is normalizing so that the constant term is exactly 1. Sorry for not realizing this earlier.

9 December, 2013 at 8:17 am

Aubrey de Grey

Ha! OK: can you post revised values?

5 December, 2013 at 3:01 pm

Aubrey de Grey

Pace: it might be instructive if we could do this kind of intuitive analysis for smaller k, not least to see what sort of value of d (in other words, how much work in optimising your code) seems likely to be needed to determine what is the truly minimal k obtainable using current ideas. Do you yet have a M_k[d] sequence for k = 58, 57 etc?

5 December, 2013 at 3:16 pm

Aubrey de Grey

PS: Naturally I’m also thinking about k < 51, in anticipation of progress in rigorising the post-Maynard validity of MPZ (and thereby lowering the required M_k below 4).

Also: I'm unclear as to the impact of raising a_0. You mentioned earlier that you obtained k=64 using a_0=3, but also that you are currently experimenting with a_0=5. Can you give us a sense of what you're seeing in terms of the dependence of k on a_0 (rather than just d)? The magnitude of that impact would somewhat inform whether it is worth taking the time to derive a set of M_k sequences for k = 58, 57 … for a_0 = 3 (or 4) in order to enable the above analysis.

5 December, 2013 at 4:21 pm

Pace Nielsen

I’m seeing the following: Increasing $a_0$ by 1 increases the run-time by double, but also seems to decrease $k$ by one or two (in the short term). So taking $a_0=4$ should let us get $k=63$ or $62$ .

On the other hand, increasing $d$ by $1$ multiplies the run-time in a quadratic fashion. So $d=6$ takes 8 seconds, $d=7$ takes 21 seconds, $d=8$ takes 110 seconds, $d=9$ takes 14 minutes, and $d=10$ takes a little over 2 hours. I expect $d=11$ to take about a day, etc…

When I get some time, I’ll try to derive the sequence $M_{59}$ for different $a_0$ . Currently, my computer processor is busy doing a few other computations.

6 December, 2013 at 1:28 pm

Aubrey de Grey

Pace: could you please re-post that sequence for d=3? – it is appearing truncated after the first five values, and I now have a feeling that the last two could be quite informative. Thanks!

6 December, 2013 at 2:43 pm

Pace Nielsen

The last few are -0.5819451…,0.1238405…,0.3236689…,0.5909730….

(You can also see them by scrolling over the LaTeX.)

6 December, 2013 at 3:43 pm

Aubrey de Grey

Many thanks. (The scroll-over doesn’t work for me – browser issue I guess.) I think I’m ready for the complete file – please send it to aubrey@sens.org – thanks!

8 December, 2013 at 10:22 am

Phil Dee

So what are the tentative values for H_1 at the moment ?

8 December, 2013 at 4:47 pm

Polymath8b, III: Numerical optimisation of the variational problem, and a search for new sieves | What's new

[…] for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on […]

	Alex Gunning on A symmetric formulation of the…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…
	Ring Theory Intervie… on Reading seminar: “Stable…
	Anonymous on Work hard
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…

Polymath8b, II: Optimising the variational problem and the sieve

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

129 comments

Leave a comment Cancel reply

For commenters

Polymath8b, II: Optimising the variational problem and the sieve

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

129 comments

Leave a comment Cancel reply

For commenters