This is the second thread for the Polymath8b project to obtain new bounds for the quantity

either for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on are:

- (Maynard) .
- (Polymath8b, tentative) .
- (Polymath8b, tentative) for sufficiently large .
- (Maynard) Assuming the Elliott-Halberstam conjecture, , , and .

Following the strategy of Maynard, the bounds on proceed by combining four ingredients:

- Distribution estimates or for the primes (or related objects);
- Bounds for the minimal diameter of an admissible -tuple;
- Lower bounds for the optimal value to a certain variational problem;
- Sieve-theoretic arguments to convert the previous three ingredients into a bound on .

Accordingly, the most natural routes to improve the bounds on are to improve one or more of the above four ingredients.

Ingredient 1 was studied intensively in Polymath8a. The following results are known or conjectured (see the Polymath8a paper for notation and proofs):

- (Bombieri-Vinogradov) is true for all .
- (Polymath8a) is true for .
- (Polymath8a, tentative) is true for .
- (Elliott-Halberstam conjecture) is true for all .

Ingredient 2 was also studied intensively in Polymath8a, and is more or less a solved problem for the values of of interest (with exact values of for , and quite good upper bounds for for , available at this page). So the main focus currently is on improving Ingredients 3 and 4.

For Ingredient 3, the basic variational problem is to understand the quantity

for bounded measurable functions, not identically zero, on the simplex

with being the quadratic forms

and

Equivalently, one has

where is the positive semi-definite bounded self-adjoint operator

so is the operator norm of . Another interpretation of is that the probability that a rook moving randomly in the unit cube stays in simplex for moves is asymptotically .

We now have a fairly good asymptotic understanding of , with the bounds

holding for sufficiently large . There is however still room to tighten the bounds on for small ; I’ll summarise some of the ideas discussed so far below the fold.

For Ingredient 4, the basic tool is this:

Thus, for instance, it is known that and , and this together with the Bombieri-Vinogradov inequality gives . This result is proven in Maynard’s paper and an alternate proof is also given in the previous blog post.

We have a number of ways to relax the hypotheses of this result, which we also summarise below the fold.

** — 1. Improved sieving — **

A direct modification of the proof of Theorem 1 also shows:

Here is defined for the truncated simplex in the obvious fashion. This allows us to use the MPZ-type bounds obtained in Polymath8a, at the cost of requiring the test functions to have somewhat truncated support. Fortunately, in the large setting, the functions we were using had such a truncated support anyway. It looks likely that we can replace the cube by significantly larger regions by using the (multiple) dense divisibility versions of , but we have not yet looked into this.

It also appears that if one generalises the Elliott-Halberstam conjecture to also encompass more general Dirichlet convolutions than the von Mangoldt function (see e.g. Conjecture 1 of Bombieri-Friedlander-Iwaniec), then one can enlarge the simplex in Theorem 1 (and probably for Theorem 2 also) to the slightly larger region

Basically, the reason for this is that the restriction to the simplex (as opposed to ) is only needed to control the sum , but by splitting into products of simpler divisor sums, and using the Elliott-Halberstam hypothesis to control one of the factors, it looks like one can still control error terms in the larger region (but this will have to be checked at some point, if we end up using this refinement). This is only likely to give a slight improvement, except when is small; from the inclusions

and a scaling argument we see that

Assume EH. To improve the bound to , it suffices to obtain a bound of the form

where

and

With given in terms of a cutoff function , the left-hand side can be computed as usual as

while we have the upper bound

and other bounds may be possible. (This is discussed in this comment.)

For higher , it appears that similar maneuvers will have a relatively modest impact, perhaps shaving or so off of the current values of .

** — 2. Upper bound on — **

We have the upper bound

for any . To see this, observe from Cauchy-Schwarz that

The final factor is , and so

Summing in and noting that on the simplex we have

and the claim follows.

Setting , we conclude that

for sufficiently large . There may be some room to improve these bounds a bit further.

** — 3. Lower bounds on — **

For small , one can optimise the quadratic form

by specialising to a finite-dimensional space and then performing the appropriate linear algebra. It is known that we may restrict without loss of generality to symmetric ; one could in principle also restrict to the functions of the form

for some symmetric function (indeed, morally at least should be an eigenfunction of ), although we have not been able to take much advantage of this yet.

For large , we can use the bounds

for any and any with ; we can also start with a given and improve it by replacing it with (normalising in if desired), and perhaps even iterating and accelerating this process.

The basic functions we have been using take the form

where

and

Then , and

where are iid random variables on with density . By Chebyshev’s inequality we then have

if , where

and

A lengthier computation for gives

assuming .

## 129 comments

Comments feed for this article

22 November, 2013 at 7:51 pm

Terence TaoIt might be time to start collecting upper and lower bounds for for small k (say, up to k=105) to get a sense of how much room there is for improvement here, and what one can hope to achieve from the MPZ estimates (or from using fancier symmetric functions). The fact that is now known to not grow any faster than is a little disappointing for the large theory, but perhaps there is yet another way to modify the sieve to break this barrier (much as the GPY sieve, for which M_k never exceeded 4, was modified into the current sieve introduced by James). The only “hard” obstruction I know of is the parity obstruction, which basically stems from the belief that is negligible for just about any sieve under consideration, which means in probabilistic terms that

and in particular

This translates to the trivial bound in the current context, which is well short of the truth; it seems there is a substantial gap between “having an odd number of prime factors” and “is prime” when is working with k-tuple sieves.

22 November, 2013 at 9:24 pm

Pace NielsenI thought I’d play around with your heuristics for .

If we plug in Maynard’s function

(where are the first two power-symmetric polynomials) into the upper-bound for (and dropping little-o terms) we get that . We needed to beat to force down one.

I imagine that there are better choices for in this context, and I’ll start thinking about that.

22 November, 2013 at 10:11 pm

Terence TaoOne can presumably get some gain by enlarging the support of F from the simplex to the larger (but more complicated) region . (The cutoff in the formula for would be similarly modified.) One would have to do a certain amount of fiddly numerics though to integrate on this more messy polytope.

Also, we’ve broken the symmetry a bit, and it’s no longer necessarily optimal to use F that are symmetric in all five variables; instead, it should just be symmetric in and in . But perhaps we get a bit more flexibility with this broken symmetry.

23 November, 2013 at 8:07 am

Terence TaoIn a previous comment, James indicated that if we simply inserted our best confirmed value into the low k machinery, ignoring delta, we could potentially reduce k from 105 to 68 (and presumably we do a little better than this if we use the tentative value ). The main thing blocking us from making this a rigorous argument is that in order to use , we have to use test functions F that are not only restricted to the simplex, but also to the cube . This is not a serious problem for the large k argument (because the test functions were already by design supported on the cube ) but is so for the low k argument (because we are instead using polynomials of the symmetric polynomials P_1 and P_2.

However, it could be, as was the experience in Zhang and Polymath8a, that the contributions outside of this cube are exponentially small and thus hopefully negligible (although we have to be careful because our values of k are now small enough that “exponentially small” does not _automatically_ imply negligibility any longer). Basically we could follow arguments similar to that in Section 4 of the Polymath8a paper, writing the truncated F as where is untruncated and is the portion outside of the cube, and use the inequality

and upper bound the error term by controlling the contribution when a given coordinate exceeds . Here the point is that this is a small portion of the simplex; by volume, it is something like .

At present, are so small that this is actually not so negligible, but we can then pull out the other Polymath8a trick, which is to exploit dense divisibility and introduce another parameter . This allows us to enlarge the cube to something like the portion of in which (this is basically Lemma 4.12(iv) from Polymath8a), which should lead to better error terms as per Polymath8a.

23 November, 2013 at 12:50 pm

Terence TaoA little bit more elaboration of the above idea, using some of the Cauchy-Schwarz arguments from the upper bound theory.

Let be some symmetric test function on the simplex, and let be the restriction of to the cube where . Then where is the portion of outside of the cube. We then have

The last term we can bound by . Now we try to upper bound . By symmetry this is

. (1)

The integrand is only non-zero when and . By Cauchy-Schwarz we have

and

and so by the AM-GM inequality we can bound (1) by

for any , which symmetrises to

(here we use the disjoint supports of the various integrals) so on optimising in we obtain a bound of

so one just need a reasonably good bound on (compared with ) to be able to neglect these errors. As I said before, the ratio of to is likely to be something like ; this doesn’t look too good for the values of k we are considering (i.e. below 105), but maybe if we split into two deltas as before, then it won’t be so bad.

23 November, 2013 at 8:32 am

Eytan PaldiIn the above definition of , the subscript in

should be replaced by .

(with similar correction in the proof of the upper bound on .)

[Corrected, thanks – T.]23 November, 2013 at 10:29 am

Eytan PaldiThis typo still appears in several places

1. In the definition of , should be and the last in the integral should be .

2. In the proof of the upper bound on , in the first two integrals

should be .

[Corrected, thanks – T.]23 November, 2013 at 9:30 am

Eytan PaldiIn the proof of the upper bound on , it should be

“noting that “.

[Corrected, thanks – T.]23 November, 2013 at 9:58 am

Terence TaoHere is an idea that might not pan out, but I’m throwing it out there anyways.

Take k=3 for brevity. Right now, we need to control sums such as , where is a divisor sum basically of the form

We can then set and are looking at a sum of the form

We can compare the inner sum against its expected value , and are left with dealing with an error term

Now at this point, what Zhang and Polymath8a do is to take absolute values and start looking at something roughly of the shape

where came from and is something formed from the Chinese remainder theorem (and thus could potentially be quite large). But this throws away two things: firstly, that the residue class came from two residue classes which were bounded, and secondly, that the coefficients were not completely arbitrary, but instead had a somewhat multiplicative structure (basically of the form ). It may be that if we delve into the Type I/II estimates, we can exploit this extra structure to go beyond the exponents that we currently do. In particular, the coefficients that show up in Polymath8a roughly seem to have the shape (times smooth factors), and so has some smoothness in r which looks helpful (but somewhat difficult to exploit).

23 November, 2013 at 11:38 am

Aubrey de GreyAm I right in assuming that the recent “Deligne-free” values from Pace that are shown on the “world records” page are derived assuming only B-V, following James’s paper? If so, perhaps it would be of interest to do as in Polymath8a and have a look at what k0 values the Deligne-free [varpi,delta] derived on July 30th gives rise to in the post-Maynard universe?

25 November, 2013 at 8:19 am

Terence TaoThis is something we can certainly do without much difficulty once our results have stabilised, but at present the four benchmarks we have (m=1 from BV, m=1 using Polymath8a, m=2 from BV, m=2 using Polymath8a) are covering enough of the “parameter space” that any further benchmarking would probably just be confusing (although perhaps a m=3 case, say using the best Polymath8a result, might be worth looking at).

23 November, 2013 at 2:40 pm

Terence Tao’s ”What’s New”- The Polymath. | Edgar's Creative[…] Polymath8b, II: Optimising the variational problem and the sieve. […]

23 November, 2013 at 6:44 pm

Eytan PaldiThe upper and lower bounds on may be improved (in particular for small ) by generalizing the basic function (e.g. by ) and optimizing over the additional parameters (e.g. ) as well.

24 November, 2013 at 5:05 am

Eytan PaldiI don’t understand (8.9) in Maynard’s paper (the integral is divergent.)

24 November, 2013 at 5:39 am

Pace NielsenThere is a minor typo. He just dropped from the integral.

24 November, 2013 at 5:50 am

Pace NielsenTo see the equality in (8.9) after that factor is reinserted, just integrate each summand separately, and count.

By the way, I had your very question earlier. There seems to be a second typo in the line above (8.13). It says that , but I believe the RHS should be .

24 November, 2013 at 6:02 am

Eytan PaldiThanks for the explanation!

25 November, 2013 at 2:53 pm

James MaynardJust to confirm: Everything that Pace says is correct. Both of these are typos in the paper – sorry for the confusion this caused.

If anyone happens to spot similar typos I’d quite like to hear – either emailing me or mentioning on here would be appreciated!

24 November, 2013 at 6:49 pm

Terence TaoI can now modify the Cauchy-Schwarz argument to prove the clean upper bound

(1)

which saves a or so from the previous bound.

The key estimate is

(2)

Assuming this estimate, we may integrate in to conclude that

which symmetrises to

giving (1).

It remains to prove (2). By Cauchy-Schwarz, it suffices to show that

But writing , the left-hand side evaluates to

as required.

This also suggests that extremal F behave like multiples of in the variable for typical fixed choices of .

In the converse direction, I have an asymptotic bound

for sufficiently large k. Using the notation of the previous post, we have the lower bound

whenever is supported on , , and are independent random variables on [0,T] with density . We select the function

with , and for some to be chosen later. We have

and

and so

Observe that the random variables have mean

The variance may be bounded crudely by

Thus the random variable has mean at most and variance , with each variable bounded in magnitude by . By Hoefding’s inequality, this implies that is at most with probability at most for some absolute constant . If we set for a sufficiently large absolute constant , we thus have

and thus

as claimed.

Hoeffding’s bound is proven by the exponential moment method, which more generally gives the bound

for any , which is a somewhat complicated, but feasible-looking, optimisation problem in A, T, s. (Roughly speaking, one expects to take A close to , T a bit smaller than , and roughly of the order of .)

7 December, 2013 at 9:47 am

Eytan PaldiFor the upper bound is , while from (7.19) in Maynard’s paper – showing that the upper bound is surprisingly good even for .

25 November, 2013 at 5:14 am

Eytan PaldiAs an interesting application of this upper bound, we have

1. , so under EH the current method is limited to

.

2. , so Under EH[1/2] the current method is limited to .

25 November, 2013 at 8:49 am

Aubrey de GreyOr 42 under the polymath8a results, yes? (I believe the confirmed and tentative versions both give this value.) Or am I overinterpreting earlier posts concerning the derivability of M from varpi?

27 November, 2013 at 2:09 pm

Eytan PaldiUnfortunately, the particular above (used for the derivation of the upper bound) is not in (since the integration of with respect to gives – which is not integrable over ) – so this can’t be used as a basis function for lower bounds evaluation.

It should be remarked that – since it has only logarithmic singularity.

14 December, 2014 at 7:28 pm

Eytan PaldiIt seems that there is an error in the above derivation of the lower bound since (not as in the above derivation!)

[Corrected, thanks – there was a similar factor missing in the bound which cancels out the error. -T.]1 February, 2015 at 7:44 pm

Eytan PaldiThere is another error in the lower bound derivation:

It seems that Hoeffding’s inequality (which is independent of the variance) gives for the probability estimate – which is clearly too crude!

The problem in using this inequality is that the denominator in the exponent is while the (much smaller!) numerator is only .

Fortunately, by using the above variance estimate, it follows from Bennett’s inequality (or even from Bernstein’s inequality) the probability estimate which implies the (slightly weaker) lower bound

30 December, 2014 at 3:25 pm

Eytan PaldiFor the last lower bound on , it is easy to verify that the lower bound is maximized by (temporarily) fixing and maximizing the integral over under the two (integral) constraints that and are fixed. It is easy to show that the optimal function has the form for some .

Fortunately, the resulting integrals representing (needed for the lower bound) are simple elementary functions of – enabling simpler optimization of the lower bound over the parameters .

For large , it seems that the paremeters can be reparametrized by

Where are the new parameters.

(Unfortunately, it seems that asymptotically we still get the above lower bound implied by Hoeffding inequality.)

24 November, 2013 at 9:56 pm

xfxieSeems k0 can drop some, assuming the implementation is correct.

With : =339

Without using the inequalities: =385

25 November, 2013 at 10:07 am

Terence TaoTwo small thoughts:

1. The Cauchy-Schwarz argument giving upper bounds on might more generally be able to give upper bounds for the quantity

for any Selberg sieve

regardless of the choice of coefficients , assuming that the error terms are manageable (which presumably means something like where is the level of distribution) and also some mild size conditions on the coefficients (e.g. all of size ). Upper bounds are of course less directly satisfying than lower bounds (particularly for the purposes of advancing the “world records” table) but would help give a picture of what more one can expect to squeeze out of these methods. (And we also have some possible ways to evade these upper bounds, e.g. by utilising the two-point correlations as discussed previously; one could also hope to use other sieves than the Selberg sieve, though I don’t see at present what alternatives would be viable.)

2. As mentioned in Maynard’s paper, the method here would also allow one to produce small gaps in other sets than the primes, as long as one had a non-trivial level of distribution . For instance, if one replaced primes by semiprimes (products of exactly two primes) Goldston-Graham-Pintz-Yildirim obtained the bounds and unconditionally. These bounds are already quite good (semiprimes are significantly denser than primes, which is why the results here are better than those for primes) but perhaps there is further room for improvement.

25 November, 2013 at 10:48 am

Terence TaoJust did a quick calculation: using Maynard’s sieve (with T equal to, say, ) the quantity looks to be roughly on the order of (compared to in the case of primes), by counting products of two primes each of size at least . (For semiprimes with smaller factors than this, the sieve weight becomes smaller.) So this would give something like for semiprimes, and more generally for products of exactly r primes (with implied constants depending on r).

Actually, in many ways, the numerology for our current problem with primes is very similar to the GGPY numerology for semiprimes.

25 November, 2013 at 8:06 pm

Terence TaoLooks like the Cauchy-Schwarz argument does indeed block the Selberg sieve from doing much better than primes in a k-tuple, assuming that the weights are fairly uniform in magnitude, which among other things, allows one to discard the small fraction of weights in which the heuristic mentioned in Section 6 of Maynard’s paper fails. The analogue of the ratio is then (in the notation of the paper)

Cauchy-Schwarz then gives

for any (using the fact that is constrained to be coprime to ); using this and summing in (using the constraint ) shows that the ratio is of size , which for bounded choices of A gives a bound of . If one works this argument more carefully we get the same upper bounds that we do for the integral variational problem .

There is a potential loophole in using sieve weights that focus on those moduli r with a unusually large number of prime factors (so that are no longer close to each other), but this is a very sparse set of moduli, to the point where the error terms in the sieve analysis now overwhelm the main terms. Also, intuitively one should not be able to get a good sieve by using only a sparse set of the available moduli.

To me, this suggests that the main remaining hope for breaking the logarithmic barrier is to work with a wider class of sieves than the Selberg sieve. In principle there are a lot of other possible sieves to use (e.g. beta sieve or other combinatorial sieves), but numerically these sieves tend to underperform the Selberg sieve for these sorts of problems (enveloping the primes in a sieve weighted on almost primes in order to make the density of the primes as large as possible). It’s tempting to try to perturb the Selberg sieve in some combinatorial fashion, but the trick is to keep the sieve non-negative (which is crucial in our argument, unless one can somehow get excellent control on the negative portion of the sieve); once one leaves the Selberg sieve form this becomes quite non-trivial.

26 November, 2013 at 11:06 am

Terence TaoI’m now leaning towards the view that any replacement for the Selberg sieve will also encounter a logarithmic type barrier, in that the density is unlikely to be much higher than . Actually it is rather miraculous that we get the logarithmic gain at all, a naive heuristic computation (which I give below) suggests that should be the threshold.

To motivate things, suppose we start with a one-dimensional sieve , where are some coefficients for up to some height . In the Selberg sieve case this divisor sum is the square of some other divisor sum to ensure non-negativity, but we will not assume here that the sieve is of Selberg type.

The primality density of such a sieve cannot be much larger than . This is because all -rough numbers (numbers with no prime factor less than ) are assigned the same value by the sieve , and by the asymptotics for the Dickman-de Bruijn function, the primes in form a proportion of about of all the -rough numbers in .

Now we look at a multidimensional sieve , where again we do not assume the sieve to be of Selberg type, but we impose a restriction of the shape since we are unlikely to control the sieve error terms without such a hypothesis, even assuming the strongest Elliott-Halberstam type conjectures. Heuristically, this limits each of the to be typically of size , and then the one-dimensional argument suggests that the primality density should not be much larger than for each .

Now the Maynard sieve does do a bit better than this, because each factor is allowed to be a bit higher than on occasion (there is a weight roughly of the shape in the sieve), since one can rely on concentration of measure (or the law of large numbers, if you wish) to keep the product of a reasonable size . This occasional use of higher moduli seems to just barely be sufficient to diminish the sieve on some of the other -rough numbers than the primes to boost the prime density from to . But it seems unlikely that a clever choice of sieve weights could do much better than this.

26 November, 2013 at 12:51 pm

James MaynardDo you have a heuristic which explains the gain in the log, without looking too much at the sieve functions themselves?

I ask because I’d been working under the same heuristic that you mention above, and it surprised me that the modified weights were able to win by a full factor of a log. I don’t have any good heuristic explanation as to how one might be able to `see’ that these weights do this well a priori. Obviously, although logs are critical in the small-gaps argument, they are very small in comparison to , and so we shouldn’t necessarily be surprised a heuristic breaks down at this level of accuracy.

That said, I liked this heuristic because it seemed to work well with similar problems. It seems to explain quite naturally the results of GGPY/Thorne on small gaps between numbers with prime factors, results on sifting limits and results on the number of prime factors of . (Some of these are more `robust’ to a log factor than others). The `superiority’ of the Selberg sieve over other sieves for larger seemed to me just that the implied constants in the -terms were better.

26 November, 2013 at 9:17 pm

Terence TaoUnfortunately I don’t have much of an explanation for this – at a formal level it shares a lot of similarities with the logarithmic divergences that show up in the endpoint Sobolev inequalities in harmonic analysis, in that one gets contributions from logarithmically many dyadic scales (which, in this case, would be something like the contribution of moduli between and for ) but the way in which these scales interact seems rather specific to the Selberg sieve, and it is possible that some other sieve might give a different power of log (or more likely, no logarithmic gain at all).

26 November, 2013 at 3:08 am

Eytan PaldiPerhaps there is some generalization to a sieve combining translates of several admissible tuples.

25 November, 2013 at 11:55 am

Aubrey de GreyAsking the converse question: is there any N for which there are already known to be infinitely many x such that both x and x+2 are products of exactly N primes? If not, is it plausible that Maynard’s methods can be used to identify such an N? (Or if so, to reduce the known N?)

25 November, 2013 at 12:09 pm

Terence TaoNo, this is blocked by the parity problem. Given any sieve candidate , we expect and to be negligible (this is part of the “Mobius pseudorandomness heuristic”), which basically means that if is drawn at random from the distribution, then has a probability of having an even number of factors, and of having an odd number of factors. Similarly for . The upshot of this is that one cannot use purely sieve-theoretic methods to prescribe the parity of the number of prime factors of both and . But it turns out (as worked out by Bombieri) that on EH, one can basically do anything that does not run into this parity obstruction; for instance one can find infinitely many prime n such that n+2 is the product of either 2013 or 2014 primes. (This should be discussed in Friedlander-Iwaniec’s “Opera del Cribro”, though I don’t have this reference handy at present.)

Breaking the parity barrier for twins would be an absolutely huge breakthrough, even if conditional on Elliott-Halberstam. But it’s going to take a radically new idea; it’s not just a matter of tweaking the sieves.

25 November, 2013 at 11:21 pm

AnonymousI was wondering if you could elaborate on the intuition for why it will take a radically new idea (similarly, I think I remember you saying that proving or disproving $P versus NP$ will require a new idea. how do you know?) Is it just that a lot of people have tried to solve these problems using many different techniques and failed or is it something else? Thanks.

26 November, 2013 at 9:13 am

Terence TaoI discuss the parity problem further in my old blog post https://terrytao.wordpress.com/2007/06/05/open-question-the-parity-problem-in-sieve-theory/

27 November, 2013 at 12:00 am

AnonymousThanks.

25 November, 2013 at 12:43 pm

Pace NielsenThree days ago I started running some computations. Here are the results. Throughout, I will be assuming EH.

————————–

Take . Let be an arbitrary polynomial function of total degree , which is symmetric in and in separately. Working over the original simplex of Maynard, set

.

Using a numerical optimizer (so results are not necessarily optimal, but should be close), we get:

For , .

For , .

It is not difficult to increase a little more. Probably up to is feasible.

————————–

Take . Let be an arbitrary polynomial function of total degree , which is symmetric in the variables. Working over the original simplex of Maynard, and again using a numerical optimizer, we get:

For , .

For , .

For , .

This is already very close to the upper bound of recently discovered by Terry.

————————-

I then tried to do the case for the new simplex , but the numerics give weird answers. For instance, when , we get . The reason I didn’t post these computations earlier is that I’ve been trying to figure our where my error is. After two days I still have not been able to track down my error. I wrote up a LaTeX document explaining the computation, and I’m willing to email it (and the mathematica file) to anyone who is interested.

25 November, 2013 at 2:34 pm

James MaynardI reran the computations I alluded to at the beginning of the previous post.

My calculations agree with yours for the original simplex (I get the slightly better bound when .)

This is for , using the enlarged simplex . As you have done, I’m taking to be an arbitrary symmetric polynomial of degree on , and then I’m using the linear-programming method from my paper to calculate the eigenvector which corresponds to the optimal choice of coefficients of .

I then get the bounds

This was my reason for saying using the enlarged simplex gets `about half way’ to obtaining gaps of size 8: we’ve improved from a lower bound of about 0.92 to about 0.96, and we want to get it to 1.

It appears that the convergence as increases is for the enlarged simplex is noticeably slower than for the original simplex.

I can give more details of how I did the calculations if interested.

25 November, 2013 at 4:03 pm

Pace NielsenWhen I figure out where my computational error is, now I’ll know what numerics to look for. Thank you!

P.S. The reason I didn’t send those typos I mentioned above to you, is that I’m reading through your paper with Roger Baker (here at BYU) and he is compiling a list of the typos we run across. He will be sending that to you, probably in a couple of weeks.

25 November, 2013 at 7:35 pm

James MaynardOops, there was a small typo in my program. The corrected bounds I get are

and the convergence with seems good.

Summary of my calculation:

,

we can notice that if is symmetric, then is symmetric in . This means the region with will contribute of the total. In this region, it is straightforward to put exact limits on each integration variable. We get

We also get a similar expression for . Again, these are quadratic forms in the coefficients of , and so we can find the optimal choice of of given degree by calculating the largest eigenvalue of a matrix.

26 November, 2013 at 11:24 am

James MaynardBogdan: Yes, the upper bound of 0.924 was for functions whose support was restricted to the smaller simplex . By comparison, we are getting a lower bound of 0.922 in this case, so these seem to be in good agreement.

If we use a modification of the original method (such as extending the support to the slightly large simplex or incorporating the terms) then this upper bound no longer applies. In particular, the lower bound of 0.96 using the larger simplex isn’t in contradiction to the earlier bound.

26 November, 2013 at 2:14 am

BogdanI am a bit confused: the upper bound of 0.924 is reported in post for 25 November, 2013 at 5:14 am, and the conclusion was that “so under EH the current method is limited to k_0=5”, and now you get the lower bound of 0.9689! Is this because of “using the enlarged simplex”?

26 November, 2013 at 5:29 pm

Eytan PaldiThe fast convergence (using polynomials) indicates the (possible) existence of a smooth maximizer.

25 November, 2013 at 3:23 pm

AnonymousComments (from a non-mathematician who is interested in mathematics):

In Maynard’s paper, it is stated that , but in the post, it is stated that .

25 November, 2013 at 4:28 pm

Terence TaoThe bound is unconditional, but the improved bound is available if one assumes the Elliott-Halberstam conjecture; this isn’t explicitly stated in Maynard’s paper but it follows easily from the methods of the paper (so if James is revising the paper anyway, one may as well add in that remark).

25 November, 2013 at 3:52 pm

Anonymous@Maynard:

* The subscript “” should be “”.

* When typing maps, one should use “\colon” instead of “:” in order to get the correct spacing aroung the colon.

(Typographical comments, valid throughout the paper.)

26 November, 2013 at 8:28 am

Aubrey de GreyI’m wondering whether it is possible (or desirable) to reformulate the progress being made here on M_k in a manner that more closely resembles prior work, in particular Polymath8a. Taking things down to their barest essentials, I gather that we can informally say that Maynard and polymath8b have improved the k for which DHL[k,m+1] is implied by EH[theta] (but see below) from an exponentially decreasing function of (theta – m/2) (i.e. GPY) to something on the order of e^(2m/theta). I appreciate that there are good mathematical reasons for working centrally with the quantity M_k, but it’s not clear to me whether the evolving bounds on M as a function of k can be “inverted” to give bounds on k as a direct function of m and theta. If that were possible, I think that for expository purposes it would be of great interest.

Also, I have a somewhat similar question relating to polymath8a. Is it possible to state an explicit version of EH[theta] (let’s call it EH’[theta]) that is implied (up to a value of theta that is a function of i. varpi and delta) by MPZ(i)[varpi,delta], and that also suffices to imply DHL[k,m+1] for sufficiently large k? If that were possible, it seems like it would constitute a nicely clean way, again for expository purposes, to separate the progress in reducing delta from the progress in reducing k for a given theta (and m). Can someone please clarify this?

26 November, 2013 at 8:43 am

Terence TaoThe various upper and lower bounds on Maynard’s sieve all correspond to asymptotics of the form for large , which in particular implies for large ; the only difference between them lies in the precise nature of the o(1) error term. Once the bounds stabilise, one can of course work these things out more accurately.

The Bombieri-Vinogradov theorem already implies for sufficiently large k, and is a known theorem and is thus implied by any MPZ type statement. We do have some implications that do a little better than this by exploiting an MPZ hypothesis, by using a cutoff function F for Maynard’s sieve that is supported in an appropriate cube; again, the asymptotics for large m or large k are as above (in particular, the role of delta is asymptotically negligible), but the situation seems to be more complicated in the non-asymptotic regime, e.g. when m=1 or m=2.

26 November, 2013 at 9:36 am

Aubrey de GreyThanks! On the first point, absolutely – I just couldn’t exclude the possibility that this inversion might be so easy that there would be no reason to defer it. On the second point, sure, I was of course referring to an EH’ that would do the job for theta > 1/2; however I was mainly referring simply to the pre-Maynard universe and wondering whether this kind of condensation of i, varpi and delta down to a single parameter theta is possible at all.

26 November, 2013 at 11:33 am

Aubrey de GreyIn other words, when I wrote “for sufficiently large k” I should have written “for the same k that MPZ(i)[varpi,delta] itself does”.

26 November, 2013 at 1:13 pm

Terence TaoA minor (and standard) remark: the Selberg sieve can be rewritten as

where are as in Maynard’s paper and are the Ramanujan sums

(restricting to squarefree ); this is basically equation (5.8) of Maynard’s paper after some rearranging.

The functions behave approximately like independent random variables as vary, which heuristically explains why the mean value of is basically

and can also give heuristic justification for the asymptotic for in terms of the (Lemma 5.2 of Maynard’s paper). I wasn’t able to use this representation of the Selberg sieve for much else, though.

26 November, 2013 at 2:51 pm

AnonymousTypographical comment to Terry: If you use $\dots$ instead of $\ldots$, LaTeX will take case of correct placement of the dots; they shouldn’t always be lowered to the base of the line.

26 November, 2013 at 3:34 pm

Eytan PaldiIn the definition of , the functions need only to be in

(i.e. not necessarily bounded.)

26 November, 2013 at 5:10 pm

Pace NielsenIn the spirit of “A poor man’s improvement on Zhang’s result”, I was playing around with Maynards mathematica notebook. I found that increasing “p[5]” to “p[6]” (and increasing the number of variables to 56, and precomputing polynomials up to degree 13) allows one to push . This was confirmed by Maynard. [It appears that no further gain is made in such an easy manner by just increasing the degree further.]

Obviously, further improvements are available if we incorporate some of the higher order power-symmetric polynomials like P_3.

26 November, 2013 at 9:57 pm

Sniffnoy102 or 103? It says 102 here but 103 on the wiki and I’m wondering whether that is a typo here or a copying error there (the H value listed there is for 103).

[Corrected, thanks – T.]27 November, 2013 at 8:44 am

Terence TaoHere is a possible way to generalise the Selberg sieve. We’ll use the “multiimensional GPY” formulation

for the sieve, where F is a smooth function supported on the simplex (or the enlarged simplex). We can square this out as

One can then consider the more general sieve

for some smooth function supported on the product of two simplices. This function obeys similar asymptotics to the original sieve (this can be seen for instance by using Stone-Weierstrass to approximate by a finite linear combination of tensor products, or else by direct computation); for instance one has

if the denominator is non-zero, where the subscripts on F denote differentiation in one or more of the variables . The main problem is that without further hypotheses on , the sieve weight is not necessarily non-negative, which is a crucial property in the argument. In the case that is positive semidefinite (viewed as a “matrix” or “quadratic form” on the simplex), one obtains the non-negativity; but in this case one can diagonalise F into a non-negative combination of Selberg sieves, which cannot give better ratios than a pure Selberg sieve. However, it may be possible to have non-negative even without being positive semidefinite, leading to a genuinely more general class of sieve. The positivity requirement is a bit complicated though, even in the base case k=1; a typical condition is

whenever . This condition demonstrates the non-negativity of for products of two primes; one needs similar conditions for products of primes for any bounded r, and this should suffice (note that these sieves are concentrated on numbers with only a bounded number of prime factors).

These conditions are implied by positive semi-definiteness of , but hopefully they are a bit weaker, allowing for more test functions to play with.

27 November, 2013 at 3:22 pm

Terence TaoA cute inequality related to the above considerations: for any smooth function , one has

where and is the difference operator . Thus for instance

The sieve theoretic interpretation is that the right-hand side is an appropriately normalised version of if , and the left-hand side is the contribution to coming from products of exactly primes for . For sufficiently smooth functions F (e.g. finite linear combinations of exponentials), the inequality is in fact an identity, as can be seen by a rather fun computation. The claim can also be proven from the recursive distributional identity

for , where is the nonlinear operator

27 November, 2013 at 5:57 pm

James MaynardA couple of calculations regarding attempts to modify the case to obtain gaps of size 8:

Using the enlarged simplex, I get that (and I would expect this to be close to the truth). This means that the `probability’ that one of the elements of a 5-tuple is prime is . If we assume independence, then we would expect the probability that both and are prime to be (since we don’t expect to be able to get an asymptotic for this quantity, in reality we have to satisfy ourselves with an upper bound, which would be worse than this). This means that the modification of taking away the contribution form the times when and are simultaneously prime would (in the most optimistic scenario) leave us with a total of . This means we need at least one new idea to improve the situation. We'd need a lower bound on which is certainly greater than for this modification to get us over the edge.

I also tested the upper bound to see how it fared numerically. My calculations should be treated as very provisional, since I haven't checked them properly. In the scenario above, I find that taking a suitable polynomial for , we get a lower bound for the modified version of which gives . This compares well with the most optimistic bound of from above. This also corresponds to an upper bound for of size , which again is close to the guessed asymptotic value of .

Since these bounds performed quite well, it is maybe reasonable to expect us to get decent upper bounds when we are considering or similar. I don't know how we would evaluate the integrals that arise, however.

27 November, 2013 at 6:40 pm

Terence TaoThanks for these calculations! Due to the parity problem, I expect that any upper bound we obtain on is going to be off by a factor of at least 2 from the truth (so in this case, it would be like rather than ). This isn’t consistent with the number that you are provisionally reporting, but perhaps the numerology for the optimiser is a bit different from that for the optimiser. For instance, it may be advantageous to break symmetry a bit and work with polynomials F that behave differently in the variables than in the , lowering the probability that n is prime and n+12 is prime (in order to also lower the probability that n,n+12 are both prime) while also raising the probability that n+2,n+6,n+8 are prime.

0.987 is tantalisingly close to 1, so there is still hope :) Certainly things should look better for larger k. One relatively cheap thing to try is just to take the best polynomial we have for and compute the upper bounds on (note though that if we retreat from EH to Bombieri-Vinogradov, some of the 1/2 factors in the previous formula for this quantity degrade to 1/4).

27 November, 2013 at 6:52 pm

Terence TaoA small observation: somewhat frustratingly, the Zhang/Maynard methods do not quite seem to be able to establish Goldbach’s conjecture up to bounded error (i.e. all sufficiently large numbers are within O(1) of the sum of two primes). But one can “split the difference” between bounds on H and establishing Goldbach conjecture with bounded error. For instance, assuming Elliott-Halberstam, one can show that at least one of the following statements hold:

1. (thus improving over the current bound of ); or

2. Every sufficiently large even number lies within 6 of a sum of two primes.

To prove this, let N be a large multiple of 6, and consider the tuple n, n+2, n+6, N-n, N-n-2. One can check that this tuple is admissible in the sense that for every prime p, there is an n such that all five elements of the tuple are coprime to p. A slight modification of the proof of DHL[5,2] then shows that there lots of n for which at least two of the five elements of this tuple are prime; either these two elements are within 6 of each other, or sum to a number between N-2 and N+6.

If we can ever get DHL[4,2] (or more precisely, a variant of this assertion for the tuple n, n+2, N-n, N-n-2), we get a more appealing dichotomy of this type: either the twin prime conjecture is true, or every sufficiently multiple of six lies within 2 of a sum of two primes.

28 November, 2013 at 8:54 am

Aubrey de GreyWow! This immediately raises two questions that would surely be of wide interest:

1) For k=5, can one split the difference unevenly, i.e. lower the value 6 in one statement at the expense of using a number larger than 6 in the other? If so, how unevenly – all the way to TP or Goldbach?

2) Can analogous statements be made for larger k? In particular, for k large enough to require only BV (or MPZ) rather than EH?

28 November, 2013 at 11:19 am

Aubrey de GreyIn fact, much better would be to unify the above questions in the obvious way, as follows:

If we denote by “GBE[n]” (Goldbach up to bounded error n – is there an established name for this conjecture?) the assertion that all even numbers are within n of an even sum of two primes (I’m omitting “sufficiently large” here only because I presume that “sufficiently” can easily be shown in this context to be within the range for which GC itself has already been computationally verified – apologies if that’s not true), then for which [a,b,c] can we say that DHL[a,2] implies DHL[b,2] or GBE[c] ? Terry’s observations constitute this statement for [a,b,c] = [5,3,6] and [4,2,4]. How far can the allowed values be broadened?

27 November, 2013 at 9:33 pm

AnonymousI feel that Goldbach’s conjecture / twin primes conjecture are very close to be proved.

All originated from Zhang’s work!

27 November, 2013 at 10:17 pm

Gil KalaiI am curious about is the following. Consider a random subset of primes (taking every prime with probability p, independently, and say p=1/2). Now consider only integers involving these primes. I think that it is known that this system of “integers” satisfies (almost surely)

PNTbut not at allRH. We can consider for such systems the propertiesBV(Bombieri Vinogradov), or more generallyEH(θ)and the quantities . For such systems does BV typically hold? or it is rare like RH. Is Meynard’s implication applies in this generality? Nicely, here we can hope even for infinite consecutive primes.28 November, 2013 at 8:26 am

Terence TaoStrictly speaking, if only takes half of the primes, then the remaining integers is a very sparse set – about elements up to – which makes all the numerology very different from that of the full primes.

You may be thinking instead of the Beurling integers, in which the primes are replaced by some other set of positive reals with the same asymptotic density properties. These integers enjoy many of the same “multiplicative number theory” properties as the usual integers (e.g. the prime number theorem), but one usually loses all of the “additive number theory” properties, basically because the Beurling integers are not translation-invariant: if n is a Beurling integer, then n+2 usually isn’t, and so asking for things like twin Beurling primes is not really a natural question; similarly for questions about distributions of Beurling primes in residue classes (Bombieri-Vinogradov, EH, etc.) (unless one also somehow creates a theory of “Beurling Dirichlet characters” to go with the Beurling primes, but I do not know how to make this work for anything that isn’t already coming from a number field).

28 November, 2013 at 10:32 am

Gil KalaiTerry, how can it be if you leave half the primes alive?

28 November, 2013 at 10:51 am

Terence TaoOops, I miscalculated; the density is now rather than , so I drop my previous objection to the question :). (My error was thinking that the average number of divisors of n would remain comparable to , when instead it drops to .) The main technical difficulty now is with counting such quantities as the number of integers n less than x such that n and n+2 are both divisible by the surviving primes; offhand I don’t see a promising technique to answer this question (as well as more general questions in which several shifts of n are involved, and some congruence class conditions on n are also imposed) but it looks like an interesting problem.

28 November, 2013 at 11:21 am

Gil KalaiIndeed, I was thinking about Beurling integers, and my questions can be asked about them but I wanted a very very specific model which will make computations easier. I think that it is not that easy to work with general Beurling primes and finding a concrete stochastic models (like random graphs, sort of) may be more fruitful.

If we want to keep the density of primes on the nose we can possibly still take p_n with independent but not identical probabilities not to corrupt asymptotic density properties, or do some other “correction”, but perhaps taking half the primes is also of interest.

All the questions I asked refer only to the relative ordering of the Beurling integers and *not* to their additive properties. So “twin primes” refers to two Beurling primes with a single composite Beurling integer in between. And “consequtive primes” refers to two Beurling primes without any composite Beurling integer in between (Alas, we don’t have them often in the usual case.) I think that PNT, RH, BV (Bombieri Vinogradov), or more generally EH(θ), and (in the *ordinal*, not *additive* sense) continue to make sense (perhaps even more than one sense) for Beurling primes and it is interesting to see if Meynard implication extends, but it looks to me that considering a stochastic model based on the primes will make the issue more tractable (and perhaps useful).

28 November, 2013 at 12:28 pm

James MaynardIf I understand your question correctly, then I would have thought for such a system you would get (almost surely) for this subset of the primes, whenever holds for the primes themselves. (And so bounded gaps with primes).

If is the number of chosen primes in the residue class which are at most , then is distributed as a binomial random variable . Therefore, we should get, for some constant

.

In particular, over all for which we have almost surely . (There are such pairs , and we take above). The errors from these -terms are negligible. The errors from those pairs for which `‘ does not hold, and the errors can be bounded by regular for all primes.

28 November, 2013 at 1:52 pm

Terence TaoI agree that this form of will hold almost surely, but I think for applications to bounded gaps between these Beurling-type integers one needs a significantly stronger version of in which one not only counts those for which is a Beurling prime, but also is a Beurling integer for the other elements in the tuple.

Actually, never mind bounded gaps between Beurling primes: bounded gaps between Beurling

integersis already somewhat non-trivial problem, particularly if one wants the correct asymptotic for how often this occurs; this is probably the main obstacle to extending the various results for gaps between primes to this Beurling setting.28 November, 2013 at 3:16 pm

James MaynardCan you not use the sieve as before, comparing to just the constant function 1 (supported on all integers, Beurling-type or not)?

Since you have for these primes, you should be able to calculate an asymptotic for the sieve weighted by these terms (which will be 1/2 of what we get for primes). Since tends to infinity, I would have thought we should then get primes in a -tuple whenever we would have got primes before.

28 November, 2013 at 4:24 pm

Terence TaoWell, if one uses the existing sieve without any modification, the density will only be of the order of , which is too sparse to pick up more than one prime in the tuple. Things are presumably better if one replaces by , but then one needs asymptotics for things like , which is presumably doable but requires some calculation. (It is vaguely reminiscent of the Titchmarsh-type divisor problem of finding integers n, n+h with specified divisor function.)

28 November, 2013 at 1:50 pm

Terence TaoA new paper on the arXiv by William D. Banks, Tristan Freiberg, and Caroline L. Turnage-Butterbaugh: “Consecutive primes in tuples“. Basically they show that Maynard’s result can be boosted to guarantee that the m primes involved are consecutive, although they have to increase k somewhat (to be double exponential in m rather than single exponential) to do this. (It may be, though, that one can avoid this by computing asymptotics for for h not in the tuple, much as in the original GPY paper, or by using the ideas of Pintz as is done for instance in http://arxiv.org/abs/1305.6289 .)

28 November, 2013 at 2:59 pm

Eytan PaldiPerhaps this new type of (for consecutive primes) may be studied here as well.

30 November, 2013 at 7:38 pm

Kalpesh MuchhalHi, Corollary 3 of the Banks-Freiberg-TB paper shows that consecutive primes can be part of an AP, while the Green-Tao theorem shows that primes can be consecutive terms of an AP. Can these results be somehow combined to prove consecutive primes can be consecutive terms of an AP infinitely often?

30 November, 2013 at 9:39 pm

Terence TaoUnfortunately, the primes that we can capture in short intervals are so sparse (one roughly has only or so primes in an interval of length H) that it is not presently possible to force them into arithmetic progression. (My arguments with Ben can produce arithmetic progressions of primes in – this was explicitly noted in a later paper of myself with Ziegler – and in view of more recent work (particularly that of Conlon, Fox, and Zhao) it is likely that one can narrow this to – but this is still a bit too wide to expect that the primes produced in this fashion will be consecutive.)

29 November, 2013 at 7:53 am

Andrew GranvilleI do not see why one needs to increase k. If the m-prime result is obtained from the k-tuple, {x+0, x+a_2,…, x+a_k} then we can restrict ourselves to a congruence class (for x) so that x+j has a given smallish prime factor p_j for each j<a_k, which is not an a_i. The congruence x=-j mod p_j, combine to mean we can write x = Q n +b (where Q = prod_j a_j). Hence we work instead with the k-tuple Qn+b, Qn+b+a_2,…,Qn+b+a_k

I think that one can apply the main theorem directly to this k-tuple, so it has (single) exponential size in m

29 November, 2013 at 8:08 am

Andrew GranvilleBeurling primes do not satisfy RH in general… You should look for the beautiful 2006 paper of Diamond, Montgomery and Vorhauer. In fact they show that one can have a Siegel-type zero (which perhaps explains why these are so hard to deal with in the actual prime case).

There was also some mention, amongst these comments, of probabilistic models… of course the Gauss-Cramer model, where a random number n is “prime” with probability 1/log n, predicts that there are many prime pairs n, n+1, so needs some adjusting to take account of correlations arising from local divisibility. Sound and i showed some years ago that all of the obvious models proposed to make these sort of corrections have serious flaws (see our paper “An arithmetic uncertainty principle”).

29 November, 2013 at 10:32 am

Gil KalaiMore about systems of Beurling primes:

About my initial suggestion to take half the primes at random, since a quarter of all pairs of primes survive anyway it does not look that the implication (if it applies) from BV(Bombieri-Vinogradov) or EH(θ) to bounded gap is useful here.

Regarding Beurling primes in general. I see three issues

1) Does (or when) Meynard implication from EH(θ) (or just BV) to bounded gaps applies to general systems of Beurling primes with the right density.

2) Can you either show it more easily or apply it more easily for stochastic systems based on ordinary primes.

3) If (or when) the implication is correct, can you get something useful from it for the current project (or even just interesting consequences otherwise.)

I don’t know about 1) but let me just assume it.

Regarding 2) for most natural models small gap pair of primes will survive. (maybe for k-tuples for large k interesting things can be said. since their rate of survival decay exponentially.) Another stochastic model that is interesting is to take all primes *with* repetitions and to take every prime with (say) Poisson or exponential distribution of mean 1. (For applying it we may need bounds on gaps between 2 to m, so that we can ignore repeated primes.) (One of the most beautiful and useful ideas in modern statistics is to study sequences by sampling from them with repetitions.)

Regarding 3) there are various things to think about. You can delete for your Beurling system all pair of primes of gaps at most 1000 (say). Now, for what we know on the number of primes involved with a gap at most 1000 is too high for the statement of Bombieri-Vinogradov to remain correct automatically. (But it is interesting if it remains correct.) However, we may try using it as part of an argument showing that there are many primes of gaps 1000 or less: If the density of those is small then when we delete them EH is not harmed and we can apply the reduction again. This is not a complete argument since new pairs of primes with small gaps (with respect to the new system) may emerge.

Since the estimates for k-tuples density are especially week compared to reality this type of hypothetical argument may work for them better.

Another similar suggestion is to choose for our Beurling system a random prime from every second interval of integers of length 1000. This automatically eliminate (additive) gaps below 1000.

(If such considerations can “just” lead to new examples of Beurling primes which violate RH, BV, or EH this would also be nice, of course.)

1 December, 2013 at 5:01 pm

Gil KalaiOn more thought, I don’t see how to formulate the above questions for general Beurling primes without the additive structure.

1 December, 2013 at 10:41 am

Terence TaoI’m recording how to use multiple dense divisibility to (in principle at least) improve the value of k, although at present we can only effectively calculate the various integrals involved when the function F is basically built from a tensor product , which isn’t the case for our best value 102 for k at m=2, but only for the larger value 339 (for m=2) or 43,134 (for m=3).

To review the situation: under , we can obtain (and hence ) whenever we can find a (bounded measurable) symmetric function supported on the simplex with

(1)

and

. (2)

A good choice for F here is a function of the form

where for some parameters (in practice we take close to and close to ), and . Hypothesis (1) is automatic, and the left-hand side of (2) can be bounded from below by

(3)

where are iid random variables with distribution .

If one wishes to use as the starting hypothesis instead of , one replaces by , and one has to add the additional hypothesis that is supported in the cube with . The latter hypothesis is obeyed if

(4)

If one uses the stronger hypothesis instead, one can relax the support condition on to the intersection of the larger cube and the set of such that

where

and is arbitrary with ; this follows from Lemma 4.13 of the Polymath8a paper. This has the effect of relaxing (4) to

(4)’

at the cost of worsening (3) to

. (3′)

We may optimise by making (4)’ an equality.

Using the exponential moment method, we can lower bound (3′) by

where

and

and s,s’ may be chosen as we please. Thus: we obtain whenever we can find obeying

with given by equality in (4′), with . Our best estimate for is with and .

1 December, 2013 at 5:04 pm

Eytan Paldiabove can be made arbitrarily small (no lower bound on .)

[Oops, there was a typo in the definition of , now corrected. -T.]2 December, 2013 at 9:36 am

Eytan PaldiThere is a typo in the last inequality (last line.)

[Fixed, thanks – T.]2 December, 2013 at 9:52 am

Terence TaoI spent some time this weekend reading up on the beta sieve (and on its role in the proof of Chen’s theorem that there are infinitely many primes p such that p+2 is an almost prime (the product of at most two primes)), in the hope that it could be a viable substitute for the Selberg sieve for this project; in particular, I think it would be a great result if one could find an alternate proof of bounded gaps between primes that did not rely fundamentally on a Selberg-type sieve. Unfortunately, it seems that while just about any of the standard sieves (including the beta sieve) can produce probability distributions for which one has for some small constant c and all i=1,…,k, it seems thus far that only the Selberg sieve can get c larger than 1 (which gives bounded gaps between primes) and to even grow logarithmically in k (which gives bounds on for all m). I still do not have a clean conceptual understanding of why the Selberg sieve can do this; I have a vague sense that there is some interplay between the “L^2” nature of the Selberg sieve, and the “L^1” nature of density of primes in that sieve, but my intuition on this is still lacking. For the beta sieve, the best I can do is to multiply together k one-dimensional beta sieves of level ; it is not obvious to me how to create a genuinely k-dimensional beta sieve, although this is perhaps worth pursuing. (I was intending to write up some notes on the beta sieve but they would basically duplicate a chapter of “Opera del Cribro”, in particular there are some rather tricky calculations involving delayed difference equations which I have thus far been unable to avoid.)

Incidentally, in “Opera del Cribro” it is noted that one can prove Chen’s theorem using a combination of beta and Selberg sieves if one uses the FI distribution theorems (which roughly speaking give a level of distribution of 4/7 (or in our notation, ) when specialised specifically to the twin prime problem). Roughly speaking, the beta sieve gives a lower bound on the number of numbers with prime such that has no prime factors less than , and the Selberg sieve gives an upper bound on the number of primes such that is the product of three primes greater than , and subtracting the two one gets a positive lower bound for the pairs needed for Chen’s theorem. It may be worth revisiting the proof of the FI distribution theorem and see if it may somehow be combined with some of Zhang’s ideas to get new distribution theorems that are usable for the k-tuple problem and not just the twin prime problem. (I did check to see if one can use the special structure of the coefficients in the error terms in the Zhang/Polymath8a arguments, but these coefficients get eliminated as a by-product of the van der Corput process and other necessary applications of Cauchy-Schwarz, so it is not obvious how to exploit the structure. Though FI, BFI, etc. did somehow manage this…)

3 December, 2013 at 12:04 am

Terence TaoHere is a side benefit of the multidimensional sieve in Maynard’s paper: it gives a new upper bound on the number of such that are all prime, i.e. the number of fully prime k-tuplets. The standard sieves (Selberg sieve, large sieve, beta sieve) all give an upper bound of for this quantity for some absolute constant C and sufficiently large x, as can be seen by looking up Friedlander-Iwaniec’s book; here is the singular series, thus this bound is basically worse than what the prime tuples conjecture predicts. (The large sieve and Selberg sieve give a slightly better value for C here than the beta sieve.) If one inserts in the multidimensional sieve, one can improve this to (in fact one gets this bound whenever one sieves k residue classes modulo p for all large primes p from an interval of length x). Basically, the problem is now to minimise the ratio

rather than maximise the ratio

but the two variational problems are comparable in much the same way that the root test and ratio test are comparable in undergraduate analysis.

In the converse direction, if one can find another sieve that gives a bound of for the number of fully prime k-tuplets, then this sieve is likely to also be able to obtain bounded gaps between primes, or to find short intervals with many primes in them.

3 December, 2013 at 7:18 am

Terence TaoOops, turns out I made an arithmetic error :(. The sieve in Maynard’s paper actually gives the same bound as existing sieves, and this is the limit of the Selberg sieve method (with , being the level of distribution), basically because the volume of the simplex is . As with the root and ratio tests, one only has one direction of the implication: a sieve that can beat the bound for the k-tuple problem will also likely give many primes in short intervals, but the converse is not necessarily true.

Still it looks like an interesting problem to get for the k-tuple problem. In the converse direction, the parity problem obstruction suggests a lower bound of .

3 December, 2013 at 7:29 am

AndreIs there still a need for your new simplified minimization problem?

4 December, 2013 at 2:26 pm

AnonymousI was asking, because I think I have found polynomials of degree with

and ,

which would have given better bounds than the trivial .

4 December, 2013 at 10:18 pm

Terence TaoI should have mentioned that in the simplified variational problem, F has to vanish to sufficiently high order on the boundary of the simplex to be admissible. With that assumption, one obtains from the fundamental theorem of calculus that , and so the Cauchy-Schwarz inequality tells us that the extremum occurs when on the simplex (strictly speaking, one has to smoothly truncate this near the boundary of the simplex, but the error incurred from this can be made arbitrarily small), so the infimal value of this ratio is the reciprocal of the volume of the simplex, i.e. .

4 December, 2013 at 4:13 am

The new Prime gap | On the learning curve...[…] discussed within the community. Terrance Tao again has this nicely articulated and maintains a polymath page for us. As an onlooker, I am as excited as many others to see this […]

4 December, 2013 at 4:48 pm

Pace NielsenI thought I would give a quick update on some recent progress in optimizing the methods described in James’ preprint. (As these ideas are already implicit in James’ paper, I’ve told him he is free to use these bounds in the final version of his paper, so they are not properly part of the polymath8b project, but they still might be of interest to people here.)

As usual, let be fixed. Let be a sequence of integers. We say that is the signature for the monomial symmetric polynomial , and that is the degree. The monomial symmetric polynomials form a vector space basis for the space of symmetric polynomials, and James has a good way of integrating these polynomials over , with a formula already appearing in his preprint.

So, I wrote a program to search for an optimal solution using such polynomials, up to a given degree. It turns out that there are significant gains over previous computations, and for degree one can get and this is optimal for that degree. Increasing the degree further increases further, but because of the way I coded the program it runs into RAM issues for . [It has problems computing the permutations of the set (where each occur exactly 11 times.]

To overcome this, I starting doing what James does in his paper, working with polynomials of the form . Taking allows one to get down to .

We are hopeful that eventually these computations will get the optimal , but there is still a lot of optimizing to do. Currently, taking , the computation takes about 1 day for a single value of .

James and I have some ideas about how to fix the RAM issue and so further improvements are forthcoming. [For example, one option is to restrict to signatures for which the number of nonzero terms is at most 10.] Anyone interested in running the Mathematica code on their own computer can email me. (Running it shouldn’t be a problem, but I didn’t make many comments in the code, so it may take a bit of work to understand it.)

4 December, 2013 at 10:36 pm

ChildmanSince it may get increasingly computationally intensive to get lower values of k, it may be a good idea to rewrite the optimization code for some kind of grid or distributed computing..

4 December, 2013 at 10:33 pm

Terence TaoVery nice to hear! For comparison, the upper bound shows that k should not be able to go below 50, and for k=64, we have (we need to win here on Bombieri-Vinogradov). So things are pretty consistent here (and the upper bound is looking remarkably sharp, all things considered – this may suggest some sort of clue as to what the extremising F are behaving like, and possibly to suggest a better theoretical ansatz for F than the one we are currently using (a truncated tensor product) that has only managed to get down to k=339 thus far).

One might hope that the computations get a little bit faster as k drops. Hopefully it becomes small enough that we can begin computing the effect of truncating the simplex a bit to use MPZ (and also to enlarge the simplex to the slightly larger convex body that is available, and perhaps also to try to use the upper bounds on to shave a little bit more off of H).

5 December, 2013 at 7:24 am

Pace NielsenCurrently, the dependence of the timing of the computation on the value of is minimal–it just changes the coefficients a bit but the number of coefficients is unchanged. Eventually would have an effect, by limiting the lengths of signatures, but as I suggested above, we may need to artificially impose a limit on such lengths to make the computations feasible. (And for this effect would only naturally start taking place when or so.)

I too have thought about whether or not these computations would give us a better idea where to look for the optimal . I’ll try to look for a pattern in the structure of the optimal for a given degree as increases, and report back if I see anything.

5 December, 2013 at 9:26 am

Eytan PaldiFor a given polynomial , let . My suggestion is to use (for some N) as a new basis (or add them to the current basis.)

5 December, 2013 at 7:44 am

Aubrey de GreyPace, could you post a list of the optimal F you’ve found for each degree up to 10? I think it would be fun for everyone to look for such a pattern.

5 December, 2013 at 12:19 pm

Pace NielsenAubrey,

How would you like the data? Would the coefficients of the monomial symmetric polynomials work for you?

5 December, 2013 at 12:34 pm

Aubrey de GreyWell, I’m not equipped to make nearly so much use of the data as others here, so I’d prefer to defer to the group as to format. I guess I was assuming that you would essentially emulate equation 7.18 from James’s paper, but please use whatever format seems most convenient and we can all complain as needed :-)

5 December, 2013 at 4:41 am

Eytan PaldiFor , this implies that under EH the best under the current method is in .

4 December, 2013 at 6:38 pm

FanFor , we have a bound .

5 December, 2013 at 12:58 pm

Pace NielsenAs suggested by Aubrey and Roger Baker here at BYU, I’ve now looked at the coefficients of the maximizing function of a given degree. I have not seen any pattern, but that may be due to a number of different factors. Here is a partial list of the data. For the full list just email me and I’ll send you the file.

Throughout, take (which gives gap sizes of , so it would be really nice to reach this point). First, let me list the lower bounds on for each degree.

For , .

For , .

For , .

For , .

For , .

For , .

For , .

For , .

For , .

For , .

Any idea if this sequence is converging below 4?

To give the coefficients of , we first need a way to order the signatures. First, to save space we can drop any zeros from the end of a signature, so the signatures correspond to the partitions of the degree. I then order the signatures first according to total degree, and then according to how Mathematica orders the partitions (which is by taking the maximal , then the maximal etc…).

So, my signatures start out as: .

For , the coefficients are .

For , the coefficients are .

For , the coefficients are .

The entries continue to decrease as increases.

5 December, 2013 at 1:00 pm

Pace Nielsen(Of course, the above was also suggested by Terry.)

When I said that the entries decrease, I meant in absolute value. They continue to change signs.

5 December, 2013 at 1:21 pm

Andreperhaps with some more digits, the inverse symbolic calculator might help

http://oldweb.cecm.sfu.ca/projects/ISC/ISCmain.html

5 December, 2013 at 1:57 pm

Aubrey de GreyThanks!

I doubt, tentatively, that the sequence M_k[d] which you give converges below 4. By simplistic inspection, I notice that the ratio of (M_k[d+1] – M_k[d]) to (M_k[d+2] – M_k[d+1]) decreases monotonically with increasing d, being around 1.2 for the largest k you list; this makes it look highly unlikely that M_k[15] < 4. However, there is a disconcerting "wobble" in the sequence, in that the last few values of this ratio decrease by 0.029, 0,008, 0.017 and 0.003, inviting apprehension that it might start to increase again, conceivably sharply enough to allow M_k[d] to converge below 4. Let's hope you can push your code that far! (Though, actually the wobble would almost entirely go away if M_k[7] were actually around 3.761 rather than 3.759 – worth a check?)

5 December, 2013 at 2:27 pm

Eytan PaldiThe ratios of the last four consecutive differences are quite stable:

– indicating linear convergence (with a rate of about 0.8) to a limit of about .

5 December, 2013 at 2:34 pm

Aubrey de GreyYes indeed – my numbers are of course just the reciprocals of yours. The wobble is still evident (differences of 0.014, 0.114, 0.065). I’m still betting Pace will update M_k[7] !

5 December, 2013 at 5:12 pm

Aubrey de GreyThanks. I actually made a decimal-place error in my own calculations, and the M_59[7] that I should have said I was expecting was actually 3.7594 – but if the wobble is real, so be it!

5 December, 2013 at 2:46 pm

Aubrey de GreyI meant 0.0014, 0.0114, 0.0065 of course.

5 December, 2013 at 4:08 pm

Pace NielsenI double checked it, and the value I gave for M_k[7] is correct (and will not improve, unless there is a bug in my program).

5 December, 2013 at 3:30 pm

Eytan PaldiAccording to this (linear convergence) hypothesis of the lower bounds, it seems that the minimal needed to conclude that is (or – in case of small extrapolation errors.)

5 December, 2013 at 4:12 pm

Pace NielsenJust to give one more data point: Using the notation in my earlier post, and taking yields the bound . (One can view this as an approximate computation.) So it looks like there can be a big difference between using all symmetric polynomials of a given degree, and just using those of the form for of medium degree.

6 December, 2013 at 1:14 pm

Eytan PaldiAssuming (as described above) that is about and (by the current upper and lower bounds on ) that

is about , it seems that the minimal for which is .

Therefore, I suggest to concentrate in particular on .

6 December, 2013 at 1:23 pm

Aubrey de GreyThat makes sense – but while there remains a major issue around CPU time, and especially while our speculations as to the sequence for d>10 remain so tentative, I can also see a case for working down from k=64 one step at a time. There is also, of course, the consideration that at any moment Terry or others may chime in with a rigorisation of MPZ in the small-k setting that renders all this BV-context work largely peripheral.

Additionally, I hope people haven’t forgotten too much about m > 1. I guess xfxie has been busy elsewhere lately, but it would be great (especially for Drew!) if there could be an update on the best obtainable k for m=2 arising from Terry’s progress in exploiting the exponential moment method.

5 December, 2013 at 2:16 pm

AnonymousFirst pair looks like . Is it?

5 December, 2013 at 4:12 pm

Pace NielsenNo. They are accurate to the decimals I gave, so they are not .

6 December, 2013 at 8:18 am

Aubrey de GreyHm, but they differ from +/- 1/sqrt(2) by very very nearly exactly the same amount – their average is 0.70710665, and sqrt(2) is 0.70710678. Can that really be a coincidence?

9 December, 2013 at 8:17 am

Aubrey de GreyHa! OK: can you post revised values?

9 December, 2013 at 7:20 am

Pace NielsenI just realized something silly.

The switching of signs, the decreasing of absolute values, and the pair above looking like are all artifacts of Mathematica normalizing the optimal .

Recall that is really only determined up to a constant. What we should have been doing this whole time is normalizing so that the constant term is exactly 1. Sorry for not realizing this earlier.

5 December, 2013 at 3:01 pm

Aubrey de GreyPace: it might be instructive if we could do this kind of intuitive analysis for smaller k, not least to see what sort of value of d (in other words, how much work in optimising your code) seems likely to be needed to determine what is the truly minimal k obtainable using current ideas. Do you yet have a M_k[d] sequence for k = 58, 57 etc?

5 December, 2013 at 3:16 pm

Aubrey de GreyPS: Naturally I’m also thinking about k < 51, in anticipation of progress in rigorising the post-Maynard validity of MPZ (and thereby lowering the required M_k below 4).

Also: I'm unclear as to the impact of raising a_0. You mentioned earlier that you obtained k=64 using a_0=3, but also that you are currently experimenting with a_0=5. Can you give us a sense of what you're seeing in terms of the dependence of k on a_0 (rather than just d)? The magnitude of that impact would somewhat inform whether it is worth taking the time to derive a set of M_k sequences for k = 58, 57 … for a_0 = 3 (or 4) in order to enable the above analysis.

5 December, 2013 at 4:21 pm

Pace NielsenI’m seeing the following: Increasing by 1 increases the run-time by double, but also seems to decrease by one or two (in the short term). So taking should let us get or .

On the other hand, increasing by $1$ multiplies the run-time in a quadratic fashion. So takes 8 seconds, takes 21 seconds, takes 110 seconds, takes 14 minutes, and takes a little over 2 hours. I expect to take about a day, etc…

When I get some time, I’ll try to derive the sequence for different . Currently, my computer processor is busy doing a few other computations.

6 December, 2013 at 1:28 pm

Aubrey de GreyPace: could you please re-post that sequence for d=3? – it is appearing truncated after the first five values, and I now have a feeling that the last two could be quite informative. Thanks!

6 December, 2013 at 2:43 pm

Pace NielsenThe last few are -0.5819451…,0.1238405…,0.3236689…,0.5909730….

(You can also see them by scrolling over the LaTeX.)

6 December, 2013 at 3:43 pm

Aubrey de GreyMany thanks. (The scroll-over doesn’t work for me – browser issue I guess.) I think I’m ready for the complete file – please send it to aubrey@sens.org – thanks!

8 December, 2013 at 10:22 am

Phil DeeSo what are the tentative values for H_1 at the moment ?

8 December, 2013 at 4:47 pm

Polymath8b, III: Numerical optimisation of the variational problem, and a search for new sieves | What's new[…] for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on […]