This is the fifth thread for the Polymath8b project to obtain new bounds for the quantity

either for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on can be found at the wiki page (which has recently returned to full functionality, after a partial outage). In particular, the upper bound for has been shaved a little from to , and we have very recently achieved the bound on the generalised Elliott-Halberstam conjecture GEH, formulated as Conjecture 1 of this paper of Bombieri, Friedlander, and Iwaniec. We also have explicit bounds for for , both with and without the assumption of the Elliott-Halberstam conjecture, as well as slightly sharper asymptotics for the upper bound for as .

The basic strategy for bounding still follows the general paradigm first laid out by Goldston, Pintz, Yildirim: given an admissible -tuple , one needs to locate a non-negative sieve weight , supported on an interval for a large , such that the ratio

is asymptotically larger than as ; this will show that . Thus one wants to locate a sieve weight for which one has good lower bounds on the numerator and good upper bounds on the denominator.

One can modify this paradigm slightly, for instance by adding the additional term to the numerator, or by subtracting the term from the numerator (which allows one to reduce the bound to ); however, the numerical impact of these tweaks have proven to be negligible thus far.

Despite a number of experiments with other sieves, we are still relying primarily on the Selberg sieve

where is the divisor sum

with , is the level of distribution ( if relying on Bombieri-Vinogradov, if assuming Elliott-Halberstam, and (in principle) if using Polymath8a technology), and is a smooth, compactly supported function. Most of the progress has come by enlarging the class of cutoff functions one is permitted to use.

The baseline bounds for the numerator and denominator in (1) (as established for instance in this previous post) are as follows. If is supported on the simplex

and we define the mixed partial derivative by

then the denominator in (1) is

and

Similarly, the numerator of (1) is

Thus, if we let be the supremum of the ratio

whenever is supported on and is non-vanishing, then one can prove whenever

We can improve this baseline in a number of ways. Firstly, with regards to the denominator in (1), if one upgrades the Elliott-Halberstam hypothesis to the generalised Elliott-Halberstam hypothesis (currently known for , thanks to Motohashi, but conjectured for ), the asymptotic (2) holds under the more general hypothesis that is supported in a polytope , as long as obeys the inclusion

examples of polytopes obeying this constraint include the modified simplex

the prism

the dilated simplex

and the truncated simplex

See this previous post for a proof of these claims.

With regards to the numerator, the asymptotic (3) is valid whenever, for each , the marginals vanish outside of . This is automatic if is supported on , or on the slightly larger region , but is an additional constraint when is supported on one of the other polytopes mentioned above.

More recently, we have obtained a more flexible version of the above asymptotic: if the marginals vanish outside of for some , then the numerator of (1) has a *lower bound* of

where

A proof is given here. Putting all this together, we can conclude

Theorem 1Suppose we can find and a function supported on a polytope obeying (4), not identically zero and with all marginals vanishing outside of , and withThen implies .

In principle, this very flexible criterion for upper bounding should lead to better bounds than before, and in particular we have now established on GEH.

Another promising direction is to try to improve the analysis at medium (more specifically, in the regime ), which is where we are currently at without EH or GEH through numerical quadratic programming. Right now we are only using and using the baseline analysis, basically for two reasons:

- We do not have good numerical formulae for integrating polynomials on any region more complicated than the simplex in medium dimension.
- The estimates produced by Polymath8a involve a parameter, which introduces additional restrictions on the support of (conservatively, it restricts to where and ; it should be possible to be looser than this (as was done in Polymath8a) but this has not been fully explored yet). This then triggers the previous obstacle of having to integrate on something other than a simplex.

However, these look like solvable problems, and so I would expect that further unconditional improvement for should be possible.

## 145 comments

Comments feed for this article

8 January, 2014 at 12:55 pm

Aubrey de GreyI’m struggling to understand the nature of the current obstacle to exploring the dilated simplex in the BV setting. Terry noted in http://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-258264 that no combinatorially-explosive dissection of that polytope was needed for this calculation, and Pace implied in http://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-258471 that it was relatively straightforward, but that’s the last I’ve seen. Has a previously-overlooked obstacle emerged? I had thought it was just a matter of getting around to it, but Terry’s description above (“We do not have good numerical formulae…”) implies a deeper issue. Could someone please clarify? – thanks in advance.

8 January, 2014 at 1:53 pm

Pace NielsenOn my end, the obstacle is that even for moderately sized the polynomial is hard to work with (especially when multiplying two such polynomials together and integrating).

8 January, 2014 at 2:42 pm

Aubrey de GreyI see – thanks. How big, at present, is “moderately sized” in this sense? – is it possible to work your way up from very small k far enough to give the chance of discerning a trend?

8 January, 2014 at 6:10 pm

Pace NielsenThat’s a good question, which I don’t know the answer to. At present, I’m going to focus on the small k computations, but I may come back to this later.

8 January, 2014 at 6:23 pm

Reid AtchesonWould it be helpful if the basis formed an orthogonal system on the simplex? Such a basis has been in use by the spectral methods community for a while, and has an analytic recurrence formula in terms of 1D Jacobi polynomials. The details of these posts are otherwise too advanced for my full comprehension.

8 January, 2014 at 6:50 pm

Terence TaoGood question! The problem though is that there are two quadratic forms (or, if one wishes, inner products) one has to work with – one associated to the denominator (which is the usual inner product on the simplex) and one associated to the numerator (which is more complicated, basically an inner product on the marginals of the original function). One could pick a basis which diagonalises one of these two quadratic forms, but unless the basis is very cleverly chosen, the other quadratic form would not be in any particularly simple form. So it would likely only lead to a modest gain in performance at best.

Of course, if we found a basis for which the numerator and denominator were simultaneously diagonalised, or nearly so, this would indeed be a game-changer, but this may be unlikely. For instance, in the k=2 case, the extremal function is explicitly computable as , which does not look like it has a particularly simple representation in any natural basis that comes to mind, which suggests that one cannot simplify the optimisation problem too much by working in a standard basis.

8 January, 2014 at 2:09 pm

Eytan PaldiIt seems (as remarked in a previous comment) that the only obstacle is that the (repeated) partial derivatives are generating many monomials from a single one (so a careful choice of the basis functions is needed in order to have low complexity integrals – needed for the numerical optimization program.)

8 January, 2014 at 3:54 pm

Eytan PaldiAs I understand, the only regularity condition on is its square integrability which implies, for the one dimensional sieve that needs only to be absolutely continuous. So there is a possibility that by imposing

– smoothness condition on we don’t get the true optimal value.

For example, in the derivation, I explicitly assumed the continuity of at but if is allowed to have a jump at that point, perhaps a better solution exists…

8 January, 2014 at 4:05 pm

Terence TaoI think that any rough candidate for f can be regularised into a smooth one while only perturbing f and its derivatives infinitesimally in L^2 type norms, which should not lead to any change in the supremum of the ratio being optimised over.

I would expect that even for rough extremisers, one should still have the Euler-Lagrange equation hold in a distributional sense (i.e. using weak derivatives). As a is continuous and bounded away from zero and b only has jump discontinuities, this suggests to me that f’ should remain continuous for the extremiser.

8 January, 2014 at 4:56 pm

Eytan PaldiI agree! (I think that the needed properties are that a is absolutely continuous and non-vanishing inside the interval, and that b is measurable and bounded.)

8 January, 2014 at 8:22 pm

Eytan PaldiThe Euler-Lagrange equation for the variational problem is quite interesting (coupling four sub-intervals) and seems to be analytically solvable (I need to check the details.)

8 January, 2014 at 9:26 pm

Pace NielsenI’m a little confused by Theorem 1. Previously, we were allowed to take the support of F to be the polytope which doesn’t seem to satisfy condition (4).

In the theorem, perhaps the support should be (rather than just ), still for a region satisfying (4)?

8 January, 2014 at 9:36 pm

Pace NielsenOr rather, .

8 January, 2014 at 9:56 pm

Terence TaoThis polytope obeys (4) as long as . Basically, the point is that full sum is times the average of the partial sums for , so that if , then at least one of the partial sums will be less than 2. (Actually, I think the fact that this polytope works originated from a comment of yours: http://terrytao.wordpress.com/2013/12/08/polymath8b-iii-numerical-optimisation-of-the-variational-problem-and-a-search-for-new-sieves/#comment-257443 .)

9 January, 2014 at 7:11 am

Pace NielsenYou are right, I see what I was missing now.

9 January, 2014 at 8:27 am

Terence TaoTwo small remarks:

Firstly, strictly speaking our new results rely on the “generalised” Elliott-Halberstam hypothesis (GEH) rather than the original Elliott-Halberstam hypothesis, namely the EH type distribution result for a general class of Dirichlet convolutions as opposed to just for the von Mangoldt function. (A precise statement of GEH can be found as Conjecture 1 of this paper of Bombieri, Friedlander, and Iwaniec; the Bombieri-Vinogradov analogue of GEH was first established by Motohashi.) If one only has EH rather than GEH, one can still take , but the function F is now constrained to the original simplex , so one cannot do better than with our current bag of tricks. But I think GEH is close enough to EH (especially when viewed relative to the presumed difficulty of EH) that this is not too much of a “cheat”.

Secondly, as mentioned previously in http://terrytao.wordpress.com/2013/11/22/polymath8b-ii-optimising-the-variational-problem-and-the-sieve/#comment-253452 , the fact that we now can get k=4 on (generalised) EH means that we have a twin prime/Goldbach dichotomy on GEH: either the twin prime conjecture is true, or every large multiple of six lies within 2 of the sum of two primes (and thus large even natural number lies within 4 of the sum of two primes), by consideration of the tuple n, n+2, N-n, N-n-2, which is admissible when N is a multiple of 6.

9 January, 2014 at 8:46 am

Pace NielsenDoes the remark on page 12 of James’ preprint allow us to use under EH, or do we need GEH there as well?

9 January, 2014 at 9:06 am

Terence TaoThat remark shows that to treat the numerator in (1) one can live on (indeed the same argument shows that one can control the numerator as long as all marginals of F live on ), but the issue is with control of the denominator (Lemma 5.1 in James’s paper, rather than Lemma 5.2) which, in order to go beyond , needs EH for other divisor sums than the von Mangoldt function.

9 January, 2014 at 9:02 am

Terence TaoA variant on the second remark: on GEH, either the twin prime conjecture is true, or there are infinitely many Cunningham chains of length 2, as can be seen by consideration of the admissible tuple n-1, n+1, 2n-1, 2n+1.

9 January, 2014 at 9:56 am

Terence TaoOops, I withdraw that assertion: if, for instance, it is n-1 and 2n+1 that are simultaneously prime then one doesn’t get either a twin prime or a Cunningham chain.

9 January, 2014 at 9:44 am

jmcIn that previous comment, you sketched two similar statements (about “prime gaps” on the one hand, and “bounded Goldbach” on the other) , and there you said that *at least one* of them is true (logical OR). Now you say it is a dichotomy (*either* one *or* the other, exclusive OR). Where does this difference come from?

And do I understand you correctly that this implies that the Twin Prime Conjecture and the Goldbach Conjecture cannot be both true?

9 January, 2014 at 9:48 am

Terence TaoSorry, I misspoke; it should be “disjunction” rather than “dichotomy”.

9 January, 2014 at 10:26 am

Terence TaoAnother variant on the second remark (this one, hopefully, more correct than the previous variant): on GEH, either the twin prime conjecture is true, or every multiple N of 6 lies within two of a Polignac number (a number that is the difference of two consecutive primes infinitely often). This comes from consideration of the admissible tuple n, n+2, n+N, n+N+2. (To force the two primes to be consecutive, one needs some additional arguments of Pintz in the paper linked above, first to force all four members of the tuple to be almost prime (by counting in the Selberg sieve the contribution for a given member to not be almost prime), and then to use an additional upper bound sieve to eliminate the contribution of those n for which there is an additional prime between two elements of the tuple.)

If we end up achieving k=3 rather than k=4, one would imagine that even more results of this type will become available.

9 January, 2014 at 11:37 am

Aubrey de GreyTerry: does the need to go beyond von Mangoldt have any impact on the anticipated difficulty of adapting MPZ to the current context?

9 January, 2014 at 12:06 pm

Terence TaoGood question! I don’t foresee any difficulties extending the MPZ results from von Mangoldt to the specific convolutions (basically, sums of the form ) needed to generate the various prisms in (4) above; the Type I/II sums should already suffice here, and in fact there might be a little bit of gain over the von Mangoldt numerology due to the smoothness of one of the convolution factors. But actually it is all probably moot, because we get “for free” the simplex in (4) (coming from the fact that the constant function 1 has level of distribution 1 for trivial reasons) and unconditionally, we have to keep close to 1/2 (at best we can do 1/2+13/540 currently) and this should give us plenty of room for the support of F. (But even with F restricted to , we should be able to beat the current value k=54 of k with the MPZ estimates we have, once we have some idea of how to numerically handle the truncation.)

9 January, 2014 at 12:42 pm

Eytan PaldiSince by theorem 1 above, in order to have we may relax GEH to for any “good” satisfying it would be interesting to get (as small as possible) such “good” .

Moreover, If for some such “good” , is implied by EH than we have under EH.

9 January, 2014 at 5:54 pm

Terence TaoYes, our criteria are open, so any result that uses actually uses for some absolute constant , and similarly for GEH. (In the original Goldston-Pintz-Yildirim paper, they observe that the bound on EH actually just needs , see page 12). So there is some residual value in pushing the various M_k type values beyond 2, as any increase will presumably decrease the amount of EH one needs. (Also, accurate values of M_k may help with trying to extrapolate to medium-sized k.)

It is conceivable that EH could imply GEH for the type of sequences we actually use – they are something like convolutions of truncated Mobius functions and 1. It looks a bit unlikely though; the best I could reasonably expect is that one could replace GEH by an EH for a family of von Mangoldt-like functions (e.g. functions whose Dirichlet series has some nice properties). Given that GEH is already stated in the literature, it may be simplest just to use it as is, and not try too hard to replace it with other unproven conjectures.

9 January, 2014 at 9:04 am

Eytan PaldiIt seems that in theorem 1 above, should be modified. (with similar modifications in the list of records.)

[Good point – I’ve changed the text accordingly.]9 January, 2014 at 2:38 pm

Eytan PaldiThe timeline of bounds should also be modified to include the GEH.

[Corrected, thanks – T.]9 January, 2014 at 10:04 am

Polymath 8 – a Success! | Combinatorics and more[…] are infinitely many primes of bounded gap below 16. Maynard improved it to 12. Polymath8b have just improved it further to 8. (The unconditional bound for gaps stands now on […]

9 January, 2014 at 2:22 pm

Aubrey de GreyI may be missing something obvious here, but in the world records table I’m not seeing why M_k,epsilon has an exact value of 2 for k=2. How does the lower bound of 2 arise here when it doesn’t for the original M_2? – and what is the optimum epsilon for this?

9 January, 2014 at 3:02 pm

Terence TaoAh, good point; in truth the only lower bound for computed so far is the value 1.385.. for M_2. Presumably one can do better than this, but one should not be able to get all the way to 2.

9 January, 2014 at 4:52 pm

Aubrey de GreyThanks. Eytan posted something about M_2,epsilon (http://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-262267) that I don’t think received any followup yet – maybe that’s a place to start?

10 January, 2014 at 2:50 pm

Eytan PaldiIn theorem 1 above, I suggest to include also the case .

[Done – T.]10 January, 2014 at 3:01 pm

Terence TaoFor purposes of large k asymptotics, I now have the upper bound

so one does not get a substantial gain in the large k regime from the epsilon flexibility.

To see this, we define the weight to be when , and to be when , and similarly define by cyclic permutations. Observe that the are nonnegative and

on the simplex . Thus, for F on this simplex we have by Cauchy-Schwarz

If , then setting

since . Integrating we get

and summing over cyclic permutations one gets the claim.

10 January, 2014 at 5:11 pm

Eytan PaldiBy using , in order to satisfy the marginals vanishing constraints, we may take any in the interval . Since the upper bound for is quite tight, the additive gain is lower bounded by about , and upper bounded by about .

For , this additive gain is in . This implies (approximately) the reduction of the unconditional current by a factor between 2 and 4. (i.e. unconditional !)

(but perhaps this is too optimistic …)

10 January, 2014 at 5:58 pm

Eytan PaldiThe above analysis is indeed optimistic because the vanishing marginals constraints should reduce – making its upper bound less tight than the bound for .

11 January, 2014 at 5:26 am

Eytan PaldiBy taking (i.e. making dependent on for any in ), the vanishing marginals condition is automatically satisfied with the same conclusion (not necessarily optimistic) as above on the possible reduction of unconditional .

11 January, 2014 at 8:18 am

Aubrey de GreyEytan: isn’t there still also the constraint that epsilon must not exceed 1/(k-1)?

11 January, 2014 at 9:07 am

Terence TaoThis constraint is only needed in the EH case . In general, one can take advantage of the additional region at the end of (4), which makes the polytope admissible for any .

11 January, 2014 at 9:14 am

Eytan PaldiAccording to theorem 1 above, it is only restricted to be in (the earlier restriction to is for the truncated simplex – the last example above – in order to satisfy the vanishing marginals condition for that particular choice of .)

12 January, 2014 at 4:27 pm

Eytan PaldiPerhaps an explicit lower bound on may be derived from this upper bound (as done for the case). Such a bound may be used to improve the current values of for .

13 January, 2014 at 3:15 am

Eytan PaldiIf the optimal is (for large k) proportional to for , then the additive gain of the upper bound on is upper bounded by about – implying reduction of the unconditional current by a factor of about at most.

(This gain due to the flexibility, is still without using the vanishing marginals flexibility,)

10 January, 2014 at 3:30 pm

Eytan PaldiThe definition of above should be for a closed simplex.

11 January, 2014 at 9:15 am

Terence TaoI’ve been somewhat inconsistent on this. For the purposes of optimising M_k etc., it does not matter whether the simplex is closed or open; but for some of the compactness arguments in the previous post, it is convenient to assume that the function F is compactly supported inside the open simplex (i.e. it vanishes near the boundary of the simplex). Of course, even if F doesn’t vanish at the simplex, one can perturb it infinitesimally to vanish near the simplex (and to make it smooth) without significantly affecting the Rayleigh quotient, so this is only a minor technicality.

11 January, 2014 at 9:31 am

Eytan PaldiI made this comment because in the page on the variational problem, was defined as the closed simplex. Moreover, since is needed only to be in , its vanishing on the boundary is not really needed.

11 January, 2014 at 10:24 am

Pace NielsenFollowing Aubrey’s recommendation, I computed the bounds we get using the simplex where no marginal conditions need to be considered. Using degree 11 polynomials and taking , we get . To make the obvious comparison, taking degree 11 we have without playing the epsilon trick. This is still a very good increase, although not nearly as nice as when we were considering the gain that the epsilon trick gave over the original simplex .

11 January, 2014 at 10:30 am

Aubrey de GreyMany thanks Pace. Is that the best epsilon, though? – previously for R you used 0.18 for k=4 but 0.243 for k=3. Is it just coincidence that the best value for R’_3 is the same as for R_4 ?

11 January, 2014 at 10:56 am

Pace NielsenIt is approximately the best value of for the new simplex . (The closeness of the two epsilon values is accidental I think.)

11 January, 2014 at 11:11 am

Eytan PaldiSince this polytope is contained in , it seems better (and seemingly easier) to take the enlarged simplex and optimize over .

11 January, 2014 at 11:27 am

Pace NielsenOptimizing over is indeed easier (and this was the computation I did previously for ), but it is not better. Indeed, it is exactly because the new computation is harder that we get better results. The new simplex contains for any fixed and thus yields better numerics.

11 January, 2014 at 12:10 pm

Eytan PaldiI can see now that the three constraints are added especially to get rid of the vanishing marginals condition! Can you please give some data (as function of the maximal degree) of (for optimized ) ? (it may help estimating its limit.)

11 January, 2014 at 12:26 pm

Aubrey de GreyIt would also be interesting to know whether the optimal epsilon varies with the maximal degree.

Additionally, in Terry’s original description of the epsilon idea there were k distinct epsilons, but so far all computations have assumed just one. Might it be worth playing around a bit with (for example) making all but one of the epsilons zero, bearing in mind Terry’s observation about M”_3 collapsing to M”_2 ?

11 January, 2014 at 1:43 pm

Eytan PaldiThere is no reason for the optimal epsilon to remain fixed. For unsymmetrical regions it seems that several epsilons should give better results.

11 January, 2014 at 11:28 am

Pace NielsenI should also point out that not .

11 January, 2014 at 11:48 am

Eytan PaldiThanks for the correction! (in addition, the simplex should be truncated by the unit cube).

12 January, 2014 at 9:09 am

Terence TaoI’m recording a Fourier-analytic way to think about the “epsilon” trick allowing us to push the support of F out to times the simplex, at the cost of only getting a lower bound coming from times the simplex.

For simplicity I’ll deal with “denominator” sums rather than “numerator” sums , even though it is the latter for which the epsilon trick is actually useful for our application. (This gives us an additional factor of to play with, because 1 has better distribution properties than the von Mangoldt function. As mentioned previously, the Selberg sieve can be written Fourier-analytically as

for some coefficients , where is a Ramanujan sum. From a Fourier perspective, is supported on the fractions with , so each summand is supported in those reduced fractions with denominator .

In the baseline approach, we require to be supported in , which makes less than . As such, different reduced fractions with different denominators are separated by at least , which allows for the Plancherel theorem (or the large sieve inequality) to get good L^2 control on the sum of the , i.e. to control . (This connection between the Selberg sieve and the large sieve is well known, see e.g. the recent book by Friedlander and Iwaniec.)

Now suppose we enlarge the support of to , then the products can now be as large as . This means that two different fractions can be as close as , which is bad for the large sieve inequality. However, observe that if one of the fractions, say a/q, has denominator less than , then it still stays a distance at least from the other fractions in this Farey sequence. (Incidentally, this inhomogeneity in the Farey sequence is a crucial ingredient in the Montgomery-Vaughan large sieve inequalities that appeared tangentially in Polymath8a.) Because of this, if one focuses on just the “major arc” contributions to the L^2 norm that come from the neighbourhood of fractions of denominator at most , the large sieve inequality still works efficiently and gives a lower bound on the full L^2 norm that comes from square-integrating F on .

12 January, 2014 at 10:05 am

Eytan PaldiDoes it means that can be supported on

without satisfying the inclusion condition (4)?

12 January, 2014 at 10:25 am

Terence TaoUnfortunately, this argument only gives lower bounds rather than upper bounds, which makes it suitable for bounding the numerator of (1), but not the denominator. I have been trying to see if one can also get good upper bounds in the regime but it looks very difficult, one has to somehow extract subtle cancellations from the minor arcs. For k=2 it may be that automorphic methods can do something, but for k>2 it looks very daunting.

12 January, 2014 at 11:17 am

Terence TaoThere is a cheap scaling trick to convert any good cutoff function F for the M_k problem to a cutoff for the , just by scaling it by . The upshot is that

for any symmetric F supported on the original simplex . It seems that one should be able to compute the integral as a polynomial function of r without much difficulty for a given polynomial F, so that one can the numerically optimise in for a given F. So one could optimise for M_k to find a good candidate F, and then apply this procedure to see if one gets any additional improvement in epsilon.

12 January, 2014 at 12:15 pm

Eytan PaldiA possible difficulty is that for the problem, is supported on a truncated simplex (so the marginals are, in general, only piecewise polynomials.)

12 January, 2014 at 12:18 pm

Terence TaoOops, this is indeed an issue, if one is dealing with (the more interesting quantity) rather than . One can fix it by truncating F to , but this makes the denominator and numerator significantly more complicated, unfortunately.

12 January, 2014 at 12:52 pm

Aubrey de GreyWell… is really “more interesting” any more? Arguably, the only outstanding question in the setting is whether can surpass 2, and after Pace’s recent result for it’s looking awfully likely that it will, even without any additional ideas. (I guess there is also the question of whether k can be reduced to 4 or 3 without having to resort to GEH rather than EH, but I’m not sure that any of the current directions are leading that way anyway.) Accordingly I would argue that we have now come to a point where the unconditional setting is more of a focus than the [G]EH setting – especially since the k predicted to be achievable from current ideas in the unconditional setting seems to be in the teens or twenties, which may be approaching the regime in which the computational methods currently being applied to k=3 or 4 are conceivably applicable. Thus, the obvious question arises: can this new trick work for M”_k, or is it limited to M_k ? If it can work for M”_k, it seems that it could be a great short-cut to determining good values for for low two-digit k.

12 January, 2014 at 3:14 pm

Terence TaoFair enough. As a first data point, I applied the above rescaling to the k=2 explicit optimiser for M_2, to get a lower bound of 1.690608 for with .

M2 := evalf(1/(1-LambertW(exp(-1))));

f := 1/(M2 - 1+ x) + log( (M2-x)/(M2-1+x) ) / (2*M2 - 1);

denominator := evalf(int( f^2, x=0..1));

numerator := M2 * (1+eps) * int( f^2, x=0..(1-eps)/(1+eps) );

ratio := numerator/denominator;

evalf(subs(eps=0.36,ratio));

12 January, 2014 at 3:24 pm

Aubrey de GreyNice!

Many thanks also for the new columns in the Selberg wiki page table (which is getting a bit wide; maybe it could be reformatted to show upper bounds below the corresponding lower bounds, rather than beside them?).

12 January, 2014 at 3:31 pm

Aubrey de GreyConjecture: the best epsilon for M_k is 0.72/k, and the best for M’_k is 0.72/(k+1). I have no prediction in relation to M” !

12 January, 2014 at 6:22 pm

Aubrey de GreyI should have noted that this conjectured relation is independent of theta. Perhaps four data points (namely [k.theta] = [3,1], [4,1], [2,1/2] for M and [3,1] for M’) is too few to make such a stretch, but, well, maybe not :-)

12 January, 2014 at 7:33 pm

Terence TaoSome back of the envelope calculations suggest that asymptotically the best value of epsilon is proportional to , basically because if one considers the truncated tensor product functions F that show up in the asymptotic analysis, then they are concentrated at a distance about from the boundary of the simplex (this is the quantity in the asymptotic analysis, and is also comparable to ), and so it becomes highly counterproductive to shrink the simplex beyond that distance. So unfortunately, one does not get to significantly improve the current bound with the epsilon trick in the large k regime (indeed, this calculation suggests that the value of k only improves by an unimpressive O(1) for a fixed value of m).

I also see now why there is

somegain to be had by increasing epsilon away from zero, even if this gain decays fairly rapidly in k; the marginal function vanishes to first order as one approaches the boundary of the simplex , simply because the length of interval that one is integrating on shrinks to zero. (I am implicitly assuming here that F is bounded, but from Cauchy-Schwarz one can see a similar vanishing in an L^2 sense if F is just assumed to be square-integrable.) Because of this, the cost of retracting the integrals by an epsilon is only cubic in epsilon (if F is bounded, or slightly better than linear if F is just assumed square-integrable). So the gain coming from dilating by outweighs the loss in shrinking the simplex by if is small enough.12 January, 2014 at 8:05 pm

Aubrey de GreyDoes this take account of the differences between M, M’, M” ? I notice that k+1/k – namely, the ratio between the only so-far-computed pair of best epsilons for M_k and M’_k for the same theta – is precisely the ratio of volume between R’_k and R_k for that k (namely k=3). Going out on a major limb I’m predicting on this basis that the best epsilon for M”_k (with theta=1/2, and with R being the dilated simplex with edge 2) will be . I’m not clear as to what is entailed in determining the optimal epsilon in a given context – maybe it is easy enough that a prediction is not useful – but I’m putting it out there in case it is.

12 January, 2014 at 8:19 pm

Aubrey de GreyBah – apologies – I obviously misstated those volumes. The prediction of the best epsilon for based on volume of R and extrapolating from M and M’ should be .

13 January, 2014 at 8:31 am

Aubrey de GreyActually, a little more thought results in the much more optimistic suspicion that the best epsilon for (with R’’ being the dilated simplex of edge 2) converges to half that of as k increases – in other words, to 0.36/k based on existing data points (though I certainly don’t exclude the possibility that there is a factor as Terry suggests). This is based on realising that the ratio of volumes is actually , not , and running with the same idea as above.

12 January, 2014 at 1:42 pm

Eytan PaldiI think that there is a possibility that the optimal for (or any other truncated simplex on which is supported) or its marginals can be non-smooth (or even discontinuous) on certain hyperplanes (determined by the truncating constraints) which may result by slow convergence of - polynomial approximation to the optimal .

A possible fix is to use piecewise polynomial approximation to the optimal (where the polytope partition for the piecewise polynomial approximation is determined so that and its marginals should be polynomials on each predetermined subdomain.)

12 January, 2014 at 2:26 pm

Eytan PaldiThe above suggestion for piecewise polynomial approximation was given only to solve a possible problem of slow convergence (wrt polynomial degree) of due to possible non-smoothness of the optimal .

This is without vanishing marginal constraints (which obviously may added as needed.)

13 January, 2014 at 9:45 am

Eytan PaldiIn the wiki page on the variational problem, and should also be defined.

[Done – T.]13 January, 2014 at 10:37 am

Eytan PaldiIt seems that as computed by Pace is for supported on a somewhat larger domain (i.e. ) than the one appearing in the wiki page.

[Smaller, actually; there was an additional restriction to which I forgot to put on the wiki, but is now there (though for it has no effect). -T.]13 January, 2014 at 11:37 am

Eytan PaldiIn this definition, is it possible to relax the factor to the maximal (i.e. )?

[Yes; I’ve adjusted the definition of accordingly. -T.]13 January, 2014 at 12:35 pm

AnonymousTypographical comment: For “GEH”, “MPZ”, ect., the letters should be upright. In math mode this means typing it as, say “”.

P.S. This project is really interesting to follow as a non-mathematician.

13 January, 2014 at 3:25 pm

Pace NielsenI’m nearly ready to write the code to compute the M_3 value over the half-cube cut out by the plane . I’ve created a file with some pretty pictures which helped me write down the necessary components of the calculation. The basic picture came from Terry’s decomposition that he found using his son’s toys. (I need to get my own sons similar toys!) You can find the file at: http://www.math.byu.edu/~pace/HalfCube.pdf

I wanted to make this available so as to get a double-check on all of the integral formulas. (I’m also planning to double-check these decompositions tonight. I keep finding minor errors, so there is no guarantee it is correct yet.)

13 January, 2014 at 4:29 pm

Terence TaoNice pictures! I’ll double-check the computations later today. One simple checksum is to make sure that I(1) is equal to 1/2, and more generally that I(F) is independent of epsilon if F is a polynomial independent of epsilon.

13 January, 2014 at 4:29 pm

Pace NielsenFirst error I found:

The second marginal condition should instead read:

13 January, 2014 at 4:35 pm

Pace NielsenSilly error: in the I-integral section I left some squares off integrands.

13 January, 2014 at 7:35 pm

Terence TaoEverything else seems to check out – it’s remarkably complicated, though!

This partition should technically exhaust a dense class of F when the degree is large enough, but convergence may be slow. One reason to suspect this is that variational arguments (using Lagrange multipliers) tell us that the optimal F should be of the form where is supported on the weirdly disconnected region . This suggests jump discontinuities when . The discontinuities associated to are already accounted for by the partition, but the discontinuities associated to are not, which does not bode well for polynomial approximation.

On the other hand, we only need to scrape together a gain of less than 0.05, so we may not need further subdivision. If that becomes necessary, it may be worthwhile to collapse to a (messy) two-dimensional problem, and write everything in terms of G instead of F. One side benefit of this is that the domain partitioning will be easier to visualise.

13 January, 2014 at 8:10 pm

Eytan PaldiI think that such slow convergence must be sublinear (e.g. like a negative power of the maximal degree). Fortunately, we don’t need to get very close to the limit (unless it is very close to 2.)

14 January, 2014 at 6:01 am

Eytan Paldi3D Jump discontinuity on a plane may lead to an even slower convergence than jump discontinuity at a point in the 1D case. (e.g. the M-value approximation error may decay as a negative fractional power of the polynomial degree !)

13 January, 2014 at 4:23 pm

Eytan PaldiPace: Nice picture! I have two questions:

1. Do you intend to compute (not )?

2. Why do you take in (instead of )?

13 January, 2014 at 4:27 pm

Pace Nielsen1. I intend to compute the M value we get using , using the epsilon expansion trick, on the stated half-cube. I don’t know if is the correct notation or not.

2. The decomposition in the file slightly changes in the regime . If we need a larger , I can work out the change in decomposition, but I doubt that we will need to do this.

13 January, 2014 at 4:41 pm

Eytan PaldiThe definition of (including the range of is in the wiki page on the variational problem (by now we have many types of M values!)

13 January, 2014 at 5:35 pm

Eytan PaldiIt seems that the optimal polynomials on the various polytope pieces are not necessarily symmetric (I don’t see any reason for that), so by removing (if needed) the (simplifying) symmetric polynomial constraint, we have the potential to make further improvement to the M-value.

Note also that the optimal should increase because of the added flexibility from vanishing marginals (implying another improvement by increasing the range of to if needed.)

13 January, 2014 at 7:14 pm

Pace NielsenAll of the symmetry follows by an averaging argument (see, for example, the argument given on page 17 of Maynard’s preprint).

14 January, 2014 at 5:20 am

Eytan PaldiI agree that the symmetry follows from this argument (also with the presence of the additional constraints) and that the optimal can be represented as a symmetrical sum of functions of two variables.

Note, however, that all such conclusions are dependent on the (seemingly plausible, yet unproved!) existence of an optimal solution in L^2 for the (constrained) variational problem.

14 January, 2014 at 8:24 am

Terence TaoActually I found an argument that shows that symmetric functions suffice even without the existence of an extremiser. Let’s work in the original simplex for simplicity, thus is the supremum of for non-trivial F. In particular, the quadratic form

is positive semi-definite, and is thus associated to a positive semi-definite inner product and norm. If is an extremising sequence for , normalised so that , then goes to zero in this semi-definite norm, as does any reflection of caused by permuting the coordinates. Also note that by replacing with if necessary, one can take to be non-negative. Applying the triangle inequality for semi-definite inner products, we conclude that the symmetrisation of also goes to zero, while staying bounded away from zero in L^2 by non-negativity. So the symmetrisations are also an extremising sequence, and so it suffices to extremise over symmetric functions.

The situation is a little trickier with vanishing marginals because we can’t force non-negativity, but I think the argument shows that the only competitor to a symmetric extremising sequence would be an antisymmetric extremising sequence (corresponding to some other irreducible representation of the permutation group) which looks very unlikely, but I don’t see how to formally exclude it.

14 January, 2014 at 8:54 am

Eytan PaldiNice argument ! (obviously it depends on the symmetry of the domain and the constraints).

14 January, 2014 at 1:24 pm

Eytan PaldiIt should also be remarked that the supremum is not attained (which is the interesting case!) If and only if the above quadratic form is positive definite.

14 January, 2014 at 10:26 am

Terence TaoCalculus of variations (and Lagrange multipliers) tell us that any symmetric extremiser F to a three-dimensional problem such as M_3, M’_3, M”_3, , etc. must have the form . In terms of polynomials, this means that the only monomials one expects F to have are those in which at least one of a,b,c is zero. So for degree D, the number of monomials in play should be O(D^2) rather than O(D^3), which may end up being a significant speed boost that allows us to explore higher degrees (at the cost of some accuracy, because if one caps the degree there might be a little bit of need to use the monomials that don’t show up in the true extremiser).

It might be worth testing the idea of restricting to monomials of just two of the three variables in the M_3 and M’_3 problems (and maybe M”_3, if the code is already available for easy modification) to see if any significant improvements in performance are obtained by doing this.

14 January, 2014 at 10:34 am

Aubrey de GreyGreat idea! Does this type of argument also work for larger k? If so, it could be a big help in finding the k for which reaches 4, even if it turns out not to be needed to make reach 2.

14 January, 2014 at 11:36 am

Eytan PaldiI also think that this is a good idea to select only monomials with the right properties. Moreover (following your previous suggestion) is it possible to iterate this idea (by inserting the new structured solution in Euler-Lagrange equation) to infer a greater structure for the solution (and consequently a smaller class of allowed monomials)?

14 January, 2014 at 1:32 pm

Terence TaoJames and I discussed this a bit in the first Polymath thread. Unfortunately things get very messy and there isn’t an obvious dimensional reduction.

Let’s work on the M_3 problem for simplicity, so one is looking at the eigenfunction equation

(1)

where

(2)

is the marginal distribution. This writes the three-variable function F in terms of the two-variable function G. If we substitute in this ansatz into (2), we get

but this doesn’t seem to allow one to easily express G in terms of one-variable functions.

14 January, 2014 at 10:27 am

Aubrey de GreyPace, let me add my praise for this awesome document.

Speaking of symmetry of F, am I right in saying that it is still open whether a symmetric R” is optimal? I’m presuming that you went with the truncated simplex rather than the prism because the simplifications afforded by symmetry outweigh the need to use 13 components rather than 7, but if the truncated simplex fails to make it to M”=2 then maybe it’s worth also looking at the prism? Or is there another reason not to – for example, does the seven-component dissection of the prism no longer work when epsilon is introdced?

14 January, 2014 at 10:33 am

Terence TaoWell, once Pace’s code is running one can test the epsilon=0 case and see whether M”_3 for the symmetric polytope is any better or worse than M”_3 for the prism (which James computed to be about 1.914); this may give some indication of whether it is better to switch to the prism. (In that case, it may also make sense to have at least two independent epsilons rather than one, given the reduced symmetry of the prism.)

14 January, 2014 at 10:47 am

Aubrey de GreyRight. Indeed, if it’s different (either better or worse), and if this isn’t all rendered moot by success with the truncated simplex, there’s probably a case for exploring other half-cubes so as to ensure that the optimal one is being used. I guess the obvious third one that is “as different as possible” from either the prism or the truncated simplex would be the one bounded by the plane x+y+2z = 2.

14 January, 2014 at 1:56 pm

Pace NielsenIt took a while, but I got the code working. Contrary to my hopes, the symmetric half-cube apparently gives worse numerics than the prism. For and taking all polynomial pieces of degree 4, we only get .

The calculation gets slow very quickly as the total degree increases. For example, with degree 5 it takes 35 minutes to show . It appears that the optimal epsilon is close to 1/6. So we could probably get better numbers going a little past this epsilon, using a slightly different decomposition. However, since the values we are getting are so far away from 2 (indeed, they are barely better than the bounds for ) I don’t think it is worth the effort. We probably want to focus on a different polytope.

14 January, 2014 at 2:15 pm

Terence TaoWell, that’s a bit disappointing, but I guess we just have to roll with it.

So the next thing to try would be the prism. Here, F would be supported on with marginal supported on and supported on for some . (There is no point enlarging the simplex for the third marginal since it is already constrained to .) Our task is now to maximise the ratio

One nice thing here is that in the limit , , the ratio approaches 2, as can be seen by taking F to be the indicator function of the box . So we are guaranteed to hit at least 2, and we only need an epsilon more than this! (But it’s not that simple; I think this limit is a local maximum, and we need a separate local maximum to beat this local maximum. Still, it’s a good sign…)

Actually, a simpler intermediate thing to calculate would be , in which F is supported in but the numerator is as before. This is also guaranteed to be at least 2 by the same example as above.

Note that we’ve broken all the symmetry here, and so will have to work with all the monomials, which may cost us a bit performancewise. On the other hand, the number of polytopes involved in the decomposition may be a bit smaller than in the symmetric case.

14 January, 2014 at 2:28 pm

Pace NielsenI’ll do this next. Might be in a day or two, given that I have to write my lecture for tomorrow, etc…

14 January, 2014 at 2:23 pm

Pace NielsenSped up the code a bit. With degree 6, we get , and it takes about 10 minutes.

Given the recent comments, I also looked at what happens when restricting to monomials where only two variables occur. It appears that there is very little loss. For instance, doing degree 6 with this restriction we get , and it takes about 3 minutes.

14 January, 2014 at 3:20 pm

Eytan PaldiThe very small improvement from degree 5 to degree 6 suggests that either

1. Even for these small degrees, the lower bound is very close to the limit.

or

2. The convergence is very slow (e.g. sub-linear) because of possible discontinuities of the optimal (or some of its small order derivatives) on certain planes.

It seems to me that the second possibility is the right one (so the limit can be larger than 2 but one needs a very large degree to come close to it!), but of course, given the M-values as a function of the degree, it is not difficult to verify the convergence type (i.e. linear with some stable rate, or sub-linear).

14 January, 2014 at 4:06 pm

Pace NielsenIf it is helpful at all, I get 1.95866 for degree 7 and 1.958774 for degree 8 (using the restriction I mentioned to monomials with only two variables). That is as far as I plan to compute on this simplex.

14 January, 2014 at 4:51 pm

Eytan PaldiAssuming linear convergence, the last two estimated rates (using the divided differences method) are about 0.98 and 0.093 (far from being stable!) which seems to rule out the linear convergence type, indicating that we have the problem of sub-linear convergence! (probably due to discontinuities of the optimal ). A possible fix (as previously remarked) is to refine the polytope decomposition such that the discontinuities should appear only on the boundaries of the polytopes (this may be simplified to polygonal partition by using the two variables representation of ).

14 January, 2014 at 5:03 pm

Aubrey de GreyWell, but the difference between d=5 and 6 is almost the same as between d=6 and 7… I’m not optimistic. (Degree 8 is so much closer to 7 than 7 is to 6 or 6 to 5 that I’m actually tempted to ask whether there’s a typo for d=8 and it should be 1.959774.) What’s worse is that the value for epsilon=0, d=4 is not all that much smaller for the truncated simplex than for the prism, suggesting that the prism won’t get us beyond about 1.97. The main remaining sliver of hope seems to be that allowing two epsilons would improve things, but it’s looking pretty forlorn to me.

14 January, 2014 at 5:33 pm

Eytan PaldiIt seems that the only way to retain linear convergence (with stable rate) is to identify the potential (planar) sets of discontinuities of the optimal and refine the partition accordingly (to keep all discontinuities on the boundaries of the polytopes).

15 January, 2014 at 6:11 am

Eytan PaldiIt is important to observe that the very slow convergence of M” lower bounds for degrees with FIXED still has unstable(!) rate (which is uncharacteristic of sub-linear convergence whose rate should tend to 1) – indicating that these degrees are still in a transitional interval (well below the final stage of convergence with stable rates.)

In addition to possible discontinuities of the optimal F in the interior of some polytopes, this strange behavior may be explained by

1. The optimal should be optimized for each degree.

2. More importantly, note that the optimal F can be approximated using a restricted class of monomials of order for each degree , it means that (effectively) there are only degrees of freedom with (vanishing marginals) constraints in the optimization program. The number of constraints may be reduced to

by adapting the constraints to the particular class of allowed monomials. It seems (since the number of polytopes is relatively large) that only for sufficiently large degree , the remaining effective number of degrees of freedom (i.e. after subtracting the number of constraints) should give a regular (i.e. stable rate) convergence behavior.

3. Implementation errors.

15 January, 2014 at 8:45 am

James MaynardI thought its worth mentioning that (at least on the assumption my code was working as intended) when I looked at the 4-variable prism I also encountered rather erratic convergence. The convergence was good on , but not on the polytopes outside of it with the vanishing conditions. For very small degrees (say 4 or less) the vanishing marginals gave very few degrees of freedom, and so there wasn’t much benefit. Even for higher degree (I think I went up to degree 9) the convergence didn’t seem regular enough to predict what it might be converging to.

15 January, 2014 at 8:29 am

Pace NielsenI performed the computation for on the simplex .

As expected, when the value is going to 2 pretty quickly. For degrees 1 to 6, we get the values 1.7549, 1.8540, 1.9029, 1.9308, 1.9481, 1.9597.

For degree 1, to within jumps of 1/100 there is a local maximum when , which gives the value 1.90704. Similarly for degree 2, the local maximum is when or . I wasn’t sure if this meant that the local maximum had split in two, or whether my partition was not fine enough. So taking jumps of 1/1000, we get that the local max is at , giving a value of 1.910399.

By degree 4, this position starts losing ground to the “corner” epsilon values, . I’ll keep running the calculations and keep you posted if I notice anything new.

15 January, 2014 at 8:38 am

Pace NielsenI should mention that this computation is in the domain where .

15 January, 2014 at 9:22 am

Eytan PaldiAt least we have a slightly better record (even without the vanishing marginals flexibility!)

15 January, 2014 at 9:48 am

Terence TaoThanks for this, Pace! It’s good to see various intuitions about the epsilon behaviour confirmed, namely the local maxima at (1,0), (0,1) (which, unfortunately, I don’t think we can directly use to try to beat 2), and the competing maxima in which the epsilons are equal and comparable to 1/6. So perhaps for the M” calculation we can focus on the case (abandoning the corners (1,0), (0,1)), which allows one to take symmetric polynomials and thus perhaps speed up the calculations a little bit. (The Lagrange multiplier argument is also still in force, allowing one to restrict attention to monomials with one exponent vanishing.)

On the other hand, we are starting from here, which is further back than that we had in the fully symmetric case. This makes sense – we only have two of the three epsilons in play, so the improvement over had to be smaller by a third or so, which is indeed the case – but is still a little unnerving; we have to somehow make up for it with the rest of the prism, which we suspect from the M”_k evidence to be a little better than the extra space in the symmetric polytope (1.914 vs 1.902), but will it be enough? From the scant evidence we have so far, it looks like a bit of a long shot, but we should at least get a clearer picture of whether the prism or the symmetric polytope is superior, even if both may end up falling short of the goal. (Frustrating, though, that we only need a 2% improvement in our bounds to get to H_1 <= 6 on GEH!)

15 January, 2014 at 10:41 am

Aubrey de GreyHm – I think it might be premature to restrict to the single-epsilon case when considering the prism, since it may be that the local maximum for M’ has the epsilons equal only because M’ is symmetric. It’s good, though, that with M’ the local maximum is evident when d=1: since low-degree calculations are fast, maybe any promising local maxima with M” can be identified when d=1 or 2 with both epsilons ranging freely and then explored for higher degree in the context of performance-enhancing pre-set constraints on the epsilons.

Also, I’m not seeing why the epsilons need to sum to at most 1. Is there a clear reason? If there is, but if it can be determined that violating that constraint would be a win, I guess one potential way forward would be to explore whether Terry’s original concept that introduced epsilon can be enhanced to allow a larger sum-of-epsilons.

15 January, 2014 at 12:03 pm

Pace NielsenHere M’ is not symmetric (or, more precisely, it is only symmetric when .

The restriction to the sum being at most 1 was because in the other case we get a different decomposition of the prism. I can do that case as well, and probably will a bit later.

15 January, 2014 at 12:18 pm

Pace NielsenOr even more precisely, the polytope on which M’ is computed is not symmetric. Although, it is symmetric in the sense that switching the roles of the epsilons gives the same numerics. Did you have a candidate polytope without this property that you wanted me to try?

15 January, 2014 at 5:10 pm

Aubrey de GreyWell, I was thinking that since the base-case prism (with no epsilons) is asymmetric, the maximum M may involve unequal epsilons whose disparity in some sense “cancels out” the asymmetry of the base case. I’m not seeing that this is nullified by your point about switching the roles of the epsilons. As for other polytopes, I’m not sure – I think I need a better understanding of your reasoning on this point first.

However, actually I think having the sum-of-epsilons exceed 1 seems by far the most promising option as things stand – there seems no obvious reason (someone please say if there is one) why we would expect (1,0) to give better performance that (1,0.1) for example.

15 January, 2014 at 5:33 pm

Terence TaoThe prism has partial symmetry – it is symmetric with respect to swapping x and y, but not with respect to swapping z with either x or y. The x-y symmetry corresponds to swapping and , so we have the symmetry .

Also note that . Indeed, with , , and so we are now just maximising subject to being supported on . At this point, there is no interaction between different slices of F, and the problem decouples into two-dimensional ones. Indeed, if we define to be the slice of , then from Fubini’s theorem we have

and

and from Cauchy-Schwarz one has

giving the bound .

In the number-theoretic setting, when trying to get at least two of n,n+2,n+6 prime, setting forgoes any chance of forcing n to be prime, so one is left with the k=2 problem of getting n+2,n+6 prime, for which we know we cannot get more than one prime at a time due to the parity problem.

16 January, 2014 at 7:50 am

Aubrey de GreyMany thanks Terry – that does seem to robustly eliminate the option of a win near the corners. As for the partial symmetry, though, I still don’t see that the swappability of the epsilons tells us for sure that equal epsilons will always be optimal – I think a saddleback situation (where the best equal-epsilons value can be beaten by increasing one epsilon and decreasing the other) is hard to exclude for asymmetric (even if partially symmetric) base case (i.e. epsilon=0) choices of R”.

15 January, 2014 at 1:02 pm

Pace NielsenA little more data. For degrees 4, 5, 6, 7, the optimal values in the symmetric case when are (to within 1/1000) 0.183, 0.183, 0.189, 0.184. The corresponding M values are 1.916121, 1.916198, 1.917474, 1.917865. I do not know the reason for the jump when the degree is 6.

15 January, 2014 at 1:20 pm

Eytan PaldiThis is a relatively small jump (both in the epsilons and the M-values), but the convergence rate is still unstable.

15 January, 2014 at 10:58 am

Eytan PaldiIn the wiki page, the current lower bound for (prism) should be 2 (due to the corners local maxima.)

15 January, 2014 at 11:06 am

Terence TaoActually, that’s a lower bound for (prism) rather than (prism), because one has to set one of the epsilons equal to 1 rather than 0.

When one of the epsilons is really big, say , then one is getting almost no benefit at all from the corresponding marginal, in this case , as it decays quadratically as approaches 1. On the other hand, the constraint (for the marginal at least) should also cost something quadratic as compared to . My feeling is that the cost outweighs the benefit, which seems to be corroborated by Pace’s most recent numerics, and that one should focus on the epsilon values closer to 1/6, which is what we have been seeing in all other analyses.

15 January, 2014 at 11:07 am

Eytan PaldiIt seems that this lower bound in the wiki page is still without the several epsilons flexibility.

15 January, 2014 at 12:59 pm

Terence TaoFor k=2 one can get a lower bound of 2 without epsilons, but I don’t see how to do it for k=3, because when all three marginals are restricted to the simplex , one cannot create an example that is (an infinitesimally thickened version of) the indicator of the unit square , which is the 2D extremiser. One can do this though if two of the three epsilons are set to zero and the third is set to 1.

16 January, 2014 at 9:01 am

Terence TaoI just realised that there is a version of the epsilon trick that can be used to reduce the denominator in our key ratio (1) by an amount that is exponentially small in k – but for our k=3 problem, an exponentially small gain in k is more or less exactly what we need! The idea goes back to the observation of James back in Nov 20 that we should be able to subtract off the contribution to when all of the are composite. On Jan 2 I managed to make James’s idea quantitative in the BV case by multiplying by (which is a non-negative weight that equals one unless all of the n+h_i are composite), but in order to absorb this new factor I needed the extra room available in the BV case (basically, the ability to stretch the sieve into instead of ).

However, the epsilon trick (or more precisely, a variant of something that I had tried in Dec 14 that ended up not working, because I was only enforcing one compositeness condition rather than all k compositeness conditions . Namely, we will work with the sieve

where is the usual Selberg sieve

with (in the EH case) and f obeying the usual conditions (e.g. supported on the prism R with marginals supported on various simplices)

and

with obeying similar constraints to f but also obeying the compatibility condition when pokes outside of the polytope R.

Note that is still non-negative (basically because when ). Also, and agree unless all of the n+h_i are composite, so replacing one by the other has no effect on the numerator of (1), only the denominator. For the denominator we can take advantage of the epsilon trick: the first term in the formula for contains divisor sums that correspond to points outside of the prism, but they are exactly canceled by the corresponding divisor sums from the second term in the formula for . So one can still obtain an asymptotic for the denominator in (1). In the case , we have and we recover the old bound (for, say, , since this trick seems to be compatible with all our previous tricks), but we should be able to squeeze an additional gain (hopefully at least 2%!) by optimising each of the separately (as was done in the numerator epsilon trick from Jan 3.) I’ll try to work out the details for the next blog post (which we are due for in any case).

If this works out, it will be rather satisfying: by throwing “everything except the kitchen sink” at the EH problem (advanced numerics, moving from EH to GEH to enlarge the sieve support, polytope decomposition, and using the epsilon trick for both the numerator and denominator), to get an essentially optimal-for-the-method bound of . (There is still the very remote chance of getting by also throwing in the “kitchen sink” of bounding the probability that n,n+6 are both prime, but this looks like it would need a huge improvement on M_3 beyond 2 that we are not close to getting at all.)

16 January, 2014 at 9:49 am

Terence TaoAh, I got overly optimistic, and this idea has the same issue that my Dec 14 idea had – the Euler-Lagrange equation for optimising turns out to be a consequence of (or more precisely, an iterated difference of) the Euler-Lagrange equation for , so that the optimal choice for is the same as the optimal choice of and so no improvement is obtained in the end. Somehow the Selberg sieve is already optimised in these directions and there is no further fat to trim here.

Ah well, back to the drawing board…

16 January, 2014 at 9:55 am

Terence TaoHere is an idea that doesn’t improve the optimal value for , but may help approach that value faster. Right now, we are partitioning the big polytope R into pieces, consisting of a big piece R’ plus a lot of smaller pieces with various marginal properties, putting polynomials on each piece, and optimising a large quadratic program, which gets slow in higher degree. But somehow the R’ piece, which is easier to optimise over, is dominant, with the other polytopes only adding small additional contributions. So perhaps we could weight our algorithm towards the R’ piece, for instance by first doing a preliminary low-degree optimisation to obtain a first guess for the optimal F that is a low degree polynomial in all polytope components, and then performing a further optimisation by replacing the R’ piece by a high degree polynomial while keeping the other terms fixed. The point is that the R’ component has no vanishing marginal conditions and so has a cleaner and faster optimisation. As we expect some discontinuities within the R’ region, this may help accelerate the convergence.

A variant of this approach (which, now that I think about it, is probably superior) is to do a single optimisation, but with R’ allowed to have high degree while the other pieces have lower degree; again the point is that we don’t lose any degrees of freedom by increasing the degree on R’ because there are no vanishing marginal conditions there.

16 January, 2014 at 10:54 am

Pace NielsenI had thought about doing this earlier, but for some reason never gave it a try. My code already allows for this approach. Since the polytopes are already entered (and quadruple checked!) for the symmetric half-plane let me give you some numerics there. Throughout, I’m limiting the degree to 4 on all the non-R’ polytopes, and also assuming that every monomial involves only two variables. I let d denote the degree used on the R’ part. I find an epsilon which gives a higher value for M” than two surrounding epsilon values (so I’m not showing their are no other local maxs [in the given mesh], but that seems likely).

d=6, epsilon=0.160, M”=1.957334, single run 21 seconds

d=7, epsilon=0.158, M”=1.958612, single run 31 seconds

d=8, epsilon=0.157, M”=1.958656, single run 51 seconds

d=9, epsilon=0.162, M”=1.959200, single run 1 minute 20 seconds

d=10, epsilon=0.160, M”=1.959633, single run 3 minutes

My next goal is to run similar computations for the prism. I still have to work out the J-integral decompositions.

16 January, 2014 at 12:33 pm

Eytan PaldiPace: I suggest to check if degree 4 outside R’ is sufficient (i.e. enough degrees of freedom are left outside R’ after satisfying the marginal constraints).

16 January, 2014 at 11:15 am

Eytan PaldiAt least we know that without the vanishing marginals condition, the Euler-Lagrange equation implies that the optimal (which is proportional to ) should be absolutely continuous on , thus (using standard results of L^2 polynomial approximation theory) is it possible to estimate the rate of convergence (as a function of the polynomial degree) to the M-value?

16 January, 2014 at 7:37 pm

Pace NielsenI think I stumbled onto something we had missed. I computed on the prism and got 1.894793, which is below what happens on the symmetric half-cube, and also below James’ computation on the same prism. I believe that the discrepancy between these values may have arisen from the fact that I used a different decomposition of the prism than James did. I think he may have used the decomposition that Terry described earlier, while I used the same decomposition (with epsilons added) except that I combined the components into one component (since that reduces the number of integrals needed a little, but preserves the convexity of each component).

If this holds true, it would suggest that the symmetric half-cube is actually the right place to look for the best local maximum (which is what my intuition suggested initially). It also confirms what a few people have asserted; there is some gain to be had from breaking some of the convex components of the symmetric half-cube decomposition into smaller pieces, and optimizing polynomial approximations on those pieces. Especially if we take into account the calculus of variations (Lagrange multipliers) information.

I have to admit that these calculations are a lot of fun, but I may have to put further work off until early next week. If I have time then I’ll start testing this by breaking into different pieces and redoing the computations. There are two obvious options I see. First, simply break the simplex into the six pieces given from the symmetry. The second is to think in terms of the support of the G functions arising from Lagrange multipliers that Terry described above. These supports overlap, but it shouldn’t be hard to write down the integrals using the given support.

By the way, on the prism [with my decomposition] I get that with degree 4 approximations on all components and with the two epsilons equal, the optimal epsilon is 0.156 and we get , which is worse than what we find on the symmetric half-cube.

16 January, 2014 at 8:36 pm

Terence TaoInteresting! So an inefficient polytope partitioning costs us about 0.02 for for the prism; if we had been losing the same amount for the for the symmetric polytope, we could get up to 1.98 or so – about halfway to our goal (Zeno’s paradox, once again…). Then it could come down to the wire as we refine the degree and partition into even more pieces than before…

I guess this is all consistent with the strange convergence behaviour that we’ve been experiencing.

I’ve begun working out the two-dimensional formulation, writing everything in terms of the G(x,y) function, which is supported on the disjoint union of a triangle and the trapezoid . Here, it turns out that the vanishing marginal conditions can be reformulated as a formula for G on the trapezoid in terms of G on the triangle, so the only independent variable is the value of G on the triangle. The naive thing to do is to then set G to be polynomial (and symmetric) on the entire triangle, but actually it’s looking like there are some mild singularities, and so to get the most optimal results one should partition into quite a lot of pieces (a preliminary analysis suggests that there are singularities on the lines , and also ). This looks like a scary number of singularities, but in some sense that’s a good thing: the more singularities there are, the worse we expect our current numerics to be, and so the more hope there is that we can do better by using a better adapted partition!

I think I will use the next blog post to talk about the two-dimensional formulation.

17 January, 2014 at 5:44 am

Eytan PaldiIn the two-dimensional formulation, the continuity (but not necessarily smoothness) properties of the function G(x,y) should also be imposed by appropriate (minimal) system of additional constraints.

17 January, 2014 at 7:58 am

Terence TaoAfter looking at the eigenfunction equation for the (putative) extremiser, I think that F is continuous and piecewise smooth, with lack of differentiability at the interior faces coming from taking an edge on the boundary of (specifically, one of the three edges joining (1,0,0), (0,1,0), or (0,0,1) to (1/2,1/2,1/2)) and extending them in one of the coordinate directions, so the interior singularities should lie on the trapezoid with corners (1,0,0), (1/2,1/2,1/2), (0,0,0), (0,1/2,1/2) (plus the two reflections). I haven’t worked out exactly how this decomposes R’_3 though. On the two-dimensional side, this suggests that G(x,y) is continuous and piecewise smooth on the triangle , with a lack of differentiability on the bisecting line joining (0,0) with (1/2,1/2). I guess the continuity and singularity can be detected once we have located good near-extremisers. (Continuity means that we have some interior matching conditions which we should probably enforce when we work on the more difficult problem.)

17 January, 2014 at 5:20 am

Pace NielsenI woke up with two questions, one technical and one numerical.

Question 1: Suppose that we achieved good enough numerics to pull off the trick of passing from to . Would similar methods allow us to look at the admissible 3-tuple and allow us to conclude that there are infinitely many prime gaps of size exactly 6? I know that Maynard’s sieve treats all admissible -tuples the same, but I wasn’t sure if the same is true of the trick of forcing the endpoints not simultaneously prime.

This would seem to overcome the parity problem for the pair , so would seem to preclude the ability to pass from to .

Question 2: We easily got , using a linear polynomial on a trivial decomposition, so there is a lot of room for improvement. What sort of bound would we need to push this to in order to play the game of forcing not simultaneously prime?

17 January, 2014 at 7:51 am

Terence TaoOh, very nice observation! I agree that your Q1 argument shows that the parity problem blocks us from reaching by our methods.

For Q2, the heuristic I’ve been using is that the probability that a given member is prime should be , and that the probability that two of the are prime is bounded above by (the 2 being there for parity reasons). So if one starts with and deletes the three pairs which are separated by distance 16 or 24, we would get a prime pair separated by 8 if

which is in fact not attainable. The corresponding calculation for k=3 gives the condition

which requires , which is also blocked by parity.

17 January, 2014 at 8:28 am

Pace NielsenI like that heuristic! If I’m computing correctly, for the tuple , to get we would need something like . That seems quite doable.

17 January, 2014 at 1:16 pm

Aubrey de GreyI’m not all that optimistic that it is. Presumably the best comparison we have to work from is the d=1 values for the basic simplex R_4 with and without epsilon, which are 1.820 and 2.0023 differing by 0.28, so adding epsilon into the high-degree R” mix we may be lucky to get beyond 1.951 (James’s best number for the prism without epsilon), plus 0.28 (see above), plus 0.02 (the benefit of the symmetric half-cube over the prism – though for k=4 it’s also very possible that the prism will beat the symmetric polytope because in contrast to k=3 the prism is a factor of 9/7 larger), plus an unknown but probably modest quantity because James’s 1.951 was for the relatively small d=5.

17 January, 2014 at 11:01 am

Terence TaoHere is a more formal Selberg-style argument that we can’t get from a sieve-theoretic analysis of the tuple (n,n+2,n+6) (the same argument would work for other tuples as well). It relies on an unproven hypothesis (the Mobius pseudorandomness conjecture), but this conjecture is widely believed, and indeed one would not even be trying to use sieve theoretic methods to produce twin primes etc. if one did not already believe this conjecture!

Let us informally call a “divisor sum” some function of the form where are some coefficients and are restricted to not be too large (how large depends on what polytope we are using and what distribution hypothesis we would like to assume; I ignore these issues here). We normalise these sums so that is comparable to 1 (this is not our usual normalisation, being more of a “probability theoretic” normalisation, but never mind this). Our basic sieve-theoretic strategy is to obtain upper and lower bounds for quantities of the form

(*)

and hope to parlay this into a proof that at least one of the pairs (n,n+2), (n+2,n+6) are simultaneously prime, giving a proof of . (We are of course also interested in auxiliary quantities such as , but we can’t control these directly; the only way we can proceed within sieve theory is to upper bound this quantity by something like for some other divisor sum , so that it becomes of the form in (*). [Basically, even GEH only allows one to asymptotically control sums with at most one copy of the von Mangoldt function in them; asymptotics for sums such as , which basically count twin primes, are out of reach even on GEH, with the only option being to get upper or lower bounds by some combination of sums (*) for which asymptotics are available, together with the non-negativity properties of sieves.]

Now, the Mobius pseudorandomness conjectures predict that the sieve is asymptotically orthogonal to the Mobius functions , thus for instance . (This is because we expect the Mobius function to exhibit significant cancellation on even very short arithmetic progressions.) The same pseudorandomness heuristic also predicts is asymptotically orthogonal to products .

What about ? This function is no longer orthogonal to , since of course is when n is prime, but from Mobius pseudorandomness we still expect this function to be orthogonal to higher products , because the constraint that n is prime can kill at most one of the two or more Mobius factors that are present.

In particular, if one introduces the weight function

(ignoring the case when n+2 is not squarefree, which can be sieved out by the W-trick (or else we replace Mobius with the Liouville function)), we see that a and 1 are indistinguishable for the purposes of correlations (*), thus for instance

Also, observe that the weight sequence is non-negative; it equals 4 when has the opposite Mobius function to both and , and is zero otherwise. Because of this, any sieve-theoretic manipulation we perform on the sums (*) to produce lots of tuples with a desired property as weighted by the constant function , will also produce lots of tuples with the same property as weighted by the function . For instance, if we can get a lower bound for by some clever combination of asymptotics for (*) (and exploiting non-negativity), we should also be able to get the same lower bound for . In particular, we would find an n in the support of a(n) for which either n,n+2 are both prime, or n+2,n+6 are both prime. But this contradicts the support of a, which forces n+2 to have a Mobius function of opposing sign to both n and n+6. So we cannot use sieve theory to force the event “n,n+2 both prime or n+2,n+6 both prime”, which indicates that we cannot establish through a sieve-theoretic analysis of the tuple n,n+2,n+6.

EDIT: The intuition here is that sieve theory cannot prevent the two “conspiracies” and from simultaneously occurring for (almost) all n for which n, n+2, n+6 are almost prime. (Of course, formally two applications of shifted versions of would imply , contradicting what I just said, but the space of n for which n,n+2,n+6 are almost prime is not shift-invariant, and one cannot force incompatibility this way. Alternatively, one can consider conspiracies in which for “almost all sufficiently large n with , which I believe cannot be ruled out by any easy “local” argument, and would mean that prime gaps less than 6 occur extremely infrequently.)

17 January, 2014 at 2:56 pm

Polymath8b, VI: A low dimensional variational problem | What's new[…] for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on can be found at the wiki page (which has recently returned to […]

28 January, 2014 at 9:19 pm

Polymath8b, VII: Using the generalised Elliott-Halberstam hypothesis to enlarge the sieve support yet further | What's new[…] smooth compactly supported functions. We use this as a replacement for the denominator estimate in this previous blog post, we obtain the criteria described […]

8 February, 2014 at 5:19 pm

Eytan PaldiIn the numerator of (1), the outer summation index should be .

[Corrected, thanks – T.]