This is the fourth thread for the Polymath8b project to obtain new bounds for the quantity

either for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on are:

- (Maynard) Assuming the Elliott-Halberstam conjecture, .
- (Polymath8b, tentative) . Assuming Elliott-Halberstam, .
- (Polymath8b, tentative) . Assuming Elliott-Halberstam, .
- (Polymath8b, tentative) . (Presumably a comparable bound also holds for on Elliott-Halberstam, but this has not been computed.)
- (Polymath8b) for sufficiently large . Assuming Elliott-Halberstam, for sufficiently large .

While the bound on the Elliott-Halberstam conjecture has not improved since the start of the Polymath8b project, there is reason to hope that it will soon fall, hopefully to . This is because we have begun to exploit more fully the fact that when using “multidimensional Selberg-GPY” sieves of the form

with

where , it is not necessary for the smooth function to be supported on the simplex

but can in fact be allowed to range on larger sets. First of all, may instead be supported on the slightly larger polytope

However, it turns out that more is true: given a sufficiently general version of the Elliott-Halberstam conjecture at the given value of , one may work with functions supported on more general domains , so long as the sumset is contained in the non-convex region

and also provided that the restriction

More precisely, if is a smooth function, not identically zero, with the above properties for some , and the ratio

is larger than , then the claim holds (assuming ), and in particular .

I’ll explain why one can do this below the fold. Taking this for granted, we can rewrite this criterion in terms of the mixed derivative , the upshot being that if one can find a smooth function supported on that obeys the vanishing marginal conditions

and

then holds. (To equate these two formulations, it is convenient to assume that is a downset, in the sense that whenever , the entire box lie in , but one can easily enlarge to be a downset without destroying the containment of in the non-convex region (1).) One initially requires to be smooth, but a limiting argument allows one to relax to bounded measurable . (To approximate a rough by a smooth while retaining the required moment conditions, one can first apply a slight dilation and translation so that the marginals of are supported on a slightly smaller version of the simplex , and then convolve by a smooth approximation to the identity to make smooth, while keeping the marginals supported on .)

We are now exploring various choices of to work with, including the prism

and the symmetric region

By suitably subdividing these regions into polytopes, and working with piecewise polynomial functions that are polynomial of a specified degree on each subpolytope, one can phrase the problem of optimising (4) as a quadratic program, which we have managed to work with for . Extending this program to , there is a decent chance that we will be able to obtain on EH.

We have also been able to numerically optimise quite accurately for medium values of (e.g. ), which has led to improved values of without EH. For large , we now also have the asymptotic with explicit error terms (details here) which have allowed us to slightly improve the numerology, and also to get explicit numerology for the first time.

** — 1. More details on the relaxed support conditions — **

Fix as above, which we take to be a polytope that is also a downset, and let be the class of smooth functions , not identically zero, supported on whose restrictions (2) are all supported on . We need a technical analysis result, that any in can be approximated by a finite linear combination of tensor products , each of which supported in a cube in the interior of of arbitrarily small specified side length, such that the ratio (3) for and are arbitrary close. This is almost a standard application of Stone-Weierstrass, but the vanishing properties of mean that some care has to be taken.

First, by using a slight dilation to shrink a bit, we can assume that is supported on for some small , and all the restrictions (2) supported on , while only modifying (4) by an epsilon. We extend from to by setting

is now Lipschitz and piecewise smooth instead of smooth. Applying a translation (and then truncating back to , we can keep supported in while also vanishing in a neighbourhood of the regions

for each . This does not affect (3), but is now piecewise smooth and Lipschitz rather than smooth. However, by convolving with an appropriate approximation to the identity, we can make smooth again, while still supported in a slightly smaller version of obeying the above-mentioned vanishing property. By using the Lebesgue differentiation theorem one can show that (3) is almost unaffected by this if we make the approximation to the identity sufficiently narrow. We can then smoothly partition into smooth functions supported on small cubes, each contained in and avoiding the regions (5) even after a slight dilation, and then by using Fourier series one can approximate each such component of by a finite combination of tensor products of smooth functions, that are still supported in and avoiding (5), and the claim follows.

From all this (and the fact that the property of (3) being larger than is an open condition) we may assume that is a finite sum of tensor products supported on small cubes in avoiding (5). For the GPY argument to work (as laid out in this previous post), we need the asymptotics

and

to hold, where

We can generalise these bounds slightly to the depolarised bounds

and are linear combinations of smooth tensor products supported on small cubes in . By linearity, we may assume that are each supported on a small cube respectively.

For the sums (7), the “” calculations in the previous blog post already suffice, because avoids the (5) and so divisors appearing in have magnitude at most (or for some small ). For (6), a compactness argument and the inclusion of in (1) shows that if the cubes are small enough, we have either

or

for some and some . In the former case, we can compute much as in the “” calculation in the previous blog post. Now suppose that we are in the latter case. Without loss of generality let us take , thus

If we use the tensor product structure to write

and

for smooth supported in appropriate cubes or intervals, we can refactor

One can write as a Dirichlet convolution , where

As has component in , is supported on values of that are at most . Assuming an Elliott-Halberstam conjecture for such Dirichlet convolutions, we see that is well-distributed in most residue classes of modulus up to , and one can then obtain asymptotics for (8) by a computation very similar to that used to establish (7); we omit the details.

## 149 comments

Comments feed for this article

20 December, 2013 at 2:56 pm

Derek O.As an amateur mathematician and number theorist, I’ve been keeping up with this project since May. I know Tao states the possibilities for H_1 in the very beginning; however, I just want to confirm. What’s the lowest unconditional bound we have on H_1? I know Maynard got it down to 600 but since then, I’ve seen 576, 330, 300, and 272. I know that H_1 is 16 if we assume EH. But under no assumptions, what’s the lowest H_1 we have? Thanks in advance for your response!

20 December, 2013 at 3:58 pm

Pace NielsenWe are currently at 272 under no assumptions other than my program is computing correctly. (I would place the accuracy very high, because even after working with a completely new basis, I still get the same results.)

21 December, 2013 at 5:10 am

Klaus LangeIf I understand right, under EH it seems like H_1 = 8.

20 December, 2013 at 3:12 pm

Aubrey de GreyA couple of small observations, relating to the case but possibly generalisable to , about the search for the ideal R:

1) When , the “Pace polytope” (the subset of the unit cube in which ) is strictly smaller than the prism in which . I think all other polytopes defined by slicing the cube through (1/k-1,1/k-1…) also are.

2) No admissible R can be created by adding material to the prism. This can be seen for any candidate point t = (t_1,..t_k) where by considering the point u = (u_1,..u_k) where u_k = 1 and the other are : the midpoint between t and u does not lie within the union-of-prisms, so t+u fails the sumset constraint.

This seems to suggest, though clearly it does not rigorously prove, that the prism is the optimal R’’ for (assuming the Selberg sieve). I wonder if this can be rigorised.

20 December, 2013 at 3:24 pm

Eytan PaldiIt seems that there is a typo in the definition of above.

[Reformatted -T.]20 December, 2013 at 5:14 pm

Eytan PaldiThis definition is somewhat different from that in the wiki page (with the additional constraint on to be the minimal one.)

20 December, 2013 at 5:31 pm

Terence TaoThe two are equivalent: requiring that for all is equivalent to requiring that .

20 December, 2013 at 5:52 pm

Eytan PaldiThanks! I understand (it is somewhat confusing.)

20 December, 2013 at 6:33 pm

Gergely HarcosDear Terry, when you say the bound will hopefully fall to soon, do you mean it under Elliott-Halberstam, or unconditionally?

[Um, the former; I’ve added a clarification. -T.]20 December, 2013 at 8:13 pm

Gergely HarcosThanks! Also, there is a typo in the formula for : the argument of should be fractions, not just one.

[Corrected, thanks – T.]21 December, 2013 at 1:24 am

AnonymousIf you use “\dots” instead of “\ldots”, you will (automatically) get the correct placement of the ellipsis.

21 December, 2013 at 2:29 pm

Aubrey de GreyIn advance of a computed value for M”_4 (let alone higher M”_k), and indeed while there remains decided uncertainty as to whether current ideas can feasibly support calculation of M”_k for double-digit k, I was just looking at the data for M’_k, and I noticed an intriguing detail – the current lower and upper bounds on M’_k do not diverge, being (albeit only slightly) closer for k=5 than for k=4. I’m presuming that combinatorial explosions prevent James’s or Pace’s code from deriving M’_k (even for small d) for k in the range of current interest for H_1 on BV, but, in the recent spirit of examining numerical trends in order to guess key values to investigate, I’m wondering whether extrapolation from single-digit k could still be a cheap route to predicting (and then, of course, hopefully confirming) a k less than 54 for which M’_k exceeds 4.. The starting-point would be to calculate just a few more values of M’_k, for k = 6, 7 etc, and see whether the sequence of shortfalls relative to the corresponding upper bounds fits some simple curve or other tightly enough (and, of course, converges fast enough) to lead to a confident (though approximate) prediction of the threshold k. James, Pace: I’m unclear as to how far you can push your M’_k computations in feasible CPU times. What can you tell us?

21 December, 2013 at 3:48 pm

Eytan PaldiIt seems that in (1), a union with (which appears in the wiki page) is missing.

[Ah, I had forgotten about that additional region; I’ve modified the argument in the post to accomodate it. -T.]21 December, 2013 at 5:10 pm

Eytan PaldiTherefore we can replace the symmetric region above by its union with .

21 December, 2013 at 5:57 pm

Terence TaoGood point! In particular, in the Bombieri-Vinogradov case , one can take , so the problem is now to maximise the usual Rayleigh quotient over functions on whose marginals are supported on . Note that has the volume of , so in principle we have gained a huge amount of extra room to work with here (though even in the best case in which marginal conditions are ignored, only improves by a factor of 2).

One nice feature of this problem is that we can in principle attack it without subdividing the polytope, as follows. Observe from the fundamental theorem of calculus that if is a polynomial on that vanishes to order k on the slanted boundary and to first order at the remaining boundary (i.e. it is a multiple of ), then the k^th partial derivative has vanishing marginals. Thus, we can work with functions F on of the form

for arbitrary smooth , and the vanishing marginal property will be automatically satisfied (indeed the marginals for F are just the marginals for ). One can then try to optimise this numerically by selecting from symmetric polynomials of bounded degree as before; I’m not sure that this covers all possible F, so it may not give the optimal M”_k, but it should at least beat M_k, and might be computable for medium values of k, say k=50, allowing us to improve the current value of k=55 that we have on Bombieri-Vinogradov.

22 December, 2013 at 8:20 pm

Pace NielsenI’m a little confused. In the decomposition above, take . Thus . For simplicity, take (but any will do); in this case we have . Thus, , which is not merely supported on , hence does not have the vanishing marginal condition. I’m probably just missing something simple here.

22 December, 2013 at 9:09 pm

Pace NielsenI just figured out where I went wrong, but you beat me to the punch. Thanks for the answer. (Another way to put it is that one can integrate from 0 to 2, but then the integral has to break into two pieces, due to presence of the indicator function.)

It appears that this new term is making no contribution at all the the integrals, so the hope is to somehow use it to make the integral smaller.

Write where is supported on and is supported on . Then . It looks like the extra area in is both working for and against us, because it both allows us to try and make small, but at the same time makes the integral of larger. I’ll try to do some computations tomorrow if I can.

If anyone has a specific they want me to look at, just post it and I’ll look tomorrow morning.

22 December, 2013 at 8:32 pm

Terence TaoOne has to integrate from 0 to , rather than from 0 to 2.

23 December, 2013 at 2:49 am

Aubrey de GreySince no one else has suggested a specific k, I’ll go out on a limb and suggest 43 and 24, since they give H = exactly 200 and 100 respectively.

21 December, 2013 at 9:41 pm

Pace NielsenI’m unfamiliar with the notation . Is this just the partial derivative of with respect to each of the variables ?

Also, do you not need an indicator function on the second summand (in your decomposition of ), so it is only supported on ? [I imagine the answer is that it was implicitly assumed that is only supported on , but I just want to double-check.]

[Yes to both -T.]22 December, 2013 at 4:37 am

Eytan PaldiI think that this nice parametrization of with the vanishing marginals property should considerably decrease the computational complexity.

It seems however, that for real analytic (in particular, polynomial) on , any marginal which vanish outside should vanish identically (which is more than required.)

A possible generalization is to dissect the outside of into a small number of polytopes and try to find similar parametrizations (perhaps with additional small number of linear constraints) for the vanishing marginals on them.

Perhaps it is possible to estimate (by similar upper and lowers bounds) the resulting (which may improve the current asymptotic bounds for .)

22 December, 2013 at 4:02 am

Aubrey de GreyMight it be worth adding a column to the table at http://michaelnielsen.org/polymath1/index.php?title=Selberg_sieve_variational_problem to distinguish the M” bounds for theta = 1 versus theta = 1/2 ? It currently seems a little misleading that k seems to need to reach 20 or so in order for the upper bound on M”_k to reach 4, whereas when theta = 1/2 the threshold k would be only 6. Or can a better upper bound on M”_k,1/2 be given now, since the current one was based on R”_k being the prism?

22 December, 2013 at 2:17 pm

Eytan PaldiIt seems that the only difficulty in applying this method is the choice of a basis of symmetric polynomials such that the coefficients of the two matrices needed for the optimization program should be evaluated efficiently (this could be difficult because of the repeated partial derivatives – which for medium-sized k can produce many monomials from a single one.)

22 December, 2013 at 9:14 am

Eytan PaldiInterestingly, condition (2) above does not include the slanted boundary vanishing condition.

22 December, 2013 at 9:18 am

Terence TaoThe function F need not vanish on the slanted boundary, but in order for to have vanishing marginals, one needs f to vanish to order k at this boundary (but this does not give any vanishing of at this boundary).

22 December, 2013 at 1:40 am

Andrew SutherlandSome further improvements to bounds on H:

395,154 for m=2 using k=35,146.

24,490,410 for m=3 using k=1,630,680.

485,825,850 for m=4 using k=25,589,558.

The last bound is obtained by taking k consecutive primes starting with 2,504,011.

22 December, 2013 at 2:49 pm

Wouter CastryckThere appears to be a precision issue with the value of k for m = 4. One can increase the precision by putting e.g. Digits := 50; on top of the maple file (the default value is 10).

For m = 2 and m = 3 I think the values are correct, although it seems that for m = 3 one can improve k to 1,628,944 by letting c := 1.0043/log(k) and T := 0.8008/log(k).

For m = 4 one can obtain k = 75,000,000 by letting c := 0.999/log(k) and T := 0.838/log(k).

For m = 5 one can similarly find k = 3,400,000,000 by taking c := 1.012/log(k) and T := 0.710/log(k).

Note that the ratios

all lie close to the asymptotic upper bound .

22 December, 2013 at 6:58 pm

Andrew SutherlandUsing Wouter’s improved k=1,628,944 for m=3 (with Deligne, no EH) the bound on H can be improved to 24,462,790.

22 December, 2013 at 3:45 pm

Andrew SutherlandTaking the first k primes greater than k gives the bounds 1,523,781,850

82,575,303,678 on H for m=4 and m=5, respectively.

22 December, 2013 at 3:26 pm

Terence TaoThanks for catching that! I had noticed that the m=4 values of k were curiously small, but I trusted in Maple without investigating it further.

I’ve moved the enormous table to a subpage of the wiki ( http://michaelnielsen.org/polymath1/index.php?title=Timeline_of_prime_gap_bounds ), leaving only the current “world records” for the main page. For anyone interested in filling in the blanks:

* to get a value of k for a given m on EH, one can run the previous Maple code (with enough digit precision, of course), with the aim of getting M_k (which in the maple code is called M0-Delta) greater than 2m. For instance, if one can get M0-Delta larger than 6, then this will be a good value of k for the m=3 EH problem. (The portion of the code relating to varpi and delta may be deleted in this case.)

* to get a value of k for a given m without EH but with Deligne, run the code as is.

* to get a value of k for a given m without EH or Deligne, replace 1080*varpi/13+ 330*delta/13 in the last line of the code with 168*varpi+48*delta.

22 December, 2013 at 7:17 pm

Andrew SutherlandFor m=5 under EH it looks like k=309954, c=0.9821/log(k),T=0.95/log(k) works.

Hensley-Richards then gives the bound 4,316,466.

23 December, 2013 at 3:32 am

Andrew SutherlandUsing a greedy sieve, the m=5 EH bound with k=309954 can be improved to 4,146,936.

Also, the m=3 EH bound with k=5511 can be improved to 52,116.

23 December, 2013 at 11:34 am

Andrew SutherlandJust for the sake of filling in the last blank spot in the table, for m=5 without EH or Deligne it looks like we can get k=5,300,000,000.

Taking the first k primes greater than k then gives the bound 131,161,149,090 on H.

22 December, 2013 at 8:45 pm

xfxieFor m=4, k can be decreased to 74487363.

For m=3, k can only be decreased 1 to 1628943.

23 December, 2013 at 6:54 am

Andrew SutherlandThere is a slight error in what I wrote above, for m=4, k=74487363 (Deligne, no EH) the bound you get by taking the first k primes greater than k is 1,512,832,950.

23 December, 2013 at 3:16 am

Andrew SutherlandFor m=3, k=1628943 (Deligne, no EH) we can get 24,462,774.

For m=4, k=74487363 (Deligne, no EH) we can get 1,512,832,958 by taking the first k primes greater than k.

22 December, 2013 at 4:57 pm

Andrew SutherlandFor m=3 under EH, it appears that with k=5511, c=0.965 and T=0.973, we get M0-Delta > 6. This then gives the bound 52,130.

22 December, 2013 at 6:49 pm

Andrew SutherlandFor m=3 without EH or Deligne, it appears that k=2114964, c=1.00/log(k), T=0.83/log(k) works.

Hensley-Richards then gives the bound 33,661,442.

(also, the values for c and T given for m=3 with EH above should of course be divided by log(k)).

23 December, 2013 at 9:08 am

Andrew SutherlandOf course, this makes sense (I hadn’t actually noticed that varpi was negative), thanks for the clarification.

23 December, 2013 at 7:11 am

Andrew SutherlandFor k=2,114,964 (m=3 no EH or Deligne) we can get H down to 32,313,942.

23 December, 2013 at 4:27 am

Andrew SutherlandI was independently optimizing the parameters in the two cases, per Terry’s post above. I certainly don’t claim to have found the optimal values of c, T, k, so possibly there are parameter choices that give the same value of k in both cases, but I don’t think so. This doesn’t contradict what you are saying, because the maple computations are giving upper bounds on k that aren’t necessarily tight.

If this is indeed the case, it suggests that optimizing the EH case for m=2N might give slightly better bounds on k for the BV case with m=N.

23 December, 2013 at 8:51 am

Terence TaoIf one takes values of c,T that work for m=2N under EH, then they will automatically work for m=N under BV; the value of varpi computed will be negative, but that’s OK, BV basically is the assertion that all negative values of varpi are admissible.

23 December, 2013 at 3:39 am

Andrew SutherlandActually, both values can be reduced.

For m=2 without EH or Deligne we can get k=41709 using c=0.9972/log(k) and T=0.84/log(k). This gives the bound 476,322.

For m=4 with EH we can get k=41589 using c=0.9793/log(k) and T=0.94/log(k). This gives the bound 474,600.

23 December, 2013 at 3:46 am

Aubrey de GreyThat’s odd; I thought that any k that works for m=2N under EH should also work for m=N under BV, and a smaller k should always be achievable when we only ask to avoid EH or Deligne (except when N=1 because k is now so small). Is that not so?

23 December, 2013 at 2:43 am

Aubrey de GreyThe value for m=2 without EH or Deligne is still the value under just BV, I think (identical to the value for m=4 under EH); presumably it can be (slightly) reduced.

23 December, 2013 at 8:20 am

Andrew SutherlandFor m=4 without EH or Deligne it looks like k=105,754,838 works. Taking the first k primes greater than k gives the bound 2,186,561,568 on H.

23 December, 2013 at 5:51 am

xfxieFor m=4 with EH, k can be 41588.

23 December, 2013 at 10:07 am

Andrew SutherlandWith k=41588 we can get H down to 474,372, which is relevant to both the m=4 EH and m=2 BV cases.

24 December, 2013 at 8:09 am

Andrew SutherlandYes, I actually took a look at this yesterday. But, in contrast to the m=4 EH and m=2 BV cases, the parameters k=2114964, m=3, c=1.0/log(k), T=0.83/log(k) (which surely can be optimized further) used in the BV case actually don’t give M_k > 12 and I couldn’t easily find a choice of parameters that works for the EH case with m=6 with k smaller than 2114964 (it’s a bit time consuming to check with k this large and the precision set higher, so I admittedly did not look all that hard).

24 December, 2013 at 4:24 am

Aubrey de GreyBy the same token, there’s a good chance that the current m=3 value avoiding EH and Deligne can be reduced by exploring m=6 under EH.

23 December, 2013 at 6:58 am

Andrew SutherlandWith k=41588 we can get H down to 474,460.

23 December, 2013 at 6:08 am

xfxieFor m=5 with EH, k can be 309661.

23 December, 2013 at 7:02 am

Andrew SutherlandWith k=309661 we can get H down to 4,143,140.

23 December, 2013 at 3:00 am

Aubrey de GreyTerry: is there a simple function of varpi and delta that gives rise to the quantities 52/283 and 4/43 in the asymptotic formulae shown in the new table?

23 December, 2013 at 5:27 am

Aubrey de GreyAh, of course: 4*varpi / (1/4 + varpi).

23 December, 2013 at 2:59 am

Eytan PaldiIt seems better to place several explanations and links to wiki pages (now appearing above the large table) above the current records in the main page.

[Explanation and links moved – T.]22 December, 2013 at 9:52 am

Andrew SutherlandThe bound for m=2 can be improved to 395,122. The bound for m=4 can be improved to 473,244,502.

The last bound comes from the (symmetric) Hensley-Richards sieve, which yields the admissible 25,589,558-tuple consisting of the primes in [2623573,236622251], their negations, and +/-1.

23 December, 2013 at 8:09 am

James MaynardI thought I’d give a quick update on the computation for using the prism , I get the bound , which is only a modest improvement on the previous bound .

It is possible that one needs to take rather higher degree approximations, since the vanishing conditions severely restrict the number of degrees of freedom of small degree polynomials (with degree 5, they reduce about 200 degrees of freedom for the coefficients of polynomials outside of to only 20, and its much more restrictive for degree 3 or 4). There are currently some RAM issues using higher degree polynomials, however. There’s also a pretty good chance there are typos in my code since the decompositions are in quite a few regions, although the results seem to pass some simple sanity checks.

23 December, 2013 at 9:22 am

Aubrey de GreyAlso, James, how does M”_4 vary with degree? The recent analysis of M_k (with medium k) in the BV setting suggests that one might be able to use extrapolation to get a pretty good feel for the magnitude of the benefit for M”_4 from increasing d. Maybe that’s overoptimistic if the furthest you’ve been able to go in this case is d=5, but it can’t hurt to have a look at the d<5 values for a speculative trend.

23 December, 2013 at 11:12 am

James MaynardI think its difficult to extrapolate much because of the vanishing constraints. If I remember correctly, when the degree was 2 or less then the polynomials outside of had to just be 0. When the degree was 3 there was only one degree of freedom, and it was only when the degree was 4 that one was really able to overcome the constraints and have enough flexibility to really get an improvement. (By comparison, the polynomial approximations on the region seem to converge quickly, and one can guess quite easily what we might expect here)

23 December, 2013 at 9:59 am

Terence TaoThat’s slightly disappointing, but at least it is still progress. (Though the k=4 problem is beginning to feel like a real-life Zeno’s paradox, in that we keep halving the distance to the goal without actually reaching it…)

I was trying to see how much this extra room (especially in the BV regime ) gives us in the asymptotic analysis for large m, but thus far the gains I could get in were exponentially small in k, which is perhaps consistent with the degradation we are experiencing as we pass from k=3 to k=4.

Actually this reminds me of your earlier idea to try to compute as an additional term that can essentially be added on top of the M”_k quantity; roughly speaking, we win (in the m=1 case) when

Previously, I had thought this trick was not useful in the EH setting, but now that we have this additional room beyond to play with, and it’s not proving to be super-useful in directly improving M_k, one could perhaps “borrow” some of this room to instead get some nontrivial lower bound on . This is another quantity that is likely to decay exponentially in k, but it could possibly be enough, combined with everything else, to push us over the top. (But there is a tradeoff; in order to get a nontrivial lower bound for this quantity, one has to retract the support of F to be a bit away from the slanted boundaries of R.)

I’m not sure how to best compute this quantity though, and the optimisation could be tricky (it won’t be a simple quadratic program any more). I guess we still have some simpler things to try right now (switching to a different polytope, or increasing the degree), so maybe we can wait until the other ideas are exhausted before taking a harder look at this one.

23 December, 2013 at 10:10 am

Terence TaoA variant of the above idea is to modify the Selberg sieve combinatorially to remove small prime factors, e.g. by multiplying it with a Brun sieve, say

where is a small power of x. But then one has to keep the support of F at distance or so away from the slanted boundary of R, so it's not clear that this is a win. (But combinatorial sieves can perform slightly better than Selberg sieves when one is sifting very small primes, so there is some hope here.)

23 December, 2013 at 8:47 am

Pace NielsenJames, did you try the case for the simplex ? What happens as the degree increases in the simpler case? Do you run into RAM issues as quickly?

23 December, 2013 at 11:04 am

James MaynardI haven’t got round to that yet, but I don’t think there should be the same issues in the case, since I’d guess it should work more-or-less like the prism.

28 December, 2013 at 4:29 am

Aubrey de Grey3/2 not 4/3 ! (You’ll almost certainly have caught that, but I guess it can’t hurt to make sure.)

23 December, 2013 at 8:20 am

Aubrey de GreyMany thanks James. Is your code easily modifiable to explore the theta = 1/2 case, and in particular the double-size simplex that has just arisen as an option in that setting? I note that it is larger than the prism even when k=3, so the possibilities are most tantalising.

23 December, 2013 at 8:38 am

Aubrey de GreyTo clarify: I realise that when k = 4 we are only interested in theta = 1, but since the double-size simplex so hugely expands R in the BV setting I’m thinking that the “low k” and “medium k” analyses are now effectively merging, such that your and Pace’s computational approaches will increasingly synergise.

23 December, 2013 at 8:57 am

Pace NielsenI think one of the differences between our computations is that unless we come up with a better characterization of the disappearing marginal conditions, my decompositions will be restricted to the form Terry came up with above [to avoid having to break the simplex into an enormous number of subcomponents], whereas it appears that James can take advantage of all degrees of freedom. So it still may be worthwhile to see how the numerics behave for small .

23 December, 2013 at 11:06 am

Eytan PaldiThe two choices for above should be replaced by their corresponding unions with .

23 December, 2013 at 11:37 am

Terence TaoUnfortunately it’s a bit more complicated than this, due to non-convexity: the condition is that

but one cannot simply take R to be the union of, say, the prism and the simplex , because of the cross terms r+r’ where r lies in the prism and r’ lies in the simplex.

23 December, 2013 at 12:14 pm

Aubrey de GreyRight. Generalising my earlier comment about the impossibility to add to the prism when theta = 1 (which is still appearing at the very bottom of the page, presumably as a side-effect of Terry’s kind repair of my broken LaTeX): the only way that a non-convex polytope can work is if the half-size version of any convex subregion of it, when translated half-way towards any point in it that is not in that subregion, always lies within the half-size version of the polytope Terry describes for R+R. That’s a very onerous criterion, which I’m not sure can ever hold for an R that is not itself contained within an admissible convex R.

23 December, 2013 at 12:13 pm

Terence TaoI’m recording a comment which is not directly helpful for the immediate problems at hand, but might be useful later:

For the non-EH cases, in which is close to 1/2, it may be possible to enlarge the region that R+R can range in a little bit more (although it is beginning to appear that such enlargements do not offer a huge amount of gain in the large k case). There are at least two ways to squeeze a bit of further improvement. Firstly, an inspection of the argument in the blog post shows that for the purposes of determining this region, we do not need the EH hypothesis for arbitrary Dirichlet convolutions , but rather for convolutions of the form . Such sums are better behaved than the “Type I/II/III” sums considered in Polymath8a, and it is likely that we can improve the numerology a bit here, which would correspond to adding some additional space to the R+R region (but unfortunately does not improve the support condition on the marginals, as this requires EH for the von Mangoldt function).

Another possible gain comes from trying to get non-trivial estimates on expressions such as

where each lives in a range , and EMT is the expected main term (basically ). Currently, we can control this expression either when (so that and the inner sum is then easy to compute) or when (in which case we can apply to the divisor sum). There may however be a way to control this sum more directly, by completion of sums and then somehow eliminating the sieve weights, either by Cauchy-Schwarz or by using the special structure of these weights (they behave very roughly like smoothly truncated Mobius functions).

Again, this only enlarges the region that R+R is allowed to lie in,and does not improve the condition on the marginals of F, so any gain here is likely to be very modest (and only likely to be useful in the non-EH setting; it's hard for unconditional arguments such as completion of sums to compete with the consequences of the powerful EH conjecture.)

24 December, 2013 at 7:02 am

hcl891017Reblogged this on hcl891017's Blog.

24 December, 2013 at 8:00 am

Andrew SutherlandSome further improvements to bounds on H:

Under EH:

For m=4 with k=41,588: 474,320.

For m=5 with k=309,661: 4,137,872.

Without EH:

For m=3 with k=1,628,943: 24,462,654.

For m=4 with k=74,487,363: 1,497,901,734 is obtained by taking k consecutive primes beginning with 6,663,067 and ending with 1,504,564,801.

Without EH or Deligne:

For m=2 with k=41,588: 474,320 (per EH, m=4 bound above)

For m=3 with k=2,114,964: 32,313,878.

28 December, 2013 at 6:48 pm

Andrew SutherlandA few more improvements under EH:

For m=4 with k=41,588: 474,296.

For m=5 with k=309,661: 4,137,854.

2 January, 2014 at 8:45 am

Andrew SutherlandFor k=41,588 we can bound H by 474,290. This improves both the m=4 with EH and m=2 without EH or Deligne bounds.

9 January, 2014 at 10:06 am

Andrew SutherlandFurther improvement: 474,266.

24 December, 2013 at 9:41 am

Eytan PaldiIn the current records table, is it possible to estimate the orders of the two terms in the exponents? (i.e. their rates of decay for large m.)

24 December, 2013 at 2:25 pm

Terence TaoI did some calculations and it looks like that the o(1) terms are in fact , so for instance we have using Deligne. The point is that the lower bound is applicable with T comparable to , which means that we can take comparable to also. Using our best MPZ estimate, this allows us to take as large as . We win when ; putting this all together, it appears that k can be taken to be of size , which on taking the first k primes past k gives .

26 December, 2013 at 3:20 am

Aubrey de GreyTerry, does this mean there should be refinements to your Maple code that Drew and xfxie are using to improve the world records? Since the expressions with and without EH are now equivalent (setting varpi = 1/4 in the EH case), I’m wondering whether the anomaly of sometimes obtaining lower k for m=2N under EH than for m=N without EH or Deligne will go away (or if not, why not).

24 December, 2013 at 3:30 pm

Eytan PaldiThat’s good! (I asked that because for the EH case there was no term.)

28 December, 2013 at 6:52 am

Eytan PaldiIf for some , (for the region ), then the current exponents in the asymptotic bounds for can be improved.

29 December, 2013 at 9:42 am

Eytan PaldiThe definition of under the the title “current records” in the main page should be for primes (not “ consecutive primes”.)

(perhaps the modified for consecutive primes may also be studied here.)

29 December, 2013 at 9:51 am

Eytan PaldiI see now my error: according to the current definition of , it is indeed for consecutive primes!

30 December, 2013 at 6:42 am

AnonymousIn the last bullet at the beginning of the post: Maybe should be changed to .

30 December, 2013 at 9:35 am

Eytan PaldiA simple constraints formulation for :

Let . Let on and on (for polynomials .)

Observe that if is outside then is outside .

Therefore, the marginals vanishing constraints are (for )

(1)

Note that the constraints in (1) are only for the polynomial Q.

Remarks:

1. Under the restriction of to be symmetric, only one constraint in (1) (e.g. for ) is required.

2. In general (as seen by the above remark) the resulting constraints may be dependent.

3. Let be a polynomial satisfying

than the constraint for in (1) is

Which is equivalent to

i.e. there is a polynomial such that

Now, by taking the derivative with respect to we get the representation

(2)

With obvious generalization for the other values of (note that the representing polynomial is dependent on !)

30 December, 2013 at 9:42 am

Aubrey de GreyVery interesting! But please let’s not use “H” for this purpose, given its other use here for the past many months… and the same goes for “m”…

30 December, 2013 at 3:24 pm

Eytan PaldiThe third remark above is for on the region (on which ) – i.e. each “” appearing in that remark should be understood as the polynomial .

Note also that the computational complexity of the constrained problem (using the above formulation) should have the same order of magnitude as for the original unconstrained one (for approximation) – enabling approximation of for medium-sized (e.g. ) for which the original optimization program for is practical.

31 December, 2013 at 8:27 am

Terence TaoIt occurs to me that in order to get some understanding of how much additional gain one gets by enlarging the simplex to a larger region such as (while retaining the vanishing marginal conditions of the original simplex), one can work in the more tractable situation of cubes, in particular one can study how the analogue of improves when one replaces a cube with a larger cube , while requiring that the function have marginals supported on . As a crude zeroth approximation (ignoring log factors), is a bit like (although this approximation is too crude – the log factor is critical in making grow in k!).

First observe that the analogue of the original , namely the supremum of the ratio for supported on , is equal to , as can be seen by a routine Cauchy-Schwarz argument (and the extremiser occurs when F is the indicator function of ).

The analogue of is the supremum of the ratio for supported on , but having marginals supported on . Restricting to symmetric functions, we are now trying to maximise

(*)

subject to being supported in .

Another application of Cauchy-Schwarz shows that cannot exceed , and it is certainly at least , but the key question is which of these two bounds is closer to the truth.

We can convert this to a variational problem on as follows. Fix a symmetric function supported on , and consider the problem of maximising (*) subject to F being symmetric and having marginals G. This problem can be solved by Fourier analysis on the cube , which we identify (discontinuously) with the torus . The marginal conditions (and symmetry) constrain all the Fourier coefficients of that have at least one zero frequency ; the remaining Fourier coefficients of F (when all the are non-zero) serve to increase the denominator of (*) (by Plancherel’s theorem) but do not affect the numerator. Thus, the extremum occurs when those coefficients are set to zero. After a bit of fiddling, the extremum of (*) for a given choice of marginals G is then

(**)

where is the orthogonal projection of to those functions on whose Fourier coefficients are only non-vanishing for those frequencies with non-zero and zero. Thus for instance

At this point I’ll resort to back of the envelope calculations. In one dimension, if we set , then and . In this case, absorbs a fraction of the total energy , with absorbing the other . Tensoring this and using the law of large numbers, we expect in high dimension that the with will be absorbing the bulk of the energy of G. For such i, we have . So the second ratio in (**) is approximately , suggesting that is unfortunately close to . (If one considers the tensor power of other functions on than the indicator function, the situation gets worse.)

This suggests that the gain from the additional room is going to be modest. I’ll try and compute (**) more precisely in the case to get a better idea of what the gain will be (my initial guess is that it is exponentially small in k, but I have not yet done a real calculation here yet).

31 December, 2013 at 9:16 am

Terence TaoI’ve now done the calculation and the gain is indeed exponentially small in this case :(. We have , so by the binomial theorem

and so it appears that

with extremiser

which is basically shaving off an exponentially small piece from . (I’ve not yet fully verified that this is indeed the extremiser, but I’m reasonably sure of this.)

The situation appears to be very similar to that of James’ old suggestion of trying to lower bound the portion of the sieve where all of the are simultaneously composite. We expect this contribution to be exponentially small. On the other hand, any modification of the sieve that does not affect the marginals is only going to change the sieve on precisely those for which none of the are prime.

There is still a small ray of hope though that in the simplex case things are a little better. One thing about the cube is that every symmetric function on can be written as the marginal of a symmetric function on (this can be seen by Fourier series), however it is not clear to me yet that every symmetric function on can be written as the marginal of a symmetric function on . If this is not the case, then there could be additional gain coming from enlarging the simplex that is not present in the cube case. (But the pessimist in me thinks that this additional gain will also be exponentially small.)

1 January, 2014 at 6:44 am

John NicholsonIs equivilent to The reason for the question is the Ramanujan prime corollary found here:

https://en.wikipedia.org/wiki/Ramanujan_prime#Ramanujan_prime_corollary

It is similar to what is stated above, but brings the n values lower.

2 January, 2014 at 7:47 pm

Terence TaoI managed to work out the gain from James’ idea of subtracting off the contribution of when all the are simultaneously composite. I unfortunately don’t know how to get any nontrivial bound on this in the EH case (unless one pulls the support of F away from the boundary of the simplex, but then one will almost certainly lose more from the reduced support than one gains from the simultaneously composite term). In the BV case it appears one can shave about from , which unfortunately is too tiny to be of much use (unless we have the exceptional coincidence that is really, really, really close to 4).

The cleanest way to subtract appears to be to replace the sieve (which, in the BV case, is a divisor sum with moduli going up to about ) by the modified sieve

Note that the big sum here is bounded by , which is basically 1 by the fundamental theorem of arithmetic, so the sieve remains positive (modulo negligible errors). The cap is needed to keep the total divisor less than x, otherwise one can’t estimate the sum accurately. The additional factor does not impact the sums , but reduces slightly, by approximately

which by Mertens theorem is roughly . (Actually the presence of the sieve distorts this calculation a bit, but not significantly.

3 January, 2014 at 2:20 pm

Terence TaoHere is an idea that seems to give some flexibility in going beyond the simplex that the marginals of F are currently restricted to, at the cost of degrading the lower bounds that end up in the numerator of the M_k-type optimisation problem.

For sake of concreteness let us take k=3 and assume EH. We would like to lower bound quantities such as

(*)

where is the GPY Selberg sieve

for some smooth function , with in the EH case; we usually write things in terms of the mixed derivative of , for instance is proportional to (i.e. ) if F is supported in a suitable polytope (e.g. will do, we can also work with some larger regions R as discussed in the past).

For (*), the cutoff ensures that the factor in the definition of must be 1, and the expression simplifies a bit to

where is the (signed) divisor sum

.

If is supported on the simplex (which, in terms of F, means that the marginal vanishes for ), then the divisor sum in only involves moduli up to , and so on EH we can compute an asymptotic for (*) which is proportional to the familiar expression

Now suppose instead that is supported on a slightly larger simplex for some . Then now involves moduli larger than , and even EH can no longer help us get asymptotics for (*). However, we don’t truly need asymptotics here; a lower bound might suffice. For instance, we have the trivial lower bound of 0 no matter what the support of is. Less trivially, we split , where

and is chosen to agree with for but is otherwise arbitrary. The point is then that only involves moduli up to (the moduli larger than appear in both and , but they cancel each other out) and so EH can again be used to get an asymptotic for this part, which is proportional to . As for the component, we apply the trivial lower bound of 0. The net lower bound then is proportional to

where is the mixed derivative of .

We can choose to vanish for and agree with otherwise (well actually we need to smooth this function infinitesimally, but never mind this technicality), so we get a final lower bound of

The upshot is that we can now allow the marginal to be supported on the larger region , at the cost of replacing the functional appearing in the numerator of the variational problem with the slightly smaller quantity . Similarly for the other two functionals and . This leads to the more general variational problem of maximising

subject to the constraints that the marginals , , are supported on , , respectively, and are parameters that can be chosen arbitrarily, while F also needs to be constrained to a polytope R with (e.g. one can take the prism or the truncated simplex ). The case is our existing M”_3 optimisation problem, so this new problem can’t do worse than what we already have. The question is whether it actually does better, in that the increased flexibility given by relaxing the vanishing marginal conditions “beats” the loss in the numerator; I plan to do a perturbative analysis of the case when the are infinitesimal to see if this is the case. One reason for hope is that (in the k=3 case) if one sets all the way to 1 while keeping at zero (and taking R to be something like for some small ), then one collapses to the M”_2 problem, for which we already have a bound of 2 which is better than the current bound on M”_3 we have.

So these epsilon parameters help explain the puzzling lack of monotonicity of the M”_k that was pointed out some time ago, I think by Pace. Indeed the way I discovered this more flexible lower bound was by trying to reconcile the k=2 theory (in which we have the extensive analysis of Bombieri on EH, for instance) with the k=3 theory by trying to smoothly interpolate between the two.

3 January, 2014 at 6:08 pm

Eytan PaldiIs it still possible to get for some enlarged region ?

3 January, 2014 at 6:41 pm

Eytan PaldiI meant that arbitrarily small improvement of the current result by arbitrarily small perturbation of (if possible) is sufficient to show that EH implies (i.e. the twin primes conjecture!)

3 January, 2014 at 7:11 pm

Terence TaoUnfortunately, the parity problem blocks this: given any sieve coming from this sort of framework (as a divisor sum), one expects and to be small (this is because of the Mobius randomness heuristic, that predicts the Mobius function to fluctuate as if it were random on short progressions), which implies that neither of and can be much larger than , which basically means that none of the M”_2 type quantities can be larger than 2.

The analysis of Bombieri on the twin prime problem suggests that the parity problem is the only obstruction to the twin prime conjecture assuming EH; roughly speaking, Bombieri shows that if EH is true and twin prime is false, then for “almost all” n, the Mobius functions and have opposite sign (so if n has an odd number of prime factors, then n+2 almost always has an even number of prime factors). Similarly, if , then for “almost all” n, and have opposite sign, and for “almost all” n, have opposite sign. Since it is not possible for to all have opposite sign, this looks like a proof that . Unfortunately, the three notions of “almost all” here are quite different; roughly speaking, we only get opposite sign for most n with almost prime (but arbitrary), and similarly for other permutations of . I played around for quite some time today to try to salvage something from this approach, but without much success unfortunately.

4 January, 2014 at 11:00 am

Terence TaoI have tried a perturbative analysis of the effect of the epsilon parameters in my previous comment, but the results are frustratingly inconclusive: it seems that this additional freedom _can_ improve M”_k if the extremisers to that problem obey a rather random-looking inequality (roughly speaking, that the directional derivative of M”_k in epsilon is positive), but that this inequality need not hold in all cases. (For instance, perturbing off of the M”_2 extremiser in the k=3 setting appears to make the constant dip below 2, thus preventing a cheap proof of H_1 <= 6. On the other hand, since M”_3 is currently less than 2 when all epsilons are set to zero, and we can reach 2 by sending one of the epsilons to 1, the mean value theorem tells us that there must be at least some situation in which increasing epsilon helps.)

It may well be that the best way to see whether this flexibility has promise is to do some numerics, e,g. to recompute M'_3 but with the region enlarged to for some small (e.g. 0.1), but with the functionals replaced with . (Ideally we should perturb off of M''_k instead of M'_k, but this might be harder to code because we have to redo all the polytope partitioning.)

4 January, 2014 at 5:56 pm

Aubrey de GreyMight it be preferable to do such numerics for the k=4 case rather than k=3? I’m thinking that not only is k=4 the case of most current interest in the EH setting, but also that unless the non-monotonicity of M is particularly sadistic the result will give a more reliable indication of what would be worth trying for two-digit k, and these considerations may outweigh the drawback of the greater complexity than the k=3 case.

3 January, 2014 at 4:31 pm

Terence TaoI’ve been notified that the polymath wiki is currently experiencing some issues that make it either load very slowly, or not at all. They should be fixed soon, however.

4 January, 2014 at 6:58 pm

Eytan PaldiA necessary condition for solutions of the constrained optimization problem:

Let for be measurable subsets of and let be a closed subset of .

Consider the (somewhat generalized) problem

Where

Subject to

(1)

(2) is supported on

(for )

(3) is supported on .

Remark: w.l.o.g. we may assume that for where denotes the projection of on the hyperplane .

The functional can be represented as

(4) , where

(4′)

Observe that the constraint (2) is equivalent to

(2′) for each vanishing on (for ).

Motivated by this equivalence, denote (for ) by the subspace of functions in and vanishing on .

Therefore (by a standard calculus of variations argument), under the constraints (1), (2) (i.e. (1), (2′)) any variation of should be orthogonal to and to the functions in the subspaces . Therefore if is a solution to the optimization problem, it follows (from (4)) that is orthogonal to any such variation, implying

Theorem: If is a solution to the optimization problem, then

Moreover, is the optimal value of the objective function.

Remarks:

1. This theorem shows that an optimal solution may be discontinuous on the boundaries of and – with some implications to polytopes partition for piecewise polynomial approximation (the approximation usually converges slowly with the polynomials degrees if the solution is not smooth inside some polytope.)

2. It seems possible to find the optimal solution also for the region .

5 January, 2014 at 11:53 am

Eytan PaldiIs it possible to apply the idea of enlarging the sieve support also to the original one dimensional sieve (e.g. by using of type over the enlarge simplex )?

5 January, 2014 at 2:32 pm

Terence TaoYes, I believe so, and this is a good model problem to work on since one can in principle solve the optimisation problems explicitly in this case.

The one-dimensional analogue of M_k is the supremum of

as f ranges over smooth functions that are compactly supported in [0,1] (and not identically zero). This supremum was calculated to be , attained when

(Strictly speaking one should smooth this out at t=1, but never mind this technicality.) The modified optimisation problem in one dimension is the supremum of

as f ranges over smooth functions compactly supported in . At present, I do not know what this supremum is, but it looks like it can be computed explicitly, presumably in terms of Bessel functions. It does not seem obvious to me which way things will go: on the one hand we have more functions to optimise over (being supported on instead of ), but on the other hand the numerator has been shaved off by an epsilon.

Incidentally, the one-dimensional lower bounds for M_k are surprisingly competitive for small k. For instance, for k=2, the Bessel bound is 1.383, compared with the true optimum of 1.386; similarly for k=3 it is 1.635 versus 1.646, and for k=4 it is 1.820 versus 1.845.

6 January, 2014 at 12:58 am

Eytan PaldiThe modified problem for can be solved as follows:

1. Let , subject to

where ,

and ranges over the same smooth functions as above.

Clearly, .

2. In order to find the above infimum , we assume that it is attained by some which satisfies the Euler-Lagrange equation

(1) on .

and the regularity conditions

(1′) has zero limits at the endpoints of .

It is easy to verify (using integration by parts) that in the equation is the same which was defined as an infimum.

In order to show that the infimum is attained by certain solution of (1), we use the identity

(2)

Which follows (via integration by parts) for any smooth and any non-vanishing solution of (1) satisfying the regularity conditions

(2′) has zero limits at the endpoints of .

3. The equation (1) has on the (familiar) solution

, on .

While on the remaining interval we have

, on (for some constants )

The constants are determined such that the solution and its derivative are continuous at .

This gives as explicit linear combinations of at . It is easy to verify (e.g. by Taylor series) that

, on .

This gives the solution on the whole interval in terms of as the only parameter. It remains only to use this expression for the remaining constraint to get the final result

Theorem: , where x is the minimal positive zero of the function

Finally is determined from (as described at the end of step 1).

Remark: Now it is possible to optimize .

7 January, 2014 at 9:41 pm

Eytan PaldiOn the evaluation of :

The theorem in my comment above (representation of x as a minimal positive zero for a certain function) holds only for and should be slightly modified for the case because for this case the solution of the corresponding Euler-Lagrange equation on is given by for some constants . Using the same solution method as before we get almost the same theorem as before except that now x is the minimal positive zero of the modified function

6 January, 2014 at 9:30 am

Terence TaoA preliminary perturbative analysis looks encouraging: for small, we expect , where is the first zero of and is bounded. We Taylor expand

to arrive at

.

Standard Bessel identities tell us that , so that

and hence

and so

indicating that there is an improvement to be had (for one-dimensional sieving at least) by taking to be positive! Of course it would be good to back this up with numerical data.

Incidentally I have a somewhat more geometric way to explain why we have the “graceful failing” of the Selberg sieve in that when one enlarges the sieve cutoff support to the simplex of sidelength , one still gets a lower bound restricted to the smaller simplex of sidelength (as opposed to getting no nontrivial lower bound whatsoever). As I remarked previously in https://terrytao.wordpress.com/2013/11/22/polymath8b-ii-optimising-the-variational-problem-and-the-sieve/#comment-253205 , the Selberg sieve is the square of a linear combination of products of Ramanujan sums. Distribution hypotheses such as Elliott-Halberstam tell us, roughly speaking, that two such products , behave orthogonally (on average) if lies below a given threshold (or, if one is weighting against (say) , if lies below a given threshold). A model problem is then to lower bound the norm of a sum of vectors in an abstract Hilbert space, with the property that distinct and are orthogonal if for some threshold . (One should think of as corresponding to a linear combination of the in which (or ) is of a certain logarithmic size proportional to , e.g. .)

If one is only summing values of j up to , then all the are orthogonal, and by Pythagoras we have

which corresponds to the standard Selberg sieve calculations. But if we sum up to , there are some inner products between for which we do not have good control. However, all the with are orthogonal to those with , so we have

which corresponds to the gracefully degrading bound we have when we push beyond the classical Selberg sieve support region. In fact this bound is sharp if we happen to have the odd symmetry for .

6 January, 2014 at 10:44 am

AnonymousTypographical comment: If \dots is used instead of \ldots, the correct formatting of the ellipsis is automatically obtained. It shouldn’t always be lowered dots.

6 January, 2014 at 5:44 pm

Eytan PaldiThere are two typos in the perturbation analysis (with the same conclusions):

1. In the line above the standard Bessel identities, the sign of the second term should be “+”.

2. The main term of should be . (i.e. should be less than )

[Corrected, thanks – T.]6 January, 2014 at 6:36 pm

Terence TaoRegarding second order expansion: this can in principle be done, but is a little messy; it may be faster simply to work things out numerically for small k. (For large k, the one-dimensional version of M_k never exceeds 4; after optimising in epsilon, it is potentially possible that one could breach 4 (giving yet another proof of Maynard’s theorem!), but it is unlikely to be competitive with the multidimensional sieving.)

Incidentally, there is an annoying technicality when trying to apply the one-dimensional improvements to the EH problems. Basically, the one-dimensional functions f supported on are only admissible when . Thus, strictly speaking, in the EH case we don’t get to increase epsilon past 0 at all. But the problem is only at the “corners” of the simplex ; if one truncates away the portion of this simplex that pokes outside of then the cutoff is admissible again, and a back-of-the-envelope calculation indicates that this only costs to the M_k quantity. So the perturbative analysis still suggests a gain in this case, though actually calculating it would be messy.

Unfortunately, if one insists on the one-dimensional ansatz then there is no gain by enlarging the region and imposing additional vanishing marginal conditions; these marginal conditions basically collapse to requiring that the antiderivative of G vanishes outside of the original interval [0,1] (or , if one is playing the epsilon game) and so we don’t actually get any new functions this way. One can play with more complicated ansatze to deal with this, but this gives up the key advantage the one-dimensional analysis had, namely that it was simple enough to solve exactly, and we may as well return to the numerical analysis of the multidimensional problem at that point.

6 January, 2014 at 8:00 pm

Eytan PaldiI think that in the EH case we still have for the enlarged simplex (so we can increase up to .)

In general, we can increase up to .

7 January, 2014 at 9:18 am

Terence TaoIn principle yes, but there is a truncation problem: the function F cannot range in the entire simplex , but only in the truncated simplex . For instance when k=3 on EH one can be supported in , as identified by Pace. So a function of the form doesn’t directly work if G is supported on . One can try working with truncated functions , with G supported on for some , and the marginals of F being integrated up to (and no vanishing marginal conditions required), but the variational problem becomes a bit trickier. Let’s see: with the setup given, the denominator is (viewing the support of F as the simplex with three copies of removed), and a numerator being equal to

writing as the antiderivative of G, the problem is now to maximise

This looks too messy to optimise exactly, but perhaps a numerical optimisation is still feasible (but in that case we may as well work with arbitrary polynomials on this polytope, rather than those coming from the one-dimensional ansatz.

6 January, 2014 at 11:54 am

Eytan PaldiIt seems that a second order expansion of should give an initial approximation for the optimal value of as well as an estimate of the gain in . (this may also be used to derive a Newton-type iteration for the optimal .)

Another possibility is to apply the enlarged simplex

with the vanishing marginals constraints to the one dimensional sieve and see if these constraints can be satisfied by a one dimensional sieve of the form . It seems that since in this case is symmetric, only one constraint is needed.

7 January, 2014 at 9:19 am

Pace NielsenAs in Eytan’s post, letting where with the minimal positive zero of the function , numerical calculations show that when the maximum is achieved at:

, with .

7 January, 2014 at 9:31 am

Pace NielsenAnd if I’m doing this correctly, we also get for and that . So, assuming this can be verified, we just passed the $k=5$ barrier under EH.

7 January, 2014 at 10:19 am

Terence TaoThat’s very promising! Unfortunately, there is a technical speedbump right now: Eytan’s analysis is only directly applicable when (because the cutoff F cannot range freely in , but must also stay inside ). So strictly speaking we don’t get anything new under EH (in which ) yet. However it should be possible to develop a truncated version of this analysis. The situation is getting a bit complicated, I will try to lay out where we’re at exactly in the next blog post (which we’re about due for anyway), though it may have to wait until tomorrow, as I have some other things to attend to today.

7 January, 2014 at 10:56 am

Terence TaoIn a bit more detail:

given any function of compact support, define the functionals

and

Define to be the supremum of the ratio

as F ranges over non-zero functions supported on the polytope . Note one can restrict to symmetric F, in which case the numerator may be simplified to .

Then we can prove that implies whenever we can find such that . For instance on EH we have whenever .

In the range , collapses to , defined as above but with F allowed to range over the entire simplex . Otherwise, could be smaller than , but the key question is how much smaller (in particular, can we still get larger than 2 for close to 0.18).

Define be the same quantity as , but with the function F restricted to have the one-dimensional form for some G supported on . This quantity , which is a lower bound for , was computed by Eytan as with a moderately complicated formula for .

In a previous comment https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-262154 I worked out the variational problem for , where is the same as but with F restricted to be of the form . It may well be that is already greater than 2 for some , in which case we win.

Perhaps one could modify the previous code for computing M’_k for small values of k to now numerically compute lower bounds on or by testing various symmetric polynomials on the polytope . (As a warmup and a checksum, one could check that for one recovers the numbers obtained by Eytan’s exact analysis.)

7 January, 2014 at 1:52 pm

Terence Taop.s. it looks like one can also enlarge the domain of F, the same way that it was enlarged to get from M_k to M’_k or M”_k. More precisely, rather than F being supported in the region , one can have F supported in a larger polytope R, so long as R+R is contained in

and the marginals of F are supported in . For instance, one could have F supported in

which gives a slight enlargement of the region on which is optimised over (similar to how the polytope gives a slightly larger value M’_k to the variational problem than the value M_k associated to the simplex ). So there is a lot of opportunity here to bump up the ratio to be above 2 in the k=4 case!

6 January, 2014 at 4:59 pm

Pace NielsenComputations now show that one can take which gives bounded gaps of size . In particular, searching over the space of symmetric polynomials of the form where the total degree is and is a signature which involves only even entries with total degree (and dropping the last three such signatures, due to RAM issues) yields

There were a few things that made it difficult for me to do this computation. First, I went on vacation and when I returned I found out that the computer with 64GB of RAM had gone to sleep. Also, I found that I ran into another RAM issue when trying to compute the inverses of or larger matrices (and a similar problem when finding the largest eigenvector of similarly sized matrices). Fortunately, after quite a few hits and misses I obtained the stated result.

It seems very suprising to me that restricting to symmetric polynomials of the given form works so well. Other choices for the basis don’t give very good results.

For , the following are the bounds I get using all symmetric polynomials of a given degree .

For ,

For ,

For ,

For ,

For ,

For ,

For ,

For ,

For ,

For ,

For ,

For ,

For ,

It would be an easy computation to push this up to . It appears that, at present, the convergence rate is somewhere in the interval (just as it was for ). If the numbers continue at the rate then it doesn’t appear that we ever get , but if the rate is then we would need about to finally get .

7 January, 2014 at 5:16 am

Eytan Paldiis needed for EH[1/2], but with our current it seems that we need only to have which is satisfied by (and expected to hold even for ) but not for (since the upper bound on is 3.806…)

7 January, 2014 at 6:27 am

Eytan PaldiIt should be remarked that the above is not rigorous because

1. It is based on MPZ[] for which the current estimates are only for the asymptotic case of .

2. It is based on a tentative value of .

7 January, 2014 at 9:37 am

Terence TaoYes, we still have to figure out how to effectively apply to these low-k situations. The main problem is that is no longer negligible in this regime – even with our most optimistic MPZ estimate, we cannot get larger than , and so the quantity cannot be taken any larger than . Now, a naive truncation of the simplex to allow us to use MPZ would leave us with . For large k, this is exponentially negligible, but with k being about 50, it’s somewhat noticeable; the fraction of volume of that is lost when passing to is roughly , which for the best-case value of , is about 0.01, so in principle this changes M_k by about 1% which corresponds to worsening k by one or two here.

In principle we can use the more advanced truncation used in Polymath8a, but it will be harder to estimate the errors; in 8a we had the advantage of some convexity properties of the Bessel function which allowed one to get civilised bounds on the errors, but from the numerical polynomials we have here, it is unlikely that we will be able to detect any analogous convexity.

An alternative approach would be to try to redo the Type I/II/III estimates with the aim of getting delta as large as possible (rather than varpi). We can take for free with the Selberg sieve construction; our Type III estimates can cope with such large values of delta (taking sigma=1/10), but not the Type I or Type II estimates, with the Type I being the bottleneck.

I guess what one can do is to start with the extremisers for the untruncated polytope that Pace located for a test value of k, say k=50, and try to compute what happens to M_k when one applies naive truncation for a given value of delta; i’ll try to work out some relevant formulae at some point (but today is busy for other reasons for me…).

6 January, 2014 at 5:53 pm

Aubrey de GreyThank you for your perseverance, Pace! Let us hope that this (or a similarly efficient) choice of signatures is similarly effective for M’ and especially M”.

6 January, 2014 at 6:11 pm

Eytan PaldiThis confirms our earlier expectations about (it was expected that should be within 0.001 from 4.)

7 January, 2014 at 1:59 pm

Aubrey de GreyInspired by the accuracy of my earlier prediction that M_54 would hit 4 when d=22, I’ve looked a little harder and found a way to quantify more precisely the trajectory of M_k with increasing d, which takes account of the fact that the convergence rate C_k,d = (M_k,d – M_k,d-1)/(M_k,d-1 – M_k,d-2) creeps up with d, and also of the fluctuation we have noted, i.e. that C_k,d is slightly larger than the trend when d is odd and smaller than the trend when d is even. Using the numbers that Pace just provided for M_53,d for d up to 13, I looked at the trend of C_k,d for even d and for odd d separately, and it seems that (C_k,d+2 – C_k,d) is very close to half of (C_k,d – C_k,d-2) when d exceeds 8; this also holds for the k=54 numbers that Pace provided before Christmas. I conclude that C_53,d converges to about 0.832, and that M_53,d is expected to surpass 4 at d=31. This may be of lower priority to explore than looking at M’ and M”, but the validity of the technique may be worth checking (in due course) in those settings too so that one can quickly home in on the right range of d to get the optimal k.

7 January, 2014 at 2:47 pm

Aubrey de GreyIn order to explore whether the above extrapolation is likely to hold up, let me give the values it predicts for M_53 for d=14 through 18, since Pace notes that those are easy to compute. I get 3.925720, 3.938891, 3.949805, 3.958898, 3.966443.

7 January, 2014 at 4:50 pm

Aubrey de GreyPS: this extrapolation does not take into account the feature, mentioned by Pace earlier, that M may rise more slowly with d once d exceeds k/2. However, the extrapolation has M reaching 3.995 by the time d=26, so there’s reason for optimism that the remaining 0.005 can be bridged, even though d=31 may not quite suffice.

8 January, 2014 at 10:59 am

Pace NielsenHere are the actual values: 3.925704, 3.938848, 3.949699, 3.958708, 3.966121. [I’m now leaving these sorts of computations in favor of the idea Terry introduced. I should have some results there later today.]

8 January, 2014 at 8:50 am

Eytan PaldiConsidering the one dimensional approximation of by a function represented as a one dimensional integral (using e.g. probabilistic interpretation of the integral), is it possible to generalize it for approximation of type by a (sufficiently simple) two dimensional integral ?

8 January, 2014 at 9:12 am

Eytan PaldiI meant that perhaps there is a sufficiently simple corresponding variational problem involving ratio of two dimensional integrals with optimal solution which may be found exactly.

8 January, 2014 at 9:51 am

Terence TaoWell, in some sense this is what James and Pace have been doing numerically for moderate k: in James’ orginal paper he worked with polynomial combinations of the first two symmetric polynomials , .

In principle, with the ansatz the problem does collapse to a two-dimensional one, but I am not sure it is so easily solvable. The thing is that the marginal distribution of becomes an integral of f along a parabola. If one replaces P_2 by the elementary symmetric polynomial then the marginal becomes an integral of f along a line, which looks a bit like a Radon transform so perhaps there is some chance of a clean formula there. On the other hand, the image of the simplex could be a little weird in these coordinates (I haven't tried to do the calculations yet).

Pace mentioned the phenomenon that the (partial) basis consisting of with all coefficients of even was a surprisingly efficient basis. I wonder if a similar phenomenon happens with James' original choice of partial basis. If so, this may suggest further investigation of the ansatz.

8 January, 2014 at 11:18 am

Pace NielsenFollowing Terry’s description of the expansion, I think I have worked out the case, but I want to double-check here that the formulas I’m using for the integrals are correct. I used:

and

If those are correct, my computations yield that using an arbitrary polynomial symmetric of degree 3. Increasing the degree further allows for slight improvements in the value of .

8 January, 2014 at 11:35 am

Terence TaoYes, this looks like the correct formulae to me (assuming , of course).

The fact that M_3 improved from 1.646 to 1.8615 (close to the 1.87459 that you got earlier from the 1D analysis without truncating to the unit cube) is a very promising sign, given that the separate improvement of getting from M_3 to M”_3 moves from 1.646 to 1.914. It gives hope that a combination of the two ways to “think outside the simplex” could even crack 2 in the k=3 case, giving on EH (which I think would be a good point to declare victory for the EH case ;-) – if we somehow manage to get a fantastic value of M_3 like 2.5 or so then we might think about using the trick to even hope for , but I doubt that all our ideas put together will get that far yet.)

8 January, 2014 at 12:01 pm

Pace NielsenI worked out the case and get that . If someone would like to look over my Mathematica code to verify the computation just email me. (It involves a mixture of stuff I wrote, and stuff I borrowed from James; but it is very straightforward.)

I’ll start thinking about how to do the case with vanishing marginals and expansion together. For some reason the vanishing marginals condition is still somewhat mysterious to me, and I have a few other jobs to finish this week, but I hope to get to it soon.

8 January, 2014 at 12:10 pm

Aubrey de GreyBravo! As Terry has noted in the new thread, the vanishing is automatic for M’ (though not for M”), so it may be useful to start with M’, especially since (in the pre-epsilon world) M’_3 beats M_3 by almost 2/3 as much as M”_3 does.

8 January, 2014 at 6:08 pm

Pace NielsenTerry is correct, I was using the wrong notation. Those bounds are for .

8 January, 2014 at 12:20 pm

Eytan Paldican you please give more information on the maximal degree and the number of monomials used for this new record?

8 January, 2014 at 12:31 pm

Aubrey de GreyActually this seems even more promising when noting that (again in the pre-epsilon setting) M’_3 is only 0.003 less than M_4 – hence you only need a similar improvement of M’ from introducing epsilon as you achieved for M, in order to reach 2 without any need to explore vanishing marginals.

8 January, 2014 at 12:39 pm

Pace NielsenEytan, I only used degree 1. It appears that when using larger degree, you can get close to the number predicted by your work.

8 January, 2014 at 1:00 pm

Eytan PaldiIt is really surprising (almost “too good to be true”) to get this remarkable record using only degree 1 monomials!

8 January, 2014 at 1:48 pm

Pace NielsenI agree it is almost too good to be true, which is why I want someone to double check my computation. (But given that the computations agree with the old ones when , and that the numerics seem to be converging below the bound you obtained, I think everything is okay.)

8 January, 2014 at 2:35 pm

Eytan PaldiThe bound is not sufficient (a modified bound corresponding to a variational problem for a simplex truncated by a cube is needed – I’m trying to solve it.)

8 January, 2014 at 2:41 pm

Terence TaoI think there may be some confusion with regards to notation: I believe (based on his description of his k=3 analysis) that when Pace is referring to , he is in fact referring to what we’ve also called , in which the function F is supported on the truncated simplex , rather than the untruncated simplex . For the latter, Pace got a lower bound of 2.01869 for k=4 using the 1D ansatz at https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-262158, which is consistent with the slightly smaller bound of 2.0023 reported for (what I believe is) the truncated simplex problem .

8 January, 2014 at 2:36 pm

Terence TaoExcellent! So it looks like we have k=4 (and hence ) on EH, finally dethroning that stubborn bound of 12 on EH that we had not been able to improve upon since the start of Polymath8b. (If you have explicit coefficients for the polynomial optimiser, we would be able to check the various four-dimensional integrals separately with just about any computational mathematics package.)

8 January, 2014 at 6:09 pm

Pace NielsenI’ll get those coefficients to you tomorrow.

8 January, 2014 at 9:10 pm

Pace NielsenI planned to give you the polynomial from my work computer tomorrow, but knowing that we only need a linear solution it was a simple task to solve for the polynomial another way. So take .

8 January, 2014 at 10:25 pm

Terence TaoMaple confirms the value of 2.002351848 for these choices of parameters – looks like on EH is solid.

eps := 0.18;

a := 0.784;

denominator := int( (1-a*t)^2 * t^3/6, t=0..1+eps) - 4 * int( (1-a*(t+1))^2 * t^3/6, t=0..eps );

numerator := int( int( 1-a*(t+s), s = 0..1)^2 * t^2/2, t=0..eps) + int( int( 1-a*(t+s), s=0..1+eps-t)^2 * t^2/2, t=eps..1-eps);

M := 4 * numerator / denominator;

evalf(M);

9 January, 2014 at 6:28 am

Eytan PaldiIt is really impressive to get this new record using only linear one dimensional sieve! this shows that the -relaxation method is very powerful!

(now it seems that even is possible.)

17 April, 2014 at 9:05 am

Tony FengI apologize if I am misunderstanding, but this requires only EH and not GEH, right? I am wondering why the wiki page still says that 12 is the best result known on EH only.

17 April, 2014 at 9:17 am

Terence TaoBack in January we didn’t fully appreciate the difference between EH and GEH, and referred to the two hypotheses interchangeably. As it turns out, if we wish to leave the standard simplex (i.e. to rely on any M-quantity other than the “vanilla” ) then (when ) we have to use GEH to control the “denominator” in the sieve analysis; it *may* be possible to rely just on EH using Bombieri’s asymptotic sieve, but this has not been attempted.

8 January, 2014 at 11:21 am

Polymath8b, V: Stretching the sieve support further | What's new[…] for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on can be found at the wiki page (which has recently returned to […]