This is the fourth thread for the Polymath8b project to obtain new bounds for the quantity
either for small values of (in particular
) or asymptotically as
. The previous thread may be found here. The currently best known bounds on
are:
- (Maynard) Assuming the Elliott-Halberstam conjecture,
.
- (Polymath8b, tentative)
. Assuming Elliott-Halberstam,
.
- (Polymath8b, tentative)
. Assuming Elliott-Halberstam,
.
- (Polymath8b, tentative)
. (Presumably a comparable bound also holds for
on Elliott-Halberstam, but this has not been computed.)
- (Polymath8b)
for sufficiently large
. Assuming Elliott-Halberstam,
for sufficiently large
.
While the bound on the Elliott-Halberstam conjecture has not improved since the start of the Polymath8b project, there is reason to hope that it will soon fall, hopefully to
. This is because we have begun to exploit more fully the fact that when using “multidimensional Selberg-GPY” sieves of the form
with
where , it is not necessary for the smooth function
to be supported on the simplex
but can in fact be allowed to range on larger sets. First of all, may instead be supported on the slightly larger polytope
However, it turns out that more is true: given a sufficiently general version of the Elliott-Halberstam conjecture at the given value of
, one may work with functions
supported on more general domains
, so long as the sumset
is contained in the non-convex region
and also provided that the restriction
is supported on the simplex
More precisely, if is a smooth function, not identically zero, with the above properties for some
, and the ratio
is larger than , then the claim
holds (assuming
), and in particular
.
I’ll explain why one can do this below the fold. Taking this for granted, we can rewrite this criterion in terms of the mixed derivative , the upshot being that if one can find a smooth function
supported on
that obeys the vanishing marginal conditions
is larger than , where
and
then holds. (To equate these two formulations, it is convenient to assume that
is a downset, in the sense that whenever
, the entire box
lie in
, but one can easily enlarge
to be a downset without destroying the containment of
in the non-convex region (1).) One initially requires
to be smooth, but a limiting argument allows one to relax to bounded measurable
. (To approximate a rough
by a smooth
while retaining the required moment conditions, one can first apply a slight dilation and translation so that the marginals of
are supported on a slightly smaller version of the simplex
, and then convolve by a smooth approximation to the identity to make
smooth, while keeping the marginals supported on
.)
We are now exploring various choices of to work with, including the prism
and the symmetric region
By suitably subdividing these regions into polytopes, and working with piecewise polynomial functions that are polynomial of a specified degree on each subpolytope, one can phrase the problem of optimising (4) as a quadratic program, which we have managed to work with for
. Extending this program to
, there is a decent chance that we will be able to obtain
on EH.
We have also been able to numerically optimise quite accurately for medium values of
(e.g.
), which has led to improved values of
without EH. For large
, we now also have the asymptotic
with explicit error terms (details here) which have allowed us to slightly improve the
numerology, and also to get explicit
numerology for the first time.
— 1. More details on the relaxed support conditions —
Fix as above, which we take to be a polytope that is also a downset, and let
be the class of smooth functions
, not identically zero, supported on
whose restrictions (2) are all supported on
. We need a technical analysis result, that any
in
can be approximated by a finite linear combination
of tensor products
, each of which supported in a cube in the interior of
of arbitrarily small specified side length, such that the ratio (3) for
and
are arbitrary close. This is almost a standard application of Stone-Weierstrass, but the vanishing properties of
mean that some care has to be taken.
First, by using a slight dilation to shrink a bit, we can assume that
is supported on
for some small
, and all the restrictions (2) supported on
, while only modifying (4) by an epsilon. We extend
from
to
by setting
is now Lipschitz and piecewise smooth instead of smooth. Applying a translation (and then truncating back to
, we can keep
supported in
while also vanishing in a neighbourhood of the regions
for each . This does not affect (3), but
is now piecewise smooth and Lipschitz rather than smooth. However, by convolving with an appropriate approximation to the identity, we can make
smooth again, while still supported in a slightly smaller version of
obeying the above-mentioned vanishing property. By using the Lebesgue differentiation theorem one can show that (3) is almost unaffected by this if we make the approximation to the identity sufficiently narrow. We can then smoothly partition
into smooth functions supported on small cubes, each contained in
and avoiding the regions (5) even after a slight dilation, and then by using Fourier series one can approximate each such component of
by a finite combination of tensor products of smooth functions, that are still supported in
and avoiding (5), and the claim follows.
From all this (and the fact that the property of (3) being larger than is an open condition) we may assume that
is a finite sum of tensor products
supported on small cubes in
avoiding (5). For the GPY argument to work (as laid out in this previous post), we need the asymptotics
and
to hold, where
We can generalise these bounds slightly to the depolarised bounds
to hold, where
and are linear combinations of smooth tensor products supported on small cubes in
. By linearity, we may assume that
are each supported on a small cube
respectively.
For the sums (7), the “” calculations in the previous blog post already suffice, because
avoids the (5) and so divisors appearing in
have magnitude at most
(or
for some small
). For (6), a compactness argument and the inclusion of
in (1) shows that if the cubes are small enough, we have either
or
for some and some
. In the former case, we can compute much as in the “
” calculation in the previous blog post. Now suppose that we are in the latter case. Without loss of generality let us take
, thus
If we use the tensor product structure to write
and
for smooth supported in appropriate cubes or intervals, we can refactor
where
One can write as a Dirichlet convolution
, where
As has
component in
,
is supported on values of
that are at most
. Assuming an Elliott-Halberstam conjecture
for such Dirichlet convolutions, we see that
is well-distributed in most residue classes of modulus up to
, and one can then obtain asymptotics for (8) by a computation very similar to that used to establish (7); we omit the details.
149 comments
Comments feed for this article
20 December, 2013 at 2:56 pm
Derek O.
As an amateur mathematician and number theorist, I’ve been keeping up with this project since May. I know Tao states the possibilities for H_1 in the very beginning; however, I just want to confirm. What’s the lowest unconditional bound we have on H_1? I know Maynard got it down to 600 but since then, I’ve seen 576, 330, 300, and 272. I know that H_1 is 16 if we assume EH. But under no assumptions, what’s the lowest H_1 we have? Thanks in advance for your response!
20 December, 2013 at 3:58 pm
Pace Nielsen
We are currently at 272 under no assumptions other than my program is computing correctly. (I would place the accuracy very high, because even after working with a completely new basis, I still get the same results.)
21 December, 2013 at 5:10 am
Klaus Lange
If I understand right, under EH it seems like H_1 = 8.
20 December, 2013 at 3:12 pm
Aubrey de Grey
A couple of small observations, relating to the
case but possibly generalisable to
, about the search for the ideal R:
1) When
, the “Pace polytope” (the subset of the unit cube in which
) is strictly smaller than the prism in which
. I think all other polytopes defined by slicing the cube through (1/k-1,1/k-1…) also are.
2) No admissible R can be created by adding material to the prism. This can be seen for any candidate point t = (t_1,..t_k) where
by considering the point u = (u_1,..u_k) where u_k = 1 and the other
are
: the midpoint between t and u does not lie within the union-of-prisms, so t+u fails the sumset constraint.
This seems to suggest, though clearly it does not rigorously prove, that the prism is the optimal R’’ for
(assuming the Selberg sieve). I wonder if this can be rigorised.
20 December, 2013 at 3:24 pm
Eytan Paldi
It seems that there is a typo in the definition of
above.
[Reformatted -T.]
20 December, 2013 at 5:14 pm
Eytan Paldi
This definition is somewhat different from that in the wiki page (with the additional constraint on
to be the minimal one.)
20 December, 2013 at 5:31 pm
Terence Tao
The two are equivalent: requiring that
for all
is equivalent to requiring that
.
20 December, 2013 at 5:52 pm
Eytan Paldi
Thanks! I understand (it is somewhat confusing.)
20 December, 2013 at 6:33 pm
Gergely Harcos
Dear Terry, when you say the
bound will hopefully fall to
soon, do you mean it under Elliott-Halberstam, or unconditionally?
[Um, the former; I’ve added a clarification. -T.]
20 December, 2013 at 8:13 pm
Gergely Harcos
Thanks! Also, there is a typo in the formula for
: the argument of
should be
fractions, not just one.
[Corrected, thanks – T.]
21 December, 2013 at 1:24 am
Anonymous
If you use “\dots” instead of “\ldots”, you will (automatically) get the correct placement of the ellipsis.
21 December, 2013 at 2:29 pm
Aubrey de Grey
In advance of a computed value for M”_4 (let alone higher M”_k), and indeed while there remains decided uncertainty as to whether current ideas can feasibly support calculation of M”_k for double-digit k, I was just looking at the data for M’_k, and I noticed an intriguing detail – the current lower and upper bounds on M’_k do not diverge, being (albeit only slightly) closer for k=5 than for k=4. I’m presuming that combinatorial explosions prevent James’s or Pace’s code from deriving M’_k (even for small d) for k in the range of current interest for H_1 on BV, but, in the recent spirit of examining numerical trends in order to guess key values to investigate, I’m wondering whether extrapolation from single-digit k could still be a cheap route to predicting (and then, of course, hopefully confirming) a k less than 54 for which M’_k exceeds 4.. The starting-point would be to calculate just a few more values of M’_k, for k = 6, 7 etc, and see whether the sequence of shortfalls relative to the corresponding upper bounds fits some simple curve or other tightly enough (and, of course, converges fast enough) to lead to a confident (though approximate) prediction of the threshold k. James, Pace: I’m unclear as to how far you can push your M’_k computations in feasible CPU times. What can you tell us?
21 December, 2013 at 3:48 pm
Eytan Paldi
It seems that in (1), a union with
(which appears in the wiki page) is missing.
[Ah, I had forgotten about that additional region; I’ve modified the argument in the post to accomodate it. -T.]
21 December, 2013 at 5:10 pm
Eytan Paldi
Therefore we can replace the symmetric region
above by its union with
.
21 December, 2013 at 5:57 pm
Terence Tao
Good point! In particular, in the Bombieri-Vinogradov case
, one can take
, so the problem is now to maximise the usual Rayleigh quotient over functions on
whose marginals are supported on
. Note that
has
the volume of
, so in principle we have gained a huge amount of extra room to work with here (though even in the best case in which marginal conditions are ignored,
only improves by a factor of 2).
One nice feature of this problem is that we can in principle attack it without subdividing the polytope, as follows. Observe from the fundamental theorem of calculus that if
is a polynomial on
that vanishes to order k on the slanted boundary
and to first order at the remaining boundary (i.e. it is a multiple of
), then the k^th partial derivative
has vanishing marginals. Thus, we can work with functions F on
of the form
for arbitrary smooth
, and the vanishing marginal property will be automatically satisfied (indeed the marginals for F are just the marginals for
). One can then try to optimise this numerically by selecting
from symmetric polynomials of bounded degree as before; I’m not sure that this covers all possible F, so it may not give the optimal M”_k, but it should at least beat M_k, and might be computable for medium values of k, say k=50, allowing us to improve the current value of k=55 that we have on Bombieri-Vinogradov.
21 December, 2013 at 9:41 pm
Pace Nielsen
I’m unfamiliar with the notation
. Is this just the partial derivative of
with respect to each of the variables
?
Also, do you not need an indicator function on the second summand (in your decomposition of
), so it is only supported on
? [I imagine the answer is that it was implicitly assumed that
is only supported on
, but I just want to double-check.]
[Yes to both -T.]
22 December, 2013 at 4:37 am
Eytan Paldi
I think that this nice parametrization of
with the vanishing marginals property should considerably decrease the computational complexity.
on
, any marginal which vanish outside
should vanish identically (which is more than required.)
It seems however, that for real analytic (in particular, polynomial)
A possible generalization is to dissect the outside of
into a small number of polytopes and try to find similar parametrizations (perhaps with additional small number of linear constraints) for the vanishing marginals on them.
Perhaps it is possible to estimate (by similar upper and lowers bounds) the resulting
(which may improve the current asymptotic bounds for
.)
22 December, 2013 at 4:02 am
Aubrey de Grey
Might it be worth adding a column to the table at http://michaelnielsen.org/polymath1/index.php?title=Selberg_sieve_variational_problem to distinguish the M” bounds for theta = 1 versus theta = 1/2 ? It currently seems a little misleading that k seems to need to reach 20 or so in order for the upper bound on M”_k to reach 4, whereas when theta = 1/2 the threshold k would be only 6. Or can a better upper bound on M”_k,1/2 be given now, since the current one was based on R”_k being the prism?
22 December, 2013 at 9:14 am
Eytan Paldi
Interestingly, condition (2) above does not include the slanted boundary vanishing condition.
22 December, 2013 at 9:18 am
Terence Tao
The function F need not vanish on the slanted boundary, but in order for
to have vanishing marginals, one needs f to vanish to order k at this boundary (but this does not give any vanishing of
at this boundary).
22 December, 2013 at 2:17 pm
Eytan Paldi
It seems that the only difficulty in applying this method is the choice of a basis of symmetric polynomials such that the coefficients of the two matrices needed for the optimization program should be evaluated efficiently (this could be difficult because of the repeated partial derivatives – which for medium-sized k can produce many monomials from a single one.)
22 December, 2013 at 8:20 pm
Pace Nielsen
I’m a little confused. In the decomposition above, take
. Thus
. For simplicity, take
(but any
will do); in this case we have
. Thus,
, which is not merely supported on
, hence does not have the vanishing marginal condition. I’m probably just missing something simple here.
22 December, 2013 at 8:32 pm
Terence Tao
One has to integrate
from 0 to
, rather than from 0 to 2.
22 December, 2013 at 9:09 pm
Pace Nielsen
I just figured out where I went wrong, but you beat me to the punch. Thanks for the answer. (Another way to put it is that one can integrate from 0 to 2, but then the integral has to break into two pieces, due to presence of the indicator function.)
It appears that this new term
is making no contribution at all the the
integrals, so the hope is to somehow use it to make the
integral smaller.
Write
where
is supported on
and
is supported on
. Then
. It looks like the extra area in
is both working for and against us, because it both allows us to try and make
small, but at the same time makes the integral of
larger. I’ll try to do some computations tomorrow if I can.
If anyone has a specific
they want me to look at, just post it and I’ll look tomorrow morning.
23 December, 2013 at 2:49 am
Aubrey de Grey
Since no one else has suggested a specific k, I’ll go out on a limb and suggest 43 and 24, since they give H = exactly 200 and 100 respectively.
22 December, 2013 at 1:40 am
Andrew Sutherland
Some further improvements to bounds on H:
395,154 for m=2 using k=35,146.
24,490,410 for m=3 using k=1,630,680.
485,825,850 for m=4 using k=25,589,558.
The last bound is obtained by taking k consecutive primes starting with 2,504,011.
22 December, 2013 at 9:52 am
Andrew Sutherland
The bound for m=2 can be improved to 395,122. The bound for m=4 can be improved to 473,244,502.
The last bound comes from the (symmetric) Hensley-Richards sieve, which yields the admissible 25,589,558-tuple consisting of the primes in [2623573,236622251], their negations, and +/-1.
22 December, 2013 at 2:49 pm
Wouter Castryck
There appears to be a precision issue with the value of k for m = 4. One can increase the precision by putting e.g. Digits := 50; on top of the maple file (the default value is 10).
For m = 2 and m = 3 I think the values are correct, although it seems that for m = 3 one can improve k to 1,628,944 by letting c := 1.0043/log(k) and T := 0.8008/log(k).
For m = 4 one can obtain k = 75,000,000 by letting c := 0.999/log(k) and T := 0.838/log(k).
For m = 5 one can similarly find k = 3,400,000,000 by taking c := 1.012/log(k) and T := 0.710/log(k).
Note that the ratios

.
all lie close to the asymptotic upper bound
22 December, 2013 at 3:26 pm
Terence Tao
Thanks for catching that! I had noticed that the m=4 values of k were curiously small, but I trusted in Maple without investigating it further.
I’ve moved the enormous table to a subpage of the wiki ( http://michaelnielsen.org/polymath1/index.php?title=Timeline_of_prime_gap_bounds ), leaving only the current “world records” for the main page. For anyone interested in filling in the blanks:
* to get a value of k for a given m on EH, one can run the previous Maple code (with enough digit precision, of course), with the aim of getting M_k (which in the maple code is called M0-Delta) greater than 2m. For instance, if one can get M0-Delta larger than 6, then this will be a good value of k for the m=3 EH problem. (The portion of the code relating to varpi and delta may be deleted in this case.)
* to get a value of k for a given m without EH but with Deligne, run the code as is.
* to get a value of k for a given m without EH or Deligne, replace 1080*varpi/13+ 330*delta/13 in the last line of the code with 168*varpi+48*delta.
22 December, 2013 at 4:57 pm
Andrew Sutherland
For m=3 under EH, it appears that with k=5511, c=0.965 and T=0.973, we get M0-Delta > 6. This then gives the bound 52,130.
22 December, 2013 at 6:49 pm
Andrew Sutherland
For m=3 without EH or Deligne, it appears that k=2114964, c=1.00/log(k), T=0.83/log(k) works.
Hensley-Richards then gives the bound 33,661,442.
(also, the values for c and T given for m=3 with EH above should of course be divided by log(k)).
23 December, 2013 at 2:43 am
Aubrey de Grey
The value for m=2 without EH or Deligne is still the value under just BV, I think (identical to the value for m=4 under EH); presumably it can be (slightly) reduced.
23 December, 2013 at 3:39 am
Andrew Sutherland
Actually, both values can be reduced.
For m=2 without EH or Deligne we can get k=41709 using c=0.9972/log(k) and T=0.84/log(k). This gives the bound 476,322.
For m=4 with EH we can get k=41589 using c=0.9793/log(k) and T=0.94/log(k). This gives the bound 474,600.
23 December, 2013 at 3:46 am
Aubrey de Grey
That’s odd; I thought that any k that works for m=2N under EH should also work for m=N under BV, and a smaller k should always be achievable when we only ask to avoid EH or Deligne (except when N=1 because k is now so small). Is that not so?
23 December, 2013 at 4:27 am
Andrew Sutherland
I was independently optimizing the parameters in the two cases, per Terry’s post above. I certainly don’t claim to have found the optimal values of c, T, k, so possibly there are parameter choices that give the same value of k in both cases, but I don’t think so. This doesn’t contradict what you are saying, because the maple computations are giving upper bounds on k that aren’t necessarily tight.
If this is indeed the case, it suggests that optimizing the EH case for m=2N might give slightly better bounds on k for the BV case with m=N.
23 December, 2013 at 8:51 am
Terence Tao
If one takes values of c,T that work for m=2N under EH, then they will automatically work for m=N under BV; the value of varpi computed will be negative, but that’s OK, BV basically is the assertion that all negative values of varpi are admissible.
23 December, 2013 at 7:11 am
Andrew Sutherland
For k=2,114,964 (m=3 no EH or Deligne) we can get H down to 32,313,942.
23 December, 2013 at 9:08 am
Andrew Sutherland
Of course, this makes sense (I hadn’t actually noticed that varpi was negative), thanks for the clarification.
22 December, 2013 at 7:17 pm
Andrew Sutherland
For m=5 under EH it looks like k=309954, c=0.9821/log(k),T=0.95/log(k) works.
Hensley-Richards then gives the bound 4,316,466.
23 December, 2013 at 3:32 am
Andrew Sutherland
Using a greedy sieve, the m=5 EH bound with k=309954 can be improved to 4,146,936.
Also, the m=3 EH bound with k=5511 can be improved to 52,116.
22 December, 2013 at 8:45 pm
xfxie
For m=4, k can be decreased to 74487363.
For m=3, k can only be decreased 1 to 1628943.
23 December, 2013 at 3:16 am
Andrew Sutherland
For m=3, k=1628943 (Deligne, no EH) we can get 24,462,774.
For m=4, k=74487363 (Deligne, no EH) we can get 1,512,832,958 by taking the first k primes greater than k.
23 December, 2013 at 6:54 am
Andrew Sutherland
There is a slight error in what I wrote above, for m=4, k=74487363 (Deligne, no EH) the bound you get by taking the first k primes greater than k is 1,512,832,950.
23 December, 2013 at 2:59 am
Eytan Paldi
It seems better to place several explanations and links to wiki pages (now appearing above the large table) above the current records in the main page.
[Explanation and links moved – T.]
23 December, 2013 at 3:00 am
Aubrey de Grey
Terry: is there a simple function of varpi and delta that gives rise to the quantities 52/283 and 4/43 in the asymptotic formulae shown in the new table?
23 December, 2013 at 5:27 am
Aubrey de Grey
Ah, of course: 4*varpi / (1/4 + varpi).
23 December, 2013 at 5:51 am
xfxie
For m=4 with EH, k can be 41588.
23 December, 2013 at 6:58 am
Andrew Sutherland
With k=41588 we can get H down to 474,460.
23 December, 2013 at 10:07 am
Andrew Sutherland
With k=41588 we can get H down to 474,372, which is relevant to both the m=4 EH and m=2 BV cases.
24 December, 2013 at 4:24 am
Aubrey de Grey
By the same token, there’s a good chance that the current m=3 value avoiding EH and Deligne can be reduced by exploring m=6 under EH.
24 December, 2013 at 8:09 am
Andrew Sutherland
Yes, I actually took a look at this yesterday. But, in contrast to the m=4 EH and m=2 BV cases, the parameters k=2114964, m=3, c=1.0/log(k), T=0.83/log(k) (which surely can be optimized further) used in the BV case actually don’t give M_k > 12 and I couldn’t easily find a choice of parameters that works for the EH case with m=6 with k smaller than 2114964 (it’s a bit time consuming to check with k this large and the precision set higher, so I admittedly did not look all that hard).
23 December, 2013 at 6:08 am
xfxie
For m=5 with EH, k can be 309661.
23 December, 2013 at 7:02 am
Andrew Sutherland
With k=309661 we can get H down to 4,143,140.
23 December, 2013 at 8:20 am
Andrew Sutherland
For m=4 without EH or Deligne it looks like k=105,754,838 works. Taking the first k primes greater than k gives the bound 2,186,561,568 on H.
23 December, 2013 at 11:34 am
Andrew Sutherland
Just for the sake of filling in the last blank spot in the table, for m=5 without EH or Deligne it looks like we can get k=5,300,000,000.
Taking the first k primes greater than k then gives the bound 131,161,149,090 on H.
22 December, 2013 at 3:45 pm
Andrew Sutherland
Taking the first k primes greater than k gives the bounds 1,523,781,850
82,575,303,678 on H for m=4 and m=5, respectively.
22 December, 2013 at 6:58 pm
Andrew Sutherland
Using Wouter’s improved k=1,628,944 for m=3 (with Deligne, no EH) the bound on H can be improved to 24,462,790.
23 December, 2013 at 8:09 am
James Maynard
I thought I’d give a quick update on the computation for
using the prism
, I get the bound
, which is only a modest improvement on the previous bound
.
It is possible that one needs to take rather higher degree approximations, since the vanishing conditions severely restrict the number of degrees of freedom of small degree polynomials (with degree 5, they reduce about 200 degrees of freedom for the coefficients of polynomials outside of
to only 20, and its much more restrictive for degree 3 or 4). There are currently some RAM issues using higher degree polynomials, however. There’s also a pretty good chance there are typos in my code since the decompositions are in quite a few regions, although the results seem to pass some simple sanity checks.
23 December, 2013 at 8:20 am
Aubrey de Grey
Many thanks James. Is your code easily modifiable to explore the theta = 1/2 case, and in particular the double-size simplex that has just arisen as an option in that setting? I note that it is larger than the prism even when k=3, so the possibilities are most tantalising.
23 December, 2013 at 8:38 am
Aubrey de Grey
To clarify: I realise that when k = 4 we are only interested in theta = 1, but since the double-size simplex so hugely expands R in the BV setting I’m thinking that the “low k” and “medium k” analyses are now effectively merging, such that your and Pace’s computational approaches will increasingly synergise.
23 December, 2013 at 8:57 am
Pace Nielsen
I think one of the differences between our computations is that unless we come up with a better characterization of the disappearing marginal conditions, my decompositions will be restricted to the form Terry came up with above [to avoid having to break the simplex into an enormous number of subcomponents], whereas it appears that James can take advantage of all degrees of freedom. So it still may be worthwhile to see how the numerics behave for small
.
23 December, 2013 at 8:47 am
Pace Nielsen
James, did you try the
case for the simplex
? What happens as the degree increases in the simpler
case? Do you run into RAM issues as quickly?
23 December, 2013 at 11:04 am
James Maynard
I haven’t got round to that yet, but I don’t think there should be the same issues in the
case, since I’d guess it should work more-or-less like the prism.
28 December, 2013 at 4:29 am
Aubrey de Grey
3/2 not 4/3 ! (You’ll almost certainly have caught that, but I guess it can’t hurt to make sure.)
23 December, 2013 at 9:22 am
Aubrey de Grey
Also, James, how does M”_4 vary with degree? The recent analysis of M_k (with medium k) in the BV setting suggests that one might be able to use extrapolation to get a pretty good feel for the magnitude of the benefit for M”_4 from increasing d. Maybe that’s overoptimistic if the furthest you’ve been able to go in this case is d=5, but it can’t hurt to have a look at the d<5 values for a speculative trend.
23 December, 2013 at 11:12 am
James Maynard
I think its difficult to extrapolate much because of the vanishing constraints. If I remember correctly, when the degree was 2 or less then the polynomials outside of
had to just be 0. When the degree was 3 there was only one degree of freedom, and it was only when the degree was 4 that one was really able to overcome the constraints and have enough flexibility to really get an improvement. (By comparison, the polynomial approximations on the region
seem to converge quickly, and one can guess quite easily what we might expect here)
23 December, 2013 at 9:59 am
Terence Tao
That’s slightly disappointing, but at least it is still progress. (Though the k=4 problem is beginning to feel like a real-life Zeno’s paradox, in that we keep halving the distance to the goal without actually reaching it…)
I was trying to see how much this extra room (especially in the BV regime
) gives us in the asymptotic analysis for large m, but thus far the gains I could get in
were exponentially small in k, which is perhaps consistent with the degradation we are experiencing as we pass from k=3 to k=4.
Actually this reminds me of your earlier idea to try to compute
as an additional term that can essentially be added on top of the M”_k quantity; roughly speaking, we win (in the m=1 case) when
Previously, I had thought this trick was not useful in the EH setting, but now that we have this additional room beyond
to play with, and it’s not proving to be super-useful in directly improving M_k, one could perhaps “borrow” some of this room to instead get some nontrivial lower bound on
. This is another quantity that is likely to decay exponentially in k, but it could possibly be enough, combined with everything else, to push us over the top. (But there is a tradeoff; in order to get a nontrivial lower bound for this quantity, one has to retract the support of F to be a bit away from the slanted boundaries of R.)
I’m not sure how to best compute this quantity though, and the optimisation could be tricky (it won’t be a simple quadratic program any more). I guess we still have some simpler things to try right now (switching to a different polytope, or increasing the degree), so maybe we can wait until the other ideas are exhausted before taking a harder look at this one.
23 December, 2013 at 10:10 am
Terence Tao
A variant of the above idea is to modify the Selberg sieve
combinatorially to remove small prime factors, e.g. by multiplying it with a Brun sieve, say
where
is a small power of x. But then one has to keep the support of F at distance
or so away from the slanted boundary of R, so it's not clear that this is a win. (But combinatorial sieves can perform slightly better than Selberg sieves when one is sifting very small primes, so there is some hope here.)
23 December, 2013 at 11:06 am
Eytan Paldi
The two choices for
above should be replaced by their corresponding unions with
.
23 December, 2013 at 11:37 am
Terence Tao
Unfortunately it’s a bit more complicated than this, due to non-convexity: the condition is that
but one cannot simply take R to be the union of, say, the prism
and the simplex
, because of the cross terms r+r’ where r lies in the prism and r’ lies in the simplex.
23 December, 2013 at 12:14 pm
Aubrey de Grey
Right. Generalising my earlier comment about the impossibility to add to the prism when theta = 1 (which is still appearing at the very bottom of the page, presumably as a side-effect of Terry’s kind repair of my broken LaTeX): the only way that a non-convex polytope can work is if the half-size version of any convex subregion of it, when translated half-way towards any point in it that is not in that subregion, always lies within the half-size version of the polytope Terry describes for R+R. That’s a very onerous criterion, which I’m not sure can ever hold for an R that is not itself contained within an admissible convex R.
23 December, 2013 at 12:13 pm
Terence Tao
I’m recording a comment which is not directly helpful for the immediate problems at hand, but might be useful later:
For the non-EH cases, in which
is close to 1/2, it may be possible to enlarge the region that R+R can range in a little bit more (although it is beginning to appear that such enlargements do not offer a huge amount of gain in the large k case). There are at least two ways to squeeze a bit of further improvement. Firstly, an inspection of the argument in the blog post shows that for the purposes of determining this region, we do not need the EH hypothesis
for arbitrary Dirichlet convolutions
, but rather for convolutions of the form
. Such sums are better behaved than the “Type I/II/III” sums considered in Polymath8a, and it is likely that we can improve the numerology a bit here, which would correspond to adding some additional space to the R+R region (but unfortunately does not improve the support condition on the marginals, as this requires EH for the von Mangoldt function).
Another possible gain comes from trying to get non-trivial estimates on expressions such as
where each
lives in a range
, and EMT is the expected main term (basically
). Currently, we can control this expression either when
(so that
and the inner sum is then easy to compute) or when
(in which case we can apply
to the
divisor sum). There may however be a way to control this sum more directly, by completion of sums and then somehow eliminating the
sieve weights, either by Cauchy-Schwarz or by using the special structure of these weights (they behave very roughly like smoothly truncated Mobius functions).
Again, this only enlarges the region that R+R is allowed to lie in,and does not improve the condition on the marginals of F, so any gain here is likely to be very modest (and only likely to be useful in the non-EH setting; it's hard for unconditional arguments such as completion of sums to compete with the consequences of the powerful EH conjecture.)
24 December, 2013 at 7:02 am
hcl891017
Reblogged this on hcl891017's Blog.
24 December, 2013 at 8:00 am
Andrew Sutherland
Some further improvements to bounds on H:
Under EH:
For m=4 with k=41,588: 474,320.
For m=5 with k=309,661: 4,137,872.
Without EH:
For m=3 with k=1,628,943: 24,462,654.
For m=4 with k=74,487,363: 1,497,901,734 is obtained by taking k consecutive primes beginning with 6,663,067 and ending with 1,504,564,801.
Without EH or Deligne:
For m=2 with k=41,588: 474,320 (per EH, m=4 bound above)
For m=3 with k=2,114,964: 32,313,878.
28 December, 2013 at 6:48 pm
Andrew Sutherland
A few more improvements under EH:
For m=4 with k=41,588: 474,296.
For m=5 with k=309,661: 4,137,854.
2 January, 2014 at 8:45 am
Andrew Sutherland
For k=41,588 we can bound H by 474,290. This improves both the m=4 with EH and m=2 without EH or Deligne bounds.
9 January, 2014 at 10:06 am
Andrew Sutherland
Further improvement: 474,266.
24 December, 2013 at 9:41 am
Eytan Paldi
In the current records table, is it possible to estimate the orders of the two
terms in the exponents? (i.e. their rates of decay for large m.)
24 December, 2013 at 2:25 pm
Terence Tao
I did some calculations and it looks like that the o(1) terms are in fact
, so for instance we have
using Deligne. The point is that the lower bound
is applicable with T comparable to
, which means that we can take
comparable to
also. Using our best MPZ estimate, this allows us to take
as large as
. We win when
; putting this all together, it appears that k can be taken to be of size
, which on taking the first k primes past k gives
.
24 December, 2013 at 3:30 pm
Eytan Paldi
That’s good! (I asked that because for the EH case there was no
term.)
26 December, 2013 at 3:20 am
Aubrey de Grey
Terry, does this mean there should be refinements to your Maple code that Drew and xfxie are using to improve the world records? Since the expressions with and without EH are now equivalent (setting varpi = 1/4 in the EH case), I’m wondering whether the anomaly of sometimes obtaining lower k for m=2N under EH than for m=N without EH or Deligne will go away (or if not, why not).
28 December, 2013 at 6:52 am
Eytan Paldi
If for some
,
(for the region
), then the current exponents in the asymptotic bounds for
can be improved.
29 December, 2013 at 9:42 am
Eytan Paldi
The definition of
under the the title “current records” in the main page should be for
primes (not “
consecutive primes”.)
for consecutive primes may also be studied here.)
(perhaps the modified
29 December, 2013 at 9:51 am
Eytan Paldi
I see now my error: according to the current definition of
, it is indeed for consecutive primes!
30 December, 2013 at 6:42 am
Anonymous
In the last bullet at the beginning of the post: Maybe
should be changed to
.
30 December, 2013 at 9:35 am
Eytan Paldi
A simple constraints formulation for
:
Let
. Let
on
and
on
(for polynomials
.)
Observe that if
is outside
then
is outside
.
Therefore, the marginals vanishing constraints are (for
)
(1)
Note that the
constraints in (1) are only for the polynomial Q.
Remarks:
1. Under the restriction of
to be symmetric, only one constraint in (1) (e.g. for
) is required.
2. In general (as seen by the above remark) the resulting constraints may be dependent.
3. Let
be a polynomial satisfying 
in (1) is
than the constraint for
Which is equivalent to
i.e. there is a polynomial
such that
Now, by taking the derivative with respect to
we get the representation
(2)
With obvious generalization for the other values of
(note that the representing polynomial
is dependent on
!)
30 December, 2013 at 9:42 am
Aubrey de Grey
Very interesting! But please let’s not use “H” for this purpose, given its other use here for the past many months… and the same goes for “m”…
30 December, 2013 at 3:24 pm
Eytan Paldi
The third remark above is for
on the region
(on which
) – i.e. each “
” appearing in that remark should be understood as the polynomial
.
approximation) – enabling approximation of
for medium-sized
(e.g.
) for which the original optimization program for
is practical.
Note also that the computational complexity of the constrained problem (using the above formulation) should have the same order of magnitude as for the original unconstrained one (for
31 December, 2013 at 8:27 am
Terence Tao
It occurs to me that in order to get some understanding of how much additional gain one gets by enlarging the simplex
to a larger region such as
(while retaining the vanishing marginal conditions of the original simplex), one can work in the more tractable situation of cubes, in particular one can study how the analogue of
improves when one replaces a cube
with a larger cube
, while requiring that the function have marginals supported on
. As a crude zeroth approximation (ignoring log factors),
is a bit like
(although this approximation is too crude – the log factor is critical in making
grow in k!).
First observe that the analogue of the original
, namely the supremum of the ratio
for
supported on
, is equal to
, as can be seen by a routine Cauchy-Schwarz argument (and the extremiser occurs when F is the indicator function of
).
The analogue of
is the supremum of the ratio
for
supported on
, but having marginals supported on
. Restricting to symmetric functions, we are now trying to maximise
subject to
being supported in
.
Another application of Cauchy-Schwarz shows that
cannot exceed
, and it is certainly at least
, but the key question is which of these two bounds is closer to the truth.
We can convert this to a variational problem on
as follows. Fix a symmetric function
supported on
, and consider the problem of maximising (*) subject to F being symmetric and having marginals G. This problem can be solved by Fourier analysis on the cube
, which we identify (discontinuously) with the torus
. The marginal conditions (and symmetry) constrain all the Fourier coefficients
of
that have at least one zero frequency
; the remaining Fourier coefficients of F (when all the
are non-zero) serve to increase the denominator of (*) (by Plancherel’s theorem) but do not affect the numerator. Thus, the extremum occurs when those coefficients are set to zero. After a bit of fiddling, the extremum of (*) for a given choice of marginals G is then
where
is the orthogonal projection of
to those functions on
whose Fourier coefficients are only non-vanishing for those frequencies
with
non-zero and
zero. Thus for instance
At this point I’ll resort to back of the envelope calculations. In one dimension, if we set
, then
and
. In this case,
absorbs a
fraction of the total energy
, with
absorbing the other
. Tensoring this and using the law of large numbers, we expect in high dimension that the
with
will be absorbing the bulk of the energy of G. For such i, we have
. So the second ratio in (**) is approximately
, suggesting that
is unfortunately close to
. (If one considers the tensor power of other functions on
than the indicator function, the situation gets worse.)
This suggests that the gain from the additional room is going to be modest. I’ll try and compute (**) more precisely in the case
to get a better idea of what the gain will be (my initial guess is that it is exponentially small in k, but I have not yet done a real calculation here yet).
31 December, 2013 at 9:16 am
Terence Tao
I’ve now done the calculation and the gain is indeed exponentially small in this case :(. We have
, so by the binomial theorem
and so it appears that
with extremiser
which is basically shaving off an exponentially small piece from
. (I’ve not yet fully verified that this is indeed the extremiser, but I’m reasonably sure of this.)
The situation appears to be very similar to that of James’ old suggestion of trying to lower bound the portion of the sieve where all of the
are simultaneously composite. We expect this contribution to be exponentially small. On the other hand, any modification of the sieve that does not affect the marginals is only going to change the sieve on precisely those
for which none of the
are prime.
There is still a small ray of hope though that in the simplex case things are a little better. One thing about the cube is that every symmetric function on
can be written as the marginal of a symmetric function on
(this can be seen by Fourier series), however it is not clear to me yet that every symmetric function on
can be written as the marginal of a symmetric function on
. If this is not the case, then there could be additional gain coming from enlarging the
simplex that is not present in the cube case. (But the pessimist in me thinks that this additional gain will also be exponentially small.)
1 January, 2014 at 6:44 am
John Nicholson
Is
equivilent to
The reason for the question is the Ramanujan prime corollary found here:
https://en.wikipedia.org/wiki/Ramanujan_prime#Ramanujan_prime_corollary
It is similar to what is stated above, but brings the n values lower.
2 January, 2014 at 7:47 pm
Terence Tao
I managed to work out the gain from James’ idea of subtracting off the contribution of when all the
are simultaneously composite. I unfortunately don’t know how to get any nontrivial bound on this in the EH case (unless one pulls the support of F away from the boundary of the simplex, but then one will almost certainly lose more from the reduced support than one gains from the simultaneously composite term). In the BV case it appears one can shave about
from
, which unfortunately is too tiny to be of much use (unless we have the exceptional coincidence that
is really, really, really close to 4).
The cleanest way to subtract appears to be to replace the sieve
(which, in the BV case, is a divisor sum with moduli going up to about
) by the modified sieve
Note that the big sum here is bounded by
, which is basically 1 by the fundamental theorem of arithmetic, so the sieve remains positive (modulo negligible errors). The cap
is needed to keep the total divisor less than x, otherwise one can’t estimate the sum
accurately. The additional factor does not impact the sums
, but reduces
slightly, by approximately
which by Mertens theorem is roughly
. (Actually the presence of the sieve
distorts this calculation a bit, but not significantly.
3 January, 2014 at 2:20 pm
Terence Tao
Here is an idea that seems to give some flexibility in going beyond the simplex
that the marginals of F are currently restricted to, at the cost of degrading the lower bounds that end up in the numerator of the M_k-type optimisation problem.
For sake of concreteness let us take k=3 and assume EH. We would like to lower bound quantities such as
where
is the GPY Selberg sieve
for some smooth function
, with
in the EH case; we usually write things in terms of the mixed derivative
of
, for instance
is proportional to
(i.e.
) if F is supported in a suitable polytope (e.g.
will do, we can also work with some larger regions R as discussed in the past).
For (*), the cutoff
ensures that the
factor in the definition of
must be 1, and the expression simplifies a bit to
where
is the (signed) divisor sum
If
is supported on the simplex
(which, in terms of F, means that the marginal
vanishes for
), then the divisor sum in
only involves moduli up to
, and so on EH we can compute an asymptotic for (*) which is proportional to the familiar expression
Now suppose instead that
is supported on a slightly larger simplex
for some
. Then
now involves moduli larger than
, and even EH can no longer help us get asymptotics for (*). However, we don’t truly need asymptotics here; a lower bound might suffice. For instance, we have the trivial lower bound of 0 no matter what the support of
is. Less trivially, we split
, where
and
is chosen to agree with
for
but is otherwise arbitrary. The point is then that
only involves moduli up to
(the moduli larger than
appear in both
and
, but they cancel each other out) and so EH can again be used to get an asymptotic for this part, which is proportional to
. As for the
component, we apply the trivial lower bound of 0. The net lower bound then is proportional to
where
is the mixed derivative of
.
to vanish for
and agree with
otherwise (well actually we need to smooth this function infinitesimally, but never mind this technicality), so we get a final lower bound of
We can choose
The upshot is that we can now allow the marginal
to be supported on the larger region
, at the cost of replacing the functional
appearing in the numerator of the
variational problem with the slightly smaller quantity
. Similarly for the other two functionals
and
. This leads to the more general variational problem of maximising
subject to the constraints that the marginals
,
,
are supported on
,
,
respectively, and
are parameters that can be chosen arbitrarily, while F also needs to be constrained to a polytope R with
(e.g. one can take the prism
or the truncated simplex
). The case
is our existing M”_3 optimisation problem, so this new problem can’t do worse than what we already have. The question is whether it actually does better, in that the increased flexibility given by relaxing the vanishing marginal conditions “beats” the loss in the numerator; I plan to do a perturbative analysis of the case when the
are infinitesimal to see if this is the case. One reason for hope is that (in the k=3 case) if one sets
all the way to 1 while keeping
at zero (and taking R to be something like
for some small
), then one collapses to the M”_2 problem, for which we already have a bound of 2 which is better than the current bound on M”_3 we have.
So these epsilon parameters help explain the puzzling lack of monotonicity of the M”_k that was pointed out some time ago, I think by Pace. Indeed the way I discovered this more flexible lower bound was by trying to reconcile the k=2 theory (in which we have the extensive analysis of Bombieri on EH, for instance) with the k=3 theory by trying to smoothly interpolate between the two.
3 January, 2014 at 6:08 pm
Eytan Paldi
Is it still possible to get
for some enlarged region
?
3 January, 2014 at 6:41 pm
Eytan Paldi
I meant that arbitrarily small improvement of the current result
by arbitrarily small perturbation of
(if possible) is sufficient to show that EH implies
(i.e. the twin primes conjecture!)
3 January, 2014 at 7:11 pm
Terence Tao
Unfortunately, the parity problem blocks this: given any sieve
coming from this sort of framework (as a divisor sum), one expects
and
to be small (this is because of the Mobius randomness heuristic, that predicts the Mobius function to fluctuate as if it were random on short progressions), which implies that neither of
and
can be much larger than
, which basically means that none of the M”_2 type quantities can be larger than 2.
The analysis of Bombieri on the twin prime problem suggests that the parity problem is the only obstruction to the twin prime conjecture assuming EH; roughly speaking, Bombieri shows that if EH is true and twin prime is false, then for “almost all” n, the Mobius functions
and
have opposite sign (so if n has an odd number of prime factors, then n+2 almost always has an even number of prime factors). Similarly, if
, then for “almost all” n,
and
have opposite sign, and for “almost all” n,
have opposite sign. Since it is not possible for
to all have opposite sign, this looks like a proof that
. Unfortunately, the three notions of “almost all” here are quite different; roughly speaking, we only get
opposite sign for most n with
almost prime (but
arbitrary), and similarly for other permutations of
. I played around for quite some time today to try to salvage something from this approach, but without much success unfortunately.
4 January, 2014 at 11:00 am
Terence Tao
I have tried a perturbative analysis of the effect of the epsilon parameters in my previous comment, but the results are frustratingly inconclusive: it seems that this additional freedom _can_ improve M”_k if the extremisers to that problem obey a rather random-looking inequality (roughly speaking, that the directional derivative of M”_k in epsilon is positive), but that this inequality need not hold in all cases. (For instance, perturbing off of the M”_2 extremiser in the k=3 setting appears to make the constant dip below 2, thus preventing a cheap proof of H_1 <= 6. On the other hand, since M”_3 is currently less than 2 when all epsilons are set to zero, and we can reach 2 by sending one of the epsilons to 1, the mean value theorem tells us that there must be at least some situation in which increasing epsilon helps.)
It may well be that the best way to see whether this flexibility has promise is to do some numerics, e,g. to recompute M'_3 but with the region
enlarged to
for some small
(e.g. 0.1), but with the
functionals replaced with
. (Ideally we should perturb off of M''_k instead of M'_k, but this might be harder to code because we have to redo all the polytope partitioning.)
4 January, 2014 at 5:56 pm
Aubrey de Grey
Might it be preferable to do such numerics for the k=4 case rather than k=3? I’m thinking that not only is k=4 the case of most current interest in the EH setting, but also that unless the non-monotonicity of M is particularly sadistic the result will give a more reliable indication of what would be worth trying for two-digit k, and these considerations may outweigh the drawback of the greater complexity than the k=3 case.
3 January, 2014 at 4:31 pm
Terence Tao
I’ve been notified that the polymath wiki is currently experiencing some issues that make it either load very slowly, or not at all. They should be fixed soon, however.
4 January, 2014 at 6:58 pm
Eytan Paldi
A necessary condition for solutions of the constrained optimization problem:
Let
for
be measurable subsets of
and let
be a closed subset of
.
Consider the (somewhat generalized) problem
Where
Subject to
(1)
(2)
is supported on
(for
)
(3)
is supported on
.
Remark: w.l.o.g. we may assume that
for
where
denotes the projection of
on the hyperplane
.
The functional
can be represented as
(4)
, where
(4′)
Observe that the constraint (2) is equivalent to
(2′)
for each
vanishing on
(for
).
Motivated by this equivalence, denote (for
) by
the subspace of functions
in
and vanishing on
.
Therefore (by a standard calculus of variations argument), under the constraints (1), (2) (i.e. (1), (2′)) any variation
of
should be orthogonal to
and to the functions in the subspaces
. Therefore if
is a solution to the optimization problem, it follows (from (4)) that
is orthogonal to any such variation, implying
Theorem: If
is a solution to the optimization problem, then
Moreover,
is the optimal value of the objective function.
Remarks:
1. This theorem shows that an optimal solution may be discontinuous on the boundaries of
and
– with some implications to polytopes partition for piecewise polynomial approximation (the approximation usually converges slowly with the polynomials degrees if the solution is not smooth inside some polytope.)
2. It seems possible to find the optimal solution also for the region
.
5 January, 2014 at 11:53 am
Eytan Paldi
Is it possible to apply the idea of enlarging the sieve support also to the original one dimensional sieve (e.g. by using
of type
over the enlarge simplex
)?
5 January, 2014 at 2:32 pm
Terence Tao
Yes, I believe so, and this is a good model problem to work on since one can in principle solve the optimisation problems explicitly in this case.
The one-dimensional analogue of M_k is the supremum of
as f ranges over smooth functions
that are compactly supported in [0,1] (and not identically zero). This supremum was calculated to be
, attained when
(Strictly speaking one should smooth this out at t=1, but never mind this technicality.) The modified optimisation problem
in one dimension is the supremum of
as f ranges over smooth functions compactly supported in
. At present, I do not know what this supremum is, but it looks like it can be computed explicitly, presumably in terms of Bessel functions. It does not seem obvious to me which way things will go: on the one hand we have more functions to optimise over (being supported on
instead of
), but on the other hand the numerator has been shaved off by an epsilon.
Incidentally, the one-dimensional lower bounds for M_k are surprisingly competitive for small k. For instance, for k=2, the Bessel bound is 1.383, compared with the true optimum of 1.386; similarly for k=3 it is 1.635 versus 1.646, and for k=4 it is 1.820 versus 1.845.
6 January, 2014 at 12:58 am
Eytan Paldi
The modified problem for
can be solved as follows:
1. Let
, subject to
where
, ![b(t) := t^{k-2} 1_{[0, 1-\epsilon]}(t)](https://s0.wp.com/latex.php?latex=b%28t%29+%3A%3D+t%5E%7Bk-2%7D+1_%7B%5B0%2C+1-%5Cepsilon%5D%7D%28t%29&bg=ffffff&fg=545454&s=0&c=20201002)
ranges over the same smooth functions as above.
and
Clearly,
.
2. In order to find the above infimum
, we assume that it is attained by some
which satisfies the Euler-Lagrange equation
(1)
on
.
and the regularity conditions
(1′)
has zero limits at the endpoints of
.
It is easy to verify (using integration by parts) that
in the equation is the same
which was defined as an infimum.
In order to show that the infimum
is attained by certain solution
of (1), we use the identity
(2)
Which follows (via integration by parts) for any smooth
and any non-vanishing solution
of (1) satisfying the regularity conditions
(2′)
has zero limits at the endpoints of
.
3. The equation (1) has on
the (familiar) solution
While on the remaining interval
we have
The constants
are determined such that the solution
and its derivative
are continuous at
.
This gives
as explicit linear combinations of
at
. It is easy to verify (e.g. by Taylor series) that
This gives the solution
on the whole interval in terms of
as the only parameter. It remains only to use this expression for the remaining constraint
to get the final result
Theorem:
, where x is the minimal positive zero of the function
Finally
is determined from
(as described at the end of step 1).
Remark: Now it is possible to optimize
.
6 January, 2014 at 9:30 am
Terence Tao
A preliminary perturbative analysis looks encouraging: for
small, we expect
, where
is the first zero of
and
is bounded. We Taylor expand
to arrive at
Standard Bessel identities tell us that
, so that
and hence
and so
indicating that there is an improvement to be had (for one-dimensional sieving at least) by taking
to be positive! Of course it would be good to back this up with numerical data.
Incidentally I have a somewhat more geometric way to explain why we have the “graceful failing” of the Selberg sieve in that when one enlarges the sieve cutoff support to the simplex of sidelength
, one still gets a lower bound restricted to the smaller simplex of sidelength
(as opposed to getting no nontrivial lower bound whatsoever). As I remarked previously in https://terrytao.wordpress.com/2013/11/22/polymath8b-ii-optimising-the-variational-problem-and-the-sieve/#comment-253205 , the Selberg sieve is the square of a linear combination of products
of Ramanujan sums. Distribution hypotheses such as Elliott-Halberstam tell us, roughly speaking, that two such products
,
behave orthogonally (on average) if
lies below a given threshold (or, if one is weighting against (say)
, if
lies below a given threshold). A model problem is then to lower bound the norm of a sum
of vectors in an abstract Hilbert space, with the property that distinct
and
are orthogonal if
for some threshold
. (One should think of
as corresponding to a linear combination of the
in which
(or
) is of a certain logarithmic size proportional to
, e.g.
.)
If one is only summing values of j up to
, then all the
are orthogonal, and by Pythagoras we have
which corresponds to the standard Selberg sieve calculations. But if we sum up to
, there are some inner products between
for which we do not have good control. However, all the
with
are orthogonal to those
with
, so we have
which corresponds to the gracefully degrading bound we have when we push beyond the classical Selberg sieve support region. In fact this bound is sharp if we happen to have the odd symmetry
for
.
6 January, 2014 at 10:44 am
Anonymous
Typographical comment: If \dots is used instead of \ldots, the correct formatting of the ellipsis is automatically obtained. It shouldn’t always be lowered dots.
6 January, 2014 at 11:54 am
Eytan Paldi
It seems that a second order expansion of
should give an initial approximation for the optimal value of
as well as an estimate of the gain in
. (this may also be used to derive a Newton-type iteration for the optimal
.)
Another possibility is to apply the enlarged simplex

. It seems that since in this case
is symmetric, only one constraint is needed.
with the vanishing marginals constraints to the one dimensional sieve and see if these constraints can be satisfied by a one dimensional sieve of the form
6 January, 2014 at 5:44 pm
Eytan Paldi
There are two typos in the perturbation analysis (with the same conclusions):
should be
. (i.e.
should be less than
)
1. In the line above the standard Bessel identities, the sign of the second term should be “+”.
2. The main term of
[Corrected, thanks – T.]
6 January, 2014 at 6:36 pm
Terence Tao
Regarding second order expansion: this can in principle be done, but is a little messy; it may be faster simply to work things out numerically for small k. (For large k, the one-dimensional version of M_k never exceeds 4; after optimising in epsilon, it is potentially possible that one could breach 4 (giving yet another proof of Maynard’s theorem!), but it is unlikely to be competitive with the multidimensional sieving.)
Incidentally, there is an annoying technicality when trying to apply the one-dimensional improvements to the EH problems. Basically, the one-dimensional functions f supported on
are only admissible when
. Thus, strictly speaking, in the EH case
we don’t get to increase epsilon past 0 at all. But the problem is only at the “corners” of the simplex
; if one truncates away the portion of this simplex that pokes outside of
then the cutoff is admissible again, and a back-of-the-envelope calculation indicates that this only costs
to the M_k quantity. So the perturbative analysis still suggests a gain in this case, though actually calculating it would be messy.
Unfortunately, if one insists on the one-dimensional ansatz
then there is no gain by enlarging the region
and imposing additional vanishing marginal conditions; these marginal conditions basically collapse to requiring that the antiderivative of G vanishes outside of the original interval [0,1] (or
, if one is playing the epsilon game) and so we don’t actually get any new functions this way. One can play with more complicated ansatze to deal with this, but this gives up the key advantage the one-dimensional analysis had, namely that it was simple enough to solve exactly, and we may as well return to the numerical analysis of the multidimensional problem at that point.
6 January, 2014 at 8:00 pm
Eytan Paldi
I think that in the EH case we still have
for the enlarged simplex (so we can increase
up to
.)
up to
.
In general, we can increase
7 January, 2014 at 9:18 am
Terence Tao
In principle yes, but there is a truncation problem: the function F cannot range in the entire simplex
, but only in the truncated simplex
. For instance when k=3 on EH one can be supported in
, as identified by Pace. So a function of the form
doesn’t directly work if G is supported on
. One can try working with truncated functions
, with G supported on
for some
, and the marginals of F being integrated up to
(and no vanishing marginal conditions required), but the variational problem becomes a bit trickier. Let’s see: with the setup given, the denominator
is
(viewing the support of F as the simplex
with three copies of
removed), and a numerator
being equal to
writing
as the antiderivative of G, the problem is now to maximise
This looks too messy to optimise exactly, but perhaps a numerical optimisation is still feasible (but in that case we may as well work with arbitrary polynomials on this polytope, rather than those coming from the one-dimensional ansatz.
7 January, 2014 at 9:41 pm
Eytan Paldi
On the evaluation of
:
The theorem in my comment above (representation of x as a minimal positive zero for a certain function) holds only for
and should be slightly modified for the case
because for this case the solution of the corresponding Euler-Lagrange equation on
is given by
for some constants
. Using the same solution method as before we get almost the same theorem as before except that now x is the minimal positive zero of the modified function
7 January, 2014 at 9:19 am
Pace Nielsen
As in Eytan’s post, letting
where
with
the minimal positive zero of the function
, numerical calculations show that when
the maximum is achieved at:
7 January, 2014 at 9:31 am
Pace Nielsen
And if I’m doing this correctly, we also get for
and
that
. So, assuming this can be verified, we just passed the $k=5$ barrier under EH.
7 January, 2014 at 10:19 am
Terence Tao
That’s very promising! Unfortunately, there is a technical speedbump right now: Eytan’s analysis is only directly applicable when
(because the cutoff F cannot range freely in
, but must also stay inside
). So strictly speaking we don’t get anything new under EH (in which
) yet. However it should be possible to develop a truncated version of this analysis. The situation is getting a bit complicated, I will try to lay out where we’re at exactly in the next blog post (which we’re about due for anyway), though it may have to wait until tomorrow, as I have some other things to attend to today.
7 January, 2014 at 10:56 am
Terence Tao
In a bit more detail:
given any function
of compact support, define the functionals
and
Define
to be the supremum of the ratio
as F ranges over non-zero functions supported on the polytope
. Note one can restrict to symmetric F, in which case the numerator may be simplified to
.
Then we can prove that
implies
whenever we can find
such that
. For instance on EH we have
whenever
.
In the range
,
collapses to
, defined as above but with F allowed to range over the entire simplex
. Otherwise,
could be smaller than
, but the key question is how much smaller (in particular, can we still get
larger than 2 for
close to 0.18).
Define
be the same quantity as
, but with the function F restricted to have the one-dimensional form
for some G supported on
. This quantity
, which is a lower bound for
, was computed by Eytan as
with a moderately complicated formula for
.
In a previous comment https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-262154 I worked out the variational problem for
, where
is the same as
but with F restricted to be of the form
. It may well be that
is already greater than 2 for some
, in which case we win.
Perhaps one could modify the previous code for computing M’_k for small values of k to now numerically compute lower bounds on
or
by testing various symmetric polynomials on the polytope
. (As a warmup and a checksum, one could check that for
one recovers the numbers obtained by Eytan’s exact analysis.)
7 January, 2014 at 1:52 pm
Terence Tao
p.s. it looks like one can also enlarge the domain of F, the same way that it was enlarged to get from M_k to M’_k or M”_k. More precisely, rather than F being supported in the region
, one can have F supported in a larger polytope R, so long as R+R is contained in
and the marginals of F are supported in
. For instance, one could have F supported in
which gives a slight enlargement of the region on which
is optimised over (similar to how the polytope
gives a slightly larger value M’_k to the variational problem than the value M_k associated to the simplex
). So there is a lot of opportunity here to bump up the ratio to be above 2 in the k=4 case!
6 January, 2014 at 4:59 pm
Pace Nielsen
Computations now show that one can take
which gives bounded gaps of size
. In particular, searching over the space of symmetric polynomials of the form
where the total degree is
and
is a signature which involves only even entries with total degree
(and dropping the last three such signatures, due to RAM issues) yields
There were a few things that made it difficult for me to do this computation. First, I went on vacation and when I returned I found out that the computer with 64GB of RAM had gone to sleep. Also, I found that I ran into another RAM issue when trying to compute the inverses of
or larger matrices (and a similar problem when finding the largest eigenvector of similarly sized matrices). Fortunately, after quite a few hits and misses I obtained the stated result.
It seems very suprising to me that restricting to symmetric polynomials of the given form works so well. Other choices for the basis don’t give very good results.
For
, the following are the bounds I get using all symmetric polynomials of a given degree
.
For
, 
, 
, 
, 
, 
, 
, 
, 
, 
, 
, 
, 
, 
For
For
For
For
For
For
For
For
For
For
For
For
It would be an easy computation to push this up to
. It appears that, at present, the convergence rate is somewhere in the interval
(just as it was for
). If the numbers continue at the rate
then it doesn’t appear that we ever get
, but if the rate is
then we would need about
to finally get
.
6 January, 2014 at 5:53 pm
Aubrey de Grey
Thank you for your perseverance, Pace! Let us hope that this (or a similarly efficient) choice of signatures is similarly effective for M’ and especially M”.
6 January, 2014 at 6:11 pm
Eytan Paldi
This confirms our earlier expectations about
(it was expected that
should be within 0.001 from 4.)
7 January, 2014 at 5:16 am
Eytan Paldi
7 January, 2014 at 6:27 am
Eytan Paldi
It should be remarked that the above is not rigorous because
1. It is based on MPZ[
] for which the current estimates are only for the asymptotic case of
.
2. It is based on a tentative value of
.
7 January, 2014 at 9:37 am
Terence Tao
Yes, we still have to figure out how to effectively apply
to these low-k situations. The main problem is that
is no longer negligible in this regime – even with our most optimistic MPZ estimate, we cannot get
larger than
, and so the quantity
cannot be taken any larger than
. Now, a naive truncation of the simplex
to allow us to use MPZ would leave us with
. For large k, this is exponentially negligible, but with k being about 50, it’s somewhat noticeable; the fraction of volume of
that is lost when passing to
is roughly
, which for the best-case value of
, is about 0.01, so in principle this changes M_k by about 1% which corresponds to worsening k by one or two here.
In principle we can use the more advanced truncation used in Polymath8a, but it will be harder to estimate the errors; in 8a we had the advantage of some convexity properties of the Bessel function which allowed one to get civilised bounds on the errors, but from the numerical polynomials we have here, it is unlikely that we will be able to detect any analogous convexity.
An alternative approach would be to try to redo the Type I/II/III estimates with the aim of getting delta as large as possible (rather than varpi). We can take
for free with the Selberg sieve construction; our Type III estimates can cope with such large values of delta (taking sigma=1/10), but not the Type I or Type II estimates, with the Type I being the bottleneck.
I guess what one can do is to start with the extremisers for the untruncated polytope that Pace located for a test value of k, say k=50, and try to compute what happens to M_k when one applies naive truncation for a given value of delta; i’ll try to work out some relevant formulae at some point (but today is busy for other reasons for me…).
7 January, 2014 at 1:59 pm
Aubrey de Grey
Inspired by the accuracy of my earlier prediction that M_54 would hit 4 when d=22, I’ve looked a little harder and found a way to quantify more precisely the trajectory of M_k with increasing d, which takes account of the fact that the convergence rate C_k,d = (M_k,d – M_k,d-1)/(M_k,d-1 – M_k,d-2) creeps up with d, and also of the fluctuation we have noted, i.e. that C_k,d is slightly larger than the trend when d is odd and smaller than the trend when d is even. Using the numbers that Pace just provided for M_53,d for d up to 13, I looked at the trend of C_k,d for even d and for odd d separately, and it seems that (C_k,d+2 – C_k,d) is very close to half of (C_k,d – C_k,d-2) when d exceeds 8; this also holds for the k=54 numbers that Pace provided before Christmas. I conclude that C_53,d converges to about 0.832, and that M_53,d is expected to surpass 4 at d=31. This may be of lower priority to explore than looking at M’ and M”, but the validity of the technique may be worth checking (in due course) in those settings too so that one can quickly home in on the right range of d to get the optimal k.
7 January, 2014 at 2:47 pm
Aubrey de Grey
In order to explore whether the above extrapolation is likely to hold up, let me give the values it predicts for M_53 for d=14 through 18, since Pace notes that those are easy to compute. I get 3.925720, 3.938891, 3.949805, 3.958898, 3.966443.
7 January, 2014 at 4:50 pm
Aubrey de Grey
PS: this extrapolation does not take into account the feature, mentioned by Pace earlier, that M may rise more slowly with d once d exceeds k/2. However, the extrapolation has M reaching 3.995 by the time d=26, so there’s reason for optimism that the remaining 0.005 can be bridged, even though d=31 may not quite suffice.
8 January, 2014 at 10:59 am
Pace Nielsen
Here are the actual values: 3.925704, 3.938848, 3.949699, 3.958708, 3.966121. [I’m now leaving these sorts of computations in favor of the
idea Terry introduced. I should have some results there later today.]
8 January, 2014 at 8:50 am
Eytan Paldi
Considering the one dimensional approximation of
by a function
represented as a one dimensional integral (using e.g. probabilistic interpretation of the integral), is it possible to generalize it for approximation of type
by a (sufficiently simple) two dimensional integral ?
8 January, 2014 at 9:12 am
Eytan Paldi
I meant that perhaps there is a sufficiently simple corresponding variational problem involving ratio of two dimensional integrals with optimal solution
which may be found exactly.
8 January, 2014 at 9:51 am
Terence Tao
Well, in some sense this is what James and Pace have been doing numerically for moderate k: in James’ orginal paper he worked with polynomial combinations
of the first two symmetric polynomials
,
.
In principle, with the ansatz
the problem does collapse to a two-dimensional one, but I am not sure it is so easily solvable. The thing is that the marginal distribution
of
becomes an integral of f along a parabola. If one replaces P_2 by the elementary symmetric polynomial
then the marginal becomes an integral of f along a line, which looks a bit like a Radon transform so perhaps there is some chance of a clean formula there. On the other hand, the image of the simplex could be a little weird in these coordinates (I haven't tried to do the calculations yet).
Pace mentioned the phenomenon that the (partial) basis consisting of
with all coefficients of
even was a surprisingly efficient basis. I wonder if a similar phenomenon happens with James' original choice
of partial basis. If so, this may suggest further investigation of the
ansatz.
8 January, 2014 at 11:18 am
Pace Nielsen
Following Terry’s description of the
expansion, I think I have worked out the
case, but I want to double-check here that the formulas I’m using for the integrals are correct. I used:
and
If those are correct, my computations yield that
using an arbitrary polynomial symmetric
of degree 3. Increasing the degree further allows for slight improvements in the value of
.
8 January, 2014 at 11:35 am
Terence Tao
Yes, this looks like the correct formulae to me (assuming
, of course).
The fact that M_3 improved from 1.646 to 1.8615 (close to the 1.87459 that you got earlier from the 1D analysis without truncating to the unit cube) is a very promising sign, given that the separate improvement of getting from M_3 to M”_3 moves from 1.646 to 1.914. It gives hope that a combination of the two ways to “think outside the simplex” could even crack 2 in the k=3 case, giving
on EH (which I think would be a good point to declare victory for the EH case ;-) – if we somehow manage to get a fantastic value of M_3 like 2.5 or so then we might think about using the
trick to even hope for
, but I doubt that all our ideas put together will get that far yet.)
8 January, 2014 at 12:01 pm
Pace Nielsen
I worked out the
case and get that
. If someone would like to look over my Mathematica code to verify the computation just email me. (It involves a mixture of stuff I wrote, and stuff I borrowed from James; but it is very straightforward.)
I’ll start thinking about how to do the
case with vanishing marginals and
expansion together. For some reason the vanishing marginals condition is still somewhat mysterious to me, and I have a few other jobs to finish this week, but I hope to get to it soon.
8 January, 2014 at 12:10 pm
Aubrey de Grey
Bravo! As Terry has noted in the new thread, the vanishing is automatic for M’ (though not for M”), so it may be useful to start with M’, especially since (in the pre-epsilon world) M’_3 beats M_3 by almost 2/3 as much as M”_3 does.
8 January, 2014 at 12:20 pm
Eytan Paldi
can you please give more information on the maximal degree and the number of monomials used for this new record?
8 January, 2014 at 12:31 pm
Aubrey de Grey
Actually this seems even more promising when noting that (again in the pre-epsilon setting) M’_3 is only 0.003 less than M_4 – hence you only need a similar improvement of M’ from introducing epsilon as you achieved for M, in order to reach 2 without any need to explore vanishing marginals.
8 January, 2014 at 12:39 pm
Pace Nielsen
Eytan, I only used degree 1. It appears that when using larger degree, you can get close to the number predicted by your work.
8 January, 2014 at 1:00 pm
Eytan Paldi
It is really surprising (almost “too good to be true”) to get this remarkable record using only degree 1 monomials!
8 January, 2014 at 1:48 pm
Pace Nielsen
I agree it is almost too good to be true, which is why I want someone to double check my computation. (But given that the computations agree with the old ones when
, and that the numerics seem to be converging below the bound you obtained, I think everything is okay.)
8 January, 2014 at 2:35 pm
Eytan Paldi
The bound
is not sufficient (a modified bound corresponding to a variational problem for a simplex truncated by a cube is needed – I’m trying to solve it.)
8 January, 2014 at 2:41 pm
Terence Tao
I think there may be some confusion with regards to notation: I believe (based on his description of his k=3 analysis) that when Pace is referring to
, he is in fact referring to what we’ve also called
, in which the function F is supported on the truncated simplex
, rather than the untruncated simplex
. For the latter, Pace got a lower bound of 2.01869 for k=4 using the 1D ansatz at https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-262158, which is consistent with the slightly smaller bound of 2.0023 reported for (what I believe is) the truncated simplex problem
.
8 January, 2014 at 6:08 pm
Pace Nielsen
Terry is correct, I was using the wrong notation. Those bounds are for
.
8 January, 2014 at 2:36 pm
Terence Tao
Excellent! So it looks like we have k=4 (and hence
) on EH, finally dethroning that stubborn bound of 12 on EH that we had not been able to improve upon since the start of Polymath8b. (If you have explicit coefficients for the polynomial optimiser, we would be able to check the various four-dimensional integrals separately with just about any computational mathematics package.)
8 January, 2014 at 6:09 pm
Pace Nielsen
I’ll get those coefficients to you tomorrow.
8 January, 2014 at 9:10 pm
Pace Nielsen
I planned to give you the polynomial from my work computer tomorrow, but knowing that we only need a linear solution it was a simple task to solve for the polynomial another way. So take
.
8 January, 2014 at 10:25 pm
Terence Tao
Maple confirms the value of 2.002351848 for these choices of parameters – looks like
on EH is solid.
eps := 0.18;
a := 0.784;
denominator := int( (1-a*t)^2 * t^3/6, t=0..1+eps) - 4 * int( (1-a*(t+1))^2 * t^3/6, t=0..eps );
numerator := int( int( 1-a*(t+s), s = 0..1)^2 * t^2/2, t=0..eps) + int( int( 1-a*(t+s), s=0..1+eps-t)^2 * t^2/2, t=eps..1-eps);
M := 4 * numerator / denominator;
evalf(M);
9 January, 2014 at 6:28 am
Eytan Paldi
It is really impressive to get this new record using only linear one dimensional sieve! this shows that the
-relaxation method is very powerful!
is possible.)
(now it seems that even
17 April, 2014 at 9:05 am
Tony Feng
I apologize if I am misunderstanding, but this requires only EH and not GEH, right? I am wondering why the wiki page still says that 12 is the best result known on EH only.
17 April, 2014 at 9:17 am
Terence Tao
Back in January we didn’t fully appreciate the difference between EH and GEH, and referred to the two hypotheses interchangeably. As it turns out, if we wish to leave the standard simplex
(i.e. to rely on any M-quantity other than the “vanilla”
) then (when
) we have to use GEH to control the “denominator” in the sieve analysis; it *may* be possible to rely just on EH using Bombieri’s asymptotic sieve, but this has not been attempted.
8 January, 2014 at 11:21 am
Polymath8b, V: Stretching the sieve support further | What's new
[…] for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on can be found at the wiki page (which has recently returned to […]