This is the fifth thread for the Polymath8b project to obtain new bounds for the quantity
either for small values of (in particular
) or asymptotically as
. The previous thread may be found here. The currently best known bounds on
can be found at the wiki page (which has recently returned to full functionality, after a partial outage). In particular, the upper bound for
has been shaved a little from
to
, and we have very recently achieved the bound
on the generalised Elliott-Halberstam conjecture GEH, formulated as Conjecture 1 of this paper of Bombieri, Friedlander, and Iwaniec. We also have explicit bounds for
for
, both with and without the assumption of the Elliott-Halberstam conjecture, as well as slightly sharper asymptotics for the upper bound for
as
.
The basic strategy for bounding still follows the general paradigm first laid out by Goldston, Pintz, Yildirim: given an admissible
-tuple
, one needs to locate a non-negative sieve weight
, supported on an interval
for a large
, such that the ratio
is asymptotically larger than as
; this will show that
. Thus one wants to locate a sieve weight
for which one has good lower bounds on the numerator and good upper bounds on the denominator.
One can modify this paradigm slightly, for instance by adding the additional term to the numerator, or by subtracting the term
from the numerator (which allows one to reduce the bound
to
); however, the numerical impact of these tweaks have proven to be negligible thus far.
Despite a number of experiments with other sieves, we are still relying primarily on the Selberg sieve
where is the divisor sum
with ,
is the level of distribution (
if relying on Bombieri-Vinogradov,
if assuming Elliott-Halberstam, and (in principle)
if using Polymath8a technology), and
is a smooth, compactly supported function. Most of the progress has come by enlarging the class of cutoff functions
one is permitted to use.
The baseline bounds for the numerator and denominator in (1) (as established for instance in this previous post) are as follows. If is supported on the simplex
and we define the mixed partial derivative by
then the denominator in (1) is
where
and
Similarly, the numerator of (1) is
where
Thus, if we let be the supremum of the ratio
whenever is supported on
and is non-vanishing, then one can prove
whenever
We can improve this baseline in a number of ways. Firstly, with regards to the denominator in (1), if one upgrades the Elliott-Halberstam hypothesis to the generalised Elliott-Halberstam hypothesis
(currently known for
, thanks to Motohashi, but conjectured for
), the asymptotic (2) holds under the more general hypothesis that
is supported in a polytope
, as long as
obeys the inclusion
examples of polytopes obeying this constraint include the modified simplex
the prism
the dilated simplex
and the truncated simplex
See this previous post for a proof of these claims.
With regards to the numerator, the asymptotic (3) is valid whenever, for each , the marginals
vanish outside of
. This is automatic if
is supported on
, or on the slightly larger region
, but is an additional constraint when
is supported on one of the other polytopes
mentioned above.
More recently, we have obtained a more flexible version of the above asymptotic: if the marginals vanish outside of
for some
, then the numerator of (1) has a lower bound of
where
A proof is given here. Putting all this together, we can conclude
Theorem 1 Suppose we can find
and a function
supported on a polytope
obeying (4), not identically zero and with all marginals
vanishing outside of
, and with
Then
implies
.
In principle, this very flexible criterion for upper bounding should lead to better bounds than before, and in particular we have now established
on GEH.
Another promising direction is to try to improve the analysis at medium (more specifically, in the regime
), which is where we are currently at without EH or GEH through numerical quadratic programming. Right now we are only using
and using the baseline
analysis, basically for two reasons:
- We do not have good numerical formulae for integrating polynomials on any region more complicated than the simplex
in medium dimension.
- The estimates
produced by Polymath8a involve a
parameter, which introduces additional restrictions on the support of
(conservatively, it restricts
to
where
and
; it should be possible to be looser than this (as was done in Polymath8a) but this has not been fully explored yet). This then triggers the previous obstacle of having to integrate on something other than a simplex.
However, these look like solvable problems, and so I would expect that further unconditional improvement for should be possible.
145 comments
Comments feed for this article
8 January, 2014 at 12:55 pm
Aubrey de Grey
I’m struggling to understand the nature of the current obstacle to exploring the dilated simplex in the BV setting. Terry noted in https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-258264 that no combinatorially-explosive dissection of that polytope was needed for this calculation, and Pace implied in https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-258471 that it was relatively straightforward, but that’s the last I’ve seen. Has a previously-overlooked obstacle emerged? I had thought it was just a matter of getting around to it, but Terry’s description above (“We do not have good numerical formulae…”) implies a deeper issue. Could someone please clarify? – thanks in advance.
8 January, 2014 at 1:53 pm
Pace Nielsen
On my end, the obstacle is that even for moderately sized
the polynomial
is hard to work with (especially when multiplying two such polynomials together and integrating).
8 January, 2014 at 2:42 pm
Aubrey de Grey
I see – thanks. How big, at present, is “moderately sized” in this sense? – is it possible to work your way up from very small k far enough to give the chance of discerning a trend?
8 January, 2014 at 6:10 pm
Pace Nielsen
That’s a good question, which I don’t know the answer to. At present, I’m going to focus on the small k computations, but I may come back to this later.
8 January, 2014 at 6:23 pm
Reid Atcheson
Would it be helpful if the basis formed an orthogonal system on the simplex? Such a basis has been in use by the spectral methods community for a while, and has an analytic recurrence formula in terms of 1D Jacobi polynomials. The details of these posts are otherwise too advanced for my full comprehension.
8 January, 2014 at 6:50 pm
Terence Tao
Good question! The problem though is that there are two quadratic forms (or, if one wishes, inner products) one has to work with – one associated to the denominator (which is the usual inner product on the simplex) and one associated to the numerator (which is more complicated, basically an inner product on the marginals of the original function). One could pick a basis which diagonalises one of these two quadratic forms, but unless the basis is very cleverly chosen, the other quadratic form would not be in any particularly simple form. So it would likely only lead to a modest gain in performance at best.
Of course, if we found a basis for which the numerator and denominator were simultaneously diagonalised, or nearly so, this would indeed be a game-changer, but this may be unlikely. For instance, in the k=2 case, the extremal function is explicitly computable as
, which does not look like it has a particularly simple representation in any natural basis that comes to mind, which suggests that one cannot simplify the optimisation problem too much by working in a standard basis.
8 January, 2014 at 2:09 pm
Eytan Paldi
It seems (as remarked in a previous comment) that the only obstacle is that the (repeated) partial derivatives are generating many monomials from a single one (so a careful choice of the basis functions is needed in order to have low complexity integrals – needed for the numerical optimization program.)
8 January, 2014 at 3:54 pm
Eytan Paldi
As I understand, the only regularity condition on
is its square integrability which implies, for the one dimensional sieve that
needs only to be absolutely continuous. So there is a possibility that by imposing 
we don’t get the true optimal value.
derivation, I explicitly assumed the continuity of
at
but if
is allowed to have a jump at that point, perhaps a better solution exists…
– smoothness condition on
For example, in the
8 January, 2014 at 4:05 pm
Terence Tao
I think that any rough candidate for f can be regularised into a smooth one while only perturbing f and its derivatives infinitesimally in L^2 type norms, which should not lead to any change in the supremum of the ratio being optimised over.
I would expect that even for rough extremisers, one should still have the Euler-Lagrange equation
hold in a distributional sense (i.e. using weak derivatives). As a is continuous and bounded away from zero and b only has jump discontinuities, this suggests to me that f’ should remain continuous for the extremiser.
8 January, 2014 at 4:56 pm
Eytan Paldi
I agree! (I think that the needed properties are that a is absolutely continuous and non-vanishing inside the interval, and that b is measurable and bounded.)
8 January, 2014 at 8:22 pm
Eytan Paldi
The Euler-Lagrange equation for the
variational problem is quite interesting (coupling four sub-intervals) and seems to be analytically solvable (I need to check the details.)
8 January, 2014 at 9:26 pm
Pace Nielsen
I’m a little confused by Theorem 1. Previously, we were allowed to take the support of F to be the polytope
which doesn’t seem to satisfy condition (4).
In the theorem, perhaps the support should be
(rather than just
), still for a region
satisfying (4)?
8 January, 2014 at 9:36 pm
Pace Nielsen
Or rather,
.
8 January, 2014 at 9:56 pm
Terence Tao
This polytope obeys (4) as long as
. Basically, the point is that full sum
is
times the average of the partial sums
for
, so that if
, then at least one of the partial sums
will be less than 2. (Actually, I think the fact that this polytope works originated from a comment of yours: https://terrytao.wordpress.com/2013/12/08/polymath8b-iii-numerical-optimisation-of-the-variational-problem-and-a-search-for-new-sieves/#comment-257443 .)
9 January, 2014 at 7:11 am
Pace Nielsen
You are right, I see what I was missing now.
9 January, 2014 at 8:27 am
Terence Tao
Two small remarks:
Firstly, strictly speaking our new results rely on the “generalised” Elliott-Halberstam hypothesis (GEH) rather than the original Elliott-Halberstam hypothesis, namely the EH type distribution result for a general class of Dirichlet convolutions
as opposed to just for the von Mangoldt function. (A precise statement of GEH can be found as Conjecture 1 of this paper of Bombieri, Friedlander, and Iwaniec; the Bombieri-Vinogradov analogue of GEH was first established by Motohashi.) If one only has EH rather than GEH, one can still take
, but the function F is now constrained to the original simplex
, so one cannot do better than
with our current bag of tricks. But I think GEH is close enough to EH (especially when viewed relative to the presumed difficulty of EH) that this is not too much of a “cheat”.
Secondly, as mentioned previously in https://terrytao.wordpress.com/2013/11/22/polymath8b-ii-optimising-the-variational-problem-and-the-sieve/#comment-253452 , the fact that we now can get k=4 on (generalised) EH means that we have a twin prime/Goldbach dichotomy on GEH: either the twin prime conjecture is true, or every large multiple of six lies within 2 of the sum of two primes (and thus large even natural number lies within 4 of the sum of two primes), by consideration of the tuple n, n+2, N-n, N-n-2, which is admissible when N is a multiple of 6.
9 January, 2014 at 8:46 am
Pace Nielsen
Does the remark on page 12 of James’ preprint allow us to use
under EH, or do we need GEH there as well?
9 January, 2014 at 9:06 am
Terence Tao
That remark shows that to treat the numerator in (1) one can live on
(indeed the same argument shows that one can control the numerator as long as all marginals of F live on
), but the issue is with control of the denominator (Lemma 5.1 in James’s paper, rather than Lemma 5.2) which, in order to go beyond
, needs EH for other divisor sums than the von Mangoldt function.
9 January, 2014 at 9:02 am
Terence Tao
A variant on the second remark: on GEH, either the twin prime conjecture is true, or there are infinitely many Cunningham chains of length 2, as can be seen by consideration of the admissible tuple n-1, n+1, 2n-1, 2n+1.
9 January, 2014 at 9:56 am
Terence Tao
Oops, I withdraw that assertion: if, for instance, it is n-1 and 2n+1 that are simultaneously prime then one doesn’t get either a twin prime or a Cunningham chain.
9 January, 2014 at 9:44 am
jmc
In that previous comment, you sketched two similar statements (about “prime gaps” on the one hand, and “bounded Goldbach” on the other) , and there you said that *at least one* of them is true (logical OR). Now you say it is a dichotomy (*either* one *or* the other, exclusive OR). Where does this difference come from?
And do I understand you correctly that this implies that the Twin Prime Conjecture and the Goldbach Conjecture cannot be both true?
9 January, 2014 at 9:48 am
Terence Tao
Sorry, I misspoke; it should be “disjunction” rather than “dichotomy”.
9 January, 2014 at 10:26 am
Terence Tao
Another variant on the second remark (this one, hopefully, more correct than the previous variant): on GEH, either the twin prime conjecture is true, or every multiple N of 6 lies within two of a Polignac number (a number that is the difference of two consecutive primes infinitely often). This comes from consideration of the admissible tuple n, n+2, n+N, n+N+2. (To force the two primes to be consecutive, one needs some additional arguments of Pintz in the paper linked above, first to force all four members of the tuple to be almost prime (by counting in the Selberg sieve the contribution for a given member to not be almost prime), and then to use an additional upper bound sieve to eliminate the contribution of those n for which there is an additional prime between two elements of the tuple.)
If we end up achieving k=3 rather than k=4, one would imagine that even more results of this type will become available.
9 January, 2014 at 11:37 am
Aubrey de Grey
Terry: does the need to go beyond von Mangoldt have any impact on the anticipated difficulty of adapting MPZ to the current context?
9 January, 2014 at 12:06 pm
Terence Tao
Good question! I don’t foresee any difficulties extending the MPZ results from von Mangoldt to the specific convolutions (basically, sums of the form
) needed to generate the various prisms in (4) above; the Type I/II sums should already suffice here, and in fact there might be a little bit of gain over the von Mangoldt numerology due to the smoothness of one of the convolution factors. But actually it is all probably moot, because we get “for free” the simplex
in (4) (coming from the fact that the constant function 1 has level of distribution 1 for trivial reasons) and unconditionally, we have to keep
close to 1/2 (at best we can do 1/2+13/540 currently) and this should give us plenty of room for the support of F. (But even with F restricted to
, we should be able to beat the current value k=54 of k with the MPZ estimates we have, once we have some idea of how to numerically handle the truncation.)
9 January, 2014 at 12:42 pm
Eytan Paldi
Since by theorem 1 above, in order to have
we may relax GEH to
for any “good”
satisfying
it would be interesting to get (as small as possible) such “good”
.
,
is implied by EH than we have
under EH.
Moreover, If for some such “good”
9 January, 2014 at 5:54 pm
Terence Tao
Yes, our criteria are open, so any result that uses
actually uses
for some absolute constant
, and similarly for GEH. (In the original Goldston-Pintz-Yildirim paper, they observe that the bound
on EH actually just needs
, see page 12). So there is some residual value in pushing the various M_k type values beyond 2, as any increase will presumably decrease the amount of EH one needs. (Also, accurate values of M_k may help with trying to extrapolate to medium-sized k.)
It is conceivable that EH could imply GEH for the type of sequences we actually use – they are something like convolutions of truncated Mobius functions and 1. It looks a bit unlikely though; the best I could reasonably expect is that one could replace GEH by an EH for a family of von Mangoldt-like functions (e.g. functions whose Dirichlet series has some nice properties). Given that GEH is already stated in the literature, it may be simplest just to use it as is, and not try too hard to replace it with other unproven conjectures.
9 January, 2014 at 9:04 am
Eytan Paldi
It seems that
in theorem 1 above, should be modified. (with similar modifications in the list of records.)
[Good point – I’ve changed the text accordingly.]
9 January, 2014 at 2:38 pm
Eytan Paldi
The timeline of bounds should also be modified to include the GEH.
[Corrected, thanks – T.]
9 January, 2014 at 10:04 am
Polymath 8 – a Success! | Combinatorics and more
[…] are infinitely many primes of bounded gap below 16. Maynard improved it to 12. Polymath8b have just improved it further to 8. (The unconditional bound for gaps stands now on […]
9 January, 2014 at 2:22 pm
Aubrey de Grey
I may be missing something obvious here, but in the world records table I’m not seeing why M_k,epsilon has an exact value of 2 for k=2. How does the lower bound of 2 arise here when it doesn’t for the original M_2? – and what is the optimum epsilon for this?
9 January, 2014 at 3:02 pm
Terence Tao
Ah, good point; in truth the only lower bound for
computed so far is the value 1.385.. for M_2. Presumably one can do better than this, but one should not be able to get all the way to 2.
9 January, 2014 at 4:52 pm
Aubrey de Grey
Thanks. Eytan posted something about M_2,epsilon (https://terrytao.wordpress.com/2013/12/20/polymath8b-iv-enlarging-the-sieve-support-more-efficient-numerics-and-explicit-asymptotics/#comment-262267) that I don’t think received any followup yet – maybe that’s a place to start?
10 January, 2014 at 2:50 pm
Eytan Paldi
In theorem 1 above, I suggest to include also the case
.
[Done – T.]
10 January, 2014 at 3:01 pm
Terence Tao
For purposes of large k asymptotics, I now have the upper bound
so one does not get a substantial gain in the large k regime from the epsilon flexibility.
To see this, we define the weight
to be
when
, and to be
when
, and similarly define
by cyclic permutations. Observe that the
are nonnegative and
on the simplex
. Thus, for F on this simplex we have by Cauchy-Schwarz
If
, then setting 
since
. Integrating we get
and summing over cyclic permutations one gets the claim.
10 January, 2014 at 5:11 pm
Eytan Paldi
By using
, in order to satisfy the marginals vanishing constraints, we may take any
in the interval
. Since the upper bound for
is quite tight, the additive gain is lower bounded by about
, and upper bounded by about
.
, this additive gain is in
. This implies (approximately) the reduction of the unconditional current
by a factor between 2 and 4. (i.e. unconditional
!)
For
(but perhaps this is too optimistic …)
10 January, 2014 at 5:58 pm
Eytan Paldi
The above analysis is indeed optimistic because the vanishing marginals constraints should reduce
– making its upper bound less tight than the bound for
.
11 January, 2014 at 5:26 am
Eytan Paldi
By taking
(i.e. making
dependent on
for any
in
), the vanishing marginals condition is automatically satisfied with the same conclusion (not necessarily optimistic) as above on the possible reduction of unconditional
.
11 January, 2014 at 8:18 am
Aubrey de Grey
Eytan: isn’t there still also the constraint that epsilon must not exceed 1/(k-1)?
11 January, 2014 at 9:07 am
Terence Tao
This constraint is only needed in the EH case
. In general, one can take advantage of the additional region
at the end of (4), which makes the polytope
admissible for any
.
11 January, 2014 at 9:14 am
Eytan Paldi
According to theorem 1 above, it is only restricted to be in
(the earlier restriction to
is for the truncated simplex – the last example above – in order to satisfy the vanishing marginals condition for that particular choice of
.)
12 January, 2014 at 4:27 pm
Eytan Paldi
Perhaps an explicit lower bound on
may be derived from this upper bound (as done for the
case). Such a bound may be used to improve the current values of
for
.
13 January, 2014 at 3:15 am
Eytan Paldi
If the optimal
is (for large k) proportional to
for
, then the additive gain of the upper bound on
is upper bounded by about
– implying reduction of the unconditional current
by a factor of about
at most.
flexibility, is still without using the vanishing marginals flexibility,)
(This gain due to the
10 January, 2014 at 3:30 pm
Eytan Paldi
The definition of
above should be for a closed simplex.
11 January, 2014 at 9:15 am
Terence Tao
I’ve been somewhat inconsistent on this. For the purposes of optimising M_k etc., it does not matter whether the simplex is closed or open; but for some of the compactness arguments in the previous post, it is convenient to assume that the function F is compactly supported inside the open simplex (i.e. it vanishes near the boundary of the simplex). Of course, even if F doesn’t vanish at the simplex, one can perturb it infinitesimally to vanish near the simplex (and to make it smooth) without significantly affecting the Rayleigh quotient, so this is only a minor technicality.
11 January, 2014 at 9:31 am
Eytan Paldi
I made this comment because in the page on the variational problem,
was defined as the closed simplex. Moreover, since
is needed only to be in
, its vanishing on the boundary is not really needed.
11 January, 2014 at 10:24 am
Pace Nielsen
Following Aubrey’s recommendation, I computed the bounds we get using the simplex
where no marginal conditions need to be considered. Using degree 11 polynomials and taking
, we get
. To make the obvious comparison, taking degree 11 we have
without playing the epsilon trick. This is still a very good increase, although not nearly as nice as when we were considering the gain that the epsilon trick gave over the original simplex
.
11 January, 2014 at 10:30 am
Aubrey de Grey
Many thanks Pace. Is that the best epsilon, though? – previously for R you used 0.18 for k=4 but 0.243 for k=3. Is it just coincidence that the best value for R’_3 is the same as for R_4 ?
11 January, 2014 at 10:56 am
Pace Nielsen
It is approximately the best value of
for the new simplex
. (The closeness of the two epsilon values is accidental I think.)
11 January, 2014 at 11:11 am
Eytan Paldi
Since this polytope is contained in
, it seems better (and seemingly easier) to take the enlarged simplex
and optimize over
.
11 January, 2014 at 11:27 am
Pace Nielsen
Optimizing over
is indeed easier (and this was the computation I did previously for
), but it is not better. Indeed, it is exactly because the new computation is harder that we get better results. The new simplex contains
for any fixed
and thus yields better numerics.
11 January, 2014 at 12:10 pm
Eytan Paldi
I can see now that the three constraints are added especially to get rid of the vanishing marginals condition! Can you please give some data (as function of the maximal degree) of
(for optimized
) ? (it may help estimating its limit.)
11 January, 2014 at 12:26 pm
Aubrey de Grey
It would also be interesting to know whether the optimal epsilon varies with the maximal degree.
Additionally, in Terry’s original description of the epsilon idea there were k distinct epsilons, but so far all computations have assumed just one. Might it be worth playing around a bit with (for example) making all but one of the epsilons zero, bearing in mind Terry’s observation about M”_3 collapsing to M”_2 ?
11 January, 2014 at 1:43 pm
Eytan Paldi
There is no reason for the optimal epsilon to remain fixed. For unsymmetrical regions
it seems that several epsilons should give better results.
11 January, 2014 at 11:28 am
Pace Nielsen
I should also point out that
not
.
11 January, 2014 at 11:48 am
Eytan Paldi
Thanks for the correction! (in addition, the simplex
should be truncated by the unit cube).
12 January, 2014 at 9:09 am
Terence Tao
I’m recording a Fourier-analytic way to think about the “epsilon” trick allowing us to push the support of F out to
times the simplex, at the cost of only getting a lower bound coming from
times the simplex.
For simplicity I’ll deal with “denominator” sums
rather than “numerator” sums
, even though it is the latter for which the epsilon trick is actually useful for our application. (This gives us an additional factor of
to play with, because 1 has better distribution properties than the von Mangoldt function. As mentioned previously, the Selberg sieve
can be written Fourier-analytically as
for some coefficients
, where
is a Ramanujan sum. From a Fourier perspective,
is supported on the fractions
with
, so each summand
is supported in those reduced fractions with denominator
.
In the baseline approach, we require
to be supported in
, which makes
less than
. As such, different reduced fractions with different denominators
are separated by at least
, which allows for the Plancherel theorem (or the large sieve inequality) to get good L^2 control on the sum of the
, i.e. to control
. (This connection between the Selberg sieve and the large sieve is well known, see e.g. the recent book by Friedlander and Iwaniec.)
Now suppose we enlarge the support of
to
, then the products
can now be as large as
. This means that two different fractions
can be as close as
, which is bad for the large sieve inequality. However, observe that if one of the fractions, say a/q, has denominator less than
, then it still stays a distance at least
from the other fractions in this Farey sequence. (Incidentally, this inhomogeneity in the Farey sequence is a crucial ingredient in the Montgomery-Vaughan large sieve inequalities that appeared tangentially in Polymath8a.) Because of this, if one focuses on just the “major arc” contributions to the L^2 norm that come from the neighbourhood of fractions of denominator at most
, the large sieve inequality still works efficiently and gives a lower bound on the full L^2 norm that comes from square-integrating F on
.
12 January, 2014 at 10:05 am
Eytan Paldi
Does it means that
can be supported on
without satisfying the inclusion condition (4)?
12 January, 2014 at 10:25 am
Terence Tao
Unfortunately, this argument only gives lower bounds rather than upper bounds, which makes it suitable for bounding the numerator of (1), but not the denominator. I have been trying to see if one can also get good upper bounds in the regime
but it looks very difficult, one has to somehow extract subtle cancellations from the minor arcs. For k=2 it may be that automorphic methods can do something, but for k>2 it looks very daunting.
12 January, 2014 at 11:17 am
Terence Tao
There is a cheap scaling trick to convert any good cutoff function F for the M_k problem to a cutoff for the
, just by scaling it by
. The upshot is that
for any symmetric F supported on the original simplex
. It seems that one should be able to compute the integral
as a polynomial function of r without much difficulty for a given polynomial F, so that one can the numerically optimise
in
for a given F. So one could optimise for M_k to find a good candidate F, and then apply this procedure to see if one gets any additional improvement in epsilon.
12 January, 2014 at 12:15 pm
Eytan Paldi
A possible difficulty is that for the
problem,
is supported on a truncated simplex (so the marginals are, in general, only piecewise polynomials.)
12 January, 2014 at 12:18 pm
Terence Tao
Oops, this is indeed an issue, if one is dealing with (the more interesting quantity)
rather than
. One can fix it by truncating F to
, but this makes the denominator and numerator significantly more complicated, unfortunately.
12 January, 2014 at 12:52 pm
Aubrey de Grey
Well… is
really “more interesting” any more? Arguably, the only outstanding question in the
setting is whether
can surpass 2, and after Pace’s recent result for
it’s looking awfully likely that it will, even without any additional ideas. (I guess there is also the question of whether k can be reduced to 4 or 3 without having to resort to GEH rather than EH, but I’m not sure that any of the current directions are leading that way anyway.) Accordingly I would argue that we have now come to a point where the unconditional setting is more of a focus than the [G]EH setting – especially since the k predicted to be achievable from current ideas in the unconditional setting seems to be in the teens or twenties, which may be approaching the regime in which the computational methods currently being applied to k=3 or 4 are conceivably applicable. Thus, the obvious question arises: can this new trick work for M”_k, or is it limited to M_k ? If it can work for M”_k, it seems that it could be a great short-cut to determining good values for
for low two-digit k.
12 January, 2014 at 3:14 pm
Terence Tao
Fair enough. As a first data point, I applied the above rescaling to the k=2 explicit optimiser for M_2, to get a lower bound of 1.690608 for
with
.
M2 := evalf(1/(1-LambertW(exp(-1))));
f := 1/(M2 - 1+ x) + log( (M2-x)/(M2-1+x) ) / (2*M2 - 1);
denominator := evalf(int( f^2, x=0..1));
numerator := M2 * (1+eps) * int( f^2, x=0..(1-eps)/(1+eps) );
ratio := numerator/denominator;
evalf(subs(eps=0.36,ratio));
12 January, 2014 at 3:24 pm
Aubrey de Grey
Nice!
Many thanks also for the new columns in the Selberg wiki page table (which is getting a bit wide; maybe it could be reformatted to show upper bounds below the corresponding lower bounds, rather than beside them?).
12 January, 2014 at 3:31 pm
Aubrey de Grey
Conjecture: the best epsilon for M_k is 0.72/k, and the best for M’_k is 0.72/(k+1). I have no prediction in relation to M” !
12 January, 2014 at 6:22 pm
Aubrey de Grey
I should have noted that this conjectured relation is independent of theta. Perhaps four data points (namely [k.theta] = [3,1], [4,1], [2,1/2] for M and [3,1] for M’) is too few to make such a stretch, but, well, maybe not :-)
12 January, 2014 at 7:33 pm
Terence Tao
Some back of the envelope calculations suggest that asymptotically the best value of epsilon is proportional to
, basically because if one considers the truncated tensor product functions F that show up in the asymptotic analysis, then they are concentrated at a distance about
from the boundary of the simplex (this is the quantity
in the asymptotic analysis, and is also comparable to
), and so it becomes highly counterproductive to shrink the simplex beyond that distance. So unfortunately, one does not get to significantly improve the current bound
with the epsilon trick in the large k regime (indeed, this calculation suggests that the value of k only improves by an unimpressive O(1) for a fixed value of m).
I also see now why there is some gain to be had by increasing epsilon away from zero, even if this gain decays fairly rapidly in k; the marginal function
vanishes to first order as one approaches the boundary of the simplex
, simply because the length of interval
that one is integrating on shrinks to zero. (I am implicitly assuming here that F is bounded, but from Cauchy-Schwarz one can see a similar vanishing in an L^2 sense if F is just assumed to be square-integrable.) Because of this, the cost of retracting the
integrals by an epsilon is only cubic in epsilon (if F is bounded, or slightly better than linear if F is just assumed square-integrable). So the gain coming from dilating by
outweighs the loss in shrinking the simplex by
if
is small enough.
12 January, 2014 at 8:05 pm
Aubrey de Grey
Does this take account of the differences between M, M’, M” ? I notice that k+1/k – namely, the ratio between the only so-far-computed pair of best epsilons for M_k and M’_k for the same theta – is precisely the ratio of volume between R’_k and R_k for that k (namely k=3). Going out on a major limb I’m predicting on this basis that the best epsilon for M”_k (with theta=1/2, and with R being the dilated simplex with edge 2) will be
. I’m not clear as to what is entailed in determining the optimal epsilon in a given context – maybe it is easy enough that a prediction is not useful – but I’m putting it out there in case it is.
12 January, 2014 at 8:19 pm
Aubrey de Grey
Bah – apologies – I obviously misstated those volumes. The prediction of the best epsilon for
based on volume of R and extrapolating from M and M’ should be
.
13 January, 2014 at 8:31 am
Aubrey de Grey
Actually, a little more thought results in the much more optimistic suspicion that the best epsilon for
(with R’’ being the dilated simplex of edge 2) converges to half that of
as k increases – in other words, to 0.36/k based on existing data points (though I certainly don’t exclude the possibility that there is a
factor as Terry suggests). This is based on realising that the ratio of volumes
is actually
, not
, and running with the same idea as above.
12 January, 2014 at 1:42 pm
Eytan Paldi
I think that there is a possibility that the optimal
for
(or any other truncated simplex on which
is supported) or its marginals can be non-smooth (or even discontinuous) on certain hyperplanes (determined by the truncating constraints) which may result by slow convergence of
– polynomial approximation to the optimal
.
(where the polytope partition for the piecewise polynomial approximation is determined so that
and its marginals should be polynomials on each predetermined subdomain.)
A possible fix is to use piecewise polynomial approximation to the optimal
12 January, 2014 at 2:26 pm
Eytan Paldi
The above suggestion for piecewise polynomial approximation was given only to solve a possible problem of slow convergence (wrt polynomial degree) of
due to possible non-smoothness of the optimal
.
This is without vanishing marginal constraints (which obviously may added as needed.)
13 January, 2014 at 9:45 am
Eytan Paldi
In the wiki page on the variational problem,
and
should also be defined.
[Done – T.]
13 January, 2014 at 10:37 am
Eytan Paldi
It seems that
as computed by Pace is for
supported on a somewhat larger domain (i.e.
) than the one appearing in the wiki page.
[Smaller, actually; there was an additional restriction to
which I forgot to put on the wiki, but is now there (though for
it has no effect). -T.]
13 January, 2014 at 11:37 am
Eytan Paldi
In this definition, is it possible to relax the factor
to the maximal
(i.e.
)?
[Yes; I’ve adjusted the definition of
accordingly. -T.]
13 January, 2014 at 12:35 pm
Anonymous
Typographical comment: For “GEH”, “MPZ”, ect., the letters should be upright. In math mode this means typing it as, say “
”.
P.S. This project is really interesting to follow as a non-mathematician.
13 January, 2014 at 3:25 pm
Pace Nielsen
I’m nearly ready to write the code to compute the M_3 value over the half-cube cut out by the plane
. I’ve created a file with some pretty pictures which helped me write down the necessary components of the calculation. The basic picture came from Terry’s decomposition that he found using his son’s toys. (I need to get my own sons similar toys!) You can find the file at: http://www.math.byu.edu/~pace/HalfCube.pdf
I wanted to make this available so as to get a double-check on all of the integral formulas. (I’m also planning to double-check these decompositions tonight. I keep finding minor errors, so there is no guarantee it is correct yet.)
13 January, 2014 at 4:29 pm
Terence Tao
Nice pictures! I’ll double-check the computations later today. One simple checksum is to make sure that I(1) is equal to 1/2, and more generally that I(F) is independent of epsilon if F is a polynomial independent of epsilon.
13 January, 2014 at 4:29 pm
Pace Nielsen
First error I found:
The second marginal condition should instead read:
13 January, 2014 at 4:35 pm
Pace Nielsen
Silly error: in the I-integral section I left some squares off integrands.
13 January, 2014 at 7:35 pm
Terence Tao
Everything else seems to check out – it’s remarkably complicated, though!
This partition should technically exhaust a dense class of F when the degree is large enough, but convergence may be slow. One reason to suspect this is that variational arguments (using Lagrange multipliers) tell us that the optimal F should be of the form
where
is supported on the weirdly disconnected region
. This suggests jump discontinuities when
. The discontinuities associated to
are already accounted for by the partition, but the discontinuities associated to
are not, which does not bode well for polynomial approximation.
On the other hand, we only need to scrape together a gain of less than 0.05, so we may not need further subdivision. If that becomes necessary, it may be worthwhile to collapse to a (messy) two-dimensional problem, and write everything in terms of G instead of F. One side benefit of this is that the domain partitioning will be easier to visualise.
13 January, 2014 at 8:10 pm
Eytan Paldi
I think that such slow convergence must be sublinear (e.g. like a negative power of the maximal degree). Fortunately, we don’t need to get very close to the limit (unless it is very close to 2.)
14 January, 2014 at 6:01 am
Eytan Paldi
3D Jump discontinuity on a plane may lead to an even slower convergence than jump discontinuity at a point in the 1D case. (e.g. the M-value approximation error may decay as a negative fractional power of the polynomial degree !)
13 January, 2014 at 4:23 pm
Eytan Paldi
Pace: Nice picture! I have two questions:
(not
)?
in
(instead of
)?
1. Do you intend to compute
2. Why do you take
13 January, 2014 at 4:27 pm
Pace Nielsen
1. I intend to compute the M value we get using
, using the epsilon expansion trick, on the stated half-cube. I don’t know if
is the correct notation or not.
2. The decomposition in the file slightly changes in the regime
. If we need a larger
, I can work out the change in decomposition, but I doubt that we will need to do this.
13 January, 2014 at 4:41 pm
Eytan Paldi
The definition of
(including the range of
is in the wiki page on the variational problem (by now we have many types of M values!)
13 January, 2014 at 5:35 pm
Eytan Paldi
It seems that the optimal polynomials on the various polytope pieces are not necessarily symmetric (I don’t see any reason for that), so by removing (if needed) the (simplifying) symmetric polynomial constraint, we have the potential to make further improvement to the M-value.
should increase because of the added flexibility from vanishing marginals (implying another improvement by increasing the range of
to
if needed.)
Note also that the optimal
13 January, 2014 at 7:14 pm
Pace Nielsen
All of the symmetry follows by an averaging argument (see, for example, the argument given on page 17 of Maynard’s preprint).
14 January, 2014 at 5:20 am
Eytan Paldi
I agree that the symmetry follows from this argument (also with the presence of the additional constraints) and that the optimal
can be represented as a symmetrical sum of functions of two variables.
Note, however, that all such conclusions are dependent on the (seemingly plausible, yet unproved!) existence of an optimal solution in L^2 for the (constrained) variational problem.
14 January, 2014 at 8:24 am
Terence Tao
Actually I found an argument that shows that symmetric functions suffice even without the existence of an extremiser. Let’s work in the original simplex for simplicity, thus
is the supremum of
for non-trivial F. In particular, the quadratic form
is positive semi-definite, and is thus associated to a positive semi-definite inner product and norm. If
is an extremising sequence for
, normalised so that
, then
goes to zero in this semi-definite norm, as does any reflection of
caused by permuting the coordinates. Also note that by replacing
with
if necessary, one can take
to be non-negative. Applying the triangle inequality for semi-definite inner products, we conclude that the symmetrisation of
also goes to zero, while staying bounded away from zero in L^2 by non-negativity. So the symmetrisations are also an extremising sequence, and so it suffices to extremise over symmetric functions.
The situation is a little trickier with vanishing marginals because we can’t force non-negativity, but I think the argument shows that the only competitor to a symmetric extremising sequence would be an antisymmetric extremising sequence (corresponding to some other irreducible representation of the permutation group) which looks very unlikely, but I don’t see how to formally exclude it.
14 January, 2014 at 8:54 am
Eytan Paldi
Nice argument ! (obviously it depends on the symmetry of the domain and the constraints).
14 January, 2014 at 1:24 pm
Eytan Paldi
It should also be remarked that the supremum
is not attained (which is the interesting case!) If and only if the above quadratic form is positive definite.
14 January, 2014 at 10:26 am
Terence Tao
Calculus of variations (and Lagrange multipliers) tell us that any symmetric extremiser F to a three-dimensional problem such as M_3, M’_3, M”_3,
, etc. must have the form
. In terms of polynomials, this means that the only monomials
one expects F to have are those in which at least one of a,b,c is zero. So for degree D, the number of monomials in play should be O(D^2) rather than O(D^3), which may end up being a significant speed boost that allows us to explore higher degrees (at the cost of some accuracy, because if one caps the degree there might be a little bit of need to use the monomials that don’t show up in the true extremiser).
It might be worth testing the idea of restricting to monomials of just two of the three variables in the M_3 and M’_3 problems (and maybe M”_3, if the code is already available for easy modification) to see if any significant improvements in performance are obtained by doing this.
14 January, 2014 at 10:34 am
Aubrey de Grey
Great idea! Does this type of argument also work for larger k? If so, it could be a big help in finding the k for which
reaches 4, even if it turns out not to be needed to make
reach 2.
14 January, 2014 at 11:36 am
Eytan Paldi
I also think that this is a good idea to select only monomials with the right properties. Moreover (following your previous suggestion) is it possible to iterate this idea (by inserting the new structured solution in Euler-Lagrange equation) to infer a greater structure for the solution (and consequently a smaller class of allowed monomials)?
14 January, 2014 at 1:32 pm
Terence Tao
James and I discussed this a bit in the first Polymath thread. Unfortunately things get very messy and there isn’t an obvious dimensional reduction.
Let’s work on the M_3 problem for simplicity, so one is looking at the eigenfunction equation
where
is the marginal distribution. This writes the three-variable function F in terms of the two-variable function G. If we substitute in this ansatz into (2), we get
but this doesn’t seem to allow one to easily express G in terms of one-variable functions.
14 January, 2014 at 10:27 am
Aubrey de Grey
Pace, let me add my praise for this awesome document.
Speaking of symmetry of F, am I right in saying that it is still open whether a symmetric R” is optimal? I’m presuming that you went with the truncated simplex rather than the prism because the simplifications afforded by symmetry outweigh the need to use 13 components rather than 7, but if the truncated simplex fails to make it to M”=2 then maybe it’s worth also looking at the prism? Or is there another reason not to – for example, does the seven-component dissection of the prism no longer work when epsilon is introdced?
14 January, 2014 at 10:33 am
Terence Tao
Well, once Pace’s code is running one can test the epsilon=0 case and see whether M”_3 for the symmetric polytope is any better or worse than M”_3 for the prism (which James computed to be about 1.914); this may give some indication of whether it is better to switch to the prism. (In that case, it may also make sense to have at least two independent epsilons rather than one, given the reduced symmetry of the prism.)
14 January, 2014 at 10:47 am
Aubrey de Grey
Right. Indeed, if it’s different (either better or worse), and if this isn’t all rendered moot by success with the truncated simplex, there’s probably a case for exploring other half-cubes so as to ensure that the optimal one is being used. I guess the obvious third one that is “as different as possible” from either the prism or the truncated simplex would be the one bounded by the plane x+y+2z = 2.
14 January, 2014 at 1:56 pm
Pace Nielsen
It took a while, but I got the code working. Contrary to my hopes, the symmetric half-cube apparently gives worse numerics than the prism. For
and taking all polynomial pieces of degree 4, we only get
.
The calculation gets slow very quickly as the total degree increases. For example, with degree 5 it takes 35 minutes to show
. It appears that the optimal epsilon is close to 1/6. So we could probably get better numbers going a little past this epsilon, using a slightly different decomposition. However, since the values we are getting are so far away from 2 (indeed, they are barely better than the bounds for
) I don’t think it is worth the effort. We probably want to focus on a different polytope.
14 January, 2014 at 2:15 pm
Terence Tao
Well, that’s a bit disappointing, but I guess we just have to roll with it.
So the next thing to try would be the prism. Here, F would be supported on
with marginal
supported on
and
supported on
for some
. (There is no point enlarging the simplex for the third marginal
since it is already constrained to
.) Our task is now to maximise the ratio
One nice thing here is that in the limit
,
, the ratio approaches 2, as can be seen by taking F to be the indicator function of the box
. So we are guaranteed to hit at least 2, and we only need an epsilon more than this! (But it’s not that simple; I think this limit is a local maximum, and we need a separate local maximum to beat this local maximum. Still, it’s a good sign…)
Actually, a simpler intermediate thing to calculate would be
, in which F is supported in
but the numerator is as before. This is also guaranteed to be at least 2 by the same example as above.
Note that we’ve broken all the symmetry here, and so will have to work with all the monomials, which may cost us a bit performancewise. On the other hand, the number of polytopes involved in the decomposition may be a bit smaller than in the symmetric case.
14 January, 2014 at 2:28 pm
Pace Nielsen
I’ll do this next. Might be in a day or two, given that I have to write my lecture for tomorrow, etc…
14 January, 2014 at 2:23 pm
Pace Nielsen
Sped up the code a bit. With degree 6, we get
, and it takes about 10 minutes.
Given the recent comments, I also looked at what happens when restricting to monomials where only two variables occur. It appears that there is very little loss. For instance, doing degree 6 with this restriction we get
, and it takes about 3 minutes.
14 January, 2014 at 3:20 pm
Eytan Paldi
The very small improvement from degree 5 to degree 6 suggests that either
1. Even for these small degrees, the lower bound is very close to the limit.
or
2. The convergence is very slow (e.g. sub-linear) because of possible discontinuities of the optimal
(or some of its small order derivatives) on certain planes.
It seems to me that the second possibility is the right one (so the limit can be larger than 2 but one needs a very large degree to come close to it!), but of course, given the M-values as a function of the degree, it is not difficult to verify the convergence type (i.e. linear with some stable rate, or sub-linear).
14 January, 2014 at 4:06 pm
Pace Nielsen
If it is helpful at all, I get 1.95866 for degree 7 and 1.958774 for degree 8 (using the restriction I mentioned to monomials with only two variables). That is as far as I plan to compute on this simplex.
14 January, 2014 at 4:51 pm
Eytan Paldi
Assuming linear convergence, the last two estimated rates (using the divided differences method) are about 0.98 and 0.093 (far from being stable!) which seems to rule out the linear convergence type, indicating that we have the problem of sub-linear convergence! (probably due to discontinuities of the optimal
). A possible fix (as previously remarked) is to refine the polytope decomposition such that the discontinuities should appear only on the boundaries of the polytopes (this may be simplified to polygonal partition by using the two variables representation of
).
14 January, 2014 at 5:03 pm
Aubrey de Grey
Well, but the difference between d=5 and 6 is almost the same as between d=6 and 7… I’m not optimistic. (Degree 8 is so much closer to 7 than 7 is to 6 or 6 to 5 that I’m actually tempted to ask whether there’s a typo for d=8 and it should be 1.959774.) What’s worse is that the value for epsilon=0, d=4 is not all that much smaller for the truncated simplex than for the prism, suggesting that the prism won’t get us beyond about 1.97. The main remaining sliver of hope seems to be that allowing two epsilons would improve things, but it’s looking pretty forlorn to me.
14 January, 2014 at 5:33 pm
Eytan Paldi
It seems that the only way to retain linear convergence (with stable rate) is to identify the potential (planar) sets of discontinuities of the optimal
and refine the partition accordingly (to keep all discontinuities on the boundaries of the polytopes).
15 January, 2014 at 6:11 am
Eytan Paldi
It is important to observe that the very slow convergence of M” lower bounds for degrees
with FIXED
still has unstable(!) rate (which is uncharacteristic of sub-linear convergence whose rate should tend to 1) – indicating that these degrees are still in a transitional interval (well below the final stage of convergence with stable rates.)
In addition to possible discontinuities of the optimal F in the interior of some polytopes, this strange behavior may be explained by
1. The optimal
should be optimized for each degree.
2. More importantly, note that the optimal F can be approximated using a restricted class of monomials of order
for each degree
, it means that (effectively) there are only
degrees of freedom with
(vanishing marginals) constraints in the optimization program. The number of constraints may be reduced to
by adapting the constraints to the particular class of
allowed monomials. It seems (since the number of polytopes is relatively large) that only for sufficiently large degree
, the remaining effective number of degrees of freedom (i.e. after subtracting the number of constraints) should give a regular (i.e. stable rate) convergence behavior.
3. Implementation errors.
15 January, 2014 at 8:45 am
James Maynard
I thought its worth mentioning that (at least on the assumption my code was working as intended) when I looked at the 4-variable prism I also encountered rather erratic convergence. The convergence was good on
, but not on the polytopes outside of it with the vanishing conditions. For very small degrees (say 4 or less) the vanishing marginals gave very few degrees of freedom, and so there wasn’t much benefit. Even for higher degree (I think I went up to degree 9) the convergence didn’t seem regular enough to predict what it might be converging to.
15 January, 2014 at 8:29 am
Pace Nielsen
I performed the computation for
on the simplex
.
As expected, when
the value is going to 2 pretty quickly. For degrees 1 to 6, we get the values 1.7549, 1.8540, 1.9029, 1.9308, 1.9481, 1.9597.
For degree 1, to within jumps of 1/100 there is a local maximum when
, which gives the value 1.90704. Similarly for degree 2, the local maximum is when
or
. I wasn’t sure if this meant that the local maximum had split in two, or whether my partition was not fine enough. So taking jumps of 1/1000, we get that the local max is at
, giving a value of 1.910399.
By degree 4, this position starts losing ground to the “corner” epsilon values,
. I’ll keep running the calculations and keep you posted if I notice anything new.
15 January, 2014 at 8:38 am
Pace Nielsen
I should mention that this computation is in the domain where
.
15 January, 2014 at 9:22 am
Eytan Paldi
At least we have a slightly better record (even without the vanishing marginals flexibility!)
15 January, 2014 at 9:48 am
Terence Tao
Thanks for this, Pace! It’s good to see various intuitions about the epsilon behaviour confirmed, namely the local maxima at (1,0), (0,1) (which, unfortunately, I don’t think we can directly use to try to beat 2), and the competing maxima in which the epsilons are equal and comparable to 1/6. So perhaps for the M” calculation we can focus on the
case (abandoning the corners (1,0), (0,1)), which allows one to take symmetric polynomials and thus perhaps speed up the calculations a little bit. (The Lagrange multiplier argument is also still in force, allowing one to restrict attention to monomials with one exponent vanishing.)
On the other hand, we are starting from
here, which is further back than
that we had in the fully symmetric case. This makes sense – we only have two of the three epsilons in play, so the improvement over
had to be smaller by a third or so, which is indeed the case – but is still a little unnerving; we have to somehow make up for it with the rest of the prism, which we suspect from the M”_k evidence to be a little better than the extra space in the symmetric polytope (1.914 vs 1.902), but will it be enough? From the scant evidence we have so far, it looks like a bit of a long shot, but we should at least get a clearer picture of whether the prism or the symmetric polytope is superior, even if both may end up falling short of the goal. (Frustrating, though, that we only need a 2% improvement in our bounds to get to H_1 <= 6 on GEH!)
15 January, 2014 at 10:41 am
Aubrey de Grey
Hm – I think it might be premature to restrict to the single-epsilon case when considering the prism, since it may be that the local maximum for M’ has the epsilons equal only because M’ is symmetric. It’s good, though, that with M’ the local maximum is evident when d=1: since low-degree calculations are fast, maybe any promising local maxima with M” can be identified when d=1 or 2 with both epsilons ranging freely and then explored for higher degree in the context of performance-enhancing pre-set constraints on the epsilons.
Also, I’m not seeing why the epsilons need to sum to at most 1. Is there a clear reason? If there is, but if it can be determined that violating that constraint would be a win, I guess one potential way forward would be to explore whether Terry’s original concept that introduced epsilon can be enhanced to allow a larger sum-of-epsilons.
15 January, 2014 at 12:03 pm
Pace Nielsen
Here M’ is not symmetric (or, more precisely, it is only symmetric when
.
The restriction to the sum being at most 1 was because in the other case we get a different decomposition of the prism. I can do that case as well, and probably will a bit later.
15 January, 2014 at 12:18 pm
Pace Nielsen
Or even more precisely, the polytope on which M’ is computed is not symmetric. Although, it is symmetric in the sense that switching the roles of the epsilons gives the same numerics. Did you have a candidate polytope without this property that you wanted me to try?
15 January, 2014 at 5:10 pm
Aubrey de Grey
Well, I was thinking that since the base-case prism (with no epsilons) is asymmetric, the maximum M may involve unequal epsilons whose disparity in some sense “cancels out” the asymmetry of the base case. I’m not seeing that this is nullified by your point about switching the roles of the epsilons. As for other polytopes, I’m not sure – I think I need a better understanding of your reasoning on this point first.
However, actually I think having the sum-of-epsilons exceed 1 seems by far the most promising option as things stand – there seems no obvious reason (someone please say if there is one) why we would expect (1,0) to give better performance that (1,0.1) for example.
15 January, 2014 at 5:33 pm
Terence Tao
The prism
has partial symmetry – it is symmetric with respect to swapping x and y, but not with respect to swapping z with either x or y. The x-y symmetry corresponds to swapping
and
, so we have the symmetry
.
Also note that
. Indeed, with
,
, and so we are now just maximising
subject to
being supported on
. At this point, there is no interaction between different
slices of F, and the problem decouples into two-dimensional ones. Indeed, if we define
to be the slice of
, then from Fubini’s theorem we have
and
and from Cauchy-Schwarz one has
giving the bound
.
In the number-theoretic setting, when trying to get at least two of n,n+2,n+6 prime, setting
forgoes any chance of forcing n to be prime, so one is left with the k=2 problem of getting n+2,n+6 prime, for which we know we cannot get more than one prime at a time due to the parity problem.
16 January, 2014 at 7:50 am
Aubrey de Grey
Many thanks Terry – that does seem to robustly eliminate the option of a win near the corners. As for the partial symmetry, though, I still don’t see that the swappability of the epsilons tells us for sure that equal epsilons will always be optimal – I think a saddleback situation (where the best equal-epsilons value can be beaten by increasing one epsilon and decreasing the other) is hard to exclude for asymmetric (even if partially symmetric) base case (i.e. epsilon=0) choices of R”.
15 January, 2014 at 1:02 pm
Pace Nielsen
A little more data. For degrees 4, 5, 6, 7, the optimal values in the symmetric case when
are (to within 1/1000) 0.183, 0.183, 0.189, 0.184. The corresponding M values are 1.916121, 1.916198, 1.917474, 1.917865. I do not know the reason for the jump when the degree is 6.
15 January, 2014 at 1:20 pm
Eytan Paldi
This is a relatively small jump (both in the epsilons and the M-values), but the convergence rate is still unstable.
15 January, 2014 at 10:58 am
Eytan Paldi
In the wiki page, the current lower bound for
(prism) should be 2 (due to the corners local maxima.)
15 January, 2014 at 11:06 am
Terence Tao
Actually, that’s a lower bound for
(prism) rather than
(prism), because one has to set one of the epsilons equal to 1 rather than 0.
When one of the epsilons is really big, say
, then one is getting almost no benefit at all from the corresponding marginal, in this case
, as it decays quadratically as
approaches 1. On the other hand, the constraint
(for the marginal at least) should also cost something quadratic as compared to
. My feeling is that the cost outweighs the benefit, which seems to be corroborated by Pace’s most recent numerics, and that one should focus on the epsilon values closer to 1/6, which is what we have been seeing in all other analyses.
15 January, 2014 at 11:07 am
Eytan Paldi
It seems that this lower bound in the wiki page is still without the several epsilons flexibility.
15 January, 2014 at 12:59 pm
Terence Tao
For k=2 one can get a lower bound of 2 without epsilons, but I don’t see how to do it for k=3, because when all three marginals are restricted to the simplex
, one cannot create an example that is (an infinitesimally thickened version of) the indicator of the unit square
, which is the 2D extremiser. One can do this though if two of the three epsilons are set to zero and the third is set to 1.
16 January, 2014 at 9:01 am
Terence Tao
I just realised that there is a version of the epsilon trick that can be used to reduce the denominator in our key ratio (1) by an amount that is exponentially small in k – but for our k=3 problem, an exponentially small gain in k is more or less exactly what we need! The idea goes back to the observation of James back in Nov 20 that we should be able to subtract off the contribution to
when all of the
are composite. On Jan 2 I managed to make James’s idea quantitative in the BV case by multiplying
by
(which is a non-negative weight that equals one unless all of the n+h_i are composite), but in order to absorb this new factor I needed the extra room available in the BV case (basically, the ability to stretch the sieve into
instead of
).
However, the epsilon trick (or more precisely, a variant of something that I had tried in Dec 14 that ended up not working, because I was only enforcing one compositeness condition
rather than all k compositeness conditions
. Namely, we will work with the sieve
where
is the usual Selberg sieve
with
(in the EH case) and f obeying the usual conditions (e.g. supported on the prism R with marginals supported on various simplices)
and
with
obeying similar constraints to f but also obeying the compatibility condition
when
pokes outside of the polytope R.
Note that
is still non-negative (basically because
when
). Also,
and
agree unless all of the n+h_i are composite, so replacing one by the other has no effect on the numerator of (1), only the denominator. For the denominator we can take advantage of the epsilon trick: the first term in the formula for
contains divisor sums that correspond to points outside of the prism, but they are exactly canceled by the corresponding divisor sums from the second term in the formula for
. So one can still obtain an asymptotic for the denominator in (1). In the case
, we have
and we recover the old bound (for, say,
, since this trick seems to be compatible with all our previous tricks), but we should be able to squeeze an additional gain (hopefully at least 2%!) by optimising each of the
separately (as was done in the numerator epsilon trick from Jan 3.) I’ll try to work out the details for the next blog post (which we are due for in any case).
If this works out, it will be rather satisfying: by throwing “everything except the kitchen sink” at the EH problem (advanced numerics, moving from EH to GEH to enlarge the sieve support, polytope decomposition, and using the epsilon trick for both the numerator and denominator), to get an essentially optimal-for-the-method bound of
. (There is still the very remote chance of getting
by also throwing in the “kitchen sink” of bounding the probability that n,n+6 are both prime, but this looks like it would need a huge improvement on M_3 beyond 2 that we are not close to getting at all.)
16 January, 2014 at 9:49 am
Terence Tao
Ah, I got overly optimistic, and this idea has the same issue that my Dec 14 idea had – the Euler-Lagrange equation for optimising
turns out to be a consequence of (or more precisely, an iterated difference of) the Euler-Lagrange equation for
, so that the optimal choice for
is the same as the optimal choice of
and so no improvement is obtained in the end. Somehow the Selberg sieve is already optimised in these directions and there is no further fat to trim here.
Ah well, back to the drawing board…
16 January, 2014 at 9:55 am
Terence Tao
Here is an idea that doesn’t improve the optimal value for
, but may help approach that value faster. Right now, we are partitioning the big polytope R into pieces, consisting of a big piece R’ plus a lot of smaller pieces with various marginal properties, putting polynomials on each piece, and optimising a large quadratic program, which gets slow in higher degree. But somehow the R’ piece, which is easier to optimise over, is dominant, with the other polytopes only adding small additional contributions. So perhaps we could weight our algorithm towards the R’ piece, for instance by first doing a preliminary low-degree optimisation to obtain a first guess for the optimal F that is a low degree polynomial in all polytope components, and then performing a further optimisation by replacing the R’ piece by a high degree polynomial while keeping the other terms fixed. The point is that the R’ component has no vanishing marginal conditions and so has a cleaner and faster optimisation. As we expect some discontinuities within the R’ region, this may help accelerate the convergence.
A variant of this approach (which, now that I think about it, is probably superior) is to do a single optimisation, but with R’ allowed to have high degree while the other pieces have lower degree; again the point is that we don’t lose any degrees of freedom by increasing the degree on R’ because there are no vanishing marginal conditions there.
16 January, 2014 at 10:54 am
Pace Nielsen
I had thought about doing this earlier, but for some reason never gave it a try. My code already allows for this approach. Since the polytopes are already entered (and quadruple checked!) for the symmetric half-plane let me give you some numerics there. Throughout, I’m limiting the degree to 4 on all the non-R’ polytopes, and also assuming that every monomial involves only two variables. I let d denote the degree used on the R’ part. I find an epsilon which gives a higher value for M” than two surrounding epsilon values (so I’m not showing their are no other local maxs [in the given mesh], but that seems likely).
d=6, epsilon=0.160, M”=1.957334, single run 21 seconds
d=7, epsilon=0.158, M”=1.958612, single run 31 seconds
d=8, epsilon=0.157, M”=1.958656, single run 51 seconds
d=9, epsilon=0.162, M”=1.959200, single run 1 minute 20 seconds
d=10, epsilon=0.160, M”=1.959633, single run 3 minutes
My next goal is to run similar computations for the prism. I still have to work out the J-integral decompositions.
16 January, 2014 at 12:33 pm
Eytan Paldi
Pace: I suggest to check if degree 4 outside R’ is sufficient (i.e. enough degrees of freedom are left outside R’ after satisfying the marginal constraints).
16 January, 2014 at 11:15 am
Eytan Paldi
At least we know that without the vanishing marginals condition, the Euler-Lagrange equation implies that the optimal
(which is proportional to
) should be absolutely continuous on
, thus (using standard results of L^2 polynomial approximation theory) is it possible to estimate the rate of convergence (as a function of the polynomial degree) to the M-value?
16 January, 2014 at 7:37 pm
Pace Nielsen
I think I stumbled onto something we had missed. I computed
on the prism and got 1.894793, which is below what happens on the symmetric half-cube, and also below James’ computation on the same prism. I believe that the discrepancy between these values may have arisen from the fact that I used a different decomposition of the prism than James did. I think he may have used the decomposition that Terry described earlier, while I used the same decomposition (with epsilons added) except that I combined the
components into one component (since that reduces the number of integrals needed a little, but preserves the convexity of each component).
If this holds true, it would suggest that the symmetric half-cube is actually the right place to look for the best local maximum (which is what my intuition suggested initially). It also confirms what a few people have asserted; there is some gain to be had from breaking some of the convex components of the symmetric half-cube decomposition into smaller pieces, and optimizing polynomial approximations on those pieces. Especially if we take into account the calculus of variations (Lagrange multipliers) information.
I have to admit that these calculations are a lot of fun, but I may have to put further work off until early next week. If I have time then I’ll start testing this by breaking
into different pieces and redoing the computations. There are two obvious options I see. First, simply break the simplex into the six pieces given from the symmetry. The second is to think in terms of the support of the G functions arising from Lagrange multipliers that Terry described above. These supports overlap, but it shouldn’t be hard to write down the integrals using the given support.
By the way, on the prism [with my decomposition] I get that with degree 4 approximations on all components and with the two epsilons equal, the optimal epsilon is 0.156 and we get
, which is worse than what we find on the symmetric half-cube.
16 January, 2014 at 8:36 pm
Terence Tao
Interesting! So an inefficient polytope partitioning costs us about 0.02 for
for the prism; if we had been losing the same amount for the
for the symmetric polytope, we could get up to 1.98 or so – about halfway to our goal (Zeno’s paradox, once again…). Then it could come down to the wire as we refine the degree and partition into even more pieces than before…
I guess this is all consistent with the strange convergence behaviour that we’ve been experiencing.
I’ve begun working out the two-dimensional formulation, writing everything in terms of the G(x,y) function, which is supported on the disjoint union of a triangle
and the trapezoid
. Here, it turns out that the vanishing marginal conditions can be reformulated as a formula for G on the trapezoid in terms of G on the triangle, so the only independent variable is the value of G on the triangle. The naive thing to do is to then set G to be polynomial (and symmetric) on the entire triangle, but actually it’s looking like there are some mild singularities, and so to get the most optimal results one should partition into quite a lot of pieces (a preliminary analysis suggests that there are singularities on the lines
, and also
). This looks like a scary number of singularities, but in some sense that’s a good thing: the more singularities there are, the worse we expect our current numerics to be, and so the more hope there is that we can do better by using a better adapted partition!
I think I will use the next blog post to talk about the two-dimensional formulation.
17 January, 2014 at 5:44 am
Eytan Paldi
In the two-dimensional formulation, the continuity (but not necessarily smoothness) properties of the function G(x,y) should also be imposed by appropriate (minimal) system of additional constraints.
17 January, 2014 at 7:58 am
Terence Tao
After looking at the eigenfunction equation
for the (putative)
extremiser, I think that F is continuous and piecewise smooth, with lack of differentiability at the interior faces coming from taking an edge on the boundary of
(specifically, one of the three edges joining (1,0,0), (0,1,0), or (0,0,1) to (1/2,1/2,1/2)) and extending them in one of the coordinate directions, so the interior singularities should lie on the trapezoid with corners (1,0,0), (1/2,1/2,1/2), (0,0,0), (0,1/2,1/2) (plus the two reflections). I haven’t worked out exactly how this decomposes R’_3 though. On the two-dimensional side, this suggests that G(x,y) is continuous and piecewise smooth on the triangle
, with a lack of differentiability on the bisecting line joining (0,0) with (1/2,1/2). I guess the continuity and singularity can be detected once we have located good near-extremisers. (Continuity means that we have some interior matching conditions which we should probably enforce when we work on the more difficult
problem.)
17 January, 2014 at 5:20 am
Pace Nielsen
I woke up with two questions, one technical and one numerical.
Question 1: Suppose that we achieved good enough numerics to pull off the trick of passing from
to
. Would similar methods allow us to look at the admissible 3-tuple
and allow us to conclude that there are infinitely many prime gaps of size exactly 6? I know that Maynard’s sieve treats all admissible
-tuples the same, but I wasn’t sure if the same is true of the trick of forcing the endpoints not simultaneously prime.
This would seem to overcome the parity problem for the pair
, so would seem to preclude the ability to pass from
to
.
Question 2: We easily got
, using a linear polynomial on a trivial decomposition, so there is a lot of room for improvement. What sort of bound would we need to push this to in order to play the game of forcing
not simultaneously prime?
17 January, 2014 at 7:51 am
Terence Tao
Oh, very nice observation! I agree that your Q1 argument shows that the parity problem blocks us from reaching
by our methods.
For Q2, the heuristic I’ve been using is that the probability
that a given member
is prime should be
, and that the probability
that two of the
are prime is bounded above by
(the 2 being there for parity reasons). So if one starts with
and deletes the three pairs which are separated by distance 16 or 24, we would get a prime pair separated by 8 if
which is in fact not attainable. The corresponding calculation for k=3 gives the condition
which requires
, which is also blocked by parity.
17 January, 2014 at 8:28 am
Pace Nielsen
I like that heuristic! If I’m computing correctly, for the tuple
, to get
we would need something like
. That seems quite doable.
17 January, 2014 at 1:16 pm
Aubrey de Grey
I’m not all that optimistic that it is. Presumably the best comparison we have to work from is the d=1 values for the basic simplex R_4 with and without epsilon, which are 1.820 and 2.0023 differing by 0.28, so adding epsilon into the high-degree R” mix we may be lucky to get beyond 1.951 (James’s best number for the prism without epsilon), plus 0.28 (see above), plus 0.02 (the benefit of the symmetric half-cube over the prism – though for k=4 it’s also very possible that the prism will beat the symmetric polytope because in contrast to k=3 the prism is a factor of 9/7 larger), plus an unknown but probably modest quantity because James’s 1.951 was for the relatively small d=5.
17 January, 2014 at 11:01 am
Terence Tao
Here is a more formal Selberg-style argument that we can’t get
from a sieve-theoretic analysis of the tuple (n,n+2,n+6) (the same argument would work for other tuples as well). It relies on an unproven hypothesis (the Mobius pseudorandomness conjecture), but this conjecture is widely believed, and indeed one would not even be trying to use sieve theoretic methods to produce twin primes etc. if one did not already believe this conjecture!
Let us informally call a “divisor sum” some function of the form
where
are some coefficients and
are restricted to not be too large (how large depends on what polytope we are using and what distribution hypothesis we would like to assume; I ignore these issues here). We normalise these sums so that
is comparable to 1 (this is not our usual normalisation, being more of a “probability theoretic” normalisation, but never mind this). Our basic sieve-theoretic strategy is to obtain upper and lower bounds for quantities of the form
and hope to parlay this into a proof that at least one of the pairs (n,n+2), (n+2,n+6) are simultaneously prime, giving a proof of
. (We are of course also interested in auxiliary quantities such as
, but we can’t control these directly; the only way we can proceed within sieve theory is to upper bound this quantity by something like
for some other divisor sum
, so that it becomes of the form in (*). [Basically, even GEH only allows one to asymptotically control sums with at most one copy of the von Mangoldt function in them; asymptotics for sums such as
, which basically count twin primes, are out of reach even on GEH, with the only option being to get upper or lower bounds by some combination of sums (*) for which asymptotics are available, together with the non-negativity properties of sieves.]
Now, the Mobius pseudorandomness conjectures predict that the sieve
is asymptotically orthogonal to the Mobius functions
, thus for instance
. (This is because we expect the Mobius function to exhibit significant cancellation on even very short arithmetic progressions.) The same pseudorandomness heuristic also predicts
is asymptotically orthogonal to products
.
What about
? This function is no longer orthogonal to
, since of course
is
when n is prime, but from Mobius pseudorandomness we still expect this function to be orthogonal to higher products
, because the constraint that n is prime can kill at most one of the two or more Mobius factors that are present.
In particular, if one introduces the weight function
(ignoring the case when n+2 is not squarefree, which can be sieved out by the W-trick (or else we replace Mobius with the Liouville function)), we see that a and 1 are indistinguishable for the purposes of correlations (*), thus for instance
Also, observe that the weight sequence
is non-negative; it equals 4 when
has the opposite Mobius function to both
and
, and is zero otherwise. Because of this, any sieve-theoretic manipulation we perform on the sums (*) to produce lots of tuples
with a desired property as weighted by the constant function
, will also produce lots of tuples with the same property as weighted by the function
. For instance, if we can get a lower bound for
by some clever combination of asymptotics for (*) (and exploiting non-negativity), we should also be able to get the same lower bound for
. In particular, we would find an n in the support of a(n) for which either n,n+2 are both prime, or n+2,n+6 are both prime. But this contradicts the support of a, which forces n+2 to have a Mobius function of opposing sign to both n and n+6. So we cannot use sieve theory to force the event “n,n+2 both prime or n+2,n+6 both prime”, which indicates that we cannot establish
through a sieve-theoretic analysis of the tuple n,n+2,n+6.
EDIT: The intuition here is that sieve theory cannot prevent the two “conspiracies”
and
from simultaneously occurring for (almost) all n for which n, n+2, n+6 are almost prime. (Of course, formally two applications of shifted versions of
would imply
, contradicting what I just said, but the space of n for which n,n+2,n+6 are almost prime is not shift-invariant, and one cannot force incompatibility this way. Alternatively, one can consider conspiracies in which
for “almost all sufficiently large n with
, which I believe cannot be ruled out by any easy “local” argument, and would mean that prime gaps less than 6 occur extremely infrequently.)
17 January, 2014 at 2:56 pm
Polymath8b, VI: A low dimensional variational problem | What's new
[…] for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on can be found at the wiki page (which has recently returned to […]
28 January, 2014 at 9:19 pm
Polymath8b, VII: Using the generalised Elliott-Halberstam hypothesis to enlarge the sieve support yet further | What's new
[…] smooth compactly supported functions. We use this as a replacement for the denominator estimate in this previous blog post, we obtain the criteria described […]
8 February, 2014 at 5:19 pm
Eytan Paldi
In the numerator of (1), the outer summation index should be
.
[Corrected, thanks – T.]