Polymath8b, III: Numerical optimisation of the variational problem, and a search for new sieves

8 December, 2013 in math.NA, math.NT, polymath | Tags: polymath8 | by Terence Tao

This is the third thread for the Polymath8b project to obtain new bounds for the quantity

$\displaystyle H_m := \liminf_{n \rightarrow\infty} (p_{n+m} - p_n),$

either for small values of ${m}$ (in particular ${m=1,2}$ ) or asymptotically as ${m \rightarrow \infty}$ . The previous thread may be found here. The currently best known bounds on ${H_m}$ are:

(Maynard) Assuming the Elliott-Halberstam conjecture, ${H_1 \leq 12}$ .
(Polymath8b, tentative) ${H_1 \leq 330}$ . Assuming Elliott-Halberstam, ${H_2 \leq 330}$ .
(Polymath8b, tentative) ${H_2 \leq 484{,}126}$ . Assuming Elliott-Halberstam, ${H_4 \leq 493{,}408}$ .
(Polymath8b) ${H_m \leq \exp( 3.817 m )}$ for sufficiently large ${m}$ . Assuming Elliott-Halberstam, ${H_m \ll e^{2m} m \log m}$ for sufficiently large ${m}$ .

Much of the current focus of the Polymath8b project is on the quantity

$\displaystyle M_k = M_k({\cal R}_k) := \sup_F \frac{\sum_{m=1}^k J_k^{(m)}(F)}{I_k(F)}$

where ${F}$ ranges over square-integrable functions on the simplex

$\displaystyle {\cal R}_k := \{ (t_1,\ldots,t_k) \in [0,+\infty)^k: t_1+\ldots+t_k \leq 1 \}$

with ${I_k, J_k^{(m)}}$ being the quadratic forms

$\displaystyle I_k(F) := \int_{{\cal R}_k} F(t_1,\ldots,t_k)^2\ dt_1 \ldots dt_k$

and

$\displaystyle J_k^{(m)}(F) := \int_{{\cal R}_{k-1}} (\int_0^{1-\sum_{i \neq m} t_i} F(t_1,\ldots,t_k)\ dt_m)^2$

$\displaystyle dt_1 \ldots dt_{m-1} dt_{m+1} \ldots dt_k.$

It was shown by Maynard that one has ${H_m \leq H(k)}$ whenever ${M_k > 4m}$ , where ${H(k)}$ is the narrowest diameter of an admissible ${k}$ -tuple. As discussed in the previous post, we have slight improvements to this implication, but they are currently difficult to implement, due to the need to perform high-dimensional integration. The quantity ${M_k}$ does seem however to be close to the theoretical limit of what the Selberg sieve method can achieve for implications of this type (at the Bombieri-Vinogradov level of distribution, at least); it seems of interest to explore more general sieves, although we have not yet made much progress in this direction.

The best asymptotic bounds for ${M_k}$ we have are

$\displaystyle \log k - \log\log\log k + O(1) \leq M_k \leq \frac{k}{k-1} \log k \ \ \ \ \ (1)$

which we prove below the fold. The upper bound holds for all ${k > 1}$ ; the lower bound is only valid for sufficiently large ${k}$ , and gives the upper bound ${H_m \ll e^{2m} \log m}$ on Elliott-Halberstam.

For small ${k}$ , the upper bound is quite competitive, for instance it provides the upper bound in the best values

$\displaystyle 1.845 \leq M_4 \leq 1.848$

and

$\displaystyle 2.001162 \leq M_5 \leq 2.011797$

we have for ${M_4}$ and ${M_5}$ . The situation is a little less clear for medium values of ${k}$ , for instance we have

$\displaystyle 3.95608 \leq M_{59} \leq 4.148$

and so it is not yet clear whether ${M_{59} > 4}$ (which would imply ${H_1 \leq 300}$ ). See this wiki page for some further upper and lower bounds on ${M_k}$ .

The best lower bounds are not obtained through the asymptotic analysis, but rather through quadratic programming (extending the original method of Maynard). This has given significant numerical improvements to our best bounds (in particular lowering the ${H_1}$ bound from ${600}$ to ${330}$ ), but we have not yet been able to combine this method with the other potential improvements (enlarging the simplex, using MPZ distributional estimates, and exploiting upper bounds on two-point correlations) due to the computational difficulty involved.

— 1. Upper bound —

We now prove the upper bound in (1). The key estimate is

$\displaystyle (\int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)\ dt_1)^2 \ \ \ \ \ (2)$

$\displaystyle \leq \frac{\log k}{k-1} \int_0^{1-t_2-\ldots-t_k} F(t_1,\ldots,t_k)^2 (1 - t_1-\ldots-t_k+ kt_1)\ dt_1.$

Assuming this estimate, we may integrate in ${t_2,\ldots,t_k}$ to conclude that

$\displaystyle J_k^{(1)}(F) \leq \frac{\log k}{k-1} \int F^2 (1-t_1-\ldots-t_k+kt_1)\ dt_1 \ldots dt_k$

which symmetrises to

$\displaystyle \sum_{m=1}^k J_k^{(m)}(F) \leq k \frac{\log k}{k-1} \int F^2\ dt_1 \ldots dt_k$

giving the upper bound in (1).

It remains to prove (2). By Cauchy-Schwarz, it suffices to show that

$\displaystyle \int_0^{1-t_2-\ldots-t_k} \frac{dt_1}{1 - t_1-\ldots-t_k+ kt_1} \leq \frac{\log k}{k-1}.$

But writing ${s = t_2+\ldots+t_k}$ , the left-hand side evaluates to

$\displaystyle \frac{1}{k-1} (\log k(1-s) - \log (1-s) ) = \frac{\log k}{k-1}$

as required.

This also suggests that extremal ${F(t_1,\ldots,t_k)}$ behave like ${\tilde F(t_2,\ldots,t_k) / (1-t_1-\ldots-t_k + kt_1)}$ for some function ${\tilde F}$ , and similarly for permutations. However, it is not possible to have exact equality in all variables simultaneously, which indicates that the upper bound (1) is not optimal, although in practice it does remarkably well for small ${k}$ .

— 2. Lower bound —

Using the notation of this previous post, we have the lower bound

$\displaystyle M_k \geq \frac{m_1^2}{m_2} {\bf P}( X_1 + \ldots + X_{k-1} \leq k - T )$

whenever ${g}$ is supported on ${[0,T]}$ , ${m_i := \int_0^T g(t)^i\ dt}$ , and ${X_1,\ldots,X_k}$ are independent random variables on ${[0,T]}$ with density ${\frac{1}{m_2} g(t)^2\ dt}$ . We select the function

$\displaystyle g(t) := \frac{1}{1 + A t} 1_{[0,T]}(t)$

with ${A := \log k}$ , and ${T := \varepsilon \frac{k}{A}}$ for some ${0 < \varepsilon < 1}$ to be chosen later. We have

$\displaystyle m_1 = \log(1+AT) = \log( 1 + \varepsilon k)$

and

$\displaystyle m_2 = \int_0^T \frac{1}{(1+At)^2}\ dt$

$\displaystyle = \frac{1}{A} - \frac{1}{A (1+AT)}$

$\displaystyle \leq \frac{1}{A} = \log k$

and so

$\displaystyle M_k \geq \frac{\log^2(1+\varepsilon k)}{\log k} {\bf P}( X_1 + \ldots + X_{k-1} \leq k - T ).$

Observe that the random variables ${X_i}$ have mean

$\displaystyle \mu = \frac{1}{m_2} \frac{1}{A^2} (\log(1+AT)-1+\frac{1}{1+AT})$

$\displaystyle = (A + \frac{1}{\varepsilon k}) (\log(1+\varepsilon k)-1+\frac{1}{1+\varepsilon k})$

$\displaystyle \leq 1 - \frac{1}{\log k} + O( \frac{\log k}{\varepsilon k} ).$

The variance ${ \sigma^2}$ may be bounded crudely by

$\displaystyle \sigma^2 \leq \frac{1}{m_2} \int_0^T \frac{t^2}{(1+At)^2}\ dt$

$\displaystyle = O( A \frac{T}{A^2} ) = O( \frac{\varepsilon k}{\log^2 k} ).$

Thus the random variable ${X_1 + \ldots + X_{k-1}}$ has mean at most ${k - \frac{k}{\log k} + O( \frac{\log k}{\varepsilon} )}$ and variance ${O( \frac{\varepsilon k^2}{\log^2 k} )}$ , with each variable bounded in magnitude by ${T = \frac{\varepsilon k}{\log k}}$ . By Hoeffding’s inequality, this implies that ${X_1 + \ldots + X_{k-1}}$ is at least ${ k - T}$ with probability at most ${O( \exp(- c / \varepsilon^2 )}$ for some absolute constant ${c}$ . If we set ${ \varepsilon := C / (\log \log k)^{1/2}}$ for a sufficiently large absolute constant ${C}$ , we thus have

$\displaystyle {\bf P}( X_1 + \ldots + X_{k-1} \leq k - T ) = 1 - O( 1 / \log k)$

and thus

$\displaystyle M_k \geq \log k - \log\log\log k + O(1)$

giving the lower bound in (1).

Hoeffding’s bound is proven by the exponential moment method, which more generally gives the bound

$\displaystyle M_k \geq \frac{m_1^2}{m_2} (1 - e^{-s(k-T)} (\frac{1}{m_2} \int_0^T g(t)^2 e^{st}\ dt)^{k-1} )$

for any ${s > 0}$ . However, this bound is inferior to the linear algebra method for small ${k}$ ; for instance, we can only obtain ${DHL[k,2]}$ for ${k=339}$ by this method (compared to ${k=64}$ from quadratic programming), even if one uses the best available MPZ estimate.

Using ${MPZ[\varpi,\delta]}$ , one can modify this lower bound, obtaining ${DHL[k_0,m+1]}$ whenever we can find ${\varpi, \delta, A, T, s, s'}$ obeying

$\displaystyle \frac{m_1^2}{m_2} ( 1 - \kappa_1 - \kappa_2 ) > \frac{m}{1/4+\varpi}$

where

$\displaystyle \kappa_1 := e^{-s(k-T)} (\frac{1}{m_2}\int_0^T e^{st} g(t)^2\ dt)^{k-1}$

and

$\displaystyle \kappa_2 := e^{s' \theta k} (\frac{1}{m_2}\int_0^{\tilde \delta k} e^{-s't} g(t)^2\ dt)^{k-1}$

with

$\displaystyle \tilde \delta' := \frac{T}{k}$

$\displaystyle \delta' = \tilde \delta' (\frac{1}{4}+\varpi)$

$\displaystyle \tilde \delta := \frac{\delta}{1/4+\varpi}.$

$\displaystyle \theta := \frac{i(\delta'-\delta)/2 + \varpi}{1/4+\varpi}$

$\displaystyle = i (\tilde \delta' - \tilde \delta)/2 + \frac{\varpi}{1/4+\varpi}.$

Details can be found at this comment.

— 3. Quadratic programming —

The expressions ${\sum_{m=1}^k J_k^{(m)}(F)}$ , ${I_k(F)}$ are quadratic forms in ${F}$ , so one can (in principle, at least) obtain lower bounds for ${M_k}$ by restricting to a finite-dimensional space of ${F}$ and performing quadratic programming on the resulting finite-dimensional quadratic forms.

One can take ${F}$ to be symmetric without loss of generality. One can then look at functions ${F}$ that are linear combinations of symmetric polynomials

$\displaystyle m(\alpha) = \sum_{\sigma \in S_k} x_{\sigma(1)}^{\alpha_1} \ldots x_{\sigma(k)}^{\alpha_k}$

for various signatures ${\alpha_1,\alpha_2,\ldots}$ ; actually it turns out numerically that one can get more efficient results by working with combinations of ${m(\alpha)}$ for ${\alpha}$ up to a fixed degree, and ${(1-P_1)^a m(\alpha)}$ for ${\alpha}$ at that degree afterwords, where ${P_1(x_1,\ldots,x_k) = x_1+\ldots+x_k}$ is the first symmetric polynomial. (Note that the GPY type sieves come from the case when ${F}$ is purely a function of ${P_1}$ .)

It may be that other choices of bases here are more efficient still (perhaps choosing bases that reflect the belief that ${F}$ should behave something like ${\prod_{i=1}^k \frac{1}{1+A k^{1/2} t_i}}$ for some smallish ${A}$ ), but one numerical obstacle is that we are only able to accurately compute the required coefficients for ${J_k^{(m)}(F), I_k(F)}$ in the case of polynomials.

154 comments

Comments feed for this article

8 December, 2013 at 5:08 pm

Andrew Sutherland

A couple of minor typos in the bulleted list of bounds: I think $H_2 \le 330$ should be $H_1\le 330$ and $H_4\le 493,408$ should be $H_2\le 493,408$ .

8 December, 2013 at 5:19 pm

Terence Tao

Actually, the Elliott-Halberstam conjecture essentially allows us to double the value of $m$ for any bound previously obtained using only the Bombieri-Vinogradov theorem (such as the bound $H_2 \leq 493,408$ ).

8 December, 2013 at 5:11 pm

Terence Tao

This doesn’t directly help the numerical optimisation, but here is an alternate way to compute the integrals of polynomials on the simplex ${\cal R}_k$ , giving a slightly different way to establish Lemma 7.1 from Maynard’s paper. The starting point is the identity

$\displaystyle \int_{t_1,\ldots,t_k \geq 0} e^{-t_1-\ldots-t_k} F(t_1,\ldots,t_k)\ dt_1 \ldots dt_k$

$\displaystyle = \int_0^\infty e^{-t} \int_{t {\cal R}_k} F(t_1,\ldots,t_k)\ dt_1 \ldots dt_k$

which is a simple application of Fubini’s theorem. If F is homogeneous of degree d, then

$\displaystyle \int_{t {\cal R}_k} F(t_1,\ldots,t_k)\ dt_1 \ldots t_k = t^{k+d} \int_{{\cal R}_k} F(t_1,\ldots,t_k)\ dt_1 \ldots dt_k;$

from the Gamma function identity $\int_0^\infty e^{-t} t^{k+d}\ dt = (k+d)!$ , we thus see that

$\displaystyle \int_{t_1,\ldots,t_k \geq 0} e^{-t_1-\ldots-t_k} F(t_1,\ldots,t_k)\ dt_1 \ldots dt_k$

$\displaystyle = (d+k)! \int_{{\cal R}_k} F(t_1,\ldots,t_k)\ dt_1 \ldots dt_k.$

Now, from the gamma function identity again we have

$\displaystyle \int_{t_1,\ldots,t_k \geq 0} e^{-t_1-\ldots-t_k} t_1^{a_1} \ldots t_k^{a_k}\ dt_1 \ldots dt_k = a_1! \ldots a_k!$

and thus

$\displaystyle \int_{{\cal R}_k} t_1^{a_1} \ldots t_k^{a_k}\ dt_1 \ldots t_k = \frac{a_1! \ldots a_k!}{(k+a_1+\ldots+a_k)!}.$

Replacing k with $k+1$ , and then performing the $t_{k+1}$ integral, one also has the generalisation

$\displaystyle \int_{{\cal R}_k} t_1^{a_1} \ldots t_k^{a_k} (1-t_1-\ldots-t_k)^{a_{k+1}}\ dt_1 \ldots dt_k = \frac{a_1! \ldots a_{k+1}!}{(k+a_1+\ldots+a_{k+1})!}$

which is equation (7.4) in Maynard’s paper.

This calculation suggests that Laplace transforms in the $t_i$ variables may possibly be useful.

8 December, 2013 at 5:51 pm

Terence Tao

I’ve been experimenting with the problem of optimising a sieve, to get some sense as to how sharp the Selberg sieve is. A model problem is the following: suppose one has a non-negative sequence $a_n$ for which one has the exact bounds

$\sum_{n: d|n} a_n = g(d) X$

for all $d \leq R$ and some multiplicative function $g(d)$ taking values in [0,1] and some quantity $X$ (in practice there are error terms also, but I will ignore them here). Let $P$ be the product of some finite number of primes. What is the best upper bound $\alpha X$ one can then place on the quantity

$\sum_{n: (n,P) = 1} a_n$ ?

(This isn’t quite the sieve problem that is of interest when trying to go beyond the Selberg sieve to prove results of the form DHL[k,m+1], but it is a much more well-studied problem and is perhaps a place to start first.)

By the Hahn-Banach theorem, $\alpha$ is also the minimal value of the quotient

$\sum_{d \leq R} \lambda_d g(d) / \lambda_1$

subject to the constraint that $\sum_{d|n; d \leq R} \lambda_d \geq 0$ for all $n$ (and that $\lambda_1$ is non-zero, of course).

Another equivalent definition of $\alpha$ is that it is the maximal value of ${\bf P} ( \overline{\bigvee_{p|P} E_p} )$ , if $E_p$ are events in a probability space such that

${\bf P} ( \bigwedge_{p|d} E_p ) = g(d)$

for all $d \leq R$ . For instance, by taking the $E_p$ to be independent events of probability $g(p)$ , we obtain the lower bound

$\alpha \geq \prod_{p|P} (1-g(p))$ . (1)

The Selberg sieve, in contrast, gives the upper bound

$\alpha \leq (\sum_{d|P: d \leq \sqrt{R}} h(d))^{-1}$

where $h(d) = \prod_{p|d} \frac{g(p)}{1-g(p)}$ . For comparison, (1) can be rewritten as

$\alpha \geq (\sum_{d|P} h(d))^{-1}$ .

But the Selberg sieve is not always optimal; for instance, when $P$ is a product of many small primes then often the beta sieve is superior (and in the model one-dimensional case when $g(p) = \frac{1+o(1)}{p}$ , the beta sieve in fact gives asymptotically optimal results).

I’ve started playing around with a model case when the primes dividing P are very close to each other, and $g(p)$ is basically constant, $g(p) \approx k/p$ for some fixed $k$ , where the question simplifies to a low-dimensional linear programming problem, with the intent of understanding the precise strength of the Selberg sieve in this particular setting. The situation is then as follows: if we have events $E_1,\ldots,E_N$ for some large $N$ with the bounds

${\bf P}( E_{i_1} \wedge \ldots \wedge E_{i_d} ) = \frac{k^d}{N^d}$

for all $d \leq d_0$ and $1 \leq i_1 < \ldots < i_d \leq N$ , what is the best upper bound $\alpha( k, d_0 )$ on

${\bf P}( \overline{E_1 \vee \ldots \vee E_N} )$

in the asymptotic limit $N \to \infty$ ? There are some interesting combinatorics emerging here and I might detail them in a future post.

8 December, 2013 at 9:44 pm

Terence Tao

Some preliminary computations for the $\alpha(k,d_0)$ problem mentioned previously, where $k \geq 0$ is real and $d_0 \geq 1$ is a natural number:

The lower bound (1) becomes $\alpha(k,d_0) \geq \exp(-k)$ . The Selberg sieve upper bound is

$\alpha(k,d_0) \leq ( \sum_{j \leq d_0/2} \frac{k^j}{j!} )^{-1}$ .

The Bonferroni inequalities give

$\alpha(k,d_0) \leq \sum_{j \leq d} (-1)^j \frac{k^j}{j!}$

whenever $d \leq d_0$ is even. When $k \leq 1$ and $d$ is the largest even number less than or equal to $d_0$ , I can show that this is in fact an equality, by explicitly constructing events $E_i$ .

One can also define $\alpha(k,d_0)$ to be the infimal value of $\sum_{d \leq d_0} \frac{\lambda_d k^d}{d!} / \lambda_1$ subject to the constraints $\sum_{d \leq n} \binom{n}{d} \lambda_d \geq 0$ for all $n$ ; it is also the supremal value of $p_0$ whenever $p_0,p_1,\ldots$ are non-negative numbers obeying the constraints $\sum_{n \geq d} \frac{1}{(n-d)!} p_n = k^d$ for all $d \leq d_0$ . Unlike the general sieve problem, there is so much symmetry here that the linear programming problem can actually be solved exactly for small values of $d_0$ .

The first non-trivial value of $d_0$ is $d_0=2$ : here we have ${\bf P}(E_i) = k/N$ and ${\bf P}(E_i \wedge E_j ) = k^2 / N^2$ , and want an asymptotic upper bound for $P( \overline{E_1 \vee \ldots \vee E_N} )$ . It turns out that the exact value can be computed here as

$\alpha(k,2) = 1 - \frac{2}{n+1} k + \frac{1}{n(n+1)} k^2$

whenever $n \geq 1$ is a natural number and $n-1 \leq k \leq n$ . Thus for instance the Bonferroni bound

$\alpha(k,2) = 1 - k + \frac{1}{2} k^2$

is sharp for $0 \leq k \leq 1$ , then we have

$\alpha(k,2) = 1 - \frac{2}{3} k + \frac{1}{6} k^2$

for $1 \leq k \leq 2$ , and so forth. The Selberg sieve bound is

$\alpha(k,2) \leq \frac{1}{k+1}$

which happens to be sharp exactly when $k$ is a natural number – so in this case the Selberg sieve is fairly efficient. This latter bound can also be seen by observing that the counting random variable $X := \sum_i 1_{E_i}$ has mean $k$ and second moment ${\bf E} X^2 = k + k^2 + o(1)$ , so that it has to be non-zero on an event of probability at least $1-\frac{1}{k+1}+o(1)$ by the Cauchy-Schwarz inequality.

The $d_0=3$ situation appears to be almost identical with the $d_0=2$ situation, though I haven’t nailed this down completely. The $d_0=4$ case may be more interesting; I suspect here that the Selberg sieve bound $\alpha(k,4) \leq \frac{1}{1+k+k^2/2}$ is now inefficient, although I don’t know for sure yet. This is quite an oversimplified problem compared to our true sieve problem, but may help shed some light on that problem.

9 December, 2013 at 2:18 pm

Terence Tao

I confirmed that the $d_0=3$ numerology is identical to the $d_0=4$ numerology. For the $d_0=4$ case, the Selberg sieve bound $\alpha(k,4) \leq (1+k+\frac{k^2}{2})^{-1}$ can be derived as follows. For a random variable of mean zero and variance 1, one has the bound ${\bf E} X^4 - ({\bf E} X^3)^2 - 1 \geq 0$ relating the third and fourth moments, which arises from the Selberg-sieve type manipulation of starting with the trivial inequality ${\bf E} (X^2+aX+b)^2 \geq 0$ and then optimising in a and b. In our situation, the counting function $X := \sum_i 1_{E_i}$ has falling moments ${\bf E} X (X-1) \ldots (X-i) = k^i$ for $i=0,\ldots,4$ , or equivalently ${\bf E} X^i = \sum_{j=0}^i S(i,j) k^j$ where the $S(i,j)$ are Stirling numbers of the second kind. Conditioning onto the event where $X$ is non-zero, normalising to have mean zero and unit variance, and using the previous inequality, one obtains (as I verified in Maple) the bound $\alpha(k,4) \leq (1+k+\frac{k^2}{2})^{-1}$ .

The inequality ${\bf E} X^4 - ({\bf E} X^3)^2 - 1 \geq 0$ is sharp if and only if X takes at most two values; thus the Selberg sieve is sharp exactly when X takes at most two non-zero values. But I believe I can show by hand that this cannot actually occur for the situation at hand, and so the Selberg bound is never sharp in this case. However, I don’t yet know of an easy way to improve the bound (although in the $d_0=4$ case, brute force computation should eventually give the optimal bounds).

9 December, 2013 at 5:22 pm

Terence Tao

Actually, it turns out that in some cases (specifically, when $k=j(j+1)$ for some natural number $j$ ) it is possible for $X$ to take precisely two non-zero values, in which case the Selberg sieve bound $\alpha(k,4) \leq (1+k+\frac{k^2}{2})^{-1}$ is indeed sharp. (For instance, when $k=2$ , one can concoct a scenario in which $X$ takes the value 0 with probability 1/5, 2 with probability 2/3, and 5 with probability 2/15, and the Selberg sieve bound is sharp in this case.)

Similar scenarios are possible for larger $d_0$ ; in principle, the Selberg sieve bound is sharp if $X$ takes precisely $d_0/2$ non-zero values, although it becomes difficult to ensure that these values are all natural numbers. But the Selberg sieve nevertheless performs substantially better than I had expected on this model problem, which is perhaps a hint that one cannot hope to do much better than that sieve for the bounded gaps problem.

12 December, 2013 at 9:52 am

Terence Tao

I found a way to convert the general sieve problem into a Selberg-type problem, by taking advantage of the positivstellensatz of Putinar. Among other things, this theorem asserts that a polynomial of n variables $Q(x) = Q(x_1,\ldots,x_n)$ is non-negative on the cube $[0,1]^n$ if and only if it is the sum of polynomials of the form $x_i R(x)^2$ and $(1-x_i) R(x)^2$ for various polynomials $R$ (but with the caveat that R can have degree larger than that of Q, due to cancellation).

Anyway, the sieve problem is to minimise $\sum_{d|P} \lambda_d g(d)$ for $\lambda_d$ supported on $d \leq R$ , given that $\lambda_1 = 1$ and $\sum_{d|n; d|P} \lambda_d \geq 0$ . Writing $P = p_1 \ldots p_n$ and introducing the n-variate polynomial

$Q(x_1,\ldots,x_n) := \sum_{d|P} \lambda_d \prod_{i: p_i|d} x_i$

we see that we are trying to minimise the linear functional

$\sum_{x \in \{0,1\}^n} Q(x) \prod_{i=1}^n ((1-g(p_i))(1-x_i) + g(p_i) x_i)$ (*)

subject to the constraints that Q is non-negative on the corners $\{0,1\}^n$ of the cube with $Q(0)=1$ , and only involves monomials $x_{i_1} \ldots x_{i_k}$ with $p_{i_1} \ldots p_{i_k} \leq R$ . Since $x_i^2 = x_i$ on the corners of the cube, one can extend these monomials to $x_{i_1}^{a_i} \ldots x_{i_k}^{a_k}$ for arbitrary $a_1,\ldots,a_k \geq 0$ . By adding large positive multiples of $x_i(1-x_i)$ to Q, we may then assume that Q is non-negative on the entire unit cube $[0,1]^n$ , not just on the corners.

One can then use the positivstellensatz to replace the non-negativity of Q with a sum-of-squares representation

$Q = \sum_i \sum_j x_i R_{i,j}(x)^2 + (1-x_i) S_{i,j}(x)^2$

and one is now trying to minimise (*) subject to the linear constraints that Q(0)=1 and that all $x_{i_1}^{a_i} \ldots x_{i_k}^{a_k}$ coefficients of Q with $p_{i_1} \ldots p_{i_k} > R$ vanish. In principle this is a Selberg-type optimisation problem – the Selberg case corresponds to the case when Q is a single square, $Q = R^2$ . Unfortunately the degrees of the $R_{i,j}, S_{i,j}$ can be large, as can the number of $R_{i,j}, Q_{i,j}$ involved (I think it can be something like $2^n$ ). But it suggests a way to go a little bit beyond the Selberg case, for instance replacing Selberg sieves

$(\sum_{d|n} \alpha_d)^2$

(where $d$ has to be restricted to be at most $\sqrt{R}$ ) with slightly more general sieves such as

$\sum_p 1_{p|n} (\sum_{d|n} \alpha_{d,p})^2 + (1-1_{p|n}) (\sum_{d|n} \beta_{d,p})^2$

for coefficients $\alpha_{d,p}, \beta_{d,p}$ which are arbitrary except for the linear constraint that the net coefficient of $1_{d|n}$ in the final sum vanishes for $d>R$ . But it’s not yet clear to me whether this leads to a substantial improvement in the Selberg sieve numerology (from the toy cases in previous comments, there are already a lot of cases when Selberg is basically sharp already).

12 December, 2013 at 9:53 pm

Terence Tao

Here is a slight modification of the above scheme, adapted for our current sieve problem. For sake of concreteness let us take k=3 (with the tuple (0,2,6)) and assume EH. Right now, we are working with sieves of the form

$\nu(n) = (\sum_{d_1|n; d_2|n+2; d_3|n+6} \lambda_{d_1,d_2,d_3})^2$

for some weights $\lambda_{d_1,d_2,d_3}$ which we are free to choose, as long as $d_1 d_2 d_3 \leq x^{1/2-\varepsilon}$ (actually we can stretch this a bit to the region $d_1 d_2, d_2 d_3, d_3 d_1 \leq x^{1/2-\varepsilon}$ , which corresponds to the enlarged region ${\cal R}_3$ , but let us ignore this enlargement for now). One possible generalisation of the above sieve is to

$\nu(n) = (1 - \sum_{p|n: p < x^{\sigma}} \frac{\log p}{\log x}) (\sum_{d_1|n; d_2|n+2; d_3|n+6} \lambda_{d_1,d_2,d_3})^2$
$+ (\sum_{p|n: p < x^{\sigma}} \frac{\log p}{\log x}) (\sum_{d_1|n; d_2|n+2; d_3|n+6} \lambda'_{d_1,d_2,d_3})^2$

where $0 < \sigma < 1/2$ is a parameter one can optimise in, and $\lambda'_{d_1,d_2,d_3}$ obeys the same support conditions as $\lambda_{d_1,d_2,d_3}$ , and also agrees with $\lambda_{d_1,d_2,d_3}$ for $d_1 d_2 d_3 \geq x^{1/2-\sigma-\varepsilon}$ . Note that the expression $1 - \sum_{p|n: o < x^\sigma} \frac{\log p}{\log x}$ is non-negative for $n \leq x$ , so $\nu$ remains non-negative, and note that when one expands out $\nu$ as a divisor sum, one only obtains terms $1_{d|n}$ with $d \leq x^{1-\varepsilon}$ (all larger values of $d$ end up cancelling each other out). So one can still use EH to control error terms here.

In the case when $\lambda=\lambda'$ , this new sieve collapses to the old Selberg sieve. But hopefully the extra flexibility here allows us to improve the quadratic optimisation problem. One can also play with other variants of the above scheme of course.

14 December, 2013 at 10:42 am

Terence Tao

I’ve been trying to see if this new sieve actually improves the values of $M_k, M'_k$ , etc. over the existing theoretical optimal values, but the results are frustratingly inconclusive: for model problems, no gain occurs, and for the original problem, the Selberg sieve is a stationary point of the optimisation problem, but it is still not clear whether it is a global extremum.

To focus the discussion, let us look at the problem of bounding the number of $n \leq x$ such that $n, n+2, n+6$ are all prime. Hardy-Littlewood predicts an asymptotic $(1+o(1)) {\mathfrak G} \frac{x}{\log^3 x}$ for this quantity, but the best upper bounds we can get are only within a large constant of this. For instance, by using a Selberg sieve

$\nu(n) = \nu_F(n) := (\sum_{d_1|n, d_2|n+2,d_3|n+6} \mu(d_1d_2d_3)$
$F(\frac{\log d_1}{\log x}, \ldots, \frac{\log d_3}{\log x})^2$

for $F$ smooth and supported on the interior of the simplex $\frac{1}{2} \cdot {\cal R}_3 = \{ (t_1,t_2,t_3) \in [0,1]^3: t_1+t_2+t_3 \leq \frac{1}{2} \}$ , we get an upper bound of $(C+o(1)) {\mathfrak G} \frac{x}{\log^3 x}$ where

$C := \frac{\int F_{123}(t_1,t_2,t_3)^2\ dt_1 dt_2 dt_3}{F(0,0,0)^2}.$

Writing $F(0,0,0)= \int F_{123}(t_1,t_2,t_3)\ dt_1 dt_2 dt_3$ and using Cauchy-Schwarz, we see that the best value of C is 48 (the reciprocal of the volume of the simplex $\frac{1}{2} \cdot {\cal R}_3$ ), attained when $F_{123}$ is the indicator function of the simplex (in practice one has to infinitesimally smooth this a bit). (One can do a bit better than this by using distribution theorems on the primes; using Bombieri-Vinogradov I think we can lower C to 32, and on Elliott-Halberstam I think we can get 8, but let me ignore these directions here for simplicity.)

Now let us try a modified sieve

$\nu'(n) = (1 - \sum_{p \leq x^{1/2}: p|n} \frac{\log p}{\log x}) \nu_F(n) + \sum_{p \leq x^{1/2}: p|n} \nu_{F_p}(n)$

where F is as before, and for each $p \leq x^{1/2}$ , $F_p$ has the same support properties as F with $F_p(t_1,t_2,t_3)=F(t_1,t_2,t_3)$ for $t_1+t_2+t_3 \geq \frac{1}{2} - \frac{\log p}{\log x}$ . This sieve is still non-negative for $n \leq x$ , and one can still calculate $\sum_{n \leq x} \nu'(n)$ : indeed the numerator of C is adjusted from $\int F_{123}(t_1,t_2,t_3)^2\ dt_1 dt_2 dt_3$ to

$\int F_{123}^2 + \frac{1}{\log x} \sum_{p \leq x^{1/2}} \frac{\log p}{p}$
$[(\Delta_{\log p/\log x}^1 (F_p)_{123})^2 - (\Delta_{\log p/\log x}^1 F_{123})^2]\ dt_1 dt_2 dt_3$

where $\Delta_u^1 F(t_1,t_2,t_3) := F(t_1,t_2,t_3) - F(t_1+u,t_2,t_3)$ . The summand vanishes when $x+y+z \geq \frac{1}{2}-\frac{\log p}{\log x}$ , and one can make $\Delta_{\log p/\log x}^1 (F_p)_{123}$ vanish for $x+y+z < \frac{1}{2} - \frac{\log p}{\log x}$ , and then by using Mertens' theorem, C gets reduced to

$\int F_{123}^2\ dt_1 dt_2 dt_3 - \int_{u+t_1+t_2+t_3 \leq 1/2} \Delta_u^1 F_{123}^2\ dt_1 dt_2 dt_3 du$ . (*)

At first glance it looks like this is a strict improvement, as we have subtracted a new non-negative term from C. However, this new term vanishes when F is the previous extremiser (when $F_{123}$ is the indicator of $\frac{1}{2} \cdot {\cal R}_3$ ); furthermore, using the identity

$\int_{u+t_1+t_2+t_3 \leq 1/2} \Delta_u^1 F_{123}^2\ dt_1 dt_2 dt_3 du$
$= (1/2-t_2-t_3) \int F_{123}^2\ dt_1 t_2 t_3 - \int F_{23}^2(0,t_2,t_3)\ dt_2 dt_3$

and some Cauchy-Schwarz, one can lower bound (*) by

$\int \frac{1}{1/2-t_2-t_3} F_{23}^2(0,t_2,t_3)\ dt_2 dt_3$

and by some further Cauchy-Schwarz, we get exactly the same extremum C=48 for C as before. So unfortunately in this model problem at least, the new sieve doesn't move the needle as far as bounds is concerned. However the analysis for the prime gaps problem is substantially more complicated, and I can't tell whether the extra flexibility here actually improves the global extremum; all I can say so far is that the previous extremiser remains a critical point.

7 May, 2014 at 5:29 am

Anonymous

It seems that this preprint is related to the problem.

8 December, 2013 at 9:49 pm

Terence Tao

Incidentally, the upper bound $M_k \leq \frac{k}{k-1} \log k$ suggests that $M_k > 8$ could occur for k as low as 2973, which is a significant improvement over the value of 43,134 we currently have. Given how tight the upper bound is for small k (and for k=5 it is ridiculously tight – I would like to understand how that is so better) this suggests that there is quite a bit of room to improve our m=2 analysis. One idea I had was to replace the Cauchy-Schwarz inequality with its defect version (the Lagrange identity) and then try to obtain upper bounds on the defect $\frac{k}{k-1}\log k - M_k$ , which may be more efficient than just crudely obtaining lower bounds on $M_k$ .

9 December, 2013 at 8:50 am

Eytan Paldi

This difference seems to be $o(1)$ (or even $O(1/k)$ ).

9 December, 2013 at 12:24 am

Anonymous

Typographical comment: If you wrap the comma, used as ‘1000-separator’ in a large number, in curly braces, the space after the comma disappears, as it should; “ $484,126$ ” –> “ $484{,}126$ ”.

[Corrected, thanks – T.]

9 December, 2013 at 4:41 am

Eytan Paldi

There is a typo in the upper bound for $M_5$ above (it should be $2.011797...$ ).

[Corrected, thanks! This helps explains why the M_5 bounds were suspiciously tight; they’re still quite good, but not unbelievably so any more… -T.]

9 December, 2013 at 5:00 am

Eytan Paldi

In order to give a precise generalized definition for $M_k(D)$ for a domain $D$ (not necessarily $\mathcal R_k$ ), it is not sufficiently clear how to define the inner integral in the expression for $J_k^{(m)}$ .

9 December, 2013 at 8:50 pm

Pace Nielsen

True, but this can be done for the domain $\mathcal{R}_k'$ defined in Maynard’s preprint. I wonder if there are some nice functions $F$ (which can take the place of our monomial symmetric polynomials) for which $\int_{\mathcal{R}_k'} F(t_1,\ldots, t_k)\ dt_1\ldots dt_k$ is easily computed.

9 December, 2013 at 9:14 pm

Terence Tao

Well, ${\cal R}'_k$ is the union (up to measure zero sets) of $k$ pieces, all of which are reflections of

$\displaystyle \{ (t_1,\ldots,t_k): t_k,\ldots,t_2 \geq t_1 \geq 0; t_2 + \ldots + t_k \leq 1 \}$

or equivalently (after setting $t_1 := \frac{s_1}{k-1}$ and $t_i := \frac{s_1}{k-1} + s_i$ for $i=2,\ldots,k$ )

$\displaystyle \{ (\frac{s_1}{k-1},\frac{s_1}{k-1}+s_2,\ldots,\frac{s_1}{k-1}+s_k): (s_1,\ldots,s_k) \in {\cal R}_k \}$

and so for symmetric F we have

$\displaystyle \int_{{\cal R}'_k} F = \frac{k}{k-1} \int_{{\cal R}_k} F( \frac{s_1}{k-1}, \frac{s_1}{k-1}+s_2,\ldots,\frac{s_1}{k-1}+s_k)\ ds_1 \ldots ds_k$ .

In principle, this means that we may integrate on ${\cal R}'_k$ using the existing integration formulae on ${\cal R}_k$ (symmetrising the integrand on the RHS if desired). For instance

$\displaystyle \int_{{\cal R}'_k} 1 = \frac{k}{k-1} \frac{1}{k!}$

and

$\displaystyle \int_{{\cal R}'_k} P_1 = \frac{k^2-k+1}{(k-1)^2} \frac{k}{(k+1)!}$

(I think). The formulae get a bit more complicated beyond this, but look computable.

[EDIT: On the other hand, the functionals $J^{(m)}_k(F)$ are a little bit tricky to compute here; will have to think about this a bit more.]

10 December, 2013 at 6:57 am

Pace Nielsen

For the functionals $J_{k}^{(m)}(F)$ , one may want to break up the simplex in a different way. Let’s focus on $J_{k}^{(k)}$ for a moment (the others being similar). One option (following something James sent me) is to break $\mathcal{R}_{k}'$ into the $k-1$ sets

$\mathcal{R}_k'(j) =\{(t_1,\ldots, t_k)\ :\ t_1,\ldots, t_{k-1}\geq t_j,$

$0\leq t_k\leq 1-\sum_{i=1}^{k-1}t_i +t_j \}$

for $1\leq j\leq k-1$ . I think one can then follow the analysis you gave above. It seems that the formulas are a bit complicated; there may be a cleaner method here.

10 December, 2013 at 7:13 am

Terence Tao

Actually, I realised just now that we can use symmetry to write

$J_k^{(k)}(F) = 2 \int_{{\cal R}'_k} F V_k F$

where $V_k$ is the Volterra operator in the k^th direction:

$V_k F(t_1,\ldots,t_k) := \int_0^{t_k} F(t_1,\ldots,t_{k-1},s_k)\ ds_k.$

In particular if F is a polynomial then $V_k F$ is a polynomial of one higher degree (without having to involve tropical expressions such as $\min(t_1,\ldots,t_k)$ ). So any formula that allows one to integrate polynomials on ${\cal R}'_k$ will also be able to calculate $J_k^{(k)}(F)$ for polynomials F, and similarly for the other $J_k^{(m)}(F)$ .

9 December, 2013 at 10:40 am

Smetalka

Possible inconsistency: on the wiki page the lower bound for k=59 (3.96508) differs from the one you mentioned in your original post (3.898)

[Corrected; the 3.898 referred to a previous lower bound that has since been superseded. -T.]

9 December, 2013 at 11:49 am

Eytan Paldi

There is still a typo (also in the wiki page): The value (as given by Pace) is $3.95608...$

[Corrected, thanks – T.]

10 December, 2013 at 11:49 am

Eytan Paldi

Another possibility to approximate $M_k$ is by trying to express the coefficients of $\mathcal L ^j 1; j = 1, 2, ...$ and than to see if by the power method $|| \mathcal L ||$ may be estimated.

10 December, 2013 at 12:55 pm

Eytan Paldi

Three lines above lemma 4.9, I suggest to replace “the first non-zero derivative is positive” by $f(0) > 0$ .

[Corrected, thanks – although these sorts of comments should go in the Polymath8a thread. -T.]

10 December, 2013 at 1:28 pm

Fan

The link to this newest thread is not on the polymath wiki.

[Corrected, thanks – T.]

10 December, 2013 at 1:50 pm

Terence Tao

I was looking again at the Selberg sieve to see if there was any other way to move beyond our current range. It occurred to me that we could exploit more fully the difficulty gap between the task of computing the sum $\sum_n \nu(n)$ (which is really easy, since we understand how integers distribute in arithmetic progressions) with the task of computing the sum $\sum_n \nu(n) 1_{n+h_i \hbox{ prime}}$ (which is difficult, since we need to understand how primes distribute in arithmetic progression).

Let us use the multidimensional GPY sieve as discussed in this blog post (rather than the elementary Selberg sieve from Maynard’s paper). Assuming all error terms are under control, and the underlying cutoff f is symmetric we obtain $DHL[k,m+1]$ (assuming $EH[\theta]$ ) whenever the ratio

$\displaystyle \frac{ k \int f_{1,\ldots,k-1}(t_1,\ldots,t_{k-1},0)^2\ dt_1 \ldots dt_{k-1}}{\int f_{1,\ldots,k}(t_1,\ldots,t_k)^2\ dt_1 \ldots dt_k}$

exceeds $\frac{2m}{\theta}$ .

Now what constraints are there on the cutoff function f, other than it be smooth and compactly supported? Right now, we are requiring that f is supported on the simplex ${\cal R}_k$ , or on the enlarged simplex ${\cal R}'_k$ . But actually, if one looks at the proof, in order to control the numerator it is only necessary that the slice $(t_1,\ldots,t_{k-1}) \to f(t_1,\ldots,t_{k-1},0)$ be supported on the simplex ${\cal R}_{k-1}$ , while to control the denominator, f can be supported in the larger simplex $\frac{1}{\theta} \cdot {\cal R}_k$ to reflect the fact that the integers trivially obey the full Elliott-Halberstam conjecture. (Here let us work in the “unconditional” setting in which $\theta$ is equal to or only slightly larger than 1/2.) It may be that this gives some extra flexibility to the variational problem. In terms of the differentiated cutoff function $F = f_{1,\ldots,k}$ that appears in Maynard’s paper, the constraint is then essentially that F is allowed to be supported in the larger simplex $\frac{1}{\theta} \cdot {\cal R}_k$ , so long as the marginal distribution

$(t_1,\ldots,t_{k-1}) \mapsto \int F(t_1,\ldots,t_k)\ dt_k$

is still supported on the simplex ${\cal R}_{k-1}$ . For non-negative F, this restricts F to the region ${\cal R}'_k$ , but there is some possibility (perhaps unlikely) that one gains something numerically by considering some oscillatory F that pokes a bit outside of ${\cal R}'_k$ . I’ll think about this a bit more…

10 December, 2013 at 2:26 pm

Terence Tao

Indeed, it appears that this relaxation of constraints does allow one to increase the ratio one is trying to maximise. Among other things, the optimum in the new variational problem is only achieved if the function $F$ is a sum of functions that depend only on k-1 of the k variables, thus (by symmetry)

$F(t_1,\ldots,t_k) = \tilde F(t_1,\ldots,t_{k-1}) + \tilde F(t_1,\ldots,t_{k-2},t_k)$

$+ \ldots + \tilde F(t_2,\ldots,t_k)$ .

Whereas in the previous optimisation problem this decomposition held only inside ${\cal R}_k$ or ${\cal R}'_k$ , in the new problem it must hold throughout $\frac{1}{\theta} {\cal R}_k$ ; equivalently, the alternating sum of $F(t_1,\ldots,t_k)$ on the vertices of any box in $\frac{1}{\theta} {\cal R}_k$ with sides parallel to the axes needs to vanish. This is an additional constraint that is unlikely to have been satisfied by the original extremiser, suggesting that an improvement in the analogue for $M_k$ in this setting is available.

I’m not yet sure though how to convert this to an effectively computable optimisation problem; one needs a good basis for the space of symmetric functions of k variables on $\frac{1}{\theta} \cdot {\cal R}_k$ whose marginal distribution on k-1 variables is supported on the simplex ${\cal R}_{k-1}$ , and I can’t see an obvious candidate for such a basis offhand…

10 December, 2013 at 3:17 pm

Pace Nielsen

Can you clarify what it means for the marginal distribution on $k-1$ variables to be supported on the simplex $\mathcal{R}_{k-1}$ , perhaps with a simple example of such function? [Specifically, I thought we could choose our function to be piece-wise smooth, so doesn’t that let essentially let us choose the support? I’m probably missing something simple here.]

10 December, 2013 at 4:25 pm

Terence Tao

Saying that a symmetric function $F(t_1,\ldots,t_k)$ has k-1-marginals supported on ${\cal R}_{k-1}$ means that the k-1-dimensional function $(t_1,\ldots,t_{k-1}) \mapsto \int_0^\infty F(t_1,\ldots,t_{k-1},t_k)\ dt_k$ vanishes whenever $t_1+\ldots+t_{k-1} > 1$ . An example would be the tensor product $F(t_1,\ldots,t_k) = h(t_1) \ldots h(t_k)$ for any function $h: [0,+\infty) \to {\bf R}$ of mean zero; this symmetric oscillating function in fact has all marginals vanishing everywhere.

For sake of notation let us take k=3 and $\theta=1/2$ (in practice of course we need to take k to be more like 59 with this value of $\theta$ , but I don’t want to write so many variables); then the new variational problem (replacing $M_3$ ) is that of maximising the ratio

$\displaystyle 3 \frac{\int_0^\infty \int_0^\infty (\int_0^\infty F(x,y,z)\ dz)^2\ dx dy}{\int_0^\infty \int_0^\infty \int_0^\infty F(x,y,z)^2\ dx dy dz}$

for piecewise smooth symmetric $F: [0,+\infty)^3 \to {\bf R}$ supported in the simplex $\{ x+y+z \leq 2\}$ , and such that $\int_0^\infty F(x,y,z)\ dz$ vanishes whenever $x+y \geq 1$ . The latter condition is automatic if F is supported on ${\cal R}'_3 = \{ (x,y,z): x+y,y+z,z+x \leq 1 \}$ , but also allows for some additional oscillatory functions F.

10 December, 2013 at 3:26 pm

Eytan Paldi

For $k=2$ it seems that such oscillatory $F$ (having the above symmetric form) is not possible.

10 December, 2013 at 4:30 pm

Terence Tao

If $\theta = 1/2$ , then one can find oscillating symmetric functions $F(x,y)$ supported on the triangle $\{x+y \leq 2\}$ whose marginals are supported on $[0,1]$ , e.g. $F(x,y) = g(x) h(y) + g(y) h(x)$ where $g, h$ are supported on $[0,1/2], [1,3/2]$ respectively with g having mean zero. (Actually, the $k=2, \theta = 1/2$ case looks like a good one to analyse even if it won’t give us any nontrivial value of m, as this is one of the few places where we can hope for an exact solution to the optimisation problem.)

10 December, 2013 at 5:16 pm

Eytan Paldi

I meant that it can’t be done if $F$ is of the form $f(x)+f(y)$ (symmetric sum of two functions of one variable) – which (as previously remarked) is the only possibility for an optimum.

10 December, 2013 at 7:38 pm

Terence Tao

To be precise, F will be of the form $F(x,y) = (f(x)+f(y)) 1_{x+y \leq 2}$ . It is possible for functions of this form to have vanishing marginals outside of [0,1], although if F is an extremiser then by a Lagrange multiplier argument one can show that ${\cal L} F(x,y)$ is a function of y only when $0 \leq x \leq 1 \leq y \leq 2-x$ , and similarly when x and y are reversed, which shows after a little computation that $F$ is a scalar multiple of $1_{[0,1] \times [0,1]}(x,y)$ . This shows that in the k=2 case, we don’t gain anything over the bound $M'_2 = 2$ that we already have from the $1_{[0,1] \times [0,1]}$ example.

For k=3, though, I believe a similar argument shows that the extremiser is not supported on ${\cal R}'_3$ , suggesting that we can improve upon $M'_3$ with this expanded variational problem.

11 December, 2013 at 8:55 am

Terence Tao

Just to expand upon the final paragraph: Lagrange multipliers tell us that a symmetric extremiser F to the extended variational problem (in which F is supported on $\frac{1}{\theta} \cdot {\cal R}_k$ but has marginals supported on ${\cal R}_{k-1}$ ) solves the modified eigenfunction equation ${\cal L} F = M''_k F + \sum_{i=0}^k f( t_1,\ldots,t_{i-1},t_i,\ldots,t_k )$ on $\frac{1}{\theta} \cdot {\cal R}_k$ , where ${\cal L}$ as the usual operator (but now defined on $\frac{1}{\theta} \cdot {\cal R}_k$ ), $M''_k$ is the optimal ratio (larger than or equal to the optimal ratio $M'_k$ on ${\cal R}'_k$ , which is in turn larger than or equal to $M_k$ ), and $f$ is symmetric supported on $\frac{1}{\theta} \cdot {\cal R}_{k-1}$ but vanishes on ${\cal R}_{k-1}$ .

Note also from the vanishing marginal condition that ${\cal L} F(t_1,\ldots,t_k) = \sum_{i=0}^k g(t_1,\ldots,t_{i-1},t_i,\ldots,t_k)$ where $g$ is symmetric and supported on ${\cal R}_{k-1}$ .

In the k=3 case, we have

$g(x,y)+g(y,z)+g(z,x) = M''_2 F(x,y,z) + f(x,y) + f(y,z) + f(z,x).$

Suppose that F is supported on ${\cal R}'_3$ and $\theta=1/2$ . Then in the regime $x+y,y+z \geq 1 \geq x+z$ , we then have

$g(z,x) = f(x,y) + f(y,z)$

which (setting y=1) shows that $g(z,x) = h(z)+h(x)$ for $z+x \leq 1$ and some function $h$ . In the regime $x+z \geq 1 \geq x+y,y+z$ , we have

$g(x,y) + g(y,z) = f(x,z);$

the left-hand side is $h(x)+h(z)+2h(y)$ while the RHS is independent of y, which forces h to be constant, which then forces f to be constant, and then F is a scalar multiple of the indicator function on ${\cal R}'_3$ , which is not the extremiser. I think a similar argument works for all $k \geq 3$ (for $k=2$ , though, the indicator on ${\cal R}'_k$ happens to be the extremiser).

In the Elliott-Halberstam case $\theta=1$ , the restriction of F to $\frac{1}{\theta} \cdot {\cal R}_k$ collapses the problem back to the M_k problem, but I think one can replace $\frac{1}{\theta} \cdot {\cal R}_k$ by the larger region

$\frac{1}{\theta} \cdot {\cal R}_k$
$\cup \{ (t_1,\ldots,t_k): t_1+\ldots+t_{i-1}+t_{i+1}+\ldots+t_k \leq 1 \hbox{ for some } i \}$

(by using the generalised EH to control the distribution of the $d_i$ summations) which is still larger than ${\cal R}'_k$ . There is now hope of getting $M''_4$ greater than 2 (it seems that $M'_4$ is about 1.92, we only need a little bit of additional boost to get over the top).

10 December, 2013 at 5:31 pm

Eytan Paldi

BTW, perhaps Kolmogorov’s superposition theorem may be used here?

11 December, 2013 at 11:08 am

Pace Nielsen

Just to clarify, is that new range (in the $\theta=1$ case) equal to

$\mathcal{R}_k''=\{(t_1,\ldots, t_k)\ :\ \sum_{i=1}^{k}t_i-\max(t_1,\ldots,t_k)\leq 1 \}$ ?

If so, I can do the computation to see if $k=4$ now works.

11 December, 2013 at 11:47 am

Terence Tao

A bit more precisely, for $\theta=1$ the region is

${\cal R}''_k = \{ (t_1,\ldots,t_k) \in [0,1]^k: \sum_{i=1}^k t_i - \max(t_1,\ldots,t_k) \leq 1 \}$

(in particular I believe this region is now non-convex for k>2), and one has to optimise $\sum_{m=1}^k J_k^{(m)}(F)/I(F)$ for $F$ symmetric and supported in ${\cal R}''_k$ , subject to the constraint that the marginals $(t_1,\ldots,t_{k-1}) \mapsto \int_0^1 F(t_1,\ldots,t_k)\ dt_k$ are supported in ${\cal R}_{k-1}$ . (In the general theta case, one has to restrict $(t_1,\ldots,t_k)$ to $[0,1/\theta]^k$ otherwise some of the divisors $d_i$ will exceed x.)

I’m not sure how to implement the vanishing marginal constraint computationally. One crude way would be to add a large penalty function based on the L^2 mass of the marginals outside of the simplex ${\cal R}_{k-1}$ , but this is likely to significantly degrade convergence of the algorithm.

11 December, 2013 at 11:15 am

Aubrey de Grey

Is it yet possible to derive the minimal k for which M’_k passes 4? (The mention of a value for M’_4 arouses irresistible curiosity that it might be!)

11 December, 2013 at 11:39 am

Aubrey de Grey

Ah, I apologise – I now realise that M’_4 = 1.92 is simply twice the value of 0.96 that James provided two weeks ago (https://terrytao.wordpress.com/2013/11/22/polymath8b-ii-optimising-the-variational-problem-and-the-sieve/#comment-253116), using R’ but before the “M'” notation had been introduced. (Notation is evolving quickly at present!) But it would still be of interest to learn how feasible/close a calculation of M’_k for k in the 50 range is felt to be.

12 December, 2013 at 9:58 pm

Terence Tao

I need to correct my previous comment here; actually, I don’t think we can take the support of F in the $\theta=1$ case (taking $k=3$ for sake of concreteness) to be as large as

$\{ (x,y,z): \min( x+y, y+z, z+x ) \leq 1 \}$

as I had previously claimed.

The problem comes from lack of convexity; we have to deal with the square $(\sum_{d_1|n,d_2|n+2,d_3|n+6}\lambda_{d_1,d_2,d_3})^2$ of the divisor sum, and if one factor in the square comes from a tuple $(x,y,z)$ with (say) $x+y \leq 1$ , and if another comes from a tuple with $y+z \leq 1$ , then we don’t get any good control on the combined factor, and in particular cannot use EH in any of the three resulting moduli. The best I can see right now is to break the symmetry and work with a region such as

$\{ (x,y,z): x+y \leq 1 \}.$

While this in principle is still an improvement over the previous situation in which we restricted ourselves to the region ${\cal R}'_3 = \{ (x,y,z): x+y,y+z,z+x \leq 1\}$ , the loss of symmetry is going to make the numerical optimisation messier (especially since we still have the condition on the (y,z) and (z,x) marginals being restricted to $y+z \leq 1$ and $z+x \leq 1$ respectively).

But in the $\theta < 1$ case, we can still work with the simplex $\frac{1}{\theta} \cdot {\cal R}_k$ as before, because it is convex and so this issue does not occur.

11 December, 2013 at 2:43 am

Eytan Paldi

Such a basis can’t be real analytic (in particular, polynomial) on $(1/ \theta) \mathcal R_k$ (otherwise, the marginals are also real analytic on $(1/ \theta) \mathcal R_{k-1}$ – and can’t be supported on $\mathcal R_{k-1}$ without vanishing identically!)

11 December, 2013 at 2:26 am

AndrewVSutherland

This paper of Friedlander and Iwaniec posted to arXiv this morning may be of interest (and should probably be added to the bibliography)..

[Reference added, thanks – T.]

11 December, 2013 at 8:38 am

Terence Tao

Thanks for the link! From a quick read through, the paper has a few innovations that aren’t in Polymath8a; for instance, in the MPZ estimates they are able to let the residue class $a\ (q)$ vary with $q$ (in our notation, it would be $a_q\ (q)$ ). By doing so, they give up some of the more advanced estimates of Polymath8a that come from averaging in modulus parameters such as q and r, but enough of the original arguments of Zhang (including Zhang’s original Type III argument, based on the Birch-Bombieri bound from Deligne’s theorem) can be adapted to still obtain a nontrivial estimate of this type.

There is also a passing remark that the Bessel function optimisation was actually performed first by Brian Conrey in an unpublished 2005 computation, thus predating the work of Farkas, Pintz, and Revesz; I’m guessing this was at an AIM workshop. I suppose we should note this also in Polymath8a, I’ll see if I can get some confirmation of this remark. [UPDATE: Brian has confirmed the calculation and showed me some of his emails and handwritten notes on the topic. I’ll put in a remark in the paper to this effect. -T.]

11 December, 2013 at 11:09 am

Emmanuel Kowalski

I had also heard that Brian had done this optimization around the time of the first GPY paper, but I didn’t get around to asking him for details. I actually thought that Soundararajan had a remark to this effect in his Bulletin of the AMS survey, but I couldn’t find it there, so that was probably a mis-remembrace. I agree that it is good to add a mention of this in the paper.

11 December, 2013 at 2:44 pm

Terence Tao

I’ve added a remark to this effect in Section 4.3. I also found a reference to Conrey’s computation in p. 129 of “Opera del Cribro”, and now that I look again at the Farkas-Pintz-Revesz paper, this is also mentioned in the introduction; I remember looking at this paper before, so I’m surprised I didn’t catch this previously.

On looking through “Opera del Cribro” I also found a conjecture on page 408 which is close to what we have been calling the Motohashi-Pintz-Zhang estimate (but the authors do cite Motohashi-Pintz for further details). This should probably also be mentioned in our paper.

14 December, 2013 at 5:44 am

Donal Lyons

“Opera de Cribro” :-)

11 December, 2013 at 11:08 am

Eytan Paldi

It would be helpful to update the wiki page about the variational problem by adding details on the associated problem for $\mathcal R'_k$ (and perhaps for larger domains as discussed above.)

[Added some text for this – T.]

12 December, 2013 at 8:22 am

Eytan Paldi

On the possibility of a “piecewise polynomial” basis for the variational problem:

The idea is to represent $\mathcal R''_{k, \theta}$ as a disjoint (up to boundaries) union of $\mathcal R_k$ and finitely many convex polytopes $D_j$ such that each basis function is a polynomial over the interior of $\mathcal R_k$ and each $D_j$ . Note that (because of $D_j$ convexity), each marginal is a sum of integrals over intervals, with at most one interval for each $D_j$ .

It follows that each marginal of each monomial is “piecewise polynomial” outside $R_{k-1}$ (over similar partition $E_l$ of polytopes outside $\mathcal R_{k-1}$ .)

We need the condition: For each FIXED $m$ , the marginal (as an integral with respect to $t_m$ ) of each monomial is a polynomial over each $E_l$ above (where the polytopes $E_l$ are “monomial independent” but may depend on $m$ ).

It seems that this condition is satisfied (for sufficiently refined partition into polytopes $D_j$ .)

Under the above condition, for each basis function, the vanishing of each marginal outside $\mathcal R_{k-1}$ is described by a system of linear constraints on the coefficients of the basis function.

Therefore (for appropriate partition $D_j$ ), we return (for each maximal degree $d$ ) to a similar optimization problem (with the additional system of linear constraints on the coefficients describing the “piecewise polynomial” $F$ ).

14 December, 2013 at 11:13 am

Terence Tao

I like this idea. It may well be that by applying this with k=4 on EH, we may finally be able to get $M''_4$ above 2 and thus get $H_1 \leq 8$ on EH.

But it may make sense to warm up on k=3 first, which is already somewhat complicated combinatorially (and k=4 is much worse). We have, in increasing order, the quantities $M_3,M'_3,M''_3$ , where $M_3$ is the maximal value of

$\frac{J_3^{(1)}(F) + J_3^{(2)}(F) + J_3^{(3)}(F)}{I_3(F)}$ (*)

for F supported on the simplex ${\cal R}_3 = \{ (x,y,z): x+y+z \leq 1 \}$ , with

$I_3(F) = \int\int\int F(x,y,z)^2\ dx dy dz$
$J_3^{(1)}(F) = \int\int (\int F(x,y,z)\ dx)^2\ dy dz$

and similarly for cyclic permutations, $M'_3$ is the maximum value of the same ratio but with F supported on the larger region ${\cal R}'_3: \{ (x,y,z): x+y,y+z,z+x \leq 1\}$ , and $M''_3$ is the maximum value of the same ratio, but F now supported on the prism ${\cal R}''_3 = \{ (x,y,z): x+y, z \leq 1 \}$ and with marginal distributions $\int F(x,y,z)\ dx$ and $\int F(x,y,z)\ dy$ supported on $y+z \leq 1$ and $x+z \leq 1$ respectively. (The support of the third marginal $\int F(x,y,z)\ dz$ is already supported on the correct simplex.) Note that while the first two problems are completely symmetric in x,y,z, this third optimisation problem is only symmetric in x,y, so we have to work with functions F that are symmetric in x,y only.

I’m not sure if the numerically computed values of M_3 or M’_3 have already been posted here but presumably they can be calculated easily using existing algorithms, to serve as benchmarks for the potential gain M”_3 offers over M’_3.

As Eytan suggests, one can split the prism ${\cal R}''_3$ into convex polytopes on which the marginal conditions become polynomial. It seems that the following partition into seven regions works:

$D_0 := \{ x+y, y+z, z+x \leq 1 \}$

$D_1 := \{ x+y \leq y+z \leq 1 \leq z+x \}$

$D_2 := \{ y+z \leq x+y \leq 1 \leq z+x \}$

$D'_1 := \{ x+y \leq z+x \leq 1 \leq y+z \}$

$D'_2 := \{ z+x \leq x+y \leq 1 \leq y+z \}$

$D_3 := \{ x+y \leq 1 \leq z+x \leq y+z; z \leq 1 \}$

$D'_3 := \{ x+y \leq 1 \leq y+z \leq x+z; z \leq 1 \}$

We take $F$ to be piecewise polynomial on each of these seven pieces, e.g. $F = F_0$ on $D_0$ , $F = F'_2$ on $D'_2$ , etc. By symmetry we may assume that $F'_i(x,y,z) = F_i(y,x,z)$ for $i=0,1,2,3$ (with $F'_0=F_0$ ), so we have four polynomials $F_0,F_1,F_2,F_3$ to independently optimise over. The M’_3 problem corresponds to the case $F_1=F_2=F_3=0$ .

By symmetry, the only marginal condition is that $\int F(x,y,z)\ dy = 0$ when $x+z \geq 1$ . When $z \leq x$ , this condition is just

$\int_0^{1-x} F_2\ dy = 0$

but when $z \geq x \geq 1/2$ , the condition becomes

$\int_0^{1-z} F_1\ dy + \int_{1-z}^{1-x} F'_3\ dy= 0$

and when $x \leq z, 1/2$ , the condition becomes

$\int_0^{1-z} F_1\ dy + \int_{1-z}^x F'_3\ dy + \int_x^{1-x} F_3\ dy = 0$ .

In principle, these are finite-dimensional linear constraints on $F_1,F_2,F_3$ that allow for a non-trivial solution when the degrees D of $F_1,F_2,F_3$ are large enough (one has about $O(D^3)$ degrees of freedom and $O(D^2)$ constraints). The quadratic program is moderately messy though.

14 December, 2013 at 11:22 am

Pace Nielsen

In your definition of $I_3, J_3$ did you get the correct powers on $F$ ? I thought in $I_3$ we only used the square, and in $J_3$ only the square on the outside of the first integral.

[Oops, that was weird; fixed now. -T.]

14 December, 2013 at 4:16 pm

Eytan Paldi

Some remarks:

1. To find the coefficients of the linear constraints and quadratic forms for the optimization, it is perhaps easier to use (iterated) symbolic integration.

2. If the linear constraints are represented by $C a = 0$ where $C$ is the constraint matrix and $a$ is a vector representing the coefficients of $F$ , then we have $a = D b$ for some matrix $D$ and vector $b$ with $dim b < dim a$, so we return to the original optimization problem (without linear constraints) for $b$ .

14 December, 2013 at 7:36 pm

Aubrey de Grey

Is there any prospect of adapting Terry’s proof from three weeks ago of a sharp upper bound on M_k (https://terrytao.wordpress.com/2013/11/22/polymath8b-ii-optimising-the-variational-problem-and-the-sieve/#comment-253731) to give a corresponding upper bound on M’_k or M”_k ? I don’t think there has been any discussion of that (other than James’s computed upper bound for M’_4), and I’m thinking that in view of the daunting combinatorial obstacle which may now be emerging in relation to extending Eytan’s idea to k > 3 (and especially to k for which M”_k could get anywhere near 4), there may be a case for attempting to determine in advance how much progress beyond M_k could be achieved for a given k in the best-case scenario.

15 December, 2013 at 8:39 am

Terence Tao

For M’_k, the best bounds I know of in general are

$M_k \leq M'_k \leq \frac{k}{k-1} M_{k-1}$

(improving slightly over the previously observed upper bound $M'_k \leq \frac{k}{k-1} M_k$ ). The lower bound is trivial. For the upper bound, note for any F supported on the polytope ${\cal R}'_k$ that one has

$J_k^{(1)}(F) + \ldots + J_k^{(k-1)}(F) \leq M_{k-1} I_k(F)$

by looking at each fixed- $t_k$ slice of F (which is a k-1-dimensional function supported on ${\cal R}_{k-1}$ ) and applying the definition of $M_{k-1}$ , and then integrating in $t_k$ . Averaging over permutations we then obtain

$J_k^{(1)}(F) + \ldots + J_k^{(k)}(F) \leq \frac{k}{k-1} M_{k-1} I_k(F)$

giving the claim.

Note that we do not necessarily have monotonicity in $M'_k$ (for instance, $M'_2 = 2$ ). So it may well be that M’_3 exceeds 2 even if M’_4 does not. The above bound, combined with the existing bound $M_k \leq \frac{k}{k-1} \log k$ , gives an upper bound of 2.0794 for $M'_3$ , so there is hope, though admittedly a slim one since the upper bound was somewhat crude.

For $M''_k = M''_{k,\theta}$ , the best bounds I can obtain easily are

$M'_k \leq M''_{k,\theta} \leq M_{k-1} + \frac{1}{\theta}$ .

The lower bound is trivial, and the upper bound comes from observing that the function F for $M''_{k,\theta}$ is supported in the prism ${\cal R}_{k-1} \times [0,1/\theta]$ , and ignoring all the vanishing marginal conditions on F. From the slicing argument we again have

$J_k^{(1)}(F) + \ldots + J_k^{(k-1)}(F) \leq M_{k-1} I_k(F)$

and from Cauchy-Schwarz one has

$J_k^{(k)}(F) \leq \frac{1}{\theta} I_k(F)$

giving the claim. This is a very poor bound for small k, but it shows asymptotically that we do not break the logarithmic barrier $M_k \leq \log k + O(1)$ through any of these tricks, unfortunately.

15 December, 2013 at 9:11 am

Aubrey de Grey

Thank you Terry! The absence of an (easily-obtainable, anyway) upper bound on $M''_k$ that is even close to that for $M'_k$ seems most encouraging. Out of interest, is $M''_k$ also not necessarily monotonic (for a given $\theta$ )?

16 December, 2013 at 11:23 pm

James Maynard

I had a look at this this afternoon.

Provisionally, my calculations are getting the bound
$M_3''\geq 1.914.$
This is from taking $F_0,F_1,F_2,F_3$ to be degree 4 polynomials in $x,y,z$ and following the approach outlined above.

This should be compared with the bounds
$M_3'\geq 1.842$
$M_3\geq 1.646.$
I’ll give a couple details tomorrow.

Frustratingly, if the above calculations hold up then it looks like we might come just short of improving our current bound $M_4' \geq 1.937$ to something above 2.

17 December, 2013 at 9:27 am

Terence Tao

Thanks for the computations! At least we now do know (at least numerically) that $M''_k$ can be strictly greater than $M'_k$ , so the idea of going outside ${\cal R}'_k$ does have some potential. I’ve started putting up the data on the wiki at http://michaelnielsen.org/polymath1/index.php?title=Selberg_sieve_variational_problem#World_records

I’m still missing a few cells (notably the lower bounds for M_2 and M’_5 – presumably you have these available or easily computable from your code?) but it already shows that the crude upper bounds $M'_k \leq \frac{k}{k-1} M_{k-1}$ and $M''_k \leq M_{k-1}+1$ are nowhere near as sharp as the bound $M_k \leq \frac{k}{k-1} \log k$ , although this is not terribly surprising.

It would be somewhat amusing (though suspenseful) if $M''_4 \geq 2$ was just at the edge of provability, so that a massive computer effort would be required to get high enough degree to verify it. If that doesn’t work, another option is to turn to $M''_5$ and shave off $P_{0,12}$ as discussed a couple weeks ago, to get $H_1$ to 10 rather than to 8.

One further option is the following (let’s take k=3 for sake of discussion). Right now, on EH, the function F is assumed to lie in the prism

$\{ (x,y,z): x+y, z \leq 1 \}$

with appropriate vanishing marginal conditions. This is not quite the only choice available: in principle, we can have F supported on any set $R \subset [0,1]^3$ with the property that the sumset $R+R = \{ v+w: v,w \in R \}$ lies in the non-convex region

$\{ (x,y,z): x+y, z \leq 2 \} \cup \{ (x,y,z): y+z, x \leq 2\}$
$\cup \{ (x,y,z): z+x, y \leq 2 \}$

as this represents all the ranges of moduli $d_1,d_2,d_3$ occurring in the Selberg sieve (which is the square of a divisor sum over a region associated to R) for which we can use Elliott-Halberstam in one of the three moduli (to make this work we have to first approximate F to be a finite linear combination of tensor products of one-dimensional functions, but this should be easily achievable from a Stone-Weierstrass argument, though there is a minor technicality that one has to make sure that the vanishing boundary conditions are still obeyed). Anyway, it is conceivable that there is some other choice of R which is superior to the prism $\{ (x,y,z): x+y, z \leq 1 \}$ , although I don’t obviously see one (my 3D geometry visualisation is not that great!).

17 December, 2013 at 10:17 am

James Maynard

I get that $M_2\geq 1.385$ and $M_5'\geq 2.059$ . (Actually for $M_2$ we can solve the Eigenfunction equation, and the answer is the solution $x$ of the equation
$2-\frac{1}{x}+\log\Bigl(1-\frac{1}{x}\Bigr)=0$ )

17 December, 2013 at 11:25 am

Terence Tao

Thanks for this! Just for the record, I’m writing down the calculations for the eigenfunction equation ${\cal L}_2 F = M_2 F$ on the triangle ${\cal R}_2 = \{ (x,y): x+y \leq 1\}$ . Assuming symmetry and writing ${\cal L}_2 F = f(x) + f(y)$ , where $f(x) = \int_0^{1-x} F(x,y)\ dy$ , we see that

$M_2 f(x) = (1-x) f(x) + \int_0^{1-x} f(y)\ dy$

which in particular implies that f(1)=0. Also, differentiating we see that

$M_2 f'(x) = (1-x) f'(x) - f(x) - f(1-x)$

and thus

$f'(x) = -\frac{G(x)}{M_2-1+x}$ (1)

where $G(x) := f(x)+f(1-x)$ ; reflecting, we obtain

$f'(1-x) = -\frac{G(x)}{M_2-x}$

and hence

$G'(x) = -\frac{G(x)}{M_2-1+x} + \frac{G(x)}{M_2-x}$

which can be solved as

$G(x) = C( \frac{1}{M_2-x} + \frac{1}{M_2-1+x}).$

We may normalise C=1. Substituting back into (1), we conclude after some calculation that

$f(x) = \frac{1}{M_2-1+x} + \frac{1}{2M_2-1} \log \frac{M_2-x}{M_2-1+x} + C'$

but then since $G(x)=f(x)+f(1-x)$ we have C’=0. Using the boundary condition f(1)=0 we then have

$\frac{1}{M_2} + \frac{1}{2M_2-1} \log \frac{M_2-1}{M_2} = 0$

which rearranges to $2 - \frac{1}{M_2} + \log( 1 - \frac{1}{M_2} ) = 0$ as required. Up to scalar multiples, the eigenfunction is then

$F(x,y) = \frac{1}{M_2-1+x} + \frac{1}{M_2-1+y} + \frac{1}{2M_2-1} \log \frac{M_2-x}{M_2-1+x} \frac{M_2-y}{M_2-1+y}.$

Wolfram alpha tells me (and one can easily confirm) that $M_2 = 1 / (1- W(1/e)) = 1.38593...$ where $W$ is the Lambert W function.

17 December, 2013 at 12:17 pm

James Maynard

I thought I’d record for completeness the solution to the Eigenfunction equation (even though it seems we can’t solve it when $k>2$ and for $k=2$ we have simpler and stronger results from using the enlarged region). The comment above was mine – I hadn’t intended it to be anonymised. [Name restored to previous comment – T.]

We wish to solve the eigenfunction equation (for smooth $F$ supported on $\mathcal{R}_2$ )
$\lambda F(x,y)=\mathcal{L}_2 F=\int_0^{1-x}F(x,t)dt+\int_0^{1-y}F(t,y)dy.$
We see that from this equation (and by the symmetry of $F$ ) that $F$ is of the form $F(x,y)=P(x)+P(y)$ . This gives us an eigenfunction equation for $P$ (having simplified slightly):
$(\lambda-1+t)P(t)=\int_0^{1-t}P(u)du.$
Differentiating with respect to $t$ gives
$(\lambda-1+t)P'(t)=-(P(t)+P(1-t)), (*)$
and since the right hand side is unchanged by the substitution $t\rightarrow 1-t$ we see that
$(\lambda-1+t)P'(t)=(\lambda-t)P'(1-t).$
Differentiating (*) again and substituting our above expression for $P'(1-t)$ gives
$P''(t)=P'(t)\Bigl(\frac{1}{\lambda-t}-\frac{2}{\lambda-1+t}\Bigr).$
This is a standard differential equation which can be solved by separation of variables. The solution is
$P(t)=A+B\Bigl(\frac{2\lambda-1}{\lambda-1+t}-\log\Bigl(\frac{\lambda-1+t}{\lambda-t}\Bigr)\Bigr)$
for some constants $A,B$ . We see from substituting this expression into (*) that we must have $A=0$ . Since any scalar multiple of an eigenfunction is also an eigenfunction, WLOG we may take $B=1$ . To calculate $\lambda$ we notice that the eigenfunction equation for $P$ requires that $P(1)=0$ . This gives the equation
$2-\frac{1}{\lambda}+\log\Bigl(1-\frac{1}{\lambda}\Bigr)=0$
mentioned above. For such a $\lambda$ , we have then found the unique eigienfunction (it is easy to verify that this is an eigenfunction) given by $F(x,y)=P(x)+P(y)$ with
$P(t)=\frac{2\lambda-1}{\lambda-1+t}-\log\Bigl(\frac{\lambda-1+t}{\lambda-t}\Bigr).$

17 December, 2013 at 8:12 pm

Pace Nielsen

Along the lines of finding a set $R\subset [0,1]^{3}$ for which $R+R$ lies in the non-convex region Terry described above, the following is a choice which is symmetric in all three variables. (It’s not clear to me whether this domain is better than the one already being used, but it may be worth a shot.)

The region is simply

$R= \{(x,y,z)\ :\ x+y+z\leq 3/2\}\cap [0,1]^{3}$ .

I like to visualize the increasing sequence of domains (in the $k=3$ case) as follows. The domain used for computing $M_3$ is obtained by slicing the cube $[0,1]^{3}$ along the plane $x+y+z\leq 1$ (containing three corners of the cube), leaving behind a tetrahedron.

The domain used for computing $M_3'$ is found by taking the tetrahedron we just described, and gluing another tetrahedron along the same plane $x+y+z\leq 1$ , with the new tetrahedron having a pinnacle at the point $(1/2,1/2,1/2)$ .

The new region described above also contains the point $(1/2,1/2,1/2)$ , and indeed we slice the cube $[0,1]^{3}$ along the plane $x+y+z\leq 3/2$ , which contains $(1/2,1/2,1/2)$ .

17 December, 2013 at 8:25 pm

Pace Nielsen

Alternatively, one can view this new region as cutting a cube exactly in half (with the cutting plane perpendicular to opposite corners). So the area of this new region matches that of the other one currently being used.

17 December, 2013 at 9:39 pm

Aubrey de Grey

I’m not sure whether this was Pace’s motivation, but it’s worth noting that even if Pace’s new suggestion for R is no larger than R”, the fact that it is symmetric in 1..k may very well make it much easier to identify an algorithm for dissecting it appropriately when k>3.

17 December, 2013 at 11:17 pm

Terence Tao

It took a while to figure out how to dissect Pace’s half-cube $R = \{ (x,y,z) \in [0,1]^3: x+y+z \leq\frac{3}{2}\}$ – in fact I eventually had to use my son’s construction toy set to visualise it – but I think I have a dissection into 13 pieces so that all the constraints become polynomial. First there is the base polytope

$R'_3 := \{ (x,y,z): x+y,y+z,z+x \leq 1 \}$ .

Then there are three simplices $A_x, A_y, A_z$ , where

$A_x := \{ x \leq 1 \leq x+y, x+z; x+y+z \leq \frac{3}{2} \}$

and similarly for cyclic permutations of x,y,z; there are also three simplices $B_x,B_y,B_z$ , where

$B_x:= \{ x \geq 0; y,z \geq \frac{1}{2}; x+y+z \leq \frac{3}{2} \}$

and similarly for cyclic permutations; and finally there are six simplices $C_{xyz}, C_{xzy}, C_{yxz}, C_{yzx}, C_{zxy}, C_{zyx}$ , where

$C_{xyz} := \{ x \leq 1/2; z \geq 0; x+y \geq 1; y+z \leq 1 \}$

and similarly for arbitrary permutations.

A symmetric function F can be specified by its values on R’_3 (which are arbitrary), together with its values $F_{A_x}, F_{B_x}, F_{C_{xyz}}$ on $A_x, B_x, C_{xyz}$ respectively, with $F_{A_x}, F_{B_x}$ being symmetric wrt y and z interchange. The vanishing marginal constraints, by symmetry, just assert that $\int F(x,y,z)\ dz = 0$ for $x+y \geq 1$ , and $y \leq x$ , which become

$\int_0^{1-x} F_{C_{xyz}}(y,x,z)\ dz + \int_{1-x}^{3/2 - x - y} F_{A_x}(x,y,z)\ dz = 0$

when $y \leq 1/2$ , and

$\int_0^{3/2-x-y} F_{B_x}(z,y,x)\ dz = 0$

when $y > 1/2$ . (But this should probably be checked.)

With this new polytope, I have an upper bound $\tilde M''_3 \leq \frac{3}{2} M_3 \approx 2.46$ , just because $R \subset \frac{3}{2} \cdot {\cal R}_3$ . This is larger than the bound $M_2 + 1 \approx 2.356$ for the prism $\{ x+y,z \leq 1\}$ , which is perhaps a hopeful sign, although the upper bound here is cruder because the role of the corners of $\frac{3}{2}\cdot {\cal R}_3$ poking out of [0,1]^3 are ignored.

I don’t see yet though how to dissect either of the two four-dimensional candidate polytopes, namely the prism $\{ x+y+z, w \leq 1 \}$ and the truncated simplex $\{ x,y,z,w \leq 1; x+y+z+w \leq \frac{4}{3} \}$ . (Certainly my son’s construction set is no longer of use here!)

18 December, 2013 at 8:50 am

Aubrey de Grey

Is it possible to state in purely geometrical terms what it means for a polyhedron (or polytope, but let’s start with k=3) to have polynomial moment conditions? I would like to play with possibilities for doing the chopping-up of candidate R’s systematically for k > 3, and I’m thinking that there may be alternative dissections that are superficially inferior (involve more pieces, in particular) but that turn out to be easier to generalise to larger k, but this is hard to get one’s head around without a geometrical sense of what makes a dissection admissible.

18 December, 2013 at 9:38 am

Terence Tao

The precise problem is this: given a polytope $R \subset [0,1]^k$ , subdivide it into subpolytopes $P_1,\ldots,P_m$ , such that for each polytope $P_i$ and each $1 \leq j \leq k$ , at least one of the following is true:

(1) The projection of $P_i$ to ${\bf R}^{k-1}$ under the operation $\pi_j: (t_1,\ldots,t_k) \to (t_1,\ldots,t_{j-1},t_{j+1},\ldots,t_k)$ of deleting the $t_j$ coordinate maps $P_i$ to a subset of ${\cal R}_{k-1}$ (or equivalently, $P_i$ lies in the prism $\pi_j^{-1}({\cal R}_{k-1})$ (which is some permutation of ${\cal R}_{k-1} \times [0,1]$ ); or

(2) The polytope $P_i$ can be viewed as a region of the form

$\{ t \in \pi_j^{-1}(\pi_j(P_i)): f(\pi_j(t)) \leq t_j \leq g(\pi_j(t)) \}$

for some affine-linear functionals $f, g$ . In other words, all but two of the faces of $P_i$ are parallel to the $e_j$ coordinate direction. (This makes the vanishing marginal condition in the e_j direction a linear one in the coefficients of F, if F is piecewise polynomial on each of the P_i.)

In order to ensure that we recover at least $M'_k$ in our optimisation problem, we also need to require that one of the polytopes in the partition is equal to ${\cal R}'_k$ . (In particular, the trivial partition in which m=1 is useless, as it only gives the trivial lower bound of 0 for M”_k.)

At present it is not obvious to me that an arbitrary polytope R actually admits such a finite subdivision; for any fixed j one can apply various slicing operations parallel to $e_j$ to make the subdivision good for that particular value of j, but this may introduce new problematic faces that ruin other values of j. The subdivisions I found so far were obtained by ad hoc means, and I don’t immediately see how to generalise them.

18 December, 2013 at 9:49 am

Terence Tao

Actually, it just occurred to me that we don’t, strictly speaking, need the polytopes to obey condition (2); if there are more faces that are transverse to $e_j$ , then we get more vanishing conditions, which reduces the number of degrees of freedom for a given degree of piecewise polynomial, but still in principle allows us to approach the optimal M”_k by using a sufficiently high degree. So one may be able to get by with a coarser partition that has more transverse faces, if the degrees of freedom tradeoff is favorable.

James: perhaps it may be a good idea to organise your code in such a way that one can easily swap in different partitions, polytopes, degree parameters, etc., to give some comparative numerical study of performance of different partitions in the k=3 case, which may give us some useful guidance before plunging into the all-important k=4 case.

17 December, 2013 at 3:15 pm

Aubrey de Grey

Many thanks James – and, in advance, for the details. In particular, can you elaborate on the basis of your remark concerning $M''_4$ ? – I was presuming that there is no simple algorithm for dissecting $R''_k$ with $k > 3$ according to Eytan’s idea, which is surely a prerequisite for making any such estimate, but maybe there is one?

17 December, 2013 at 4:53 pm

James Maynard

Aubrey: I agree there is no simple algorithm at present. The same sort of decompositions should work for $M_4''$ , although the decomposition will probably be in many more pieces. I think it should be tedious but probably relatively straightforward to do it by hand.

My comment regarding this improvement potentially not getting all the way to $M_4''>2$ was because we need an improvement of about 0.063 on the current bound for $M_4'$ . When $k=3$ we appear to get an improvement of 0.072 in going from $M_3'$ to $M_3''$ . My guess was that the improvement in moving from $M_k'$ to $M_k''$ would decrease quite quickly in $k$ (as it does moving from $M_k$ to $M_k'$ ), and so the current improvement of 0.072 would probably decrease to below the required 0.063. This is just my guess though (I have no proof of this) – it would be nice if I was wrong on this front.

17 December, 2013 at 5:09 pm

Terence Tao

Well, from a volume perspective at least, ${\cal R}''_k$ has much bigger volume ( $\frac{1}{(k-1)!}$ , compared with $\frac{1}{(k-1) (k-1)!}$ for ${\cal R}'_k$ and $\frac{1}{k (k-1)!}$ for ${\cal R}_k$ ), which gives one hope. Of course, much of this extra volume is somehow “cancelled out” by the additional vanishing marginal conditions, so things are inconclusive.

This is a degenerate data point, but the improvement from M’_2 to M”_2 is precisely zero, so the improvement certainly got better going from k=2 to k=3. Perhaps going from k=3 to k=4 we still have a little bit of upward monotonicity – well, we can hope, anyway :)

18 December, 2013 at 12:19 pm

James Maynard

I’ll try to give some more details of my calculations (and hopefully if I’ve made a mistake someone will spot it). I’m not sure that I’m explaining this in the clearest language, but I hope it makes sense.

As described in Terry’s post, we are looking at polynomials $F_0,F_1,F_1',F_2,F_2',F_3,F_3'$ supported on $D_0,D_1,D_1',D_2,D_2',D_3,D_3'$ respectively.

Let’s concentrate on the calculation of $J_x=\int\int\Bigl(\int F dx\Bigr)^2dydz$ , the calculations of $J_y,J_z,I$ being similar.

I considered the projections of $D_0\dots d_3'$ onto the $y,z$ plane. For example, the projection of $D_1$ is given by $\{(y,z):y\leq 1/2\leq z\leq 1-y\}$ . Based on these projections, I split the $y,z$ plane into 6 disjoint polytopes given by
$E_0=\{0 \leq z \leq 1/2, z\leq y\leq 1-z \}$
$E_1=\{1/2\leq z \leq 1 , 0\leq y\leq 1-z \}$
$E_2=\{0 \leq z \leq 1/2, 0\leq y\leq z \}$
$E_3=\{1/2\leq z \leq 1, 1-z\leq y\leq 1/2 \}$
$E_4=\{1/2\leq z \leq 1, 1/2\leq y\leq z\}$
$E_5=\{0 \leq y \leq 1/2, 1-y\leq z\leq y\}$
These were formed by just splitting the projections so that each region $E_j$ had a fixed set of the $D_i$ regions which projected to it (and those regions projected onto the whole of $E_j$ ). Since the $D_i$ regions were given by a set of linear inequalities, so too were the $E_j$ regions (and in particular they are convex).

The point of these regions is that for $y,z$ in any region the integration over $x\in D_i$ can then be written as integration over an interval. We see that the integration over $x$ for $y,z\in E_3,E_4,E_5$ must be zero from our marginal conditions (since $y+z\ge1$ ).

Over each region, we can write down the integration $\int F dx$ :
$E_0: \int_0^{1-y}F_0dx$
$E_1: \int_0^{1-z}F_0dx+\int_{1-z}^zF_1dx+\int_z^{1-y}F_2dx$
$E_2: \int_0^{1-z}F_0dx+\int_{1-z}^{1-y}F_2dx$
The regions of integration for $x$ over $E_3,E_4,E_5$ correspond exactly to the vanishing conditions terry wrote above.

Thus we can perform the above integrations for $E_0,E_1,E_2$ with respect to $x$ , square, and then integrate over the regions $E_0,E_1,E_2$ . This gives our expression for $J_x$ as a quadratic form in the coefficients of $F_0,F_1,F_1',F_2,F_2',F_3,F_3'$ .

To enforce the vanishing marginal conditions, we simply calculate the polynomials given by Terry above (i.e. the 2-variable polynomials obtained by integrating with respect to $y$ ). Each coefficient of a monomial $x^i z^j$ gives a (homogeneous) linear constraint on our coefficients. By simply using each linear constraint to eliminate one variable at a time from the quadratic form $J_x$ , we arrive at a reduced quadratic form which satisifies all the vanishing marginal conditions. Doing the same for $I,J_y,J_z$ then gives us the ratio of two quadratic forms, which is our quadratic programming problem from before.

It should be relatively straightforward to modify my code so that given a decomposition into polytopes given by inequalities $L_1\le t_1\le L_1, L_2(t_1)\leq t_2\leq L_2'(t_1),\dots , L_n(t_1,\dots,t_{n-1})\leq t_n\leq L_n'(t_1,\dots,t_{n-1})$ for linear functions $L_j,L'_j:\mathbb{R}^{j-1}\rightarrow \mathbb{R}$ it can calculate the quadratic form corresponding to the integration over the polytope.

18 December, 2013 at 12:43 pm

James Maynard

Also, unless I’m missing something, I think in principle there should be a fairly simple algorithm for decomposing any region given by linear inequalities to polytopes of the type we desire.

Assume we are given some region $\mathcal{R}''$ which is given by a set of linear inequalities in the variables $t_1,\dots,t_k$ . We can split this into $2^k$ subregions $D_1,\dots$ depending on which of the forms $g_j=\sum_{i\ne j}t_i$ are greater than 1, and which are less than 1. Each such subregion will also just be given by a set of linear inequalities. We let $F$ be a fixed polynomial $F_i$ on each such region $D_i$ . Since we know exactly which of the functions $g_j$ are greater than 1 on $D_i$ , each region comes with a set of $m$ vanishing marginal conditions, where $m$ is the number of the $g_i$ which are greater than 1.

Each region $D_i$ can be further decomposed into regions $E_{i,j}$ which are of the form $L_1\leq t_1\leq L_1',\dots, L_k(t_1,\dots,t_{k-1})\leq t_k\leq L_k'(t_1,\dots,t_{k-1})$ for linear functions $L_j,L_j':\mathbb{R}^{j-1}\rightarrow\mathbb{R}$ . (Proof: induction on $k$ . Since $D_i$ is given by a set of linear inequalities, the lower and upper bounds for $t_k$ are given by the maximum and minimum of a set of linear functions in $t_1,\dots t_{k-1}$ . For a each ordering of these functions, this gives a upper bound and lower bound for $t_k$ which is linear. Then we use the induction hypothesis, since the remaining variables are constrained only by linear inequalities).

We then decompose the projections in the $t_k$ coordinate of the $E_{i,j}$ regions as above (so we have regions where each region is projected to from a fixed set of the $E_{i,j}$ ). This means on each of these subregions (which are also just given by linear inequalities) the integral of $F$ with respect to $t_k$ is given by a finite sum of integrals of $F_i$ over intervals. Thus we can calculate all these integrals explcitly and calculate the quadratic form $J_{t_k}$ . Similarly for the other $J$ regions.

We still need to apply the linear constraints, but by following exactly the same method above, we can calculate the various degree $k-1$ polynomials corresponding to the vanishing marginals, which gives a set of homogeneous linear constraints on the coefficients of the $F_i$ . Eliminating variables from the quadratic forms then gives reduced quadratic forms satisfying the vanishing conditions.

Since everything is just given by linear constraints at each stage, everything remains convex so we avoid issues to do with a lack of convexity.

Obviously the above would give some huge number of decompositions if done directly. We can save a lot for small $k$ by using symmetry and sometimes switching the orderings of the variables to mean we only have to consider fewer regions. Thus it is probably the case that it is still better to do things by hand.

18 December, 2013 at 1:18 pm

Aubrey de Grey

Splendid! Hm, but if an economical decomposition is hard to find even for k=4, maybe it is worth seeking an algorithm to convert your decomposition into something with a more manageable number of components. The first approach that springs to mind is to do the decomposition as you describe and then iteratively examine each pair of component polytopes that share a (k-1)-dimensional polytope, to see whether their union satisfies the necessary conditions, and merge them if it does. Is there any reason why that would be expected to run into the sand unhelpfully quickly?

18 December, 2013 at 1:35 pm

James Maynard

In case my first post isn’t clear, I thought I’d try to write out explicitly what I get for some of the integrals. (Also it might make any mistakes I made clearer – there are enough terms that I can’t rule out I’ve made a small mistake somewhere).

Not assuming symmetry, I find that
$J_x=J_{x,1}+J_{x,2}+J_{x,3},$
where
$J_{x,1}=\int_0^{1/2}\int_{z}^{1-z}\Bigl(\int_0^{1-y}F_0dx\Bigr)^2dydz$
$J_{x,2}=\int_{1/2}^1\int_0^{1-z}\Bigl(\int_0^{1-z}F_0dx+\int_{1-z}^zF_1dx+\int_z^{1-y}F_2dx\Bigr)^2dydz$
$J_{x,3}=\int_0^{1/2}\int_{0}^z\Bigl(\int_0^{1-z}F_0dx+\int_{1-z}^{1-y}F_2dx\Bigr)^2dydz$
And this is subject to the three vanishing marginal conditions
$\int_y^{1-y} F'_3dx+\int_0^{1-z}F_1'dx+\int_{1-z}^yF_3dx=0$
$\int_0^{1-z}F_1'dx+\int_{1-z}^{1-y}F_3dx=0$
$\int_0^{1-y} F'_2dx=0$
The expression for $J_y$ is essentially the same, swapping all instances of $y$ with $x$ in the integration variables and limits, and swapping the functions $F_i$ with $F_i'$ everywhere.

For $J_z$ I obtained
$J_z=J_{z,1}+J_{z,2}+J_{z,3}+J_{z,4}$
where
$J_{z,1}=\int_0^{1/2}\int_{1/2}^{1-y}$
$\Bigl(\int_0^{1-x}F_0+\int_x^{1-y}F_1+\int_{1-x}^xF_2+\int_{1-y}^1F_3' dx\Bigr)^2dxdy$
$J_{z,2}=\int_{1/2}^1\int_0^{1-y}$
$\Bigl(\int_0^{1-y}F_0+\int_y^{1-x}F_1'+\int_{1-y}^yF_2'+\int_{1-x}^1F_3dz\Bigr)^2dxdy$
$J_{z,3}=\int_0^{1/2}\int_y^{1/2}\Bigl(\int_0^{1-x}F_0+\int_{1-x}^{1-y}F_1+\int_{1-y}^1F_3'dz\Bigr)^2dxdy$
$J_{z,4}=\int_0^{1/2}\int_0^{y}\Bigl(\int_0^{1-y}F_0+\int_{1-y}^{1-x}F_1'+\int_{1-x}^1F_3dz\Bigr)^2dxdy$
There are no vanishing conditions for $J_z$ since we are restricting to $x+y\leq 1$ anyway.

18 December, 2013 at 4:32 pm

Pace Nielsen

James, did you get a chance to try the computation for $k=3$ on the new domain I described above (which Terry was kind enough to work out the linear constraints for)?

18 December, 2013 at 4:43 pm

James Maynard

I think my comments above can give a relatively straightforward decomposition of the region
$\mathcal{R}''_4=\{(x,y,z,w)\in[0,1]^4:x+y+z\leq 1\}.$
We split $\mathcal{R}''_4$ into 8 subregions which I’ll label $D_0,D_x,D_y,D_z,D_{xy},D_{xz},D_{yz},D_{xyz}$ . To ease notation let $L_x=y+z+w$ and similarly define $L_y,L_z,L_w$ which are the forms which are the sum of all variables except for $y,z$ or $w$ respectively. Then
$D_0=\{L_x,L_y,L_z,L_w\leq 1\}$
$D_x=\{L_y,L_z,L_w\leq 1\leq L_x\}$
$D_{xy}=\{L_z,L_w\leq 1\leq L_x,L_y\}$
$D_{xyz}=\{L_w\leq 1\leq L_x,L_y,L_z\}$
and similarly for $D_y,D_z,D_{xz},D_{yz}$ (the subscripts are indicating which of the $L_i$ are greater than 1).

We choose our function $F$ to be a polynomial $F_i$ of each region $D_i$ . (For the time being I won’t assume any symmetry, although this can be used to simplify things later).

To calculate $J_w=\int\int\int\Bigl(\int F dw\Bigr)^2dxdydz$ :
We split the $x,y,z$ space into 6 subregions given by $E_{x,y,z}$ and all permutations of the subscripts. (For the $E$ regions the subsripts are intended to be ordered, whist the $D$ regions the subscripts are unordered). The region $E_{x,y,z}$ is given by
$E_{x,y,z}=\{x\leq y\leq z, x+y+z\leq 1\},$
and similarly for all other $E_i$ regions.

If we restrict to $(x,y,z)\in E_j$ , then the inner integration $\int F dw$ is a sum of integrals of the functions $F_i$ over intervals. For example, for $(x,y,z)\in E_{x,y,z}$ we find $\int F dw$ is given by
$\int_0^{1-z-y}F_0+\int_{1-y-z}^{1-x-y}F_x+\int_{1-x-y}^{1-x-z}F_{xy}+\int_{1-x-z}^1F_{xyz}$
It is easy to write explicit limits for the integration with respect to $x,y,z$ for any region $E_{i}$ . Thus we can calculate the quadratic form $J_w$ . There are no vanishing marginal restrictions here.

To calculate $J_x$ we follow a similar approach. We again split the $y,z,w$ space into regions $E_{y,z,w}$ etc giving the orderings of the sizes of $z,y,w$ . However, we further split $E_{y,z,w}$ into $E_{y,z,w}^+$ and $E_{y,z,w}^-$ depending on whether $y+z+w\geq 1$ or $y+z+w\leq 1$ . For example, we obtain for $(y,z,w)\in E_{y,w,z}^-$
$\int F dx=\int_0^{1-w-z}F_0dx+\int_{1-w-z}^{1-z-y}F_y dx$
whilst for $(y,z,w)\in E_{y,w,z}^+$ we obtain the vanishing condition
$\int F dx=\int_0^{1-w-z}F_x dx+\int_{1-w-z}^{1-z-y}F_{xy} dx=0.$

By symmetry, in this case we only need to consider functions $F$ for which $F(x,y,z,w)=F(\sigma(x),\sigma(y),\sigma(z),w)$ for any permutation $\sigma$ of $x,y,z$ . This should reduce the search space significantly. For example, $F_0$ and $F_{xyz}$ will be symmetric in $x,y,z$ , whilst $F_x,F_{yz}$ will be symmetric in $y,z$ but equal to one of the other functions under transpositions of $x,z$ or $y,x$ .

I haven’t had a chance to actually try to compute anything with this yet, though. I also haven’t had a chance to compute with Pace’s alternative region. (Work to do, although I’ll be busy with other things for most of tomorrow.)

16 December, 2013 at 2:32 pm

Terence Tao

I can get rid of the final $\log\log\log k$ error in the asymptotic lower bound for $M_k$ , thus

$M_k \geq \log k - O(1)$ .

I’m recording a sketch of the argument below; I may give a more detailed proof for the next blog post.

The basic idea is to do a defect analysis of the upper bound $M_k \leq \frac{k}{k-1} \log k$ . We first observe that the Cauchy-Schwarz upper bound

$(\int F)^2 \leq (\int F^2 / h) (\int h),$ (1)

valid for reasonable non-negative functions $F, h$ , can be matched with the lower bound

$(\int F)^2 \geq (\int F^2 / h) (\int h) - (\int (F-ah)^2/h) (\int h)$ (2)

for any scalar a (independent of the variable of integration). Indeed, the right-hand side can be calculated to be $(\int F)^2 - (\int (F-ah))^2$ .

Now, recall that the upper bound $M_k \leq \frac{k}{k-1} \log k$ comes from the inequality

$(\int_0^\theta F\ dt_k)^2 \leq \frac{\log k}{k-1} \int_0^\theta F^2 (\theta + (k-1)t_k)\ dt_k$

for any $t_1,\ldots,t_{k-1} \in {\cal R}_{k-1}$ , where $\theta := 1-t_1-\ldots-t_{k-1}$ and $F$ is short for $F(t_1,\ldots,t_k)$ ; this comes from applying (1) with $h(t_k) := \frac{1}{\theta + (k-1)t_k}$ . Integrating this in the variables $t_1,\ldots,t_{k-1}$ and then symmetrising gives the upper bound.

Now we make the usual choice of testing

$F(t_1,\ldots,t_k) := 1_{t_1+\ldots+t_k \leq 1} g(t_1) \ldots g(t_k)$

where $g = G 1_{[0,T]}$ and $G(t) := \frac{1}{c+(k-1)t}$ , where $c,T>0$ are parameters at our disposal; a good choice here is $c = 1/\log k$ and $T = \varepsilon/\log k$ for a small absolute constant $\varepsilon$ . Applying (2) with $a := g(t_1) \ldots g(t_{k-1})$ , we obtain that

$(\int_0^\theta F\ dt_k)^2 \geq \frac{\log k}{k-1} \int_0^\theta F^2 (\theta + (k-1)t_k)\ dt_k$
$- g(t_1)^2 \ldots g(t_{k-1})^2 \frac{\log k}{k-1} \int_0^\theta (g - h)^2/h$ .

If we symmetrise, and let $X_1,\ldots,X_k$ be iid random variables with law $\frac{1}{m_2} g(t)^2\ dt$ , where $m_2 := \int g^2$ , then

$\int_{{\cal R}_k} F^2 = m_2^k P$

where $P := {\bf P}( X_1 + \ldots + X_k \leq 1 )$ , and hence

$M_{k-1} \geq \frac{k}{k-1} \log k - \frac{1}{P m_2} \frac{k \log k}{k-1} {\bf E} \int_0^\theta (g - h)^2/h$

where $\theta$ is now interpreted as $\theta := 1 - X_1 - \ldots - X_{k-1}$ , and the integral is interpreted as vanishing if $\theta$ is negative. One can compute that $m_2 \sim \frac{\log k}{k}$ , and from the second moment $P \sim 1$ if $\varepsilon$ is small enough, so

$M_{k-1} \geq \frac{k}{k-1} \log k - O( k {\bf E} \int_0^\theta (g - h)^2/h )$

The integral $\int_0^\theta (g - h)^2/h$ may be bounded by the sum of

$\int_T^\theta h$

(which by convention vanishes if $\theta \leq T$ ) and

$\int_0^\theta (G-h)^2/h$ .

The first term may be bounded by

$\int_T^\theta \frac{1}{(k-1)t}\ dt = \frac{1}{k-1} \log \frac{\theta}{T}.$

For the second term, we use the identity $G-h = (\theta-c) Gh$ to write this integral as

$(\theta-c)^2 \int_0^\theta G^2 h$

when $\theta \geq c$ , and

$(c-\theta) \int_0^\theta G (h-G)$

when $\theta \leq c$ .

When $\theta \geq c$ , we crudely bound $(\theta-c)^2 \leq \theta^2$ , $h \leq 1/\theta$ to bound this integral by $\theta \int G^2 = O( \frac{1}{k} \frac{\theta}{c} )$ . When $\theta \leq c$ , we bound $h-G \leq h$ and $(c-\theta) Gh = h-G$ to bound this integral by $\int h-G = O( \frac{1}{k} \log \frac{c}{\theta} )$ . Putting all this together, we see that

$M_k \geq \frac{k}{k-1} \log k - O( {\bf E} \log_+ \frac{\theta}{T} + \max( \frac{\theta}{c}, \log \frac{c}{\theta} )$

where we are implicitly reducing to the event that $\theta$ is positive.

The second moment method shows that $\theta$ has mean $O( 1/\log k )$ and variance $O( 1/\log^2 k )$ ; this makes all the error terms $O(1)$ except possibly for the $\log \frac{c}{\theta}$ term, but this can also be made to be O(1) by an averaging argument (dilating the g’s by $1 + O(1/\log x)$ and using the pigeonhole principle).

In principle this analysis might give better numerics for the m=2 bound, but it’s a bit messy.

18 December, 2013 at 12:45 am

Andrew Granville

Terry: please can you give more details when you write “the \log \frac{c}{\theta} term … can also be made to be O(1) by an averaging argument (dilating the g’s by 1 + O(1/\log x) and using the pigeonhole principle).” ?
In my own failed efforts at proving something like this, I got stuck precisely when the range for t_k is tiny (ie reducing this term). Thanks!

18 December, 2013 at 7:42 am

Terence Tao

I wrote up some more details of the argument (with explicit error terms, which unfortunately makes things messy) at http://michaelnielsen.org/polymath1/index.php?title=Selberg_sieve_variational_problem#Lower_bounds

The averaging argument ends up being easier to run as follows: if we let $F_r$ be the truncation $g(t_1) \ldots g(t_k) 1_{t_1+\ldots+t_k \leq r}$ of a tensor product to the dilated simplex $r \cdot {\cal R}_k$ , then we have

$J_k^{(1)}(F_r) \leq \frac{r}{k} M_k I_k(F_r).$

We manipulate this inequality as discussed previously, converting it into an upper bound for the defect $\frac{k}{k-1} \log k - M_k$ in terms of various integrals involving g and r. When one arrives at the problematic logarithmic singularity, one can then average in r (noting that the defect is independent of r).

19 December, 2013 at 12:41 pm

Eytan Paldi

Can the $O(1)$ estimate be made effectively computable ? (or even be optimized for each given $k$ ?)

19 December, 2013 at 2:06 pm

Terence Tao

It is certainly effective; I don’t know yet whether it is competitive with our existing bounds for m=2, but it should certainly give reasonable bounds for higher m for the first time. I’ll try to work out some more details later today..

19 December, 2013 at 6:07 pm

Terence Tao

After playing around with Maple, I was only able to get the O(1) loss in $M_k \geq \frac{k}{k-1} \log k - O(1)$ down to about 4 or 5 for medium-sized values of k, which unfortunately does not beat the current estimates for m=1,2, and gives a pretty lousy estimate for m=3; I can get $k=10^7$ but not much better than this. (I was hoping to recover Zhang’s original value of k=3,500,000, so that we could now stuff four primes into this tuple instead of two, but fell short of this even with our best MPZ estimate.)

I’m recording the Maple code that gave $k=10^7$ in the m=3 case below. There are three free parameters $c,T,\tau$ that one can optimise over, although I found in practice that a (somewhat artificial) constraint $1-k\mu-\tau \geq 0$ that I had to impose at some point was dominant in the sense that it set the optimal value of $\tau$ for a given c, T.

k := 10000000; m := 3;


# These three parameters may be chosen arbitrarily
c := 1.08/log(k);

T := 0.41/log(k);

# tau := 1.14/log(k);
M0 := k*log(k)/(k-1);

g := 1/(c+(k-1)*t);

m2 := int(g*g, t=0..T );

mu := int(t*g*g, t=0..T )/m2;

sigma2 := int(t*t*g*g, t=0..T)/m2 - mu*mu;
tau := 1 - k*mu;
denominator := (1+tau/2)*(1 - (k*sigma2)/((1+tau-k*mu)^2));
a := (r - k*mu) / T;

S := r * (log((r-k*mu)/T) +(k*sigma2)/(4*a*a*T*T*log(a)) ) + r*r/(4*k*T);

Sav := int(S, r=1..1+tau, numeric)/tau;
Z3 := int( g*g*k*t*log(1+t/T), t=0..T, numeric) / m2;

W := int( g*g*log(1+tau/(k*t)), t=0..T, numeric) / m2;

X:= (log(k)/tau) * ( (1-(k-1)*mu+max(tau-c,0))^2 +(k-1)*sigma2+ c^2 );
numerator := Sav + Z3 + W*X;
Delta := (k/(k-1)) * numerator/ denominator;
# these three quantities need to be positive

evalf( 1 - k*mu - T );

evalf( 1 - k*mu - tau );

evalf(denominator);
# this is the lower bound for M_k

evalf(M0-Delta);
varpi := m/(M0-Delta) - 1/4;

delta := T * ((1/4) + varpi);

# if this is less than 1, we win evalf(1080*varpi/13+ 330*delta/13);

19 December, 2013 at 10:47 pm

Terence Tao

I found a more efficient way to control one of the error terms (which I called Z_4 in the wiki page http://michaelnielsen.org/polymath1/index.php?title=Selberg_sieve_variational_problem#Lower_bounds ), which gets the O(1) error down to about 2 or 3, which gets the m=3 value of k down to 1,700,000. The code below also gives a value of k=38,000 for m=2 (by setting c = 1.07/log(k) and T = 0.48/log(k)); presumably a little bit more optimisation is possible.

k := 1700000; m := 3;


# These three parameters may be chosen arbitrarily
c := 1.06/log(k);

T := 0.46/log(k);

# tau := 1.14/log(k);
M0 := k*log(k)/(k-1);

g := 1/(c+(k-1)*t);

m2 := int(g*g, t=0..T );

mu := int(t*g*g, t=0..T )/m2;

sigma2 := int(t*t*g*g, t=0..T)/m2 - mu*mu;
tau := 1 - k*mu;
denominator := (1+tau/2)*(1 - (k*sigma2)/((1+tau-k*mu)^2));
a := (r - k*mu) / T;

S := r * (log((r-k*mu)/T) +(k*sigma2)/(4*a*a*T*T*log(a)) ) + r*r/(4*k*T);

Sav := int(S, r=1..1+tau, numeric)/tau;
Z3 := int( g*g*k*t*log(1+t/T), t=0..T, numeric) / m2;

W := int( g*g*log(1+tau/(k*t)), t=0..T, numeric) / m2;

X:= (log(k)/tau) * c^2;
V := (c/m2) * int( g^2 / (2*c+(k-1)*t) , t=0..T, numeric);

U := (log(k)/c) * int( (1+u*tau - (k-1)*mu - c)^2 + sigma2, u=0..1, numeric);
numerator := Sav + Z3 + W*X + V*U;
Delta := (k/(k-1)) * numerator/ denominator;
# this needs to be positive

evalf( 1 - k*mu - T );

evalf( 1 - k*mu - tau );
# this needs to  be positive too

evalf(denominator);
# this is the lower bound for M_k

evalf(M0-Delta);
varpi := m/(M0-Delta) - 1/4;

delta := T * ((1/4) + varpi);

# if this is less than 1, we win evalf(1080*varpi/13+ 330*delta/13);

19 December, 2013 at 6:25 pm

Andrew Sutherland

Taking the first $k=10^7$ primes greater than $k$ gives $H(k) \le {182,087,070}$ for m=3.

19 December, 2013 at 6:42 pm

Andrew Sutherland

Taking 10,000,000 consecutive primes starting with 1,040,407 gives an admissible tuple of diameter 179,933,380 for m=3.

(and 1,040,407 is the smallest prime for which this works).

20 December, 2013 at 2:32 am

Andrew Sutherland

Using the Hensley-Richards sieve for k=10,000,000 gives an admissible sequence of diameter 175,225,874 for m=3.

20 December, 2013 at 2:59 am

Andrew Sutherland

For k=1,700,000, taking the first k primes greater than k gives the bound $H\le {27,790,250}$ for m=3.

Taking 1,700,000 consecutive primes starting with 205,297 gives an admissible tuple with diameter 27,398,976 for m=3.

20 December, 2013 at 3:07 am

Andrew Sutherland

Using the Hensley-Richards sieve for k=1,700,000 gives and admissible tuple of diameter 26,682,014 for m=3.

20 December, 2013 at 3:16 am

Andrew Sutherland

For k=38,000 a greedy sieve yields an admissible tuple of diameter 431,682 for k=2.

20 December, 2013 at 3:38 am

Andrew Sutherland

For k=38,000 the bound on H can be improved to 430,448.

20 December, 2013 at 8:44 am

Andrew Sutherland

For m=2, k=38000 we can further improve the bound on H to 429,822.

20 December, 2013 at 8:49 am

Wouter Castryck

Terence’s maple code with the (hopefully legal?) input m := 2, c := 0.975/log(k), T := 0.97/log(k) seems to work for k = 25819. [This seems to check out – T.]

20 December, 2013 at 11:31 am

Terence Tao

Unfortunately I made a typo when entering in the formula for U; the sigma2 term should in fact be (k-1)sigma2, which makes things worse and invalidates Wouter’s optimisations – sorry about that! I was able at least to recover the k=38000 value using Wouter’s values (incidentally I have realised that it is slightly better to view c as a perturbation of 1/log(k), so I now wrote c = 1/log(k) – 0.25 / log(k)^2); similarly I can recover k=1700000 in the m=3 case by taking c = 1/log(k) – 0.07/log(k)^2 and T = 0.877/log(k)^2.

k := 38000; m := 2;


# These three parameters may be chosen arbitrarily
c := 1/log(k) - 0.25/log(k)^2;

T := 0.97/log(k);

# tau := 1.14/log(k);
M0 := k*log(k)/(k-1);

g := 1/(c+(k-1)*t);

m2 := int(g*g, t=0..T );

mu := int(t*g*g, t=0..T )/m2;

sigma2 := int(t*t*g*g, t=0..T)/m2 - mu*mu;
tau := 1 - k*mu;
denominator := (1+tau/2)*(1 - (k*sigma2)/((1+tau-k*mu)^2));
a := (r - k*mu) / T;

S := r * (log((r-k*mu)/T) +(k*sigma2)/(4*a*a*T*T*log(a)) ) + r*r/(4*k*T);

Sav := int(S, r=1..1+tau, numeric)/tau;
Z3 := int( g*g*k*t*log(1+t/T), t=0..T, numeric) / m2;

W := int( g*g*log(1+tau/(k*t)), t=0..T, numeric) / m2;

X:= (log(k)/tau) * c^2;
V := (c/m2) * int( g^2 / (2*c+(k-1)*t) , t=0..T, numeric);

U := (log(k)/c) * int( (1+u*tau - (k-1)*mu - c)^2 + (k-1)*sigma2, u=0..1, numeric);
numerator := Sav + Z3 + W*X + V*U;
Delta := (k/(k-1)) * numerator/ denominator;
# this needs to be positive

evalf( 1 - k*mu - T );

evalf( 1 - k*mu - tau );
# this needs to  be positive too

evalf(denominator);
# this is the lower bound for M_k

evalf(M0-Delta);
varpi := m/(M0-Delta) - 1/4;

delta := T * ((1/4) + varpi);

# if this is less than 1, we win evalf(1080*varpi/13+ 330*delta/13);

20 December, 2013 at 9:02 am

Andrew Sutherland

For k=25819 we can bound H by 283,242.

20 December, 2013 at 10:11 am

Andrew Sutherland

The bound for k=25819 can be improved to 282,792.

20 December, 2013 at 10:46 am

Wouter Castryck

For m = 3, the maple code seems to prove k = 1214999 on input c = 0.995/log(k) and T = 0.877/log(k). This is probably not yet optimal.

20 December, 2013 at 10:56 am

Andrew Sutherland

For k=1700000, m=3, a greedy sieve yields an admissible tuple of diameter 25,602,510.

I can take a look at k=1214999 shortly.

20 December, 2013 at 11:03 am

Andrew Sutherland

For k=1,214,999, taking k consecutive primes starting with 157,831 gives the bound 19,152,478. Using Hensley-Richards gives 18,641,812.

20 December, 2013 at 8:17 pm

xfxie

For m=2, seems k = 36000 is valid at:

c := 0.9785232473863417/log(k);
T := 0.9205181897139338/log(k);

[Seems to check out – T.]

21 December, 2013 at 1:51 am

Andrew Sutherland

Some further improvements:

For m=2, k=38000 we can get 429,798.
For m=3,k=1,700,000 we can get 25,602,438.

21 December, 2013 at 4:32 am

Andrew Sutherland

For k=36,000 we can bound H by 405,528.

20 December, 2013 at 10:14 pm

xfxie

Here is a solution 35146, which is based on the code here (The only differences are at the four lines for the inputs k0, m, c, T).

21 December, 2013 at 4:37 am

Andrew Sutherland

For k=35146 we can bound H by 395,542.

Have you tried optimizing for m=3 (or even m=4)?

21 December, 2013 at 9:03 am

Andrew Sutherland

For m=2, k=35,146 the bound on H can be improved to 395,264.

21 December, 2013 at 10:04 am

xfxie

For k=35146 [m=2], H can be down to 395234.

21 December, 2013 at 10:34 am

Andrew Sutherland

395,178 [m=2].

21 December, 2013 at 10:20 am

xfxie

Just tried for m=3, it can be down to 1630680.

21 December, 2013 at 10:49 am

Andrew Sutherland

Thanks! Hensley-Richards then gives 25,527,718 as a bound on H.

21 December, 2013 at 6:34 pm

Andrew Sutherland

A greedy sieve brings the m=3 bound with k=1,630,680 down to 24,490,758.

21 December, 2013 at 10:31 am

xfxie

For m=4, k can be at least down to 36000000.

21 December, 2013 at 11:00 am

Andrew Sutherland

I was able to get k down to 35,127,242 for m=4, but you may be able to improve this further.

Taking the first k primes greater than k gets the bound on H down to 685,833,596 for m=4.

21 December, 2013 at 11:04 am

Andrew Sutherland

Hmm, it doesn’t seem to want to post the link. In any case, the parameters I used for m=4 to get k=35,127,242 are

c := evalf(0.9529804688/log(k));
T := evalf(0.9999999999/log(k));

21 December, 2013 at 4:23 pm

xfxie

For m=4, seems k can be decreased to 25589558.

21 December, 2013 at 6:25 pm

Andrew Sutherland

Nice improvement. Taking the first k primes greater than k=25589558 then gives the bound 491149914 on H for m=4.

Out of curiosity, have you tried m=5? It might be interesting to get a sense of how close to the asymptotic prediction we can get with explicit bounds.

21 December, 2013 at 4:51 am

Andrew Sutherland

So if I just change m to 3 in xfxie’s worksheet and then naively optimize k without changing anything else it appears that we can get 1,640,042 for m=3.

21 December, 2013 at 8:53 am

Andrew Sutherland

After fiddling with the parameters a bit, it appears that for m=3 we can get k down to 1,631,027.

Hensley-Richards then gives 25,533,684 as an upper bound on H.

21 December, 2013 at 5:04 am

Andrew Sutherland

Doing the same thing with m=4 gives k=41,862,295. Taking the first k primes greater than k gives the bound 825,018,354 on H.

18 December, 2013 at 3:46 am

Eytan Paldi

The above computation of $M_2$ (the operator norm of $\mathcal L_2$ ) relies on the (yet unproved) assumption that it is attained as the corresponding eigenvalue for some smooth eigenfunction of $\mathcal L_2$ .

18 December, 2013 at 8:21 am

Terence Tao

That’s a good point. Smoothness is unlikely to be an issue, because everything is at least L^2 in regularity, and all the operations involved are linear in the unknown F, so that the theory of distributions will probably be able to make all the ODE manipulations rigorous. Attainment of the minimum at an eigenfunction should be possible through calculus of variations arguments due to the strict gap $M_2 > M_1$ (which we can prove rigorously, since $M_1=1$ and the explicit eigenfunction we have tells us that $M_2 \geq 1.38593\ldots$ ) prevents escape to high frequencies. More precisely, let $F_n \in L^2({\cal R}_2)$ be a extremising sequence for the ratio $[J_2^{(1)}(F_n) + J_2^{(2)}(F_n)] / I_2(F_n)$ ; we can symmetrise using the parallelogram law (as mentioned in https://terrytao.wordpress.com/2013/11/19/polymath8b-bounded-intervals-with-many-primes-after-maynard/#comment-251870 ) to ensure wlog that the $F_n$ are symmetric; we can also normalise them in L^2 and make them non-negative.

By passing to a subsequence, we can make the $F_n$ converge weakly in L^2. We want to upgrade this to strong convergence in L^2, so that the limit $F$ will be a true eigenfunction at the extremal eigenvalue $M_2$ . Because the $F_n$ have uniformly compact support and uniformly bounded in L^2, the only way in which strong convergence fails is escape of mass to high frequencies (a sort of failure of L^2 equicontinuity, in the spirit of Arzela-Ascoli). However, in the high frequency limit, at least one of the two components in ${\cal L}$ goes to zero, and the contribution of the high frequency component is controlled by M_1, which is strictly less than M_2. This can be used to show that escape to high frequences does not occur. (There are some technical details here due to the fact that the high and low frequency components of F_n may leak outside the simplex ${\cal R}_2$ , but I believe that these issues are manageable by sending the cutoff between high and low frequency slowly to infinity.)

18 December, 2013 at 10:51 am

Eytan Paldi

Concerning Aubrey’s question (on systematic dissection of $\mathcal R''_k$ ), a sufficient condition is that for each $m = 1, ..., k$ and each $t_1, ... , t_{m-1}, t_{m+1}, ... , t_k$ , the interval

$I_{m,,j} := \{t_m : (t_1, ..., t_k) \in D_j\}$

for each polytope $D_j$ is either empty or has endpoints which are FIXED linear combinations of $1, t_1, ... , t_{m-1}, t_{m+1}, ... , t_k$ – (i.e. each endpoint belong to exactly one bounding plane of $D_j$ .)

18 December, 2013 at 1:27 pm

Eytan Paldi

A possible algorithm for systematic dissection of $\mathcal R''_k$ is to check for each $m$ and polytope $D_j$ if every $(t_1, ... , t_{m-1}, t_{m+1}, ... , t_k)$ in the projection $D_{j, m}$ of $D_j$ on the plane orthogonal to $t_m$ axis, satisfies the above condition (i.e. each endpoint of the interval $I_{m, j}$ above belongs to a FIXED bounding plane of $D_j$ . Otherwise, $D_j$ should be dissected (by appropriate planes parallel to $t_m$ axis to satisfy the above condition). I think that by repeating this dissection refinement process we get a desired dissection after finitely many steps.

18 December, 2013 at 8:22 pm

Pace Nielsen

I thought I’d give a quick update on the computational work I’ve been doing, to improve the $k$ bound we have.

I’ve tried a number of different things. One was to limit the lengths of signatures being used. It turns out that this still quickly leads to memory limitations, and also does not give good numerics in the ranges where RAM issues are gone.

Second, Frederik Ziebell sent me a very nice program to compute permutations, which is a bit slower than Mathematica, but not nearly as RAM intensive. Unfortunately, that program still eventually runs into memory issues as well!

Third, I tried pushing the degree 10 computations further, but it is clear that $k=64$ is the best result using the methods I’ve already tried.

Fourth, I found a computer nearby which has 64GB of RAM, and ran the program again. Surprisingly, I kept running into problems. It turns out that there is still trouble when we use signatures $\alpha=\beta=(2,2,1,1,1,1,1,1,1)$ . In the next day or two, I will throw out this case, and maybe two other problem cases, and get a partial degree 11 result (which should push $k$ a bit lower).

It might be the case that I’m just not approaching the problem very well. Here is the computational hang-up:

Let $\alpha,\beta$ be signatures (non-decreasing sequences of positive integers). Define the monomial symmetric polynomials $m(\alpha),m(\beta)$ as before.

We wish to compute $m(\alpha)m(\beta)=\sum_{\gamma} c_{\alpha,\beta,\gamma}m(\gamma)$ . Once the coefficients $c_{\alpha,\beta,\gamma}$ are found, I put them into a look-up table and the rest of the computation does not present much difficulty.

So, the difficulty is in computing the $c$ ‘s. For anyone who wants to try to write a better algorithm than mine, a good test case is when $\alpha=\beta=(2,2,1,1,1,1,1,1,1)$ . [I’m a little surprised there is not already a large look-up table of these coefficients out there (at least I can’t find one). I would think this sort of thing would be extremely useful in other contexts.]

18 December, 2013 at 9:26 pm

James Maynard

In an earlier post I think you mentioned looking at polynomials
$(1-P_1)^a P_{\alpha}$
where $P_1=\sum_{i=1}^kt_i$ and $P_\alpha$ is a polynomial of signature $\alpha$ .

Is it not possible to use these where $\alpha$ has no 1s in its signature? This should form a basis for all symmetric polynomials, and (if I understand correctly) should avoid the RAM issues (for a bit) since you only have to add signatures which have no 1s in them, and so they should be about half the length.

This still is a bit of a hack; I’m sure there is probably a neater solution.

18 December, 2013 at 10:02 pm

Terence Tao

Perhaps one could convert between the basis of symmetrised monomials $m(\alpha)$ and the basis of monomials $e_1^{a_1} \ldots e_k^{a_k}$ of the elementary symmetric polynomials (in your notation, $e_j = m(1,\ldots,1)$ where there are j 1’s)? The point is that multiplication is trivial in the latter basis (particularly from a memory standpoint, which seems to be the bottleneck). Also the operation of integrating out one of the k variables, say t_k, is not too bad in this basis, because of the recursive identities

$e_j(t_1,\ldots,t_k) = e_j(t_1,\ldots,t_{k-1}) + t_k e_{j-1}(t_1,\ldots,t_{k-1})$

and because t_k is being integrated from 0 to $1-e_1(t_1,\ldots,t_{k-1})$ .

The price one pays for this of course is that integrating on the entire simplex is unpleasant in this basis, but one could now convert back to the symmetrised monomial basis $m(\alpha)$ in which integration is easy. It seems to be that computing the structure constants $c_{\alpha,\beta,\gamma}$ for $m(\alpha) m(\beta)$ is relatively easy if $\beta = (1,\ldots,1)$ is elementary, so the conversion from the monomials-of-elementary-functions basis to the symmetrised-monomials basis should be fairly straightforward. (The reverse conversion may not actually be necessary, if one stores F in the former basis rather than the latter.) In any case, these conversions should occupy significantly less memory (if one has an N-dimensional space of polynomials, any conversion matrices will take up $O(N^2)$ memory, as opposed to the $O(N^3)$ memory required for the multiplication structure constants $c_{\alpha,\beta,\gamma}$ .)

For your specific example m(221111111), it looks like this polynomial is an explicit linear combination of $e_2 e_9, e_1 e_{10}, e_{11}$ , so $m(221111111)$ is a combination of $e_2^2 e_9^2, e_1 e_2 e_9 e_{10}, e_2 e_9 e_{11}, e_1^2 e_{10}^2, e_1 e_{10} e_{11}, e_{11}^2$ , and each of these should be easily expandable back into $m(\gamma)$ ‘s.

19 December, 2013 at 1:46 pm

Pace Nielsen

James and Terry, these are both great ideas. I went with James’ idea first since it is incredibly easy to implement inside the code I have already written, and it essentially doubles the degree without much extra work. [The degree 10 case takes only about 1 minute now.]

I first accidentally ran the computation on polynomials of the form $(1-P_1)^a m(\alpha)$ , where $\alpha$ is a signature of degree $\leq 20$ with only *even* entries. So this set of polynomials doesn’t quite span the space of all symmetric polynomials of degree 20, but it still seems to give very good numberics (perhaps better time-wise than using the full basis–I’ll experiment with this). I obtained the lower bound $M_{59}\geq 4.06$ .

I’ll continue running the code, lift the restriction on signatures with only even entries, and report more results tomorrow.

19 December, 2013 at 2:01 pm

Terence Tao

Ah, so we have hit $H_1 \leq 300$ , excellent! It’s remarkable how close the $\frac{k}{k-1} \log k$ upper bound is holding up (and also reassuring that we are not contradicting that bound).

Incidentally, I changed the attribution for H(k) for small k (up to 171) on the wiki to Clark and Jarvis rather than Engelsma, as their work was earlier. (Engelsma gets optimal values up to k=342, and not necessarily optimal values up to k=800 or so.)

19 December, 2013 at 2:20 pm

Andrew Sutherland

The change in attribution makes sense. I’ll add a note to the entries in the admissible tuples database for k < 172.

If we manage to get k down to 26 we can go back further to Smith's 1957 paper.

19 December, 2013 at 2:37 pm

Eytan Paldi

This agrees with my estimate (based on linear convergence assumption) above that $M_{59}$ should be about $4.06$ , according to that assumption (with convergence rate of about $0.8$ ), the minimal $k$ for which $m_k > 4$ was estimated to be $k=55$ or $56$ .

20 December, 2013 at 9:46 am

Pace Nielsen

Doing as above (considering $(1-P_1)^a m(\alpha)$ where $\alpha$ is a signature with only even entries, and the total degree is at most 20) yields $M_{55}\geq 4.0051689$ . [This method is definitely faster than using all monomial symmetric polynomials of a given degree, but that can still be done.] If Eytan’s predictions hold true, this will be the smallest $k$ value for which $M_{k}\geq 4$ on $\mathcal{R}_k$ .

But just to give more data, here are the lower bounds on $M_{54}$ using all monomials of a given degree, for degrees $d$ from 1 to 14.

$d=1,\ M_{54}\geq 2.843513...$
$d=2,\ M_{54}\geq 3.162473...$
$d=3,\ M_{54}\geq 3.346057...$
$d=4,\ M_{54}\geq 3.478512...$
$d=5,\ M_{54}\geq 3.579515...$
$d=6,\ M_{54}\geq 3.658674...$
$d=7,\ M_{54}\geq 3.722270...$
$d=8,\ M_{54}\geq 3.773642...$
$d=9,\ M_{54}\geq 3.815790...$
$d=10,\ M_{54}\geq 3.850349...$
$d=11,\ M_{54}\geq 3.878997...$
$d=12,\ M_{54}\geq 3.902667...$
$d=13,\ M_{54}\geq 3.922386...$
$d=14,\ M_{54}\geq 3.938729...$

If any further computation is necessary, let me know.

20 December, 2013 at 9:59 am

Eytan Paldi

Since the difference $M_k - M_{k-1}$ should be very close to $\log k - \log (k-1)$ (i.e. very nearly $1/k$ ), we expect $M_{54}$ to be very close to $M_55 - 1/55$ – i.e. about $3.987$ .

20 December, 2013 at 10:42 am

Aubrey de Grey

3.987 is indeed very close to what our previous method (extrapolating differences with a 0.8 convergence rate) predicts for d=20. However, allowing d to rise to 26 appears to let M_54 creep over 4 – I trust Pace will be able to test this prediction – so it looks as though 0.8 is slightly over-pessimistic. It seems pretty certain that the corresponding series for k=53 will converge below 4, however (though of course that must also be checked).

20 December, 2013 at 11:04 am

Aubrey de Grey

Actually I retract my pessimism for k=53. Using a convergence rate of 0.8 as above, the limit of M_54 as d tends to infinity is about 4.004, but the ratio of differences for d > 10 is actually more like 0.83, which translates to a limit extremely close to 4 + 1/54, i.e. to a prediction that M_53 may creep over 4 but only for terrifyingly large d.

20 December, 2013 at 11:19 am

Aubrey de Grey

Pace – how much difference are you typically seeing for a given k and (double-digit) d between M calculated with all monomials versus only even entries? If the difference (or ratio, or some other simple function) is relatively constant, it may allow the even-entry version to act as a highly accurate estimator of what d (if any) would be needed to get M_53 > 4 using all monomials.

20 December, 2013 at 12:16 pm

Pace Nielsen

First, I should mention that using only even entries in the signature was a mistake on my part (which fortunately sped up the computation quite a bit). My intuition is that it should be better to use signatures with entries limited to $\{2,3,\ldots, d_0\}$ for some small $d_0$ , which will keep the speed increase but not be quite so ad hoc.

Here are two data points about using limited signatures: For degree 4, $k=59$ , using all monomials we get $M_{59}\geq 3.504684\ldots$ while with restricted signatures we get $M_{59}\geq 3.49235105\ldots$ . Similarly, in degree 10 we get the bounds $M_{59}\geq 3.898402\ldots$ and $M_{59}\geq 3.88376877\ldots$ . So it doesn’t appear that there is a significant difference in the outcome.

Regarding your $k=53$ pessimism, it should be kept in mind that these computations do not continue on at the same convergence rate once the degree gets to be about $d=k/2$ , because the monomials involved start overlapping more often. (There are fewer monomial symmetric polynomials in $F^2$ due to the lack of available variables.) So I would guess that $k=53$ is not within the realm of possibility–but I may be wrong.

I’ll see if I can’t put together some code to try degree 26 for $k=54$ . (Unfortunately, I do have finals to grade, and a family vacation to take this next week. So it may be a little while before it gets done.)

20 December, 2013 at 12:24 pm

Eytan Paldi

I agree with Aubrey’s conclusions. I give below more details

The last three ratios among consecutive differences are:

$0.82623... , 0.83307... , 0.82879...$ – showing good stability near Aubrey’s value (0.83) for the estimated convergence rate. Using this rate we have the prediction that $M_54$ should be about $4.018$ (which is safely above the threshold $4$ – concerning the small fluctuations of the ratios around $0.83$ ). It seems however that $M_54 -1/54$ is “too close” to $4$ to reliably decide if $M_53$ is above or below $4$ (because of the small fluctuations in the above ratios.)

20 December, 2013 at 2:12 pm

Aubrey de Grey

Thank you Pace. The point about d > k/2 is well taken. The difference between the two versions of M_k for a given d is not so small, though – in fact I think it is enough, especially since it is bigger for d=10 than for d=4, that M_54 will probably not reach 4 for any d (certainly not d=26), if the even-entries version of your code is used. (That assumes that the difference is relatively independent of k, though.) Your new idea of a hybrid that includes only small odd numbers sounds ideal.

20 December, 2013 at 3:44 pm

Aubrey de Grey

Pace – before you write new code I should mention that if the convergence rate of 0.83 (rather than 0.8) is accurate, M_54 should actually reach 4 with d as small as 19.

20 December, 2013 at 3:50 pm

Aubrey de Grey

Damn, apologies – actually not until d=22.

19 December, 2013 at 4:33 am

Aubrey de Grey

The search for an ideal R has thus far proceeded by identifying extra volume that can be added to an R that is already known to be usable. I’m wondering whether there might be promise in the reciprocal option of first identifying a region that provably contains every possible R and then removing polytopes to arrive at a region that works. Would a suitable starting-point for such an approach be the union of k prisms of the form currently being investigated, i.e. a half-sized version of the region Terry describes in https://terrytao.wordpress.com/2013/12/08/polymath8b-iii-numerical-optimisation-of-the-variational-problem-and-a-search-for-new-sieves/#comment-257354 as the (Selberg-specific) minimal constraint for the sumset R+R ? (I’m actually having trouble seeing what prevents the use of that entire region itself as the maximal R, but I’m sure I’m missing something.)

19 December, 2013 at 7:58 am

Terence Tao

I had initially used this very region before realising that R+R would be too big, due to the contribution of sums r+r’ where r,r’ lie in different prisms. Pace’s symmetrised region $\{ x,y,z \in [0,1]^3: x+y+z \leq 3/2 \}$ is the largest symmetric convex region available, because if $(x,y,z)$ lies in a symmetric R then so does $\frac{1}{3}(x+y+z,x+y+z,x+y+z)$ , and so R+R contains $\frac{2}{3}(x+y+z,x+y+z,x+y+z)$ , forcing $x+y+z\leq\frac{3}{2}$ . There is a possibility of using a non-convex R, though one should proceed with caution in this case and recheck all the other steps of the argument.

19 December, 2013 at 8:42 am

Eytan Paldi

It seems that the generalization of Pace’s region for $\mathcal R''_k$ is

$c_k \mathcal R_k$ with $c_k = max (k/(k-1), 1/ \theta)$

19 December, 2013 at 9:18 am

Eytan Paldi

For the above region we (obviously) have the upper bound
$M''_k \le c_k M_k \le (k/(k-1))^2 \log k$

19 December, 2013 at 11:20 am

Eytan Paldi

The second inequality above is implied (of course) only for $c_k = k/(k-1)$ – i.e. $\theta \ge 1-1/k$ .

19 December, 2013 at 10:56 am

Aubrey de Grey

Ah yes, of course – indeed, for k=3 it’s immediate that no R, convex or otherwise, can be bigger than half the unit cube, simply because no pair of points whose midpoint is epsilon away from (1/2,1/2,1/2) can both be in R. I’m not sure whether this generalises to larger k, though, because I’m not clear what the equivalent sumset condition is – I see various possibilities. Could you please state it explicitly?

19 December, 2013 at 11:36 am

Terence Tao

For general values of $k,\theta$ , the constraint is that R+R must lie in the set

$\bigcup_{j=1}^k \{ (t_1,\ldots,t_k) \in [0,2/\theta]^k:$
$t_1+\ldots+t_{j-1}+t_{j+1}+\ldots+t_k \leq 2 \}$

which is the union of k prisms. For the current focus of improving the bounds on H_1 on EH, we are taking $\theta=1$ , but one could imagine that the $\theta=1/2$ case could be useful for improving the Bombieri-Vinogradov level of k (currently at 64), e.g. by using the longer prism $\{ (t_1,\ldots,t_k): t_1+\ldots+t_{k-1} \leq 1; t_k \leq 2 \}$ , which looks like a promisingly large amount of extra room over ${\cal R}_k$ .

20 December, 2013 at 2:46 am

Anonymous

Dear Pro.Tao,
Until now,proving H=2 is possible?Why does my intuition and mental telepathy keep feeling that 95% you can do.I’m really happy to see your answer very much.
Thank you very much,

19 December, 2013 at 2:59 pm

Eytan Paldi

The lower bound for $M''_3$ in the wiki page should be $1.914$ (as given by James.)

[Corrected, thanks- T.]

20 December, 2013 at 8:18 am

Eytan Paldi

Currently there are several candidates for $\mathcal R''_k, \theta$ (and perhaps more future ones) for which this notation (as well as $M''_k$ ) is somewhat vague – so this notation should be made clearer.

20 December, 2013 at 1:20 pm

Eytan Paldi

In the asymptotic analysis for the lower bound for $M_k$ in the wiki page, the integral in the numerator of the last upper bound expression for $\Delta$ is not sufficiently clear.

[Clarification added -T.]

20 December, 2013 at 2:21 pm

Eytan Paldi

It seems clearer to replace $a$ by $a(u)$ in the integrand[OK, -T.].

20 December, 2013 at 2:05 pm

Polymath8b, IV: Enlarging the sieve support, more efficient numerics, and explicit asymptotics. | What's new

[…] for small values of (in particular ) or asymptotically as . The previous thread may be found here. The currently best known bounds on […]

21 December, 2013 at 9:07 am

Eytan Paldi

There are typos in the values of the upper bounds for $M_{40}, M_{59}$ appearing in the table in the wiki page. There is also a typo in the approximate value of $M_2$ in the last line of that page.

[Corrected, thanks – T.]

31 December, 2013 at 11:38 am

John Nicholson

Can someone explain the link of 272 to here, from the table with m=1 at: http://michaelnielsen.org/polymath1/index.php?title=Bounded_gaps_between_primes

31 December, 2013 at 12:06 pm

Andrew Sutherland

The link is to Pace Nielsen’s comment in which he notes that M_55 > 4 = 4m. By Maynard’s result (summarized above), this implies that every admissible 55-tuple has infinitely many translates that contain at least 2 primes. The bound 272 comes from the fact that there exists an admissible 55-tuple of diameter 272, e.g. this one.

2 April, 2014 at 12:30 pm

Gergely Harcos

In Section 2 of this post, $X_1+\ldots X_{k-1}\geq k-T$ should be $X_1+\ldots X_{k-1}\leq k-T$ (3 occurrences). Correspondingly,
in the text, “at most $k-T$ ” should be “at least $k-T$ “.

[Corrected, thanks – T.]

	Anonymous on On product representations of…
	Alex Gunning on A symmetric formulation of the…
	Terence Tao on On product representations of…
	domotorp on On product representations of…
	Terence Tao on 275A, Notes 3: The weak and st…
	Terence Tao on A symmetric formulation of the…
	Anonymous on On product representations of…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on 275A, Notes 3: The weak and st…
	Alex Gunning on A symmetric formulation of the…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on 275A, Notes 3: The weak and st…
	Anonymous on It ought to be common knowledg…

Polymath8b, III: Numerical optimisation of the variational problem, and a search for new sieves

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

154 comments

Leave a comment Cancel reply

For commenters

Polymath8b, III: Numerical optimisation of the variational problem, and a search for new sieves

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

154 comments

Leave a comment Cancel reply

For commenters