This is the final continuation of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and this previous post, that covers the Type I and Type II sums.) The main purpose of this post is to present (and hopefully, to improve upon) the treatment of the final and most innovative of the key estimates in Zhang’s paper, namely the Type III estimate.

The main estimate was already stated as Theorem 17 in the previous post, but we quickly recall the relevant definitions here. As in other posts, we always take to be a parameter going off to infinity, with the usual asymptotic notation associated to this parameter.

Definition 1 (Coefficient sequences)Acoefficient sequenceis a finitely supported sequence that obeys the boundsfor all , where is the divisor function.

- (i) If is a coefficient sequence and is a primitive residue class, the (signed)
discrepancyof in the sequence is defined to be the quantity- (ii) A coefficient sequence is said to be
at scalefor some if it is supported on an interval of the form .- (iii) A coefficient sequence at scale is said to be
smoothif it takes the form for some smooth function supported on obeying the derivative boundsfor all fixed (note that the implied constant in the notation may depend on ).

For any , let denote the square-free numbers whose prime factors lie in . The main result of this post is then the following result of Zhang:

Theorem 2 (Type III estimate)Let be fixed quantities, and let be quantities such thatand

and

for some fixed . Let be coefficient sequences at scale respectively with smooth. Then for any we have

In fact we have the stronger “pointwise” estimate

(This is very slightly stronger than previously claimed, in that the condition has been dropped.)

It turns out that Zhang does not exploit any averaging of the factor, and matters reduce to the following:

Theorem 3 (Type III estimate without )Let be fixed, and let be quantities such thatand

and

for some fixed . Let be smooth coefficient sequences at scales respectively. Then we have

for all and some fixed .

Let us quickly see how Theorem 3 implies Theorem 2. To show (4), it suffices to establish the bound

for all , where denotes a quantity that is independent of (but can depend on other quantities such as ). The left-hand side can be rewritten as

From Theorem 3 we have

where the quantity does not depend on or . Inserting this asymptotic and using crude bounds on (see Lemma 8 of this previous post) we conclude (4) as required (after modifying slightly).

It remains to establish Theorem 3. This is done by a set of tools similar to that used to control the Type I and Type II sums:

- (i) completion of sums;
- (ii) the Weil conjectures and bounds on Ramanujan sums;
- (iii) factorisation of smooth moduli ;
- (iv) the Cauchy-Schwarz and triangle inequalities (Weyl differencing).

The specifics are slightly different though. For the Type I and Type II sums, it was the classical Weil bound on Kloosterman sums that were the key source of power saving; Ramanujan sums only played a minor role, controlling a secondary error term. For the Type III sums, one needs a significantly deeper consequence of the Weil conjectures, namely the estimate of Bombieri and Birch on a three-dimensional variant of a Kloosterman sum. Furthermore, the Ramanujan sums – which are a rare example of sums that actually exhibit *better* than square root cancellation, thus going beyond even what the Weil conjectures can offer – make a crucial appearance, when combined with the factorisation of the smooth modulus (this new argument is arguably the most original and interesting contribution of Zhang).

** — 1. A three-dimensional exponential sum — **

The power savings in Zhang’s Type III argument come from good estimates on the three-dimensional exponential sum

defined for positive integer and (or ). The key estimate is

Theorem 4 (Bombieri-Birch bound)Let be square-free. Then for any we havewhere is the greatest common divisor of (and we adopt the convention that ). (Here, the denotes a quantity that goes to zero as , rather than as .)

Note that the square root cancellation heuristic predicts as the size for , thus we can achieve *better* than square root cancellation if has a common factor with that is not shared with . This improvement over the square root heuristic, which is ultimately due to the presence of a Ramanujan sum inside this three-dimensional exponential sum in certain degenerate cases, is crucial to Zhang’s argument.

*Proof:* Suppose that factors as , thus are coprime. Then we have

(see Lemma 7 of this previous post). From this and the Chinese remainder theorem we see that factorises as

where . Dilating by , we conclude the multiplicative law

Iterating this law, we see that to prove Theorem 4 it suffices to do so in the case when is prime, or more precisely that

We first consider the case when , so our objective is now to show that

Making the change of variables , , this becomes

Performing the sums this becomes

where is the Ramanujan sum

Basic Fourier analysis tells us that equals when and when . The expression (6) then follows from direct computation.

Next, suppose that and . Making the change of variables , becomes

Performing the summation, this becomes

For each , the summation is a Kloosterman sum and is thus by the classical Weil bound (Theorem 8 from previous notes). This gives a net estimate of as desired. Similarly if .

The only remaining case is when . Here one cannot proceed purely through Ramanujan and Weil bounds, and we need to invoke the deep result of Bombieri and Birch, proven in Theorem 1 of the the appendix to this paper of Friedlander and Iwaniec. This bound can be proven by applying Deligne’s proof of the Weil conjectures to a certain -function attached to the surface ; an elementary but somewhat lengthy second proof is also given in the above appendix.

To deal with factors such as , the following simple lemma will be useful.

Lemma 5For any and any we havein particular

As in the previous theorem, here denotes a quantity that goes to zero as , rather than as .

Note that it is important that the term is excluded from the first sum, otherwise one acquires an additional term. In particular,

*Proof:* Estimating

we can bound

** — 2. Cauchy-Schwarz — **

We now prove Theorem 3. The reader may wish to track the exponents involved in the model regime

where is any fixed power of (e.g. , in which case can be slightly larger than ).

Let be as in Theorem 3, and let be a sufficiently small fixed quantity. It will suffice to show that

where does not depend on . We rewrite the left-hand side as

and then apply completion of sums (Lemma 6 from this previous post) to rewrite this expression as the sum of the main term

plus the error terms

and

where is any fixed quantity and

The first term does not depend on , and the third term is clearly acceptable, so it suffices to show that

It will be convenient to reduce to the case when and are coprime. More precisely, it will suffice to prove the following claim:

Proposition 6Let be fixed, and letfor some fixed , and let be smooth coefficient sequences at scale respectively. Then

for some fixed .

Let us now see why the above proposition implies (8). To prove (8), we may of course assume as the claim is trivial otherwise. We can split

for any function of , so that (8) can be written as

which we expand as

In order to apply Proposition (6) we need to modify the , constraints. By Möbius inversion one has

for any function , and similarly for , so by the triangle inequality we may bound the previous expression by

We may discard those values of for which is less than one, as the summation is vacuous in that case. We then apply Proposition (6) with replaced by respectively and set equal to , and replaced by and . One can check that all the hypotheses of Proposition 6 are obeyed, so we may bound (12) by

which by the divisor bound is , which is acceptable (after shrinking slightly).

It remains to prove Proposition 6. Continuing (7), the reader may wish to keep in mind the model case

Expanding out the convolution, our task is to show that

As before, our aim is to obtain a power savings better than over the trivial bound of .

The next step is Weyl differencing. We will need a step size which we will optimise in later. We set

we will make the hypothesis that

and save this condition to be verified later.

By shifting by for and then averaging, we may write the left-hand side of (14) as

By the triangle inequality, it thus suffices to show that

Next, we combine the and summations into a single summation over . We first use a Taylor expansion and (15) to write

for any fixed . If is large enough, then the error term will be acceptable, so it suffices to establish (17) with replaced by for any fixed . We can rewrite

where is such that and

Thus we can estimate the left-hand side of (17) by

where

Here we have bounded by .

We will eliminate the expression via Cauchy-Schwarz. Observe from the smoothness of that

and thus

Note that implies . But from (10) we have , so in fact we have . Thus

From the divisor bound, we see that for each fixed there are choices for , thus

From this, (18), and Cauchy-Schwarz, we see that to prove (17) it will suffice to show that

Comparing with the trivial bound of , our task is now to gain a factor of more than over the trivial bound.

We square out (19) as

If we shift by , then relabel by , and use the fact that , we can reduce this to

Next we perform another completion of sums, this time in the variables, to bound

by

(the prime is there to distinguish this quantity from in the introduction) and

Making the change of variables and and comparing with(5), we see that

Applying Theorem 4 (and recalling that ) we reduce to showing that

We now choose to be a factor of , thus

for some coprime to . We compute the sum on the left-hand side:

Lemma 7We have

*Proof:* We first consider the contribution of the diagonal case . This term may be estimated by

The term gives , while the contribution of the non-zero are acceptable by Lemma 5.

For the non-diagonal case , we see from Lemma 5 that

since , we obtain a bound of from this case as required.

From this lemma, we see that we are done if we can find obeying

as well as the previously recorded condition (16). We can split the condition (21) into three subconditions:

Substituting the definitions (15), (20) of , we can rewrite all of these conditions as lower and upper bounds on . Indeed, (16) follows from (say)

while the other three conditions rearrange to

We can combine (23), (24) into a single condition

Also, from (9), (13) we see that this new condition also implies (22). Thus we are done as soon as we find a factor of such that

where

and

From (11) one has

if is sufficiently small. Also, from (11), (9) one also sees that

and . As is -smooth, we can thus find with the desired properties by the greedy algorithm. (In view of Corollary 12 from this previous post, one could also have ensured that has no tiny factors, although this does not seem to be of much actual use in the Type III analysis.)

## 83 comments

Comments feed for this article

14 June, 2013 at 9:38 am

Terence TaoWhile writing up the argument I found that the factorisation of the Bombieri-Birch exponential sum for a mdulus d=qr into a modulus q component (which is still of Bombieri-Birch type) and a modulus r component (which is of Ramanujan type) could be more cleanly performed by noting a multiplicativity property of the Bombieri-Birch sum (see the third display in the proof of Theorem 4). This in particular leads to a refinement of Lemma 12 of Zhang that “bakes in” the Ramanujan sum decay more efficiently. This ended up cleaning up the proof (as did application of completion of sums as a black box estimate, without any further need to track Fourier coefficients) but unfortunately did not lead to improved bounds.

As before, we can record the numerics of the critical case:

14 June, 2013 at 2:13 pm

bengreenThis multiplicativity property is a nice observation. It reduces some of the mystery.

16 June, 2013 at 3:01 am

v08ltuThere is something wrong in the above. I get that is less than 1. Its exponent is .

16 June, 2013 at 3:02 am

v08ltuOh, duh I mixed N and x, sorry!

14 June, 2013 at 10:03 am

Terence TaoThree thoughts:

1. The triple exponential sum in Theorem 4 has the unusual property that certain degenerate cases of this sum are

smallerthan the non-degenerate cases (in particular, beating square root cancellation) – usually the degenerate cases are much larger. This leads to certain “diagonal” terms in the Cauchy-Schwarz analysis being significantly smaller than what one would naively expect them to be. The rarity of this phenomenon (degenerate sums being better behaved than non-degenerate sums) may limit the applicability of this very nice trick of Zhang.2. Zhang’s trick should improve the numerical values of the recent preprint of Fouvry, Kowalski, and Iwaniec http://arxiv.org/abs/1304.3199 on the level of distribution of , at least when restricted to smooth moduli. But this is outside the scope of the polymath project and I would presume that FKM are already in the process of working out the details, so I propose that we avoid pursuing this direction here. (But it might be worth leafing through FKM to see if there are any tricks that can also be applied to the Type III sums here.)

3. The discarding of the convolution to reduce Theorem 2 to Theorem 3 looks lossy. In particular it seems to me that Theorem 2 only requires an averaged version of Theorem 3 in which the parameter is averaged over a set of length M (the inverse of an arithmetic progression in ). I have a suspicion that completion of sums is less lossy when there is such an averaging, in principle one might gain an additional factor of something like . This could lead to some noticeable numerical improvements in the exponents. However, I have not yet worked out the details of this.

14 June, 2013 at 2:27 pm

v08ltuRegarding #1, I too thought this mysterious. I mean, so you Weyl shift by multiples of – why doesn’t that just give you something like (and thus no cancellation) rather than a Ramanujan sum (with residue-class differences as an argument)? This is still a philosophy disparity for me.

14 June, 2013 at 9:24 pm

Terence TaoActually, now that I think about it, I know another context in which partly degenerate exponential sums have better cancellation than non-degenerate sums. Consider a quadratic exponential sum for some odd prime . When is non-zero this is a Gauss sum and has square root cancellation – the magnitude is exactly . But when degenerates to zero while stays non-zero, we get perfect cancellation – the magnitude is now zero! Of course, when and both degenerate, there is no cancellation whatsoever and the sum jumps up to .

With the Birch-Bombieri sum there is a qualitatively similar phenomenon: square root cancellation in the fully non-degenerate case, better than square root cancellation in the partially degenerate case ( has a common factor with but does not), and huge in the completely degenerate case.

17 June, 2013 at 12:13 am

Emmanuel KowalskiFKI should be FKM…. (Fouvry-K.-Michel)

[Corrected, thanks – T.]18 June, 2013 at 6:08 am

Ph.MRegarding # 2.

In fact, when an averaging over conveniently factorable moduli is available (as in this case here) we checked is it possible to get a quite high exponent of distribution (up to 4/7). The argument is simpler than Zhang’s treatment and in fact should improve Zhang type III sums quite a bit. See one of Emmanuel’s forth coming blog.

19 June, 2013 at 1:15 am

v08ltuIf I take this comment in the most direct sense, we have when . With no or this would give , but perhaps other aspects will break.

19 June, 2013 at 4:49 am

Andrew SutherlandWe would then have H(110)=628 (from Engelsma’s tables).

19 June, 2013 at 6:47 am

Terence TaoIf the Type III sums became that good then the bottleneck then comes from the Type I/II sums (which currently has a constraint of the form ) and also the combinatorial analysis (which has a constraint , which when combined with the constraint coming from the Type I analysis gives that ). The second barrier is a bit stronger, so there is a barrier to getting past at present, but we have basically made no efforts to attack these barriers yet as they have not come close to the battlefront thus far, and presumably they can be knocked back a bit. But in any event I foresee that the Type III sums will cease to become critical in the near future, and we will have to turn our attention elsewhere to get gains.

19 June, 2013 at 7:18 am

Ph.MActually, the exponent is for the distribution of which is “smoother” than the actual type III which includes some small non smooth factor ( the variable ) so the exponent is relative to rather than . We will write this up.

19 June, 2013 at 7:56 am

Terence TaoNaively, if one assumes a level of distribution for the component of the type III sums, one expects a condition of the form

(ignoring epsilons) which on plugging in the critical case from the combinatorial analysis gives

thus

.

From Type I analysis (and ignoring and infinitesimals) we can take so this gives a constraint , but this is unfortunately occluded by the existing constraint coming from the requirement . It may well be that one can also exploit the averaging as we are currently doing but this will only serve to weaken the threshold and not the one. So my prediction is that the Type III improvement will be strong enough that it no longer provides the dominant obstruction to bounds, with the combinatorial constraint (combined with the constraint from the Type I analysis) dominating instead, and basically placing close to .

14 June, 2013 at 12:51 pm

Pace Nielsen“Chiense” should be “Chinese”

In equation (8) there is an extra left-hand absolute value sign. [It might be more readable to make some of the absolute value signs bigger, so it is clear they encompass some of the sums.]

In the third offset equation above (8), it appears that a right-hand absolute value sign is missing.

Sorry for not having any useful things to say about the mathematics. I am enjoying reading these posts.

[Corrected, thanks – T.]14 June, 2013 at 1:40 pm

Pace NielsenI did have one question, about the definition of “coefficient sequence”. Isn’t every finitely supported sequence a coefficient sequence? From the finite support, we know that the image of the sequence is bounded. But for sufficiently large x, log(x) then dominates. (Or is x not independent of n and somehow?)

14 June, 2013 at 2:42 pm

v08ltuI had the same fret, reading this definition before. It’s finitely supported, but has some asymptotic bounds? I guess this needs to be interpreted, as saying that the constants in the have been picked before hand?

14 June, 2013 at 5:55 pm

Terence TaoEvery

fixed(i.e. independent of x) finitely supported sequence is a coefficient sequence, but a finitely supported sequence that depends on x need not be a coefficient sequence. (The distinction between quantities that depend on the asymptotic parameter , and quantities independent of that parameter (which are called “fixed”) here, forms the basis of what I call “cheap nonstandard analysis“, which allows one to work rigorously with asymptotic notation without having to deploy the full machinery of nonstandard analysis. It is briefly introduced in the earliest of this series of posts, back at https://terrytao.wordpress.com/2013/06/03/the-prime-tuples-conjecture-sieve-theory-and-the-work-of-goldston-pintz-yildirim-motohashi-pintz-and-zhang/ .15 June, 2013 at 12:07 pm

Gaston RachlouIn my humble opinion, this terminology is not a very happy innovation, and could be replaced with no harm by its definition, accessible to any mathematician. One has to understand that your is in fact a . Moreover, it is not immediately clear if the support of this sequence (supposed to be finite for fixed) depends on or not.

15 June, 2013 at 12:51 pm

Terence TaoTo quote from the first post in this series where this notation is introduced:

The alternative would be to explicitly subscript almost every quantity (other than the fixed ones) in these posts by , which would be quite distracting, since this dependence occurs for most of the objects introduced in the post and thus is better suited to be the default assumption in one’s notation as specified above, rather than the other way around. (For instance, in the above post, the objects , , , , and associated structures (e.g. the support of ) are allowed to depend on or to be bound to a range depending on (as indicated by the default convention that is in force throughout these posts), with only declared as fixed.)

15 June, 2013 at 8:39 am

meditationataeThe Bombieri-Birch bound in Theorem 4 is given in long outline form above, although presumably this wasn’t re-argued in Zhang’s paper, am I right? In expression (6), it seems that is the same thing as the assumed prime about four lines above, in the “case when is prime” . It seems Zhang’s proof of Theorem 3 involves invoking the Bombieri-Birch triple exponential sum bound of Theorem 4, and all that lies in Section 2 of the post, starting at expression (7) till the end of Section 2; is that about right? Finally, my intuition is that Theorem 2 is not so hard, given Theorem 3, and the “hard theorem” would be Theorem 3, Zhang’s Type III estimate without …

15 June, 2013 at 9:22 am

Terence TaoYes, this is basically correct. Theorem 4 above corresponds to Lemma 11 in Zhang, although Theorem 4 is also recording an additional cancellation in the degenerate case when k has a shared factor with q but not m-m’; this cancellation was implicit in Section 14 but it seems simpler to place it in Theorem 4 instead.

15 June, 2013 at 1:47 pm

Terence TaoI think I can indeed get some gain here from working with an averaged version of Theorem 3. More precisely, if one inspects the derivation of Theorem 2 from Theorem 3, we see that it actually suffices to replace the conclusion of Theorem 3 by the weaker

where for some and the coefficients are . (In the regime of interest we have .)

If we run the arguments of Section 2 carrying this new averaging in A with us, (8) becomes

and the conclusion of Proposition 6 is similarly replaced with

.

Continuing the argument through to (19), we see that (19) is replaced with

Squaring this out and continuing the arguments of the main post, we see that it suffices to establish the bound

note crucially that the third argument of the exponential sum now acquires a factor of a’ instead of a. Using Theorem 4, the left-hand side is essentially bounded by

The key point here is that generic are usually coprime, making the diagonal case somewhat rarer than the diagonal case considered in Lemma 7. Writing , for , one can rewrite

and by using the divisor bound this is basically back to the situation in Lemma 7 but with m, m’ now ranging at scale O(MM’) rather than just O(M’). As such, the first two terms on the RHS of Lemma 7 acquire an additional factor of M^{-1} after performing the averaging in a,a’. According to my calculations, this allows one to loosen the constraint

in Proposition 6 with the pair of constraints

(the latter condition comes from (22) which is no longer so obviously dominated by (24), although it will ultimately still be redundant in the final analysis after optimising with the Type I/II analysis). This means that we can replace

in Theorem 2 with

and

The former constraint is equivalent to

Returning to the combinatorial analysis, the necessary condition

is now relaxed to the combination of

and

which, upon setting close to (and also recalling the condition ) becomes (if all my calculations are correct)

and

and

The second and third conditions are weaker than the first so we are left with , which would be a significant improvement over the previous bound of .

Needless to say, this should all be checked, and should be currently viewed as provisional, but I'm optimistic that we have a good gain here.

15 June, 2013 at 4:04 pm

v08ltuI have been too busy to be PolyMathing recently, but I think that the work of Ping Xi on Burgess for Kloosterman sums that I mentioned in the other thread has good hope of some improvement on the other end of things. The main technical issue is to pass from prime to composite moduli (and write the details of the proof in the first place) w/o losing too much, which is not always trivial in a Burgess setting (it is not simply CRT) vis-a-vis coprimality accounting.

I contend that the main “problem” with 13&14 was that the diagonal was so dominant (that K needed to be so big), and so your idea seems to quell it via averaging. What is the critical K with these provisional parameters?

15 June, 2013 at 10:55 pm

v08ltuI agree that this $m$-averaging should in principle work. But to verify all the details I still need to change your notation into my notation (and compare to Zhang’s notation, particularly ensuring that “m” is not confused with “m” any where).

16 June, 2013 at 4:41 am

MichaelI think the redundant inequality near the end should have instead of

[Corrected, thanks; also I added the additional constraint from the combinatorial analysis. Fortunately, neither of these corrections ended up affecting the headline bound. -T]15 June, 2013 at 3:59 pm

Andrew SutherlandDo we know approximately what value this would imply for ?

15 June, 2013 at 4:11 pm

v08ltuI think and works, but I have to run.

15 June, 2013 at 6:25 pm

Terence TaoHere’s what I get with this choice of parameters (ignoring infinitesimals):

and

which gives us about of room for and . Using Gergely’s first improved value

of (the derivation of which is now in the main post of https://terrytao.wordpress.com/2013/06/11/further-analysis-of-the-truncated-gpy-sieve/ ), I get and , so the bar looks like it is cleared (though I do not have an error analysis on the numerical integration I performed). With Gergely’s provisional values of I get essentially the same result. So I am happy to confirm the value once the condition is confirmed.

For the above values of , we can recalculate the critical numerology as follows:

So K is looking a lot smaller than it used to be (in the previous critical case, it was closer to ).

I have to run, but I will try to confirm Gergely’s new provisional argument for soon (and maybe sneak a look at the Ping Xi paper, though this may have to wait a bit). I’ll also have to double-check the argument, though if someone else could also have a look at this, that would be great.

15 June, 2013 at 6:33 pm

v08ltuI get allows .

15 June, 2013 at 6:37 pm

v08ltuIn fact, also allows (just barely), not sure why I missed it the first time around, but I was in a hurry. I will try to sort through the new argument for 13&14. The condition is “87” omega, correct? In the latter comment you twice have “84”. I will work thru it in any case.

15 June, 2013 at 8:06 pm

Terence TaoOops, sorry about that, that was a typo and I have just corrected it. Yes, the correct condition (I believe) is .

15 June, 2013 at 6:42 pm

Gergely HarcosSounds great. For I get , while the provisional is a bit smaller, about . BTW I am more confident about the provisional and now, but I also apologize for any failures, past and future.

15 June, 2013 at 8:10 pm

Eytan PaldiNote that by decreasing by 1, the change in the denominator of the LHS ratio seems to be since the asymptotic expansion of starts with n. Hence, small changes in results in LARGER relative changes of .

So even this small value of is perhaps still not negligible, and further work on its reduction may be important for such values of

.

16 June, 2013 at 3:21 am

v08ltuI think the gain from the above is in some sense from the betterment of the last line of page 50 of Zhang on the diagonal estimation, replacing the crude worst-case bound of with an on-average bound, perhaps not as good as (I am still checking details, this is the length of the sum, so you can’t beat that) but enough to make rather harmless.

16 June, 2013 at 3:33 am

v08ltuAccording to my ansatz, there are three terms, namely the Weyl shift error which is , the -diagonal error which is dominated by giving , and the BB-Deligne term which is , the savings being from the Zhang trick. Previously the second (diagonal term) was , so the -average saved (i.e., the -diagonal now dominates the double -sum, sqrt due to Cauchy of course). Oh, there is also the -modulus error which I think is the same as the Weyl shift, but with instead of . Now to check the final numbers…

16 June, 2013 at 4:02 am

v08ltuNoting that in the equal case, I get , then , and plugging in for gives me . Check.

16 June, 2013 at 4:10 am

v08ltuSo I can be clear, I did not check every detail about how much the -average saved (computing all the gcd’s that is), just did enough to convince my self the main term “should” dominate and compute its effect.

16 June, 2013 at 7:29 am

Terence TaoThanks for this! The fact that you used a slightly different approach lends more confidence to the confirmation.

It looks like the main condition is still much stronger than all the other necessary conditions that are needed in the analysis, so it should be safe to list this condition as confirmed, as well as the corollary that (and hence ) are confirmed as well.

15 June, 2013 at 6:03 pm

Gergely HarcosWith the provisional improved values for and (see https://terrytao.wordpress.com/2013/06/11/further-analysis-of-the-truncated-gpy-sieve/#comment-234683) I get that and is best.

15 June, 2013 at 6:02 pm

Andrew SutherlandThanks. We then have . But I’m sure this bound will improve.

15 June, 2013 at 9:47 pm

Terence TaoLooks like Polymath8 has made three orders of magnitude of improvement then! (But we’ve cleaned out almost all the low hanging fruit… I’d be surprised if we get more than one more order of magnitude from the rest of the project.)

16 June, 2013 at 4:48 am

v08ltuIf we could make the (sieve) estimates be “-free” and “-free, via some impossible sieving, that would leave into the , which I find permits . So any gains there might appear slight.

I should probably directly compute how much a prospective Burgess-Kloosterman bound would achieve, but I instead will speculate it could be an or two in . Each reduces the “87” by 10, and one such reduction yields the above 5446 down to 4518, and two reductions to 3651. Even still, that is not a halving from currently 6329.

16 June, 2013 at 5:08 am

Andrew SutherlandFor reference, I get , , and (I expect these can all be improved, but they should be reasonably close to the optimal values).

16 June, 2013 at 5:17 am

Andrew SutherlandI notice that 3651 is in the range considered by Engelsma (see http://www.opertech.com/primes/summary.txt), and his results imply an upper bound of 33076 (which I suspect involved a lot of computation), whereas the bound 33070 I gave above took less than 5 minutes to find. This suggests that the polymath8 techniques for computing upper bounds on H(k0) are getting pretty good.

16 June, 2013 at 7:24 am

Terence TaoThanks for this – it’s good to get some benchmarks for best-case scenarios, even if they don’t project fantastic new gains.

Incidentally, as we work with larger and larger values of and (and smaller values of ), other obstacles begin to appear (a bit like how rocks submerged far beneath the ocean surface do not pose a problem for navigation until the ocean level drops too low); for instance the combinatorial analysis has a condition which previously has not been any difficulty whatsoever but will for instance block from exceeding even in the absence of unless the Type I analysis improves or if we can somehow improve the combinatorics.

Still, we are not completely out of things to try yet, even if I do agree that further progress is going to require increasing effort and we are not going to see the dramatic order-of-magnitude improvements that we had once or twice in the past. At some point the progress will reach a natural plateau, at which point it will be time to turn to the writing up stage of the project.

16 June, 2013 at 9:22 am

Terence TaoI haven’t checked any details yet, but there may be a way to do the Weyl differencing more efficiently. Right now we shift to for ; this limits the size of to roughly . The presence of the allows one to achieve the cancellation

where . But what if we don’t bother putting in the shift, i.e. only shifting to ? This enlarges K significantly, to about , which should help attenuate the diagonal contributions even further. The cost is that the cancellation now looks slightly worse,

but I think the factor attached to , being coprime to , is ultimately harmless in the rest of the argument (at least as far as a quick glance at Theorem 4 suggests), though this certainly needs to be checked. But this may lead to a further noticeable gain in the numerology. (I think someone pointed out in one of the earlier threads that there was potentially more slack in the Type III estimates than the Type I/II estimates, as the latter have already been optimised quite a bit by BFI and other very strong previous authors, whereas the Type III argument is original to Zhang and thus not nearly as optimised.)

16 June, 2013 at 9:57 am

Terence TaoA quick calculation of the numerology:

With the previous improvement to the arguments, the conditions to verify for Theorem 2 take the shape (ignoring epsilon factors)

and

(the extra factors of improving over Lemma 7 is the gain coming from exploiting the averaging from the factor). With the new idea, gets increased from to . The conditions on now become

.

The new gain here is in the third inequality which has lost one power of on the RHS. (The second inequality is likely to still be redundant as before.) So we end up with the conditions

for the analogue of Proposition 6; I think the main inequality is still the last one, in which one power of H has now been saved over before. The analogue of Theorem 2 then acquires the conditions

or equivalently

.

The first is majorised by the third and can be deleted. This leads to the constraints

which on substituting gives (if my arithmetic is correct)

The second condition is redundant, so we end up with

.

The condition coming from combinatorics is now also relevant and gives an additional constraint

. (2)

This is extremely rough though and there may be errors, I need to double check this.

16 June, 2013 at 10:18 am

Terence TaoOne interesting new phenomenon: if one tries to move past 1/10 (i.e. to go past (2)) then we will have to contend with a new type of sum – a “Type IV sum” if you will – a model case of which is where all are smooth and at scale about .

Gotta run, more later…

16 June, 2013 at 10:34 am

Andrew Sutherland@v08ltu, Harcos: any estimates of what this would yield for ?

16 June, 2013 at 2:17 pm

v08ltuI get for , but this breaks condition (2). From that I get and .

16 June, 2013 at 1:08 pm

Terence TaoRechecked the numerology. For the analogue of Prop 6, there are actually a few more constraints coming from the requirements , which take the form

but these look very mild; they transform to

which become the extremely weak

). The long-quiescent Type I/II border is also giving a constraint

but this is still majorised by . So the conditions (1), (2) are still the only necessary conditions in this analysis, representing the need to control the Type I/III and a new Type I/IV border. But if we improve things much more, we may have to fight battles on many fronts at once, including the Type I/II border…

I will probably write a new blog post on the latest Type III estimate since it now has two major alterations from what is posted above. But this may take a day or two.

16 June, 2013 at 2:53 pm

Hannes155 varpi + 31 delta < 1 (is this what you call (1)?) is dominated by 11 varpi + 3 delta < 1/20

16 June, 2013 at 2:03 pm

v08ltuI considered forgetting the from the Weyl shift awhile back, but (after looking at FI) somehow convinced myself that it should not work. Maybe in our context it turns out Ok for some reason.

16 June, 2013 at 3:41 pm

v08ltuIn the notation of Zhang S14, I think the issue is that you don’t have modulo anymore in the of the Cauchy, but now just . Thus the contribution is not , but . At least that’s what I think happens, when comparing to what I determined about FI.

16 June, 2013 at 4:39 pm

Terence TaoHmm, but it seems to me that changing the shift doesn’t change the relation because one is still canceling by in the first display of S14.

In the notation of Zhang S14, one is starting with a phase of the form

if one shifts by kr instead of hkr. We can rewrite this as

where is exactly the same as before, i.e. . So when one does the Cauchy-Schwarz in using , I don’t think any of the multiplicity analysis changes; instead, the only thing that is affected (other than the larger value of ) is that all occurrences of in S14 of Zhang get replaced by , which seems to change absolutely nothing (the gets carried by all the way to the first line of page 53 of Zhang, when it is killed by Theorem 12).

But perhaps I am missing something…

16 June, 2013 at 4:56 pm

Terence TaoOops, I see the problem now – we can’t have terms that depend on if we want to change variables to and take advantage of the Cauchy-Schwarz in . That was silly of me. So the idea doesn’t work directly, and I’ll have to retract the claim :-(. I’ll still play around to see if something else can be salvaged though.

16 June, 2013 at 5:19 pm

v08ltuMaybe the point is that the definition of is really a double sum over , so I don’t see how you separate the -variable in the Cauchy, considering that you are now carrying it also in ?

16 June, 2013 at 5:24 pm

v08ltuOK, so you found the same thing I did. On the “sociological” level, the reason why I disdained the idea is that FI shifted by -multiples, and I would think they were aiming for more optimisation, and this would be the first “obvious” place to manuever. On the theoretical level, it would mean that you could just pull out of the exponential, for instance and its effect is then prevalent (you are back at with Fourier).

16 June, 2013 at 9:45 am

Thomas J EngelsmaNice to see ‘tuples’ active again !!

Would love to read the Zhang paper, but it is hidden behind wall and I am not affiliated with an university.

Couple of quick questions, believe the proof is for an infinite number of prime pairings separated by some value less than 70 million. Can the proof be adapted for three primes in a larger interval? Can it then be generalized for ‘n’ primes in some massive interval?

—

Reading thru the above, some off my work should be clarified. The chart at http://www.opertech.com/primes/varcount.bmp shows the number of variations that are archived for each width.

There are 3 colors shown

green — variations found from true exhaustive search.

blue — variations found using targeted search routines – close to densest

red — variations generated from smaller tuples – just best found

‘Exhaustive search’ was true pruning exhaustive search.

‘Targeted search’ was recursive searching of a specific width

‘Generated’ was creating tuples using existing patterns (mostly smaller)

This was done due to the evidence that most tuples had smaller tuples embedded.

The tuples in the red zone were investigated mainly to see if k(w) stayed above pi(w) or oscillated around pi(w).

See http://www.opertech.com/primes/trophy.bmp Amazingly, k(w) is concave upward, and every improvement causes it to rise faster.

(current work implies that there is no inflection point, showing ever growing localized order amid the global chaos of the primes)

Also, the improvements of k()-pi() were not evenly scattered as w grew, but clustered (the series of red and blue dots) Whenever a series of red dots does not have a corresponding series of blue dots, those tuple widths should be able to be easily improved.

The headings of the table at http://www.opertech.com/primes/summary.txt are:

1st col — w width of interval, number of consecutive integers

2nd col — k [has value if first occurrence of a k]

3rd col — k number of admissible elements

4th col — k(w) – pi(w) [has value if first occurrence of k()-pi()]

5th col — k(w) – pi(w) over pack/under pack quantity relative to prime count

6th col — number of variations currently archived

The number of variations gives a clue if a tuple can be improved. When tuples in the red-zone have 1000s of variations, that width should be able to be improved. When the searching was active, if an improved tuple was found, it was tested to generate 1000 variations.

—

Searching has been idle for 4 years, but could be revived.

Currently, combinatorics is being used to create a counting function F(w,k) that counts the number of unique patterns of ‘k’ primes in an interval of ‘w’ integers. With the function ‘w’ could be fixed and ‘k’ could be varied, thereby providing the ability to know the densest tuple possible without knowing a single prime location within any pattern.

16 June, 2013 at 9:55 am

andrescaicedoThomas, you may want to take a look at http://mathoverflow.net/questions/132731/does-zhangs-theorem-generalize-to-3-or-more-primes-in-an-interval-of-fixed-len

16 June, 2013 at 10:15 am

Terence TaoThanks for the info! You may also be interested in the discussion at

http://sbseminar.wordpress.com/2013/06/05/more-narrow-admissible-sets

which is more specifically focused on finding narrow prime tuples of a given length.

16 June, 2013 at 10:07 am

Andrew SutherlandHi Thomas, thanks for the chart, and the explanation.

16 June, 2013 at 5:14 pm

Terence TaoA small comment (which does not directly improve bounds, unfortunately): of the three coefficient sequences appearing in the Type III arguments, it is only and that need to have any sort of smoothness (because we apply completion of sums in these variables); in contrast, the sequence is eventually eliminated by Cauchy-Schwarz. So one could in principle replace by other coefficient sequences, such as Dirichlet convolutions of lower scale things. This enlarges the class of combinatorial configurations that are treated by the Type III analysis, but unfortunately does not seem to help in the most important cases (e.g. when and ).

16 June, 2013 at 5:43 pm

Terence TaoGah, I need to retract this also; there is actually a Taylor expansion of (which of course exploits smoothness) in the argument that I did not properly document in the blog post above before, I have fixed this now.

16 June, 2013 at 5:52 pm

v08ltuYes, this smoothness is exploited in estimating the error from Weyl shifts (below 13.13 in Zhang, with his variable ).

16 June, 2013 at 8:19 pm

Derek OJust a side comment, I’m following this thread and polymath8, and this is just amazing. I’m a math/physics major at University of Pittsburgh with a strong interest in number theory. Unfortunately, I don’t know what half this stuff means, but it’s still awesome! I was blown away when H went from 4.5 million to 400,000 then the most recent drop to 60,000; it’s just incredible to me. Anyway, keep up the great work you guys

16 June, 2013 at 10:58 pm

AnonymousAll that’s needed now is Zhang giving the thumbs up!

16 June, 2013 at 9:39 pm

Terence TaoOK, here is another attempt to squeeze a further gain in the Type III sums. I am beginning to suspect that the treatment of the diagonal terms (or after shifting by ) in Zhang (or in the post above) is suboptimal (in a different way from the suboptimality previously identified by not taking advantage of the averaging, which for simplicity I will not try to exploit here). Right now, the diagonal terms in are treated together with the off-diagonal terms via expansion into exponential sums, then at some point one applies an exponential sum bound for an inner sum (in the diagonal case case, one uses the bound on Ramanujan sums, see bottom of page 50 of Zhang) and then uses absolute values and the triangle inequality for the outer sum.

However if one peels off the diagonal case earlier in the argument then one can do better than the triangle inequality, in particular Plancherel’s theorem (which, being an identity, is 100% efficient) appears to be available. Let me first describe this using Zhang’s original paper. In the bottom of page 48 a key quantity is defined, which can be expanded as a sum over pairs . Let us now deviate from Zhang by immediately isolating the diagonal terms in this sum (as opposed to waiting until the bottom of page 50 to try to control these terms); this expression can be written as

If we make the change of variables this becomes

This is an expression which can be estimated almost exactly by Plancherel (the only loss coming from those with a common factor with ), giving a bound of

which surely must be more accurate than the triangle inequality-based treatment of the diagonal case as is currently present in Zhang’s paper. Since the final bound in the Type III case comes from balancing the diagonal case against the off-diagonal case, this should lead to a gain.

In the language of the blog post above, one would similarly extract out the diagonal term from (19) (or the display after (19)) and evaluate it separately using Plancherel.

Tomorrow I will try to work out the precise details of this idea and see if it actually works. Hopefully after two wrong arguments, the third time will be the charm…

17 June, 2013 at 12:06 am

v08ltuWell, on line 4 of page 51, “it follows that the contribution from the terms with on the right side of (14.6) is , so I don’t think you gained anything? Note that Zhang gets Ramanujan in both and on the diagonal (and you improved this in the -aspect via the -average), so it might be hard to beat.

17 June, 2013 at 7:23 am

Terence TaoAh, yes, you’re right about this, both with and without the averaging. Using Plancherel basically replaces Ramanujan by a (normalised) Kronecker delta function but yes, the behaviour is basically the same. Back to the drawing board…

17 June, 2013 at 3:19 pm

v08ltuThe three tools of the mathematician: the paper, the pencil, and the wastebasket…

18 June, 2013 at 2:40 pm

AnonymousGood!

17 June, 2013 at 4:28 pm

Terence TaoWell, this isn’t any direct progress towards improving the bounds, but at least I understand Zhang’s argument a bit better.

We want to understand how the convolution is distributed in the residue class , so we would like to compute the sum

.

with accuracy . Fourier jnversion allows one to rewrite this expression as

where

is essentially of size when and small elsewhere.

If we ignore all that are not coprime to , we may rearrange this as

(*)

where

and

.

The function has a computable norm, basically

and similarly has a computable norm by Plancherel:

(**)

One could apply Cauchy-Schwarz directly to (*) to obtain an upper bound

(***)

however this gives an unusable bound (unless we are in the Bombieri-Vinogradov range ). To do better, Zhang notes that has some smoothness, roughly speaking

whenever is an integer of size . This reflects the fact that the Farey sequence is more or less stable with respect to shifts by integers of size . This is basically all of the regularity that enjoys; one can compute the Fourier transform of in to be spread out more or less evenly across frequencies of size , save for a big concentration at the zero frequency mode of course.

The smoothness of allows one to approximately write (*) as

for any non-empty set of integers of size , so we may replace (***) by the improved bound

. (****)

where is the translate of by . On the other hand, Bombieri-Birch/Ramanujan and completion of sums gives us a bound on the inner product

,

which is basically of the form

, (*****)

which is a bound which is strangely non-monotonic in : when is divisible by d (and in particular in the diagonal case ) it matches what one gets from Cauchy-Schwarz and (**) (this is basically why I got essentially no improvement by trying to treat the case directly), then decreases as becomes increasingly coprime with , until one reaches a transition and the bound worsens again as approaches 1. So to optimise in (****) one would like the differences for to have some common factor with , but not too much of a factor, while also keeping reasonably large to avoid the diagonal terms from swamping everything; and so Zhang factors for a controlled value of and sets to be the multiples of of size . I think I’ve convinced myself that this is more or less the optimal choice of given the bounds available.

So it's the non-monotonicity of the bound (*****) that makes the argument slightly strange, but it appears difficult to improve upon (*****) without somehow gaining the ability to get bounds on short averages of Bombieri-Birch sums that improve upon the triangle inequality, which one can conjecture to be true (in the spirit of Hooley's conjecture) but it looks very difficult (and for this one should start with the Type I/II sums where one "only" has to deal with short averages of Kloosterman rather than short averages of Bombieri-Birch. I poked for a while on the Fourier side (replacing by its Fourier dual variable) to see if there was anything better than Weyl differencing available, but the only thing that seemed to suggest itself was bounds on higher moments (e.g. moment) of the Fourier transform of (or equivalently the Gowers norm of ), which didn't look very enticing. (In principle one gets more square root cancellation from increasingly higher-dimensional cases of the Weil conjectures, but this seems to be more than compensated for by the increased amount of completion of sums and Cauchy-Schwarz one has to perform.)

So, in summary, perhaps we've cleaned up all the low hanging fruit from Type III for now, and it's time to look again at Type I. I had a little look at the Ping Xi preprint giving some power savings on short Kloosterman sums, but have not yet worked out the numerology to see if the ranges in which Xi's bound are non-trivial are relevant here (and it's more "short averages of Kloosterman sums" that we need non-trivial bounds for rather than "short Kloosterman sums" per se).

18 June, 2013 at 6:19 pm

A truncated elementary Selberg sieve of Pintz | What's new[…] that holds for as small as , but currently we are only able to establish this result for (see this comment). However, with the new truncated sieve of Pintz described in this post, we expect to be able to […]

20 June, 2013 at 9:45 am

OktawianTheorem 1 (Bounded gaps between primes) There exists a natural number {H} such that there are infinitely many pairs of distinct primes {p,q} with {|p-q| \leq H}.

I want to add some small observation about this theorem. If H is different than 3 that means there is not infinitly many pairs of even numbers like (n, n+2) and this unproven statement will be proven (or denied) if we will know the smallest H.

22 June, 2013 at 7:39 am

Bounding short exponential sums on smooth moduli via Weyl differencing | What's new[…] in (using Lemma 5 from this previous post) we see that the total contribution to the off-diagonal case […]

22 June, 2013 at 8:01 am

Terence TaoThere is a new blog post of Emmanuel Kowalski at http://blogs.ethz.ch/kowalski/2013/06/22/bounded-gaps-between-primes-the-dawn-of-some-enlightenment/ announcing an improvement in the d_3 bounds on smooth moduli (which should also lead to improvements in the Type III bounds). Interestingly, Emmanuel claims that Weyl differencing may be avoided. He also mentions an older paper of Heath-Brown http://matwbn.icm.edu.pl/ksiazki/aa/aa47/aa4713.pdf that uses some similar ideas, I think I will try to look at that paper first.

23 June, 2013 at 9:14 pm

The distribution of primes in densely divisible moduli | What's new[…] improves upon previous constraints of (see this blog comment) and (see Theorem 13 of this previous post), albeit for instead of . Inserting Theorem 4 into the […]

23 June, 2013 at 10:23 pm

Terence TaoThis thread and the companion Type I/II thread are being rolled over to

https://terrytao.wordpress.com/2013/06/23/the-distribution-of-primes-in-densely-divisible-moduli/

From past experience with polymath projects, now that our understanding of the project is more mature, the pace should settle down a bit from the crazily hectic pace of the last few weeks; I think we’re getting near the finish line and perhaps in a couple more weeks we will find a good place to “declare victory” and turn to the writing part of the project.

25 June, 2013 at 8:34 am

Zhang’s theorem on bounded prime gaps | random number theory generator[…] Crucially, if is composite then we can surpass square root cancellation just slightly. […]

7 July, 2013 at 11:17 pm

The distribution of primes in doubly densely divisible moduli | What's new[…] by Lemma 5 of this previous post and the bound is bounded […]

11 July, 2013 at 8:15 pm

Gergely HarcosMinor correction: in the proof of Theorem 4, “ equals […] when ” should be “ equals […] when “.

[Corrected, thanks – T.]