In a recent paper, Yitang Zhang has proven the following theorem:

Theorem 1 (Bounded gaps between primes)There exists a natural number such that there are infinitely many pairs of distinct primes with .

Zhang obtained the explicit value of for . A polymath project has been proposed to lower this value and also to improve the understanding of Zhang’s results; as of this time of writing, the current “world record” is (and the link given should stay updated with the most recent progress.

Zhang’s argument naturally divides into three steps, which we describe in reverse order. The last step, which is the most elementary, is to deduce the above theorem from the following weak version of the Dickson-Hardy-Littlewood (DHL) conjecture for some :

Theorem 2() Let be an admissible -tuple, that is to say a tuple of distinct integers which avoids at least one residue class mod for every prime . Then there are infinitely many translates of that contain at least two primes.

Zhang obtained for . The current best value of is , as discussed in this previous blog post. To get from to Theorem 1, one has to exhibit an admissible -tuple of diameter at most . For instance, with , the narrowest admissible -tuple that we can construct has diameter , which explains the current world record. There is an active discussion on trying to improve the constructions of admissible tuples at this blog post; it is conceivable that some combination of computer search and clever combinatorial constructions could obtain slightly better values of for a given value of . The relationship between and is approximately of the form (and a classical estimate of Montgomery and Vaughan tells us that we cannot make much narrower than , see this previous post for some related discussion).

The second step in Zhang’s argument, which is somewhat less elementary (relying primarily on the sieve theory of Goldston, Yildirim, Pintz, and Motohashi), is to deduce from a certain conjecture for some . Here is one formulation of the conjecture, more or less as (implicitly) stated in Zhang’s paper:

Conjecture 3() Let be an admissible tuple, let be an element of , let be a large parameter, and definefor any natural number , and

for any function . Let equal when is a prime , and otherwise. Then one has

for any fixed .

Note that this is slightly different from the formulation of in the previous post; I have reverted to Zhang’s formulation here as the primary purpose of this post is to read through Zhang’s paper. However, I have distinguished two separate parameters here instead of one, as it appears that there is some room to optimise by making these two parameters different.

In the previous post, I described how one can deduce from . Ignoring an exponentially small error , it turns out that one can deduce from whenever one can find a smooth function vanishing to order at least at such that

By selecting for a real parameter to optimise over, and ignoring the technical term alluded to previously (which is the only quantity here that depends on ), this gives from whenever

It may be possible to do better than this by choosing smarter choices for , or performing some sort of numerical calculus of variations or spectral theory; people interested in this topic are invited to discuss it in the previous post.

The final, and deepest, part of Zhang’s work is the following theorem (Theorem 2 from Zhang’s paper, whose proof occupies Sections 6-13 of that paper, and is about 32 pages long):

The significance of the fraction is that Zhang’s argument proceeds for a general choice of , but ultimately the argument only closes if one has

(see page 53 of Zhang) which is equivalent to . Plugging in this choice of into (1) then gives with as stated previously.

Improving the value of in Theorem 4 would lead to improvements in and then as discussed above. The purpose of this reading seminar is then twofold:

- Going through Zhang’s argument in order to improve the value of (perhaps by decreasing ); and
- Gaining a more holistic understanding of Zhang’s argument (and perhaps to find some more “global” improvements to that argument), as well as related arguments such as the prior work of Bombieri, Fouvry, Friedlander, and Iwaniec that Zhang’s work is based on.

In addition to reading through Zhang’s paper, the following material is likely to be relevant:

- A recent blog post of Emmanuel Kowalski on the technical details of Zhang’s argument.
- Scanned notes from a talk related to the above blog post.
- A recent expository note by Fouvry, Kowalski, and Michel on a Friedlander-Iwaniec character sum relevant to this argument.
- This 1981 paper of Fouvry and Iwaniec which is the first result in the literature which is roughly of the type . (This paper seems to give a related result for and , if I read it correctly; I don’t yet understand what prevents this result or modifications thereof from being used in place of Theorem 4.)

I envisage a loose, unstructured format for the reading seminar. In the comments below, I am going to post my own impressions, questions, and remarks as I start going through the material, and I encourage other participants to do the same. The most obvious thing to do is to go through Zhang’s Sections 6-13 in linear order, but it may make sense for some participants to follow a different path. One obvious near-term goal is to carefully go through Zhang’s arguments for instead of , and record exactly how various exponents depend on , and what inequalities these parameters need to obey for the arguments to go through. It may be that this task can be done at a fairly superficial level without the need to carefully go through the analytic number theory estimates in that paper, though of course we should also be doing that as well. This may lead to some “cheap” optimisations of which can then propagate to improved bounds on and thanks to the other parts of the Polymath project.

Everyone is welcome to participate in this project (as per the usual polymath rules); however I would request that “meta” comments about the project that are not directly related to the task of reading Zhang’s paper and related works be placed instead on the polymath proposal page. (Similarly, comments regarding the optimisation of given and should be placed at this post, while comments on the optimisation of given should be given at this post. On the other hand, asking questions about Zhang’s paper, even (or especially!) “dumb” ones, would be very appropriate for this post and such questions are encouraged.

## 137 comments

Comments feed for this article

4 June, 2013 at 12:52 pm

Terence TaoI haven’t yet read Sections 6-13 of Zhang’s paper in detail yet, but plan to begin very shortly. A few obvious questions come to mind: firstly, how exactly does Zhang’s results differ from previous results of Fouvry-Iwaniec, Friedlander-Iwaniec, and Bombieri-Friedlander-Iwaniec? (Later on I guess we can ask the question of what is the new twist in Zhang’s work that is not present in the previous works, but first I want to understand why the results are different. For instance Fouvry-Iwaniec’s result already looks extremely similar to Zhang’s result; Zhang has more than one residue class mod q whereas FI has just one residue class mod q but otherwise it looks about the same.)

4 June, 2013 at 1:47 pm

v08ltuOne issue with Fouvry-Iwaniec is that they do not detect primes, but almost-883-primes. This is in general much easier. On a more technical basis, the relevant counting function akin to Lambda decomposes into suitable bilinear convolutions. This is not true for Lambda, as they state in the middle of page 136 (the decomposition is unsuited).

Another issue you state is that the residue is fixed as the modulus varies, though Zhang has a different method to deal with that.

For BFI, they are able to detect primes, but not with the correct sort of arithmetic in the convolutions to get true error bounds. They describe this well in their introduction. For instance, in their Theorem 8, one lacks the absolute value in the PNT error. In their Theorem 9, they are able to handle some products of moduli, but again the error over some of them is cancellative, not in absolute value. In their Theorem 10, they are not able to obtain that the sign function is well-factorable (as they indicate page 207), so again they don’t get this with absolute value.

I am not too familiar with Friedlander-Iwaniec (at least this paper), and it seems written in a different ansatz. Perhaps one can just say that Zhang (among all other things) imitates and enlarges their work for the ternary divisor function to Lambda.

4 June, 2013 at 1:00 pm

Terence TaoWhen reading through a technical portion of a paper for the first time I find it convenient to activate some “cheats” to factor out some difficulties in the paper. Here are some such “cheats” that I plan to use (some of them suggested to me by Ben Green):

1. In order to get the basic ideas of the argument we may want to not keep track of exactly how many multiples of we are losing at each stage, and just use notation such as instead. Of course, this conflicts with our other goal of trying to numerically optimise as much as possible, but perhaps we can run these two goals in parallel, with one set of comments trying to get the “big picture” without being too careful with explicit constants, and the other set of comments focusing instead on the numerical details.

2. Given that the final bound in MPZ gains an arbitrary power of a logarithm, we should be very happy to lose factors of at various places in the argument. So factors such as the divisor function , which has logarithmic order on the average, are likely to be harmless. So we might pretend as a first approximation that all numbers of interest have only divisors.

3. Ben Green suggests replacing (or ) by the Mobius function , as this has slightly better cancellation properties and the Heath-Brown type identities for this function are a bit cleaner. Up to logarithmic factors (which by 2. should be practically irrelevant), the von Mangoldt and Mobius functions should morally be about the same for this analysis.

4. Let’s pretend that rough cutoffs (such as restricting n to ) behave "as if" they are smooth, in particular "boundary issues" coming from the edges of a rough cutoff should be negligible in practice.

5. If we ignore all small primes (e.g. by the W-trick), then almost any pair of "independent" natural numbers should be coprime/squarefree, so it should be safe to pretend that this in fact occurs all the time (except when dealing with pairs of natural numbers which are forced by design to have a common factor or a square factor, of course).

4 June, 2013 at 1:26 pm

v08ltuSections 6-12 of Zhang are basically BFI, which I consider the most readable of that genre (specially the intro, though they usually describe what they are doing quite well). The most pressing trick Zhang has to take account of is, particularly Section 10, that he not dealing with just one residue class per modulus, per those with a multiplicative CRT structure.

Zhang 6 is the “endgame” decomposition that BFI applies to various target problems in their 15-17.

Zhang 7 is BFI 3, Zhang 8 is BFI 4, Zhang 9 is BFI 5, Zhang 10 is BFI 6-7 (page 37 and 10.11 is the 6/7 divider). Zhang 11&12 probably are best read directly, as BFI have multiple sections to handle R_1 in various ways and I see no absolute correlative. Zhang’s methods largely follow his previous parts (he calls it analogous himself). He uses a slight Kloostermann variant (Lemma 11) than the literature ascribes, but I don’t feel there is a lot new in that.

I might actually suggest starting reading 13 and 14 first (after 6 perhaps, to give motivation about why one cares about trliinear estimates), as they are independent, and that is more what is new. I extended a post on Mathoverflow to give a brief overview of what he does.

http://mathoverflow.net/questions/131185/philosophy-behind-yitang-zhangs-work-on-the-twin-primes-conjecture/132701#132701

See the second half, starting with “Let me try again”. Maybe it can be copied here somehow.

I have actually not perused the earlier sections (4&5) too carefully, but I take it they have already been sorted out.

4 June, 2013 at 1:42 pm

Terence TaoThanks for this! Yes, Sections 4 and 5 seem pretty well understood, my previous blog post gives an alternate presentation for this material that leads to better constants for a given value of .

One thing that puzzles me is why having to deal with many residue classes per modulus is such a big deal. Suppose for instance that one could prove with just a single residue class mod d for every d, more precisely that

for every residue class . If one then selects the residue class at random over all classes in and then takes expectations, one concludes that

On the other hand, from crude estimates one has

so from Cauchy-Schwarz one gets Zhang's bound . So it seems to me that one could reduce to the case of a single residue class per modulus pretty easily. (Even if there was something wrong with the above argument, I would certainly use the "one residue class per modulus" hypothesis as a "cheat" when reading through the paper for the first time.)

4 June, 2013 at 1:57 pm

v08ltuI am sorry, I did not explain this well (I am not really in this field, just did analytic number theory may be a decade ago). First I should specify to make clear (though you have it right), It is not just a “single residue class” per modulus as I announced, but a *fixed* value as the modulus varies. The issue, then I suspect (did not think completely) is that the -uniformity in your first display is horrid, perhaps nonexistent as you allow it to be as big as .

4 June, 2013 at 5:34 pm

Terence TaoAh, got it – the dependence on a is key. Extrapolating from this, it suggests that Zhang’s argument must use at some point that the coefficients of the polynomial are fixed rather than growing in ; I guess we’ll see how the polynomial P is used (beyond just the Chinese Remainder Theorem) as we continue reading through the paper.

4 June, 2013 at 2:05 pm

Emmanuel KowalskiSome remarks:

(1) The oldest Fouvry-Iwaniec paper was written before Heath-Brown’s identity was known, so there is some inefficiency there.

(2) For the later papers and Bombieri-Friedlander-Iwaniec, there is a dependency in the residue class , which is seen as a positive integer, when applying spectral theory of automorphic forms; this dependency is polynomial in (so one can handle all up to a a power of logarithm, but not more).

(3) Combining the two: in the “old” Fouvry-Iwaniec paper, the result is expressed for a single class , but the implied constant is uniform for (where looking at integers up to ), so summing over residue classes is OK, wherever they are located.

4 June, 2013 at 1:52 pm

Alastair IrvingIn the definition of in Conjecture 3 I think it should be .

[Corrected, thanks - T.]4 June, 2013 at 1:54 pm

Terence TaoJust starting to look at Section 6 of Zhang. There are two easy reductions: by using Bombieri-Vinogradov, one restricts the range of from to for some arbitrarily small , and one also replaces by the von Mangoldt function . Another reduction is to let mostly avoid all small primes, in that . This latter restriction seems analogous to the W-trick; as a "cheat", I take it to mean that we can assume that d is in fact not divisible by any small prime (e.g. smaller than ).

Then comes the first tricky thing, which is some Heath-Brown type decomposition of into many different Dirichlet convolutions, which are then classified as "Type I", "Type II", and "Type III". It will take some time for me to digest what this decomposition is doing and what Type I, Type II, and Type III are (presumably they are related to the Type I and Type II sums in the Vinogradov-Vaughan estimation of exponential sums in terms of linear and bilinear sums; and presumably the earlier papers of FI and BFI will be helpful here). All the Dirichlet convolutions have to obey a certain Siegel-Walfisz theorem (hypothesis A_2 on page 7) but this is presumably standard. That's about as far as I've got for now; I have to do some other things for a few hours but will return to this.

4 June, 2013 at 2:29 pm

v08ltuI don’t think Type I and II are as different here as they are in some works in this genre, and indeed dispersion handles them on an equal basis for awhile.

The idea behind the decomposition is to write Lambda as a convolution of (say) 20 things, with each thing in a dyadic (or such) interval at a certain size parameter. Then just by combinatorics, some subproduct of the 20 things must (hopefully) have a factor in a desirable range, that is, one to which a bilinear form can be admirably applied. You group the subproduct and its complement, calling these and with the A-conditons on pages 6&7 met. The nuance between whether something is Type I or II only comes up a lot later, and indeed Zhang just declares this parameter to be the dividing line.

Type III is sort of the “leftovers” from the bilinear parts, where you could not force a divisor in a desired range. Maybe the way you go thinking about it is: I know I can prove a suitable bound for such-and-such bilinear forms with these parameters – now, what is left in the decomposition? In fact (top page 26), the only time you get Type III (that is, don’t get Type I/II) is when but , which is rather restricted, given that the decrease.

Section 15 of BFI has a picture/cartoon of what is going in their case.

I’ve seen Type II estimates dealt with by bounding them in terms of linear ones (for instance Duke/Friedlander/Iwaniec, Annals paper on equidistrbution of roots to prime moduli), and here is seems Type I estimates are dealt with “bilinearly”, and the whole nomenclature is confusing to me at least, for authors differ.

4 June, 2013 at 3:26 pm

v08ltuThe closest that BFI has to a trilinear type estimate is their section 11 (Theorem 4), and if you read their Section 15 carefully (bottom half of page 247), you can see that in a similar sense to Zhang, this is the “leftover” of to what bilinear does not apply. I am no expert on BFI though, only looking at this maybe a decade ago on a request from someone else. They handle

The picture for Zhang is actually much easier than in BFI (for they use a multitude of methods in different ranges). If there is a partial sum in the range you win (either directly or by reflection in switching the labels about 1/2) via applying the dispersion method to the relevant bilinear form. Also if (maybe rather than 3) the desired result is trivial. So you need everything in the decomposition to be in rather specific ranges for this to go bad. Essentially they either have to be small, or to be at least (the scope of that interval). They have to total 1, and having 4 of them large puts the sum of 2 of them near enough 1/2. Similarly, if only 2 are large, then will exceed 3/8 by enough (upon defining “small” appropriately, which also has to do with how many items are in the decomposition, 20 in Zhang’s case), and if only is large, as noted above the result is trivial. This leaves the middleroad case as there being 3 large values.

4 June, 2013 at 2:53 pm

David RobertsShould equation (1) have a \varpi instead of \pi?

[Corrected, thanks - T.]4 June, 2013 at 5:21 pm

v08ltuOK, I think I have chased down much of the error analysis of versus in the later sections. The notation of 7-12 conflicts with that of 13&14 to partial degree. In the former, I think one needs on a distribution basis, and the worst bound seems to come near the bottom of page 43, where I think Zhang’s display (third from bottom, with ) actually is wrong (too small) by a factor of , and I end up getting the necessitated , or in essence . This bothers , as one must be able to choose in a sufficient interval via Lemma 4, and this cannot be done while keeping small unless the -region is large. I get from all this (maybe there is a quadratic correction, I was not that careful).

I have not totally checked 13&14 yet (where one swaps Q and R to some flavor). I think the main -dependence should be here. The upper bound from (14.8) seems though this depends on the precise Section 6 decomposition. You also need or the like to win in the first place. This means Lemma 4 must produce a divisor in a range of length to win, or and with Lemma 4 as/is (the latter from condition). The first limitation is more stringent for of probable interest, even if 46 becomes 20 or 30. Namely, even with this is zero at , which is what I suggested the argument could handle elsewhere. So there could be a dual optimization with and if this is in fact correct.

4 June, 2013 at 5:39 pm

Terence TaoGreat! I am several sections behind with the reading but hopefully I or someone else) will be able to confirm your calculations in the near future when I catch up. In the meantime, I’ll enter your value for provisionally on the wiki page (http://michaelnielsen.org/polymath1/index.php?title=Bounded_gaps_between_primes#World_records).

4 June, 2013 at 6:13 pm

v08ltuWell, there is also an issue with the term in Section 14, as I get that one needs . Again if I assume various optimizations can be made, it is around that there is no room for or . But I would say this is more of a “limit” of what I could see working in the current form, rather than a “record”, even provisionally. I suspect that the “record” as in what I could currently write down – is maybe three times smaller.

It could also be the case, if is more prevalent for prime gaps, that one could aim to make smaller so as to enlarge reflexively . In this aspect, I find the above to be the worst, though maybe the Fourier analysis with the in Section 14 can be improved (the -bound there is a bit unneeded IMO, but I have yet to detail it out). That would leave something like as the limit. Is there an automated method to determine a prime gap size from a given and yet? I don’t want to aim to optimize the wrong one, though I admit all I have done so far is catalogue the error terms of Zhang, to see what looks feasible, and maybe level of distribution is in interest by its own.

4 June, 2013 at 6:22 pm

Terence TaoOK, I will annotate these values accordingly onthe wiki.

Basically, the dependence of on and is almost independent of : is basically (up to a small correction) as long as is not incredibly small (e.g. should be OK). So as a first approximation one could even treat as being negligible and see if that helps increase the value of . (The prime gap H one gets at the end of the day is roughly .)

4 June, 2013 at 7:12 pm

v08ltuI see some improvement room in 13&14. Essentially, in 14.8 Zhang uses while in 13.11 he uses as the worst possible value. But since : when the first condition holds, then is small. So in fact, one gets an improved 14.8. If I did this correctly, one achieves that (else a trivial bound applies), and then in the worst case one gets 14.8 as from 13.11 (which is itself improvable, as Zhang saves rather than epsilon). Zhang essentially bounds instead. So one only needs in (14.8). This is still worse that the bound I computed wth (14.5). I am still also not sure that (14.8) is all needed with the Fourier bound below.

4 June, 2013 at 9:09 pm

v08ltuIf I have computed correctly, with my assumed improvements to how Zhang wins with his choice of , you will need in Sections 13 and 14 to have and , so the ballpark is . This includes my reworking of his definition to be rather than . Again I say this is the “goal”, not something I have written down. It might be that the could be reduced to , I don’t really know. Zhang right now has essentially there as the margin that the needs to beat the Fourier losses, but I can see the room to improve.

5 June, 2013 at 12:55 am

v08ltuIn fact it might be a bit better than this. You can also use a 2-variable estimate in 13.7 in some cases (highly nontrivial Deligne result, but applied directly), see (2.4) of Friedlander/Iwaniec. I think this means that when you are Ok, which again pokes up the bound of (14.8), to . This then now gives , and by now this term is better in all ways that the from (14.5) and congruence sums.

5 June, 2013 at 1:46 pm

v08ltuIn reply to my comment about the 2-variable Deligne estimate as in (2.4) of FI for Zhang’s 13.7: you can also try to use the FI Theorem 4. This produces nontrivial results for some more -ranges, though by now I think the Section 14 -limit is elsewhere. However, with (this worst case appears on Kowalski blog too), the bound from FI Theorem 4 is too large. All too, this seems to be what Zhang means near the very end of introduction, as he says FI gives an efficient estimate valid only for while he needs as large as , and the difference in exponents is exactly so.

4 June, 2013 at 8:46 pm

Terence TaoBack to reading Section 6 of Zhang. The basic decomposition of the von Mangoldt function is via the Heath-Brown identity (Lemma 6 of Zhang). I have not had occasion to use this identity before (relying instead on the Vaughan identity when I needed to decompose von Mangoldt into truncated Dirichlet convolutions), so I am recording here how the identity is proven. If , then we have the Dirichlet convolution identities

and

and hence

(*)

where is the n-fold Dirichlet convolution of f.

Now let be a little bit bigger than (it seems to me by the way that there is a tiny misprint in Lemma 6 of Zhang in that needs to be larger than rather than but this is surely irrelevant). We split where is restricted to the interval and similarly for . Then on the interval we have

since one cannot factor an element of [x,2x] as the product of 10 elements larger than times a bunch of other factors. Splitting and expanding out using the binomial formula and (*), one eventually gets to write on [x_*,2x_*] as a whole bunch of convolutions:

.

This is the Heath-Brown identity of order 10. (Why does Zhang use 10? Could this be a parameter to optimise over?)

After some routine dyadic decomposition and dealing with some minor error terms, we can thus approximately express as a combination of a polylogarithmic number of convolutions basically of the form

(**)

for some and parameters with (roughly speaking) and . Here is a dyadic (or slightly finer than dyadic) localisation to scale whose precise definition does not seem to be terribly important. Presumably the log factor is irrelevant since we are going to be gaining lots of log factors eventually.

There is one class of convolution (**) which is easy to dispose of, and that is the contributions when one of the , say , is very large, say bigger than . This is because in this case (**) becomes a combination of indicator functions of quite long arithmetic progressions of spacing less than , and so from the Chinese remainder theorem they are going to be very well distributed modulo for (almost) every . (Here it is clear, as v08ltu said, that there are several places where Zhang has been using a where he could have used an infinitesimal epsilon instead, so there should be some cheap opportunities to save some factors of here and there. I won’t try to seriously optimise these things though until maybe a second pass through these sections when I understand things better.)

For the remaining convolutions of the form (*), Zhang classifies them into “Type I”, “Type II”, or “Type III”, depending on the various sizes of the . I have not yet digested what these types are, but that is probably what I will turn to next in my reading.

4 June, 2013 at 9:17 pm

v08ltuZhang uses 10 to make be small. This is used in the last sentence on page 25, when he gets from it. I don’t think there is much to be had from fiddling with this, if you take 10 to be 1000, the argument is the same in the convolutions. You can get by with 9, if is small enough.

4 June, 2013 at 9:32 pm

castoverReblogged this on riemannian hunger and commented:

Dr. Terence Tao has arranged for an online reading seminar to go through Dr. Yitang Zhang’s recent proof of the “Bounded Gaps Conjecture.” To say that this is a wonderful opportunity to pick something valuable up regarding a field that’s very hot right now in the research community would be the ultimate understatement.

4 June, 2013 at 11:43 pm

The polymath project: massively collaborative mathematics | Adsu's Blog[…] raiz del post de Tao Online reading seminar for Zhang’s “bounded gaps between primes”, he sabido de la existencia del proyecto polymath […]

5 June, 2013 at 6:04 am

Polymath Project | Math Online Tom Circle[…] http://terrytao.wordpress.com/2013/06/04/online-reading-seminar-for-zhangs-bounded-gaps-between-prim… […]

5 June, 2013 at 6:49 am

Klaus LangeWonderful work so far. But with this methods I think H < 10^3 isn't reachable.

5 June, 2013 at 8:57 am

Gergely HarcosI think is realistic in the light of comments on http://terrytao.wordpress.com/2013/06/03/the-prime-tuples-conjecture-sieve-theory-and-the-work-of-goldston-pintz-yildirim-motohashi-pintz-and-zhang/ and http://sbseminar.wordpress.com/2013/05/30/i-just-cant-resist-there-are-infinitely-many-pairs-of-primes-at-most-59470640-apart/

5 June, 2013 at 9:08 am

Johan CommelinIt seems that after the serious attacks began (Morrison, Tao, et al.) and the polymath project launched they may have reduced the bound by a factor 200 in about 1 week. It will probably not go that fast from now on, but it looks very promising in my eyes.

5 June, 2013 at 11:49 am

Gil KalaiI have a very naive background question. Suppose instead of “primes” you ask about integers with odd number of (all) distinct prime factors (namely mobius function=1). What is known then about gaps?

5 June, 2013 at 1:44 pm

Terence TaoI think one can get gaps of size at most 2 by the following argument. Let be a moderately large number, and let be the product of all the primes less than w. By the Chinese Remainder Theorem we can find three consecutive residue classes which are square-free mod W^2, which implies that about of each of these classes will consist of square-free numbers. On the other hand from the prime number theorem in arithmetic progressions we know that takes the values +1 and -1 asymptotically equally often in each of these three residue classes. Thus the density of in each of these three classes is , which is greater than 1/3 for w large enough (actually w only needs to be as big as 11 or so), and the claim then follows from the pigeonhole principle.

This argument comes really really close to the optimal gap of size 1, but I don't see a way to prevent the absurd possibility that mu alternates in sign on every connected block of squarefree numbers.

5 June, 2013 at 6:12 pm

GhaithI don’t follow the first part of the argument where one finds three consecutive residue classes each consisting mostly of square-free numbers. Can you clarify that part please? Thanks.

5 June, 2013 at 6:51 pm

Terence TaoOne can take for instance the residue classes , , and ; in each such class the density of square-free numbers is .

7 June, 2013 at 2:17 am

GhaithThanks, though my question was really about the density (whether it was simple to see it somehow); sorry I wasn’t being more precise. In any case, I had figured something shortly after.

5 June, 2013 at 6:30 pm

v08ltuMaybe I am misunderstanding the question. For $\lambda$, Hildebrand showed that all eight sequences of length 3 occur infinitely often, while Buttkewitz and Elsholtz improved this to length 4 in AP.

http://mathoverflow.net/questions/78623/fluctuations-of-liouville-function

http://jlms.oxfordjournals.org/content/84/3/578

5 June, 2013 at 1:48 pm

v08ltuI want to clairfy, is this a question about or (the Carmichael function)?

9 June, 2013 at 12:54 am

Gil KalaiDear v08ltu, I asked about but my main point was to ask if it may provide a simpler “toy question” for some of the issues/techniques for the prime case, (so also qualifies for that).

5 June, 2013 at 12:59 pm

A (partial) answer to my Goedelian conundrum? | cartesian product[…] Online reading seminar for Zhang’s “bounded gaps between primes” (terrytao.wordpress.com) […]

5 June, 2013 at 1:47 pm

Yitang Zhang latest… | cartesian product[…] Online reading seminar for Zhang’s “bounded gaps between primes” (terrytao.wordpress.com) […]

5 June, 2013 at 3:00 pm

bengreenA comment from having a look at Section 6 today, without any books to hand, is that it was deeply unclear to me how these “Siegel-Walfisz” statements are supposed to hold for such large moduli (the Siegel-Walfisz result for primes only allows moduli up to ). It seems, however, that all of the for which this is claimed are Dirichlet convolutions of two things, and I believe that gets you out of jail as you can expand into characters modulo and use Parseval. I haven’t checked the details of that, but it definitely makes Section 6 tough going for the uninitiated! I may also have overlooked something obvious. Terry, in answer to your comment above, I’m not sure there’s any good reason he applies HB with parameter 10; I think it could equally well be 10000. Maybe 10 is the smallest workable value.

5 June, 2013 at 3:29 pm

v08ltuIn the middle of page 246 of BFI, they similarly give 0 details about applying S-W. I had wondered about this at one point, but then I thought you just executed an innermost sum and got cancellation from that. Maybe I need to think more.

In another comment, I pointed out that Zhang has 10 so that (bottom page 25) when , making his decomposition go through as-is. Here 9 also works if is small enough.

5 June, 2013 at 3:41 pm

Mark LewkoWhere in section 6 is he making use of a Siegel-Walfisz-type estimate for large moduli? (not that I’m not having my own issues with the section)

5 June, 2013 at 3:45 pm

v08ltuI think the point is: for moduli larger than , the bound is in A2 is trivial.

6 June, 2013 at 4:34 am

bengreenYes, I see that in my comment on SW I was mistaken in thinking that large moduli are required. I was confused by the fact that no restrictions are replaced on r – however, as pointed out by v08ltu, the way the bound is stated means that it is trivial unless r is less than . So I’m happy that nothing at all underhand is going on here! I guess that for the purposes of initial discussion we should pretty much ignore this SW condition.

5 June, 2013 at 4:43 pm

Terence TaoReturning to Section 6 again. Zhang is using the Heath-Brown identity to decompose (on ) into components that factor as Dirichlet convolutions of factors supported at moderate scales (in particular, one doesn’t want to sum “rough” functions like over very large scales, e.g at scales close to ).

I’m more used to the Vaughan identity, which in this context would split

for some with . This splits into Type I sums and (which contain one smoothly varying factor at a large scale, so that linear exponential sum or equidistribution estimates may be applied) and a Type II sum (which contain two rough factors , that both span large scales and for which one can hope to use bilinear estimates)

With Heath-Brown, one instead splits into convolutions of the shape

(*)

where and are small (less than , say); also we can easily dispose of those sums where one of the is huge (larger than ). There are some log factors here which do not appear to be important and so I omit them here.

The remaining components of are then classified into three types, which are roughly as follows (pages 6-7 of Zhang):

* TYPE I: convolutions that can be expressed as , where lives at some scale between and , which forces to live at the opposite scale between and (ignoring some small factors).

* TYPE II: These are similar to Type I, i.e. expressions of the form , but now lives at scales very close to (between and ), so also lives close to (between and ).

* TYPE III: These convolutions factor as where is supported at some scale M with for all i (or equivalently for all .

In the Type I and Type II cases, the factor is required to obey a Siegel-Walfisz property (A_2), namely good equidistribution properties in progressions of very small moduli (polylogarithmic in size), but this property is automatic for any of the factors or once it is of non-trivial size (with the former coming from the Siegel-Walfisz theorem), and the property is preserved by convolution, so this does not seem to be an issue.

Now we claim that all convolutions (*) can be factored into one of the three types I,II,III (after excluding the case when one of the is larger than ).

If there is any subset of the factors in (*) that multiply to something at scale between and then one is going to be in Type I or II by calling that product either or as appropriate. So the only remaining case is in the rather unusual situation in which all subproducts in (*) have scale either less than or . Let’s call this the “gap property”.

Now order . If then we are in type III (taking to be everything except for the terms). So by the previous discussion we may assume that . In particular this makes for , and also . But if we remove all the small factors one by one using the gap property we conclude that , a contradiction. So everybody gets to be in Type I, Type II, or Type III. I begin to see what v08ltu and Ben were saying that the 10 in Heath-Brown could be replaced by any larger constant without much impact on this decomposition (other than to pile on some more logarithmic factors which are going to eventually become irrelevant anyway).

Zhang says that Type I and Type II sums are to be controlled by the dispersion method and Type III sums require the Birch-Bombieri result on some sort of Kloosterman sum (Lemma 12). I guess this will become clearer in later sections…

5 June, 2013 at 5:03 pm

v08ltuIt might be better to think of it: we can bound well-enough the relevant ensuing sums, bilinear or trilinear – in range A via some method, in range B for via some other method (here closely related), and in range C by some third method (and maybe in range D for trivial reasons). Fortunately these ranges cover all possibilities. Compare Section 15 of BFI (where the methods for the applied Theorem 1-3 differ, and they need a couple of additional “special” cases a la Zhang’s trilinear work to make it all glue).

When going through the later parts, it is worthy to pin down exactly what ranges the various arguments indeed work (they return nontrivial estimates in larger ranges than merely A-C as Zhang demands). Some of the FI papers in this genre deliberately leave the combinatorial arguments to the end, trying to shine on the other parts, and write them to maximum flexibility in the parameters. Zhang does not gives this emphasis. For instance the Type I/II splitting point is essentially freely-chosen by him to be as close as possible to with Type I still valid (evident in Section 11), while he instead could have determined how low the Type II estimates really work. Away from endpoints of ranges, he is winning by a power of in each Type, so there is room to phrase it either way.

5 June, 2013 at 7:09 pm

Terence TaoThat’s not a bad philosophy to keep in mind going forward. I guess as a cartoon, the Type I/II/III distinction is as follows:

* Type I/II components take the form where are supported at medium sized scales (more precisely, between to , but presumably this precise range will be negotiable);

* Type III components take the form where are supported at medium sized scales and are “smoothly varying” on their support (indeed they are constant on their support).

* Type 0 components (which were disposed of very quickly) take the form where is huge (at least , to be precise).

I got confused for a while because I thought Type I and Type II here somehow corresponded to Type I and Type II in Vingradov-Vaughan exponential sums, but there does not seem to be such a correspondence after all (here Type I and Type II look to be much closer to each other than to Type III). If anything, Type I in Vinogradov-Vaughan is a bit like Type 0 here, and Type II in Vinogradov-Vaughan is like Type I/II here.

Incidentally, for my own edification, I extracted the precise combinatorial lemma Zhang is relying on in Section 6 (though as you say we may end up doing a different combinatorial optimisation once we figure out what the limits of the Type I, Type II, and Type III analyses are):

LemmaLet , and let be non-negative numbers summing to 1. Then at least one of the following three statements hold:(Type I/II): There is a partial sum that lies in .

(Type 0): There is an that is larger than .

(Type III): There exist distinct with and . (This case can only occur for .)

Zhang basically applies this lemma for . (This already restricts to , but there may be room to maneuver here.)

ProofSuppose Type I/II never occurs, then every partial sum is either “small” in the sense that it is less than , or “large” in the sense that it is greater than .Call a summand “useless” if it cannot be used to turn a small partial sum into a large partial sum, thus there are no such that is small and is large. We then split where are the useless elements and are the useful elements.

By induction we see that if and is small, then is also small. Thus every sum of useful elements is either less than or larger than . Since a useful element must be able to convert a small sum to a large sum, we conclude that every useful element has size greater than . We may assume we are not in Type 0, then every useful element is larger than and is less than . In particular, there have to be at least three useful elements, otherwise cannot be as large as 1. As , we have , and we conclude that the sum of any two useful elements is large. Taking to be three useful elements we land in Type I/II.

5 June, 2013 at 7:32 pm

v08ltuIn the statement of (Type I/II) I perceive the interval should be , not .

5 June, 2013 at 7:38 pm

Terence TaoWell, it’s equivalent: if lies in then the complementary sum lies in . (I guess I should have explicitly mentioned that I was relying on that symmetry though.)

5 June, 2013 at 5:15 pm

v08ltuJust a note, the dispersion method also requires Kloosterman sums (well known ones), this will be clearer later… The idea is that you approximate the characteristic function of an interval smoothly, and with congruence conditions modulo this gives a sum artistically like with small error when is of decent size. In this dispersion context, what you do is square the discrepancy via Cauchy, giving and the above contributions from the 3 parts all cancel out. So you are left to handle the higher harmonics, and this is where an appeal to (incomplete) Kloostermann sums shall avail.

5 June, 2013 at 5:20 pm

v08ltuBah, that triple sum is written wrong (and is perhaps more complicated than need be). It should also contain of course, where is the smooth approximant. That’s what Fourier analysis is all about (Lemma 7) changing a sum over into the transform.

5 June, 2013 at 7:50 pm

More narrow admissible sets | Secret Blogging Seminar[…] A reading seminar on Zhang’s Theorem 2. 2) A discussion on sieve theory, bridging the gap begin Zhang’s […]

6 June, 2013 at 4:39 am

E.L. WistyReblogged this on Pink Iguana.

6 June, 2013 at 5:24 am

Américo TavaresThere’s a typo in Polymath1 World records: instead of 397,974 should be 387,974.

[Hmm, I can't locate this error - maybe it has already been fixed. In any event, anyone can register to edit on the wiki. -T.]6 June, 2013 at 5:50 am

bengreenIf you increase the number of terms in Heath-Brown’s identity upwards from 10, it looks ever more like Linnik’s identity, which asserts that , where is the number of representations with each . In fact, the two identities may be proven in a very similar way by looking at Dirichlet series: for Linnik, one looks at

,

whereas for Heath-Brown one replaces here by , where . Zhang takes .

One advantage of Heath-Brown over Linnik is that every term in Heath-Brown is going to be bounded by a fixed power of the divisor function (20, in fact, as in the discussion at the bottom of page 6 of Zhang). The disadvantage is that one has all those terms that are destined for nothing more than to be triangle-inequalitied or Cauchy-Schwarzed away, but otherwise seem somewhat harmless.

I wonder whether, for the purposes of gaining a handle on what Zhang is doing, we should pretend we are working with the conceptually simpler Linnik identity, but have magically truncated it after (say) 100 terms.

I guess one starts analysing by first dividing into (sub)dyadic ranges for the factors , let’s say . The critical case for Zhang would appear to be when and . Emmanuel Kowalski mentioned this on his blog, and also mentioned the paper of Friedlander and Iwaniec on level of distribution of the ternary divisor function. I haven’t looked at all at that paper, but I wonder if anyone could say why the techniques there aren’t in fact directly applicable here?

6 June, 2013 at 9:41 am

v08ltuI think I vaguely answered part of this in a couple of comments above (1:46pm June 5).

http://terrytao.wordpress.com/2013/06/04/online-reading-seminar-for-zhangs-bounded-gaps-between-primes/#comment-232892

Note that the “main” term of Theorem 4 of FI (page 335) is the first one, and also when compared to Zhang’s 13.7, there needs to be a term included in of FI, which is of size . In their remark on Theorem 4, FI say that one might do better with Holder than Cauchy. I am not sure, but I do not think the main term would be improved by this.

Then for comparison plug in to Theorem 4 for the values with comparing to a desired Zhang goal of . I get that , and we want times this to be less than . So when you lose by , which is (essentially?!) what Zhang will save from a convenient factor of , that is (see 13.12) he needs to be at least the fourth power of this (he wins by , and there is a Cauchy).

Via Lemma 4, we will be able to choose in a ratio-interval of size , and we need or something to get the rest of Sect 14 to work (namely expanding the sum modulo and later Fourier analysis). This is where I get my estimates BTW, namely to choose desirously we need vaguely . Incidentally, I expect that , which sits the borderline between Type I/III for Zhang, can be lowered to , maybe , which allows larger to be taken. But writing down the optimal conditions will be easier when everything is spread.

6 June, 2013 at 7:04 am

bengreenA continuation of my last post. First of all, there was a misprint: in the last paragraph I meant .

Next, a Google search for ternary divisor problem revealed that Emmanuel Kowalski (and Fouvry and Michel) very recently (April 2013) wrote a paper

http://arxiv.org/abs/1304.3199

on the distribution of in progressions. Looking at the paper, I guess these three must be seriously up on the Bombieri-Fouvry-Friedlander-Iwaniec stuff. They obtain a level of distribution for this problem of , without any assumption of the modulus being well-factorable. Indeed it works for prime moduli.

This led me to re-examine my question above, namely why these techniques aren’t directly applicable here, and actually the answer is reasonably obvious I think (though please correct me): there are Type III sums coming from the terms with in Linnik’s formula. A particularly sensitive case would seem to be (say) and . This is Type III (take ). This actually seems to me to be a pretty good test case in which to examine Zhang’s Type III argument.

By the way, I completely agree with v08ltu that the correct way to work out the best value of is to write out each estimate in Zhang with all the parameters initially unspecified, then optimise at the end. On the other hand (for me at least) this is somehow the opposite of the right way to understand the paper on a conceptual level.

Finally, one could perhaps venture that the Fouvry-Kowalski-Michel exponent of is an absolute upper limit of what one can expect for (and most probably we won’t get a value this good). I guess that if things could be taken that far then, by the discussion on the other blogs, the value of would decrease by something like , and hence the gap between primes by a factor of maybe .

6 June, 2013 at 10:26 pm

v08ltuIt might be well to say why the sensitive case is well-poised. Here it is, from the decomposition you need 4 pieces in 2 parts, and by philosophy the sizes in a part should be the same at worst, so . The FI work of Deligne bound for 13.7 returns from Cauchy needing to beat , and is critical. Writing , this is and , so .

8 June, 2013 at 9:22 pm

Emmanuel KowalskiActually, we can (currently) only handle prime moduli for the distribution of the ternary divisor, although an extension to squarefree moduli should be relatively straightforward. (The reason is that we rely on earlier work depending on different relatively fine aspects of the theory of sums over finite fields, as in my recent Cambridge seminar talk…)

6 June, 2013 at 8:50 am

Terence TaoJust starting to look at Section 7, which introduces the dispersion method to treat Type I/II sums. As I understand it, this part is more or less standard (similar to BFI), with the Type III analysis being the most original, but I am not so familiar with the dispersion method so I thought I would go through these sections first. But I might perhaps skip the Type II analysis which looks a tiny bit trickier than the Type I analysis.

Anyway, let’s take say a Type I component to , which takes the form where is supported on some scale M that is somewhat less (but not too much less) than , and on the opposite scale that is somewhat larger than . These weights also obey some basic bounds (controlled by some power of the divisor function, and also has to obey some sort of Siegel-Walfisz type theorem) but I will ignore these technicalities here. (My guess is that the are destined to be eliminated fairly soon by Cauchy-Schwarz, so their finer structure won’t be so relevant.)

Our objective is to get a saving over the trivial bound for

where is basically constrained to be a little bit larger than (think of ), and is also squarefree and -smooth (no prime factor larger than ), and having essentially no really tiny prime factor (as a model case, let’s say no prime factor less than ). Here is the irregularity of in the residue class :

.

The first step is to factor as where is now a little bit less than (think of something like ) which makes q something like , say; one can also ensure that the small factor q has no tiny prime factors. Applying the Chinese remainder theorem, it then suffices to gain a saving over the trivial bound for

where ranges over squarefree numbers of size something like , ranges over squarefree -smooth numbers with no tiny prime factors (and coprime to r), and measures the irregularity of in the joint residue class .

The rest of Section 7 is concerned with some standard-looking Cauchy-Schwarz maneuvers designed to eliminate the role of , but at the cost of introducing a new factor, the sign of . The sharp cutoff of to scale M is also replaced by a smoother cutoff (constructed in Lemma 7) which will have better equidistribution in arithmetic progressions (basically thanks to Poisson summation).

Sections 8-12 are then devoted to estimating the resulting sums that come out of this Cauchy-Schwarz. It appears that the Type I and Type II analysis is unified all the way through to Section 10, but then there is a tricky sum R_1 that needs separate treatment in the Type I and Type II cases (Sections 11 and 12 respectively). Given the standard-looking nature of the moves in the rest of Section 7, I will probably skip over this part of the paper and start reading Sections 8-12 instead (but I might skip Section 12 for now).

6 June, 2013 at 10:09 am

v08ltuOne comment that might be made about the signs (I might have called them , not ): you can’t get any cancellation from them, because you know nothing about them. Stare at the display above 7.7, you can only get cancellation from the inner parenthesis, which is indeed squared via Cauchy in 7.7. Similarly, you only have and to see cancellative effect in the . These signs just come along for the ride, and the main terms (frequency 0 in the Fourier expansion of ) in the expansion will all contain them and cancel out. The higher frequencies are bounded by absolute values in any event, so there they shall not matter either.

Also, one nicety (that BFI exploit too) is for each estimation to implore a gcd-condition on , which wins because beats all logs. Same again, to say , when achieving main terms of to cancel exactly.

You have a typo, it should be that ranges over squarefree smooth numbers, not .

I find Section 11 and 12 to be quite similar, the only difference is that (surprise) one uses a joint “bilinear” bound on pairs for “Type II” in 12, while in 11 for “Type I” one estimates over “linearly”. Both follow 9, if you can grok that, they should be easy (perhaps even repetitive). At the end of day, one uses a bound for incomplete Kloostermann sums, sometimes Lemma 11 in its full power, or sometimes just the more standard (3.13). Note that 9 has no Type I contribution, as the Fourier approximation is quite short here: , so unless is near , this is less than 1. But 11 and 12 have larger (see 10.7). Perhaps this point should be explained more when it is reached.

The hardest (and longest) part of 8-12 might be 10, which BFI split into 2 parts. Essentially you have to show that is on average what you want it to be so that the main terms cancel in , up to errors that are then estimated in 11 and 12. And Zhang has to worry about his structure here.

[Corrected, thanks. - T.]6 June, 2013 at 10:29 am

v08ltuTypo: I meant in (10.7). Still, the effect of rather than is evident between these values (and 13 and 14 will see yet another specification of , which is really just handed to us as the length one needs to reach infinitesimal error in the Fourier expansion, given a modulus).

Note that will be small for Type II in any event, since (via ) is so close to . This what in reality forces a border between these Types, namely that when is near enough to one can estimate bilinearly in with little loss because is small, while otherwise when is small, the diagonal terms (see above 11.3) can be estimated trivially in the sum over the -variable, because this sum is so “short” comparatively.

Also, to again complain about the nomenclature, although S9 is concerned with only “Type II” as I say, the -effect is done “linearly”, that is one inner -variable by itself. I guess the difference in entanglement with “linear” and “bilinear” is that in S9 and S11 Zhang has a “diagonal” and in S12 he has the diagonal as .

6 June, 2013 at 11:47 am

bengreenThe aim of these notes is to follow Sections 13 and 14 of Zhang’s paper in a particular somewhat critical case. Namely, we shall look at the estimation of the Type III sum

where and , and . Here denotes a suitably smooth cutoff to a (sub-) dyadic interval around . I’m using the additive combinatorialists’ convention of using for averaging, because I find it helps keep track of what the trivial estimate is that we’re trying to beat. I write for the expectation over all mod .

In this first post I will try and evaluate this without any condition on . We shall see that we just fail to get a nontrivial estimate, even with the Bombieri-Birch bound. The crucial saving comes from factoring . I’ll try and say why that helps in the next post.

I will make all manner of unwarranted assumptions and simplifications. In particular, I will use the following “fact”: If is an arithmetic progression of length , and if is some function, then we have

A rigorous version of this follows from a suitable smoothing of and some Fourier analysis; I've assumed here that the Fourier transform of takes the constant value for , and is zero otherwise, obviously an oversimplification of the truth. Zhang calls this a truncated Poisson summation formula, and it corresponds to his Lemma 7. I'll call it completion of sums: it's inefficient, and in fact is the main source of inefficiency in the argument.

Let's begin. The sum we're interested in is

By completion of sums, we get a main term coming from plus an error of

We want to show that this is (in fact we want to win by a small power of ). The completion of sums has already hurt us, because the trivial bound is .

Thus we want

Let's delete the averaging over ; thus we are shooting for the stronger bound

uniformly in .

We'll introduce shifts over the range ; thus it suffices to show that

The key point here is that , so shifting to doesn't change much (I guess perhaps should be a tiny bit smaller; I don't think this is very important).

We're going to make the substitution . Now observe that as ranges over and ranges over the ratio ranges over all pretty much uniformly. Thus we'd like to replace by . In fact this turns out to be a little bit too much of a fudge, and Zhang writes for the number of ways in which in this manner, then shows that the square mean of is bounded. By Cauchy-Schwarz, our task is then to show that

(Is it too much to hope to remove this application of Cauchy?)

Expanding this out, matters reduce to showing that

One can simplify this a little to pretty much

by substituting and (and then removing the tildes).

Now we take the hit of completion of sums once again, replacing the averages over with averages over all of and . This reduces things to showing that

The somewhat trivial manoeuvre of replacing the sum over with an expectation reduces this to

But, lo and behold, the inner average over was estimated by Bombieri and Birch using the full force of Deligne and it is . Thus the -adic machinery just fails to help us beat the trivial bound.

6 June, 2013 at 11:57 am

bengreenThis doesn’t render so well for me… I can try another version without strict inequality signs – or did I do something else wrong? Also, in the last display and should be and respectively.

[Ben - I've cleaned up what I could; if there's anything else to fix you can email me and I'll deal with it. By the way, very nice writeup! -T.]6 June, 2013 at 12:21 pm

v08ltuBTW, you should compare the proof of Theorem 4 in FI (p335), it gets the same results as you annuciated, and indeed is a bit simplified from Zhang’s setting.

6 June, 2013 at 12:36 pm

v08ltuAnother detail of 13&14: regarding the choice of (number of shifts), this is completely transponded from the error in Weyl shifts (13.14) where it is made as large as possible. In the gyrations of S14, generically the errors have , and there is a diagonal contributant (bottom page 50 and top page 51) with only – when is as large as (13.14) allows, this diagonal part is contained sufficiently, as it is a factor less than the goal (up to smaller losses elsewhere).

6 June, 2013 at 12:15 pm

v08ltuThis is a fairly good rehearsal of why applying Deligne fails w/o the extra trick that will be seen later (of shifting not generically by , but by a multiple of divisor namely ). I don’t think one can avoid Cauchy. To my view in these shifts by products (which FI p333 say go back to Vinogradov, and were used close to this context by Karatsuba for Burgess) you are always going to have that. There is a (very) nice PAMS paper of FI, indeed in this similar circular of ideas, that derives Burgess (a bit beyond even) via akin techniques and again you have to digress via Cauchy/Holder, counting multiplicative solutions to a modulus.

http://www.jstor.org/stable/2159916

6 June, 2013 at 1:04 pm

bengreenThanks for these comments! You seem to know your way around this stuff amazingly well. I didn’t quite understand what you say about , by the way – are you saying that the choice is delicate, or not? I didn’t work it through, but obviously you’ll need to take at least somewhat large to avoid the diagonal terms becoming important, the Bombieri-Birch estimate only being effective when .

6 June, 2013 at 1:39 pm

v08ltuI’m just saying that there is a balance for , and was trying to verbalize what that was. This parameter needs to be of some size to stop the diagonal from dominating (I guess this is a bit obvious a statement, but still should be made I suppose). Other than that, you can take it as large as the (congruential?) losses from Weyl shifts allow.

So there has to be room to choose large enough so as to beat any prior losses (like from in the Fourier expansion, though by the time you get to p50-51 he has been rewritten by his value ) when the diagonal is considered, but small enough so that the amount you lose at each shift (which is in essence , the last in your special case and I included multiplied throughout, so we are trying to beat ) does not total up.

I think Zhang just takes as large as allowed, namely in your special case, subtracting off enough multiples in the exponent to allow space for . Then the losses from the diagonal in S14 are only (if I compute correctly) in your special case (again I included the -effect throughout here). Hmm, that was a bit different than I expected, as it seems that you really do need to take a bit largish, like , to win here (I had mentally expected something in terms of multiples).

6 June, 2013 at 2:10 pm

v08ltuOK, it was good for me to do that calculation, in the critical case. The fact that is not all that small changes the optimization. Namely I get that we need something like , which is essentially Zhang’s value, up to middling constants. I figure we can save some multiples of , maybe even halve them, but the looks less movable (I had before, but now the distance to 1 should be split with ). So all my previous guesses probably need to be cut in half or so.

6 June, 2013 at 2:32 pm

bengreenHere are some further comments on Section 14 of Zhang. I may expand on this a little in due course. In my earlier post we saw a convoluted argument which, even with the application of an optimal bound coming from Deligne’s work, failed (just) to recover the trivial bound.

Recall that we had reduced our goal (of estimating a certain Type III sum – I refer the reader to the previous post for details) to proving that

Suppose now that has the specific form , with for some very small . At the expense of reducing by a tiny bit, we could have averaged over shifts that are multiples of instead of over all shifts as above. Thus, it suffices to show that

Now the idea is to write , with the remainders upon division by . The average will be replaced by , and we shall (by the Chinese remainder theorem) replace the average with averages , where . There is then a short calculation at the bottom of page 49 of Zhang, which reveals that the expression factors as

(or something morally equivalent to this!)

I shan’t write out the whole expression that comes from substituting this into what we had before. The argument henceforth is (pretty roughly) as follows: the expectation over is zero unless , so the expectation gives us a factor of . (In the paper, this step is quite a bit more complicated; in particular the expectation is not over all , but only over those coprime to , so one has instead a Ramanujan sum. But this still has very good cancellation properties, better than square root, albeit not quite as good as the sum over all . The elaboration of this crucial estimate deserves a further post at some point.

We compute the average over everything else more-or-less exactly as before, except that where we had before we now have . Note that the incomplete sums now have shorter length , but to compensate the modulus has been reduced too: it is now . Proceeding pretty much as before, using the Bombieri-Birch bound, we pull out a factor of .

The upshot of this is an estimate which, miraculously, saves the all-important factor of over what went before. My guess is that somehow this is the heart of Zhang’s new argument?

6 June, 2013 at 3:16 pm

v08ltuI would simply not say that “for some very small ” – in fact, one whole idea is that should be as *large* as you can take it elsewhere, for indeed this is where the savings are. I might say, “where exceeds a small power and satisfies in this special testcase” – then indicate that is mandated and suffices with these parameters to quell the diagonal, so there is room for , if one can factor with some .

7 June, 2013 at 4:45 am

bengreenYou’re right, of course. All this hopefully addressed in my latest post.

6 June, 2013 at 2:51 pm

v08ltuMy understanding is that for this application, the length of an incomplete sum does not matter too much. See for instance 3.13, the -dependence is almost nil unless you have a large gcd. Though this is maybe this is a difference in phrasing, from what you are saying with the Fourier expansion, as the higher harmonics will dirty the error terms.

On the other hand, the FI work (also BFI) is (at least more directly) delved toward such balanced estimates, with sundry nontrivial results for choices of sizes of the sum-length and modulus.

I think the savings in Zhang, which as you say is the heart, is from the arithmetization of the factor, in that when you Weyl shift by multiples of , this Ramanujan sum of size O(1) rather than pops out. But AFAIK, even experts are puzzling over exactly why it all works, and how it can be applied elsewhere.

To compare (I have not done it explicitly), see if you just shift by and still try to factor down to as page 50 – it perhaps should work, but the will not be anything of speciality, so will arise. You would still have shorter incomplete sums, but any effect would just be lost in the roundabouts of the three sums.

6 June, 2013 at 4:18 pm

Terence TaoReading Section 8 now, which estimates the third (and easiest) of the three terms coming out of the Cauchy-Schwarz performed in Section 7. (The reason for the three terms is that the discrepancy is a difference of two terms, so when you square it you get three terms.) More precisely, we seek a good bound on

(*)

where “good bound” means “gains at least over the trivial bound”, ranges over squarefree numbers in a scale a bit less than , and are certain complicated sums. However for all three sums one expects an approximation of the form

(**)

for i=1,2,3, where is a certain expression that comes from approximately performing the summation (this is the variable that used before it was killed by Cauchy-Schwarz). The approximation (**) can be established in a pointwise sense for the easier sums but only in an average sense for the more difficult sum , but this is all still good enough to get a good bound for (*) which is good enough to control Type I/II sums. It seems that the precise value of is not going to be important as it will cancel itself out when we get to . (I suspect that by making sure that has at least one factor of in its Dirichlet convolution, one could in fact ensure that was very small, that the second term in was negligible, and that one could skip over the S_3,S_2 analysis and jump straight to S_1, but this would be a relatively minor simplification.)

~~Section 7~~Section 8 is devoted to establishing (**) for S_3, which is the easiest expression as one does not have congruence conditions such as or here. is the quantitywhere

* are squarefree, coprime to mr, -smooth, have no tiny prime factors, and range over a scale that is something like ,

* are coprime to respectively;

* is a smooth localisation to a scale M somewhat less than , thus is also localised to this scale;

* are quantities that are bounded but otherwise uncontrolled,

* is controlled by some power of the divisor function and obeys Siegel-Walfisz type estimates.

From Poisson summation the innermost sum is very well controlled, basically

.

Applying this approximation we get the desired representation (**) (with defined to basically be all the mess that remains after applying this approximation and factoring out .)

I will probably skip Section 9 (which treats the cross term S_2) and plunge straight into Section 10, which tackles the S_3 term which is the heart of the Type I/II analysis.

6 June, 2013 at 4:53 pm

v08ltuFor your first bullet point, I think can be as large as and can be as small as . So they can actually be of some size (and indeed this reasons why Type I has a lower limit on ).

Zhang is terse on the Mobius inversion to get 8.2 (as are BFI), but you spin out the gcd condition as , swap variables, clip any large ahead of time and apply Lemma 7 to the resulting progressions which thus have an extra multiplier of on the free variable. Standard perhaps, but can be quizzical if never seen. BFI do explain a bit more something vaguely similar (but much more complicated) near the bottom of p240.

You also seem to have renumbered Zhang’s sections.

6 June, 2013 at 6:55 pm

Gergely HarcosJános Pintz increased Zhang’s , see http://arxiv.org/abs/1306.1497

7 June, 2013 at 1:46 am

bengreenIt looks as though Pintz has gone through Zhang’s argument and worked out a better (possibly optimal) exponent : he seems to get the value . I guess this will be a useful check for our reading seminar once we get to the point where we fully understand the interdependence of parameters. It seems to me as though Pintz is unaware of the work on the other blog at the GPY ends of things, because his arguments are not so optimised there (so far as I can tell).

7 June, 2013 at 3:15 am

Alastair IrvingMore generally it looks like Pintz shows we can take any satisfying . He then chooses , giving the specific value of . I don't see why he made that choice for .

7 June, 2013 at 11:53 am

v08ltuOn the other page, I computed that works, and that this minimizes (at least locally). One simply has determined by the constraint , as this Type I/III barrier dominates the analysis (in fact, I find Zhang has an error or two in Type II statements, but these do not bother the Pintz analysis as far as I can see). From the critical case analysis I (now) do get the same 1/16 in 2.39 as Pintz, and I trust that he has counted/optimized multiples correctly. One annoyance is that “Q” and “R” are parameters that are re-used in 7-12 and 13&14, but are different in the two parts (as you split differently).

7 June, 2013 at 4:41 am

bengreen(Edited)

This is the third of my postings on Zhang’s treatment of the Type III sums in Sections 13 and 14 of his papers. I’m now going to sketch out the argument with a more general choice of parameters, rather than simply in a model critical case as before. Thus we’ll consider now the problem of estimating

where is supported on the subdyadic range and , , , , , and . Here, is the quantity we’re really interested in, and we’d like to take it as large as possible. On a first run through the calculation, detailed below, I get the constraint

Here, is the parameter used in the smoothed truncation of the GPY sieve, thus is comprised only of prime factors less than . This is going to give flexibility in factoring as later on.

There are a couple of possible sources of error here: first my arithmetic could be dodgy, though this should be easily ironed out. Secondly, it’s possible I’ve overlooked some important feature of the argument that places some nontrivial further constraint. Hopefully other readers of this blog can flag up any likely issues in this regard.

I’m broadly following the lines of my earlier pair of posts, so will be a little brief with explanations here.

The quantity to be estimated is

Completing sums and taking out the main term with , the task is then to show that

uniformly in . I also took out the sum over , by the way, which serves no further purpose in proceedings.

Now we introduce a shift. Later on we’re going to factor , with . We average over shifts of by , for . Thus the preceding is replaced by

Now this is only permissible if , and this gives us our first key constraint, which I’ll call (1):

Later on, will also be required to satisfy a lower bound (so that a certain diagonal contribution is negligible) and this will ultimately put a strain on how big can be.

Now as varies over and varies over , varies over values of modulo . Write for the number of representations of in this form; thus is roughly for values of , and zero otherwise. The estimate we now want is

(The weight has been included because we are now taking expectations over all of mod ; this was the error in the previous version).

Applying Cauchy in , and using the estimate (coming from the remarks above)

we see that the task is reduced to showing that

Expanding out the square, this is

Now (and this is a point I glossed over without proper comment in my previous posts) we need to remove the diagonal contribution , because the Bombieri-Birch bound does not give as much cancellation in that case. As I’ve written things in this rough workthrough, the contribution from is

The inner expectation will be zero unless , in which case it’s 1, so we get an estimate of for this diagonal term. In a rigorous writeup (such as Zhang’s!) this step is more complicated and one must estimate a Ramanujan sum, but I think the constraint is pretty much the same.

Comparing this with our aims, we see that in order for this diagonal contribution not to ruin our hopes we must have

We now have upper and lower bounds for , but for the argument to work there must be some legal choice for . Thus, comparing with constraint (1), we see that

which tells us that

I’ll call this constraint (2). Note that in the model case discussed before, namely and and , we didn’t see this issue; could have been chosen to be anything in the range .

Let’s suppose from now on that constraint (2) is satisfied, and that we have chosen an appropriate value of (it doesn’t matter which). Thus our task now is to show

Now we complete sums again to replace the averages over by averages over all , and must prove

thus

The inner sum is by Bombieri-Birch (BB) but, as we saw before, this does not give a nontrivial estimate. To get one, employ Zhang’s trick of writing and factoring

The estimate is now the crucially stronger , which is .

Comparing this with our aim, we see that victory may be declared if

Finally, we compare this with constraint (2), which gave an upper bound on of

Now we are assuming that is comprised of primes less than , so we are free to choose in any range of length at least . Thus in order for there to be room for us to be able to factor with being of an appropriate size we need

which gives

We remark that under the constraints and the right hand side is at least . Equality occurs when . This appears to set an upper limit of for what we can hope for, but more stringent upper bounds may be required in the Type I and II analyses. I am told by v08ltu that the condition needs to be weakened still further, to , to accommodate the limits of the Type I/II analysis; this will weaken the bound still further.

7 June, 2013 at 12:21 pm

v08ltuFirst statement: you say , but I think our rendition is that . Multiply by 2 throughout.

“Comparing this with our aim we see victory may be declared”: I think you mulitplied the by 2, but not the other parts (possibly thinking of )? I just get

.

7 June, 2013 at 12:25 pm

v08ltuOK, now I see, the next big displays make the same typographical error then rectify it. In the “appropriate size we need” display the RHS inside parenthesis should be divided by 2, but in the very next line you get the math correct I think.

7 June, 2013 at 12:35 pm

v08ltuYou have not stated the final constraints in most precision. It is not just , but (in current form). A worst case of might be closer to valid, though as I say, you have multiplied by 2 from its definition. And the Type I/III barrier will be optimized itself, not being but in Pintz.

7 June, 2013 at 2:36 pm

v08ltuIn the other thread (about optimizing ), I noted that Pintz appears to have made a math error, which I detected when trying to reconcile your estimates (which I determined matched what he wrote in 2.30 and 2.35, but then he screws up in 2.39). I am getting after correcting his work, and indeed get the same from your estimates with the Pintz choice (optimal I think) of the Type I/III split. Plug in into your final bound, multiply by 2 for the right normalization, and get , or . It remains to see that Pintz is correct about Type I/III splitting, which is the top line of page 4, coming from Section 11.

7 June, 2013 at 3:05 pm

bengreenThanks for these – corrected to throughout.

8 June, 2013 at 5:43 am

Bound on prime gaps bound decreasing by leaps and bounds | The Aperiodical[…] Interested followers should take a look at an online reading seminar conducted by Terry Tao which is going through the original paper. […]

8 June, 2013 at 7:00 am

List of Misconceptions and Prime Gap Checkin | Pink Iguana[…] Interested followers should take a look at an online reading seminar conducted by Terry Tao which is going through the original paper. […]

8 June, 2013 at 7:05 am

bengreenMaybe I should make the point that a lot of features of the Type I analysis will be very familiar to people who have worked with the method of bilinear forms, or in other areas such additive combinatorics where one has an expression such as with the functions and unknown. Two applications of Cauchy-Schwarz will destroy them, leaving a quantity like

.

The argument above was like this, but with an intermediate step of Fourier expanding a cutoff before the second Cauchy.

8 June, 2013 at 7:15 am

Terence TaoI’ve started a section on the wiki on Errata at

http://michaelnielsen.org/polymath1/index.php?title=Bounded_gaps_between_primes#Errata

v08ltu, I was unable to see the issue in (9.9) of Zhang you reported. Could you elaborate on why the factor there should be instead? Also, if you have found any other errata for Zhang’s paper it could be useful to state them explicitly there (or as a comment here), even if they end up having no significant impact on the final bounds.

8 June, 2013 at 12:49 pm

v08ltuOk, I am wrong, I rewrote the definition in (7.2) wrong (I thought the parameter was , it made more sense to write it in terms of rather than ).

8 June, 2013 at 9:33 am

bengreen(Edited version of an earlier post, now deleted, clarifying a few points.)

In this post I’m going to examine the Type I case. Combining this with our previous analysis of the Type III case, we shall see what happens at the border between these two cases, which seems to be critical in determining the efficiency of Zhang’s argument. In particular I believe I can confirm the analysis of v08ltu, who showed that Zhang’s argument in fact leads to the constraint

The point of this very lengthy post, in addition to trying to demonstrate this bound, is to explain the analysis of the Type I case.

Thus I’m going to examine how to get bounds on

where is supported on a dyadic interval and is supported near . Here, is the set of residue classes mod at which the polynomial vanishes.

Our conclusion will be that the Type I analysis works if

and if

.

Remember that the parameter is coming from the assumed smoothness of : is comprised only of primes less than .

Thus we see that the analysis is effective (for some , and if is sufficiently small) provided that lies in the open interval . To deal with the case of very small (and in particular ) we need the Type II analysis. I plan to post on this soon, but I do not expect it to be a quantitative bottleneck in the argument. To deal with we need the Type III analysis, which seems to be the major new input of Zhang’s work.

Let’s get down to business. The above expression is already not quite what Zhang looks at, because from our term

he subtracts a term

inside the modulus signs. As Terry has been remarking in his study of this Type I situation, taking account of these terms basically takes up Sections 8 and 9 of Zhang’s paper. In fact, because Zhang looks at the average of the square , one ends up having to analyse terms with , and . In the notation of his paper, these correspond to the sums and respectively on page 28. As Terry notes, it seems as though the business end of proceedings is the analysis of , and indeed a glance at the relative lengths of/seriousness of input to the corresponding sections would seem to back this up. Thus, following Terry’s lead, I’m going to completely ignore and proceed straight to Section 10 of Zhang’s paper.

Set and .

Now Zhang cannot say anything about the average

itself. He must restrict only to those that are nicely factorable as with and in some favoured ranges. I’m going to suppose that this can be done with and , where is a quantity to be specified later. For the benefit of anyone reading through this, let me say that if we assume then we shall take . A factorisation like this also appeared in the analysis of the Type III method. Emmanuel Kowalski made the nice remark that the factorisation appears to be a technical trick (albeit a completely crucial one) in the analysis of the Type III sums, but is absolutely intrinsic to the analysis of the Type I sums. We will see this now.

What we need to estimate, then, is

The averages over and are rather minor, being of essentially bounded size, so I’m going to delete them in the ensuing discussion and just assume that and are fixed. It is important (perhaps this is the only thing that is important in this regard) that (by the Chinese remainder theorem). (A very minor issue is that it seems lexicographically eccentric that Zhang pairs with and with , but I don’t want to deviate from broadly following the notation of his paper here).

Now we begin properly. Let me warn the reader that, even with many simplifying cheats, things are going to get a bit horrific: quite a bit worse than the analysis of the Type III sums. This really is unavoidable. Thus we look at

We want to show that this is (in fact we want to win by a small power of ). Opening up the convolution, this is

where here is just some sign. We shan’t be able to say anything useful about it (there was a discussion between Terry and v08ltu about this) and it is destined only to be annihilated by Cauchy. Was it valid to invert modulo and ? We simply won’t care about this kind of issue and many others like it, which add a lot of additional complexity to Zhang’s paper but do not affect the substance.

Now we Cauchy to eliminate , reducing our task to

Expand out the square, and ditch the averaging over . This reduces things to showing that

for all . Call this expression (1). Here I wrote mod for the solution to the congruences . Note that the signs also depend on but we have now supressed this.

I should say at this point that pages 34, 35 and the first half of page 36 of Zhang are devoted to showing that we can basically neglect the contribution from everything in which have a common factor; I’m going to completely ignore this point, hoping that it isn’t the point at which serious constraints are placed on the analysis (I think this is right).

In the preceding expression, I want to draw attention to the fact that in the crucial range of parameters we’ll have , so typically the inside expectation over is empty. Otherwise, the analysis would actually be quite easy (and somehow this is one of the main points of why sieve theory is difficult: if then the number of with is either 0 or 1, but it’s hard to know which).

Somewhat schematically, we have the Fourier expansion

Call this expression (2). Here, and so

, and for the argument to work we need this to be positive.

This Fourier expansion is an important step (it’s the first place one sees a hint of things that may lead to Kloosterman sums) so I’ll elaborate a bit. Those wanting to get to the point can skip this section. Writing mod for the solution to the congruences , we detect

by orthogonality. Taking expectations over , the left hand side of (2) can then be expressed as

But the inner expecation is the Fourier transform mod of the dyadic range , and roughly speaking it is for and zero else. This is an oversimplification, of course, and also the dyadic range will need to be smoothed appropriately. Details are in Zhang’s Lemma 7; this is basically the same thing as the “completion of sums” I talked about in the Type III analysis.

Finally I factored as a product ; for the routine details see Zhang, top of page 38.

Now let’s return to the main line of argument. We combine expressions (1) and (2) above to see that our task is reduced to showing that

Note that the trivial bound is now only , not ; we’ve seen this issue that completion of sums can hurt before, in the analysis of the Type III sums. The term with is a kind of main term, which (as it turns out) cancels out with other similar terms coming from the analysis of and , a part of Zhang’s argument that we’re glossing over. Thus we’ll assume from now on that .

Make ths substitutions and and replace by the average .

This is only permissible if , which imposes the constraint

This trumps our earlier constraint coming from the requirement than be positive, so we may now forget that constraint.

Then delete the average over , reducing our task to showing that

Now we know that we’re going to have to Cauchy away the unknown coefficients as some point. Taking a deep breath, by Cauchy with the variables on the “outside” and on the “inside” this reduces matters to

Call this expression (3).

The case is singular here, and it is this feature which makes the analysis ineffective in the limit as and necessitates the Type II analysis.

The number of solutions to with and is bounded by .

Thus we can estimate the contribution from these singular terms by essentially . For this to be a win, we thus need . Since , this implies the constraint

.

This shows why the analysis breaks down as (the Type I/II border), but this constraint is irrelevant for the Type I/III border.

Assume henceforth that we have successfully despatched this singular case. Then our inner expectation has the form

where and , and . This is, I believe, a fairly standard type of sum and it is estimated using completion techniques and the Weil estimates for Kloosterman sums. This is Lemma 11 of Zhang and I shall write a short post on it presently, in the simplified language used so far in these posts. Anyhow, the bound obtained is .

Finally the whole of expression (3) is bounded by times this, that is to say by .

Hence we win if

. Call this (A).

This condition is, of course, least damaging when is taken as small as possible.

Our constraint on is that

.

Clearly we should try and take .

Remember, however, that we cannot necessarily factor with having exactly any desired size: we are assuming that the divisors of are smooth. However we can take some

.

Constraint (A) now becomes

. Call this (B).

Finally, let’s use this to confirm v08ltu’s constraint on and coming from the Type I/III border. Here will be in the vicinity of .

Recall that the Type III estimate was

(Here, we were looking at averages over progressions of some , with supported on the dyadic range and .)

I think the worst situation is when

Then we have a choice whether to use Type III or Type I. If we use Type III we obtain the bound

Alternatively, we could use Type I (with and ) with parameter .

then we obtain the bound

The worst case is when these bounds are equal, which appear to confirm the constraint

found by v08ltu.

8 June, 2013 at 1:08 pm

v08ltuIt is correct the with a nontrivial gcd contribute nothing, due to his imposition that they have no prime factors smaller than beating all logs. Compare S6 of BFI. However, I think 34-36 with this estimation, are the only places where he uses the multiplicative structure of his residue system.

8 June, 2013 at 1:32 pm

v08ltuAnother point is that the estimates in S10 are where the Siegel-Walfisz bound A2 is used. This is used to show that the principal terms () asymptote the same (it is here where small moduli matter). BFI calls this a Barban-Davenport-Halberstam theorem, and split Zhang 10 into S6&7, which is perhaps easier to digest (the borderline being about 10.11 of Zhang).

8 June, 2013 at 1:42 pm

bengreenInteresting points, both of these. I’ll try and edit my posts to mention them in due course. I have to say I’d been complete ignoring the SW condition on my first reading.

8 June, 2013 at 2:26 pm

v08ltuThere is a discussion of dispersion (I actually found this link from an earlier 2012(?) post on this blog) at the Encyclopedia of Math.

http://www.encyclopediaofmath.org/index.php/Dispersion_method

[An informative link, thanks! (I may not have read that link in full when I encountered it last, or maybe it was not as well developed as it is now. I've added the link to the wiki. -T.]8 June, 2013 at 2:42 pm

bengreenI spotted this during a recent Google search. It’s not incredibly easy to see how it sheds light on the present situation, I have to say, except for the formal similarity with the sum and some notion that setting up multiplicative convolutions and hence bilinear forms is a good thing. But I would be interested to learn more about the dispersion method in general.

8 June, 2013 at 2:46 pm

v08ltuOne rationale for why you never have to worry about whether it was “valid” to invert modulo whatever, is that I guess you get a gcd condition when it is not (in Lemma 11), but these are “small” gcds compared to the “singular” contribution (which is a gcd with 0, as it were). Zhang’s phrase in S11 (after 11.5) is “a much sharper bound for the second term … can be obtained”, and similarly with 9.11-9.15 and 12.5-12.6 the same is true.

8 June, 2013 at 10:54 am

bengreenFor the errata: in Zhang’s paper equation (12.3) has a the divisor function to the power ; I think . I guess he does say that can be any positive constant, not the same at each occurrence, but I think it would be perverse not to take it to be at this particular point.

[Added, thanks - T.]8 June, 2013 at 11:52 am

bengreenIn this post I shall finally get to the Type II sums, the analysis of which is Section 12 of Zhang’s paper. Notationally, this might be the most forbidding of the three cases.

Similarly to the Type I case, we shall be interested in estimates for

where is supported on a dyadic interval near and is supported near .

I aim to show that the Type II analysis is nontrivial for (and, crucially, at the point left out from the Type I analysis). Moreover in this range we shall establish that the constraint imposed on and is

We follow the Type I analysis, defining parameters and (the aim being to factor with and ).

Continue to the point just before the second application of Cauchy, where we had reduced matters to establishing that

We can assume . Here, is some parameter, and

To make matters proceed smoothly to this point, we needed to assume the constraint

At this point the analysis diverges from the Type I case, and we instead despatch the unknown (and unwanted) quantities by the triangle inequality, reducing things to proving that

Now apply Cauchy, so we need to prove that

(It might have been better to say that we Cauchy with just on the “outside” and all of on the inside; before, we Cauchy’d with and on the outside.)

As before, there is a "singular" case in which . The number of solutions to this equation is bounded by

which is essentially . Thus the contribution to our main expression from these singular cases is basically at most , and we win if . Note that, crucially, this is a weaker condition than in the Type I case, where we need . Unfortunately the bounds imposed by subsequent parts of the argument become weaker as increases, and for the Type II argument gives nothing whereas the Type I argument is nontrivial. This condition is automatically satisfied given our earlier stipulation that .

Thus we can ignore these singular cases. In the nonsingular cases the inner expectation over is, after a certain amount of messing around, basically of the form

where and (and ). We can apply the same estimate for incomplete Kloosterman sums as before (Lemma 11 in Zhang) to conclude that this is bounded by . I calculated this to be .

Not forgetting that we still need to sum over , we see that the problem surrenders if

Recalling that

this gives

We want to take as small as possible subject to the constraint , but remember we only get to specify up to a tolerance of . Thus the constraint is now

which can be rearranged as

as claimed.

Note that improving this part of the argument is going to be pretty futile, because this dependence is vastly better than that imposed by the Type I/III border.

8 June, 2013 at 3:33 pm

v08ltuTo spell it out a bit differently, the unfortunate reason why the bounds become weaker as increases, is that you took two ‘s in the Cauchy here (rather than one in “Type I”), and so have rather than appearing. As , this becomes problematic as progresses larger. On the other hand, in the limit, you don’t have an extra term outside the Cauchy, sothereby saving (a needed) on the diagonal. In other words for Type II, the sum over is short enough that you can square its length without bother, and doing so saves (relatively) on the diagonal.

In other contexts there are similar instances with such short sums and Holder can profit more (see the proof of Burgess in “Estimates for Character Sums” by FI), but here that is not needed.

8 June, 2013 at 12:00 pm

Gergely HarcosAnother small correction to Zhang’s paper: in the first display on page 5 should be multiplied by , because in (2.2) can be that large, cf. (2.4)

[Added, thanks - T.]8 June, 2013 at 1:23 pm

v08ltuSome of Zhang’s most annoying errors occur right at the end. For instance, in 14.17 he garbly puts , when he only wants this a single sum over (same on RHS). Three displays before the should be in a denominator. The penultimate display on page 51 has a trivial typo (two commas), as does the top line of page 49. The display below 10.5 has an unnecessary subscript on the , the last line of the definition of below 10.14 has the in the 8-fold in the wrong place (compare next display).

One previous problem I had was following (12.5), but I think that is Ok now (I had used a generic bound on , but by the definition on line 3 of page 43 it cancels an lcm).

8 June, 2013 at 1:52 pm

Terence TaoI recorded all of these errata except for the (14.17) one, since it appears to me that what he is doing is splitting the variable of summation in (14.15) as , and then is relabeled and is simply denoted .

Thanks to your and Ben’s exposition I think I might be able to read through all of Zhang and summarise it for the next rollover post (it’s about time for this thread to roll over at any rate). It may take a few days though.

8 June, 2013 at 2:02 pm

bengreenSounds like a plan! I’m more or less done at this point, or at least I’m prepared to let the dust settle for a while. I’ll pen a short post about the incomplete Kloosterman sum estimate (Lemma 11) and possibly also revisit Section 6, but I think Sections 7 to 14 are covered now at least to the first level of detail (which, for the most part, is probably as far as things need to go on this blog except for the occasional interesting comment like v08ltu’s one above about getting coprime). Sections 8 and 9 have been given short shrift, but they seem to be similar to, but strictly easier than, Section 10.

One observation is that one could make the bound completely rigorous just by reference to Zhang’s paper (because, indeed, that bound comes from working out the optimal exponents everywhere in what he does). If any conceptual improvements come to light, though, it might be much more painful as the second (and third…) levels of detail may need to be penetrated.

8 June, 2013 at 3:23 pm

Terence TaoI’m slowly beginning the writing up of our numerically optimised version of Zhang’s Theorem 2 with various notational simplifications. One question: now that you have gone through the paper, what properties of the residue classes (or ) do you see used by Zhang, other than cardinality bounds and the Chinese remainder theorem ? Is the fixed nature of the polynomial used to define the classes used at some point? Presumably yes (otherwise one is very close to proving Elliot-Halberstam, at least on the smooth moduli) but I don’t yet see where it is actually used.

8 June, 2013 at 3:36 pm

v08ltuYes, this is in Section 10, see (10.6) and the display above in particular, and the resulting divisor-sum estimates.

8 June, 2013 at 4:27 pm

Terence TaoThanks! That’s a relatively crude use of the bounded coefficient nature of the polynomial P – it seems to me that it would suffice for the coefficients of P to be of polynomial size and one would still get comparable results. If this were the case (and the bounded coefficient nature of the polynomial P is not used elsewhere in the proof of Theorem 2), this would allow one to delete Sections 8 and 9 of Zhang by the following trick: instead of looking at discrepancies such as

that compare one congruence class to the mean, instead look at discrepancies such as

that compare one congruence class to a shift of that class by some integer t of polynomial size, which is essentially amounting to translating the polynomial P by a polynomial size shift (which keeps all coefficients of polynomial size). If one squares this sort of thing out then one only gets S_3 type terms and no S_1,S_2 terms and the key point is that all the cross terms should end up giving a contribution that is essentially independent of the shift t. Averaging over all t of size up to, say, x^10, we should get back the classical discrepancies (there is a bit of book-keeping to ensure that one only works with primitive residues, but I think this can be managed by some relatively simple combinatorial manipulations).

8 June, 2013 at 8:12 pm

Terence TaoIncidentally, related to the above discussion, I believe that the estimate on the third display of page 36 needs some factors of because it is possible that has some large common factor with , but this does not significantly impact the later analysis.

8 June, 2013 at 8:19 pm

v08ltuI must admit that I found the top of page 36 too hard to parse, and made my own argument there.

8 June, 2013 at 8:48 pm

v08ltuNote that Zhang does not win by a power in this part, but only by . Maybe this can be , or whatever what you call the -trick can handle. For a divisor-bound, this might demand more uniformity than varying the -coefficients allows.

His method essentially says , and then estimates this over in AP.

8 June, 2013 at 2:11 pm

v08ltuAnother minor error, on the display after Mobius inversion on page 47, the condition on the inner sum should have I think (the inversion replaced by ).

[Added, thanks. -T]8 June, 2013 at 2:20 pm

v08ltuThe display on the bottom of page 14 is missing a right parenthesis on the second condition in the summation.

[Corrected, thanks -T.]I just (14.17) is Ok after your explanation, though I have not convinced myself that what he does is a bit superfluous compared to something more direct (given his goals).

9 June, 2013 at 12:35 am

v08ltuSince Zhang wins in the “main” term in S13&14, it may be useful to think about how to improve the secondary terms (eg the diagonal), to optimize the error. In their rendition, FI Theorem 4, they get the main term that Zhang would get w/o the factor-trick, and various secondary terms. In their remarks, they say that one might improve, for instance by using Holder instead of Cauchy. An idea along these lines might be to shift by instead of (the idea of shifting by a product in essence), where but can be in convenient subranges, say $k_1$ small and Holder that variable (somehow). This might help quell the diagonal, though I suspect the losses elsewhere (completion?) will ruin it.

9 June, 2013 at 6:09 am

bengreenIt seems as though the “Level 1″ structure of Zhang’s paper has now been mostly discussed on this blog. I thought I’d start looking at the “Level 2″ structure of the paper, just to check that no unforeseen constraints or difficulties are lurking there. By “Level 2″ structure I mean certain technical issues that one would definitely not mention in a seminar on the subject, but which are more important than “Level 3″ issues such as checking that cutoffs can be smoothed out, book-keeping of coprimality constraints, and so on.

By the way, I should say that this top down method is my favoured one for reading papers, and is probably something to be borne in mind when writing them (though I am certainly guilty of not paying attention to this principle).

The most worrying piece of Level 2 structure for me is the material on pages 34, 35 and 36 of Zhang’s paper. It has been discussed to some extent by Terry and v08ltu above. In my discussion of the Type I case, I had mentioned how things are reduced to showing that

Call this (1).

(I am assuming familiarity with that post, though the notation there is similar to Zhang’s paper).

I then remarked that pages 34, 35 and 36 of Zhang are devoted to showing that one may restrict the expectation just to those that are coprime.

I now realise, following comments from others, that to make this happen requires a few modifications that make the argument look even more complicated than it already is.

Remember that , and that we have attained this factorisation using the fact that is -smooth. It turns out that one should also make sure that does not have any small prime factors: the right notion is that we should have . I will explain this more in a subsequent post. Now this cannot be attained automatically; one instead has to go right back to the start of what we are trying to do, and observe that in our goal of trying to estimate

,

we can in fact ignore the contribution of those for which

,

essentially because there are very few such . Zhang notes that this can be done by “trivial estimation”, but the definition of trivial here is a little niche and that is why a MathOverflow question was devoted to this point:

http://mathoverflow.net/questions/131825/a-technical-question-related-to-zhangs-result-of-bounded-prime-gaps

The question was answered by Sungjin Kim of UCLA, and it *is* trivial, at least if you have really known your way around estimates for divisor-like functions for a few years.

Incidentally Zhang performs this step before he decomposes using Heath-Brown’s identity, but I think it is only required in the analysis of the Type I and II sums, and is technical, so perhaps it should be moved over there.

So now we know that we can assume , with and being within of any given power we like, and furthermore with having no small prime factors (smaller than . The point about doing this is that the variables in the expression (1) now satisfy a dichotomy: either they’re coprime (the case we saw, at least roughly, how to analyse before) or else they share a common factor . In this case we end up winning by a factor of which, because it beats all powers of , is fine for our purposes.

I’ll post details of this calculation in a second post, which I do not have time to write right now. There is an additional key point, which is that for once it is a bit too simplistic to simply delete (as I did right at the start) the extra averagings over and , which I dismissed before as being “essentially bounded”. The problem is that taking account of these sets leads to functions behaving a bit like the divisor function, which can grow as big as , so if we’re not careful the factor of won’t be enough to kill these contributions. (This was not an issue anywhere else, where we always got power savings. As noted by v08ltu, we do not get power savings from this part of the argument).

As it turns out the factor of does still kill these potentially unruly contributions, but we must check this carefully and here (and only here) we use something of the specific structure of the residue sets beyond mere multiplicativity. Recall that these are defined to be those for which and .

I’ll say more about this in the next post.

9 June, 2013 at 8:26 am

Terence TaoDear Ben,

I was grappling with this issue too as I try to write up a post on Zhang’s theorem (and am currently trying to deal with Section 10). The issue isn’t so much the cardinality of the (actually I have a randomisation trick that allows one to reduce to the case ), but rather the size of the “divisor function” . As you say, sup-norm bounds on this function (i.e. the divisor bound) are too weak to be absorbed by the tiny gain of . Instead one needs type bounds on this quantity along any non-trivial arithmetic progression (length ) with bounds that are basically polylogarithmic in nature. For this Zhang appeals to a standard bound in the literature (Brun-Titchmarsh for multiplicative functions, see Lemma 8 and [11]).

Abstractly, the only axioms for C(q) = C_i(q) that appear to be necessary in Zhang’s argument are:

* Primitivity:

* Chinese remainder theorem

* Bounded size at primes:

* divisor bound:

where . For the application at hand can be controlled by a polynomial combination of functions of the form for various , but one could imagine other applications of Zhang’s theorem involving fancier systems of congruence classes .

Anyway, if one abstracts the above axioms on , one gets a convenient consequence: one can then pass for free to the singleton case . The idea is to just pick randomly one representative of for each prime and use Chinese remainder to then form a singleton congruence class for each . If one can then prove Zhang’s theorem for these singleton classes, take expectations and one recovers Zhang’s theorem for the original classes (there is now a weight of but this weight may be easily disposed of by the usual Cauchy-Schwarz with a trivial bound with the opposite weight of and no gain).

On the other hand, the other trick I was proposing before, namely to replace the usual discrepancy with a more symmetric discrepancy , does not interact well with the stage of the argument where one is trying to get q_1 and q_2 to be coprime. My trick kills off the S_2 and S_3 terms, and replaces the S_1 term with a more general version of this term in which the moduli b_1, b_2 are completely unrelated (one lies in C_1(q_1) and one lies in C_2(q_2) for different systems of congruence classes C_1(q), C_2(q). I had thought that this generalisation was harmless but actually it seems to screw up the argument that eliminates the case when q_1, q_2 have a common factor. (But there is a chance that the screw up actually works in one’s favor. Still working to clear this up…)

9 June, 2013 at 8:44 am

Terence TaoRegarding ways to modify the discrepancy: I did find that when computing a discrepancy for a combined residue class as is done in the treatment of Type I/II sums, it was slightly convenient to split up the full discrepancy

as the sum of two subdiscrepancies, namely

and

The second discrepancy can be handled by Bombieri-Vinogradov (because the r parameter lies safely below ); it is the first discrepancy which is the main one. But the point is that when one Cauchy-Schwarz’es this discrepancy instead of the full discrepancy, one gets the constraint for free (which seems to be rather essential to the analysis, since we need the key reparameterisation for the Type I argument). (Doing things this way substitutes for the arguments on pages 37-38 of Zhang and demystified for me why we had two slightly different h=0 terms, X(r,a) and X^*(r,a), in the analysis.)

9 June, 2013 at 1:24 pm

v08ltuYes, in 37&38 (or BFI S7) the idea is that the asymptotic terms cancel up to a small error (distinction between X and X*), and this extra bit is bounded via a B-V style argument with the large sieve (see the proof of BFI Theorem 0, cited in Zhang Lemma 10) ending with more of a Barban-Davenport-Halberstam result (Zhang Lemma 10, BFI S7) upon the Cauchy.

The input to make this B-V machinery work is the small moduli Siegel-Walfisz bound, and then the large sieve induces what is wanted for moduli up to sqrt. I think Ben Green in his prior confusion over A2 ended up rederiving the same, that once one has the bootstrap from S-W, the rest is a large sieve (in some guise).

9 June, 2013 at 9:22 am

bengreenTerry: thanks, yes, you’re right that it isn’t the size of these sets themselves that causes the problem. I edited my post slightly in the light of your comment. The size of the set does come into play – I think this leads to the term (for whose definition you have to go back to Sections 1 and 3) on page 36. But this particular term will not harm the gain from ,

10 June, 2013 at 3:12 am

bengreenGiven my mathematical background I was certainly proposing to take the estimate of Bombieri and Birch, namely

,

as a black box.

However while looking at Friedlander-Iwaniec “Incomplete Kloosterman sums and a divisor problem”, which seems to have been a big inspiration for Zhang, I did realise what I suppose is a pretty obvious point: that after the substitution , , , this exponential sum may be written in the more suggestive form

,

where is the variety with , . This then links things to a much more general discussion of exponential sums of this type, to which Chapter 11 of Iwaniec-Kowalski seems to me to be a superb introduction for the nonspecialist.

The discussion of this sum by Fouvry, Kowalski and Michel here

http://www.math.ethz.ch/~kowalski/friedlander-iwaniec-sum.pdf

doesn’t seem to want to write it in this way, though: I’m not sure why.

10 June, 2013 at 11:20 am

Emmanuel KowalskiWe prefer writing the sum as in the note essentially because writing it as a sum of products of two Kloosterman sums at different arguments allows us to have a very simple conceptual argument (quasi-orthogonality) for the Birch-Bombieri estimate, instead of involving the geometry of algebraic threefolds, which is a rather delicate matter (though this is the way Birch and Bombieri give the first of their two proofs of the square-root cancellation estimate).

For instance, it’s not immediately obvious — to me — that the sum you write is of size at most p^2, whereas this is the “trivial” bound from the Kloosterman sum point of view.

Also, the Kloosterman sum expression (with slightly different arguments) turns out to appear in quite a few other papers of analytic number theory (Pitt, Munshi, our — F-K-M — own work on algebraic twists of modular forms). This seems unlikely to be a coincidence.

10 June, 2013 at 11:45 am

bengreenEmmanuel, but would you say that writing things as , even if it’s not the quickest way to proceed in this context, affords one the most general access to the deep theory? I ask mainly in case we start trying to mess about with Zhang’s argument, trying different Cauchys, etc. By the way I do enjoy your use of the word trivial here… are you sure the quotation marks are even necessary?

10 June, 2013 at 9:12 pm

Emmanuel KowalskiI think it is harder to prove square-root cancellation in many-variable character sums, absent extra information, than in one-variable sums with more complicated arguments than character values. More importantly here, the fact that the sums arise from Kloosterman sums seems to be such a piece of extra information that should be kept in view.

10 June, 2013 at 4:57 am

Online reading seminar for Zhang’s “bounded gaps between primes” | Apparel ego[…] Read more… […]

10 June, 2013 at 7:54 am

Terence TaoJust a note to say that I am about halfway through the writeup of Zhang’s Theorem 2 (in our notation, and with dependencies on and optimised and made explicit). More specifically, I have confirmed (modulo verifying the exponential sum estimates) that the Type I/II analysis works whenever

and

.

The first condition was already identified by both Ben and v08ltu (and in the corrected version of Pintz). The second one is needed for the Type II analysis to work, but in practice this condition is a bit weaker than the current ranges of that we are currently faced with. (But if we want to break past, say, , then we will need to tighten the Type II analysis somehow.) A minor remark: in the Type II analysis, it turns out that the second term in the exponential sum bound ends up dominating over the first term for the purposes of determining constraints on parameters, in particular even when this term ends up killing the estimate once exceeds 1/14 rather than 1/10 (which is what Ben computed as the threshold using the first term in the exponential sum estimate instead). It's fortunate though that the Type II analysis is not critical to any immediate optimisation goals, and one can afford to be a bit sloppy here.

Also the Cauchy-Schwarz for Type I and Type II is actually very similar. In both cases one is estimating a bilinear expression of the form

where are more or less arbitrary coefficients of controlled size, and is a certain explicit phase function (with some technical localisations to enforce some coprimality constraints). In Type II one Cauchy-Schwarz'es in n and then applies the triangle inequality to end up with estimating

whereas in Type I one Cauchy-Schwarz'es in and then applies the triangle inequality to instead estimate

i.e. one keeps equal. This constraint improves the exponential sum bound in the non-diagonal case, but makes the diagonal case denser, and it is this tradeoff which is causing the Type I/Type II barrier.

It just occurred to me that one might be able to squeeze a better bound than the trivial bound out of the treatment of the diagonal case in Type I, but then I realised that this just improves the Type I/II boundary and has no discernible effect on the final constraints on .

Starting on the Type III analysis now. The trick of factoring a Ramanujan sum (which has better than square root cancellation!) out of a Bombieri-Birch sum is pretty cool, and I guess is ultimately the most powerful new idea in Zhang's paper (though one which can only be seen after applying all the powerful old ideas).

10 June, 2013 at 9:30 pm

Terence TaoA small comment: the coefficients that get killed by the triangle inequality in the above analysis are not completely arbitrary; they basically factor as times a smooth cutoff on (admittedly there is this that ostensibly provides some more coupling here, but this is a smoothly varying factor and I think it can be more or less ignored). Potentially this extra structure could be used to squeeze out a bit more cancellation here by working with a Gowers box norm which corresponds to a higher dimensional exponential sum (and thus, in principle, a higher dimensional square root gain) but perhaps the gain here is worse than the price one would have to pay from completion of sums.

10 June, 2013 at 8:28 am

bengreenThanks Terry. I’ll go back and modify what I wrote on Type II so that there’s nothing (too grossly) incorrect on your blog. Am currently preparing to give a general audience lecture on this tomorrow in Cambridge…..

10 June, 2013 at 2:27 pm

Polymath8: Bounded Gaps Between Primes | Combinatorics and more[…] Zhang’s proof and improving the upper bound for the gaps. Here are links for three posts (I, II, III) on Terry Tao’s blog and for a post on the polymath blog. And here is the table for […]

11 June, 2013 at 7:30 am

Klaus LangeNow on arxiv I found a new paper from Pintz et. al. about finding small gaps between consecutive primes (regarding optimal weight function): http://arxiv.org/pdf/1306.2133.pdf

12 June, 2013 at 1:03 pm

Estimation of the Type I and Type II sums | What's new[…] is one of the continuations of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous […]

14 June, 2013 at 8:47 am

Estimation of the Type III sums | What's new[…] is the final continuation of the online reading seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous […]

14 June, 2013 at 9:05 am

Terence TaoNow that all the followup posts have been written, it’s time to close this particular thread and continue the polymath discussion at one of the followup threads:

* discussion of the Type I/II sums at

http://terrytao.wordpress.com/2013/06/12/estimation-of-the-type-i-and-type-ii-sums/

* discussion of the Type III sums at

http://terrytao.wordpress.com/2013/06/14/estimation-of-the-type-iii-sums/

* everything else about the Zhang paper at

http://terrytao.wordpress.com/2013/06/10/a-combinatorial-subset-sum-problem-associated-with-bounded-prime-gaps/

21 June, 2013 at 1:37 pm

The proof that frightened Pythagoras | cartesian product[…] Online reading seminar for Zhang’s “bounded gaps between primes” (terrytao.wordpress.com) […]

9 July, 2013 at 11:23 am

The p-group fixed point theorem | Annoying Precision[…] into bounds on the former, and this has various applications, for example the original proof of Zhang’s theorem on bounded gaps between primes (although my understanding is that this step is unnecessary just to […]

4 October, 2013 at 1:53 pm

zhang twin prime breakthru vs academic track/grind | Turing Machine[…] 7. Online reading seminar for Zhang’s “bounded gaps between primes” | What’s new […]

19 November, 2013 at 4:13 pm

An online reading seminar from Terry Tao’s blog | The Conscious Mathematician[…] http://terrytao.wordpress.com/2013/06/04/online-reading-seminar-for-zhangs-bounded-gaps-between-prim… […]

25 June, 2014 at 9:52 pm

大数学家张益唐的唐诗舒怀 | Chinoiseries 《汉瀚》[…] http://terrytao.wordpress.com/2013/06/04/online-reading-seminar-for-zhangs-bounded-gaps-between-prim… […]