The purpose of this post is to isolate a combinatorial optimisation problem regarding subset sums; any improvement upon the current known bounds for this problem would lead to numerical improvements for the quantities pursued in the Polymath8 project. (UPDATE: Unfortunately no purely combinatorial improvement is possible, see comments.) We will also record the number-theoretic details of how this combinatorial problem is used in Zhang’s argument establishing bounded prime gaps.

First, some (rough) motivational background, omitting all the number-theoretic details and focusing on the combinatorics. (But readers who just want to see the combinatorial problem can skip the motivation and jump ahead to Lemma 5.) As part of the Polymath8 project we are trying to establish a certain estimate called for as wide a range of as possible. Currently the best result we have is:

Theorem 1 (Zhang’s theorem, numerically optimised)holds whenever .

Enlarging this region would lead to a better value of certain parameters , which in turn control the best bound on asymptotic gaps between consecutive primes. See this previous post for more discussion of this. At present, the best value of is applied by taking sufficiently close to , so improving Theorem 1 in the neighbourhood of this value is particularly desirable.

I’ll state exactly what is below the fold. For now, suffice to say that it involves a certain number-theoretic function, the von Mangoldt function . To prove the theorem, the first step is to use a certain identity (the Heath-Brown identity) to decompose into a lot of pieces, which take the form

for some bounded (in Zhang’s paper never exceeds ) and various weights supported at various scales that multiply up to approximately :

We can write , thus ignoring negligible errors, are non-negative real numbers that add up to :

A key technical feature of the Heath-Brown identity is that the weight associated to sufficiently large values of (e.g. ) are “smooth” in a certain sense; this will be detailed below the fold.

The operation is Dirichlet convolution, which is commutative and associative. We can thus regroup the convolution (1) in a number of ways. For instance, given any partition into disjoint sets , we can rewrite (1) as

where is the convolution of those with , and similarly for .

Zhang’s argument splits into two major pieces, in which certain classes of (1) are established. Cheating a little bit, the following three results are established:

Theorem 2 (Type 0 estimate, informal version)The term (1) gives an acceptable contribution to wheneverfor some .

Theorem 3 (Type I/II estimate, informal version)The term (1) gives an acceptable contribution to whenever one can find a partition such thatwhere is a quantity such that

Theorem 4 (Type III estimate, informal version)The term (1) gives an acceptable contribution to whenever one can find with distinct withand

The above assertions are oversimplifications; there are some additional minor smallness hypotheses on that are needed but at the current (small) values of under consideration they are not relevant and so will be omitted.

The deduction of Theorem 1 from Theorems 2, 3, 4 is then accomplished from the following, purely combinatorial, lemma:

Lemma 5 (Subset sum lemma)Let be such thatLet be non-negative reals such that

Then at least one of the following statements hold:

- (Type 0) There is such that .
- (Type I/II) There is a partition such that
where is a quantity such that

- (Type III) One can find distinct with
and

The purely combinatorial question is whether the hypothesis (2) can be relaxed here to a weaker condition. This would allow us to improve the ranges for Theorem 1 (and hence for the values of and alluded to earlier) without needing further improvement on Theorems 2, 3, 4 (although such improvement is also going to be a focus of Polymath8 investigations in the future).

Let us review how this lemma is currently proven. The key sublemma is the following:

Lemma 6Let , and let be non-negative numbers summing to . Then one of the following three statements hold:

- (Type 0) There is a with .
- (Type I/II) There is a partition such that
- (Type III) There exist distinct with and .

*Proof:* Suppose Type I/II never occurs, then every partial sum is either “small” in the sense that it is less than or equal to , or “large” in the sense that it is greater than or equal to , since otherwise we would be in the Type I/II case either with as is and the complement of , or vice versa.

Call a summand “powerless” if it cannot be used to turn a small partial sum into a large partial sum, thus there are no such that is small and is large. We then split where are the powerless elements and are the powerful elements.

By induction we see that if and is small, then is also small. Thus every sum of powerful summand is either less than or larger than . Since a powerful element must be able to convert a small sum to a large sum (in fact it must be able to convert a small sum of powerful summands to a large sum, by stripping out the powerless summands), we conclude that every powerful element has size greater than . We may assume we are not in Type 0, then every powerful summand is at least and at most . In particular, there have to be at least three powerful summands, otherwise cannot be as large as . As , we have , and we conclude that the sum of any two powerful summands is large (which, incidentally, shows that there are *exactly* three powerful summands). Taking to be three powerful summands in increasing order we land in Type III.

Now we see how Lemma 6 implies Lemma 5. Let be as in Lemma 5. We take almost as large as we can for the Type I/II case, thus we set

for some sufficiently small . We observe from (2) that we certainly have

and

with plenty of room to spare. We then apply Lemma 6. The Type 0 case of that lemma then implies the Type 0 case of Lemma 5, while the Type I/II case of Lemma 6 also implies the Type I/II case of Lemma 5. Finally, suppose that we are in the Type III case of Lemma 6. Since

we thus have

and so we will be done if

Inserting (3) and taking small enough, it suffices to verify that

but after some computation this is equivalent to (2).

It seems that there is some slack in this computation; some of the conclusions of the Type III case of Lemma 5, in particular, ended up being “wasted”, and it is possible that one did not fully exploit all the partial sums that could be used to create a Type I/II situation. So there may be a way to make improvements through purely combinatorial arguments. (UPDATE: As it turns out, this is sadly not the case: consderation of the case when , , and shows that one cannot obtain any further improvement without actually improving the Type I/II and Type III analysis.)

A technical remark: for the application to Theorem 1, it is possible to enforce a bound on the number of summands in Lemma 5. More precisely, we may assume that is an even number of size at most for any natural number we please, at the cost of adding the additioal constraint to the Type III conclusion. Since is already at least , which is at least , one can safely take , so can be taken to be an even number of size at most , which in principle makes the problem of optimising Lemma 5 a fixed linear programming problem. (Zhang takes , but this appears to be overkill. On the other hand, does not appear to be a parameter that overly influences the final numerical bounds.)

Below the fold I give the number-theoretic details of the combinatorial aspects of Zhang’s argument that correspond to the combinatorial problem described above.

** — 1. Coefficient sequences — **

We now give some number-theoretic background material that will serve two purposes. The most immediate purpose is to enable one to understand the precise statement of Theorems 1, 2, 3, 4, as well as the deduction of the first theorem from the other three. A secondary purpose is to establish some reference material which will be used in subsequent posts on the Type I/II and Type III analysis in Zhang’s arguments.

As in previous posts, we let be an asymptotic parameter tending to infinity and define the usual asymptotic notation relative to this parameter. It is also convenient to have a large fixed quantity to be chosen later, in order to achieve a fine localisation of scales.

It will be convenient to set up some notation regarding certain types of sequences, abstracting some axioms appearing in the work of Bombieri-Friedlander-Iwaniec and Zhang:

Definition 7Acoefficient sequenceis a finitely supported sequence that obeys the boundsfor all , where is the divisor function. (In particular, any sequence that is pointwise dominated by a coefficient sequence is again a coefficient sequence.)

- (i) If is a coefficient sequence and is a primitive residue class, the (signed)
discrepancyof in the sequence is defined to be the quantityNote that this expression is linear in , so in particular we have the triangle inequality

- (ii) A coefficient sequence is said to be
at scalefor some if it is supported on an interval of the form .- (iii) A coefficient sequence at scale is said to
obey the Siegel-Walfisz theoremif one has- (iv) A coefficient sequence at scale is said to
obey the Elliott-Halberstam conjecture up to scalefor some if one hasfor any fixed . Thus for instance the Siegel-Walfisz theorem implies the Elliott-Halberstam conjecture up to scale for any fixed .

- (v) A coefficient sequence at scale is said to be
smoothif it takes the form for some smooth function supported on obeying the derivative boundsfor all fixed (note that the implied constant in the notation may depend on ).

To control error terms, we will frequently use the following crude bounds:

Lemma 8 (Crude estimates)Let be a coefficient sequence, let be fixed, and let . Then we havefor any congruence class (not necessarily primitive). In particular, if is at scale for some , one has

and hence from (5) we have the crude discrepancy bound

(In particular, the Siegel-Walfisz estimate (7) is trivial unless .) Finally, we have the crude bound

*Proof:* To prove (10) it suffices from (4) to show that

but this follows from e.g. Lemma 8 of this previous post. From (10) we also deduce (11). To show (12), it suffices to show that

(i.e. a Brun-Titchmarsh type inequality for powers of the divisor function); this estimate follows from this paper of Shiu or this paper of Barban-Vehov, and can also be proven using the methods in this previous blog post. (The factor of is needed to account for the possibility that is not primitive, while the term accounts for the possibility that is as large as .) Finally, (16) follows from the standard divisor bound; see this previous post.

As a general rule, we may freely use (16) when we are expecting a net power savings to come from another part of the analysis, and we may freely use the other estimates in Lemma 8 when we have a net super-logarithmic savings from elsewhere in the analysis. When we have neither a net power savings or a net super-logarithmic savings from other sources then we usually cannot afford to use any of the bounds in Lemma 8.

The concept of a coefficient sequence is stable under the operation of Dirichlet convolution operation

Lemma 9Let be coefficient sequences. Then:

- (i) is also a coefficient sequence.
- (ii) If are coefficient sequences at scales respectively, then is a coefficient sequence at scale .
- (iii) If are coefficient sequences at scales respectively, with for some fixed , and obeys a Siegel-Walfisz theorem, then also obeys a Siegel-Walfisz theorem (at scale ).
- (iv) If are coefficient sequences at scales respectively, with for some fixed , and obeys the Elliott-Halberstam conjecture up to some scale , then (viewed as a coefficient sequence at scale ) also obeys the Elliott-Halberstam conjecture up to scale .
- (v) If is the logarithm function, then is a coefficient sequence. Furthermore, if is smooth at some scale , then is smooth at that scale also.

*Proof:* To verify (i), observe from (4) that for any one has

so that is again a coefficient sequence. The claim (ii) then follows by considering the support of .

Now we verify (iii). We need to show that

for any and any residue class . By restricting to integers coprime to (and noting that the restricted version of still obeys the Siegel-Walfisz property if one divides out by a suitable power of ) we may assume that are supported on integers coprime to , at which point we may drop the constraint. As noted in Lemma 8 we may also assume that .

We now have

and

and by the triangle inequality so we may upper bound by

Applying (7) for and (14) for , we may bound this by

for any fixed , which is acceptable using the bound on .

Now we verify (iv), which is similar to (iii). Arguing as before, we have the inequality

The claim then follows from (13) and the Elliott-Halberstam hypothesis for .

Finally, (v) is an easy consequence of the product rule and is left to the reader.

We also have some basic sequences that obey the Siegel-Walfisz property:

Lemma 10Let be a smooth coefficient sequence at some scale with for some fixed . Then

- (i) obeys the Siegel-Walfisz theorem.
- (ii) In fact, obeys the Elliott-Halberstam conjecture up to scale for any fixed .
- (iii) obeys the Siegel-Walfisz theorem, where is the Möbius function.

Note that some lower bound on is necessary here, since one cannot hope for a sequence to be equidistributed to the extent predicted by the Siegel-Walfisz theorem (7) if the sequence is only supported on a logarithmic range!

*Proof:* We first prove (i). Our task is to show that

for any and any residue class . By applying the Möbius inversion formula

and the triangle inequality (6) we thus see that it suffices to show that

for any . We may assume that is coprime to as the left-hand side vanishes otherwise. Our task is now to show that

For this it will suffice to establish the claim

for any residue class (not necessarily primitive). Note that we may assume that as the claim follows from (14) otherwise. We use the Fourier expansion

where for or (by abuse of notation) for . We can thus write the left-hand side of (17) as

The term here is the first term on the right-hand side of (17). Thus it will suffice to show that

for all non-zero . But if we write , then by the Poisson summation formula the left-hand side is equal to

where is the Fourier transform of . From the smoothness bounds (9) and integration by parts we have

for any fixed , so for large enough (actually is enough) we obtain the claim thanks to the lower bound and the upper bound . We remark that the same argument (now with as large as needed) in fact shows the much stronger bound

for any fixed and any , which also gives (ii).

To prove (iii), we can argue similarly to before and reduce to showing that

for any , but this follows from the Siegel-Walfisz theorem for the Möbius function (which can be deduced from the more usual Siegel-Walfisz theorem for the von Mangoldt function, or else proven by essentially the same method) and summation by parts (if one wishes, one can also reduce to the case of primitive residue classes using the multiplicativity of ).

We remark that the Siegel-Walfisz theorem for the Möbius function is ineffective, although in practice one can obtain effective substitutes for this theorem that can make applications (such as the one for bounded prime gaps) effective, see e.g. the last section of this article of Pintz for further discussion. One amusing remark in this regard is that if there do happen to be infinitely many Siegel zeroes, then an old result of Heath-Brown shows that there are infinitely mnay twin primes already, so the ineffective case of Siegel-Walfisz’s theorem is in some sense the best case scenario for us!

Lemmas 9 and 10 combine to give plenty of coefficient sequences obeying the Siegel-Walfisz property. This will become useful when the time comes to deduce Theorem 1 from (the precise versions of) Theorems 2, 3, 4.

** — 2. Congruence class systems — **

The Elliott-Halberstam conjecture on a sequence at scale requires taking a supremum over all primitive residue classes. At present, we do not know how to achieve such a strong claim for non-smooth arithmetic sequences such as or once the modulus goes beyond the level , even if one restricts to smooth moduli. However, Zhang’s work (as well as some of the precursor work of Bombieri, Fouvry, Friedlander, and Iwaniec) allow one to get a restricted version of the Elliott-Halberstam conjecture if one restricts the congruences one is permitted to work with.

Much as with the coefficient sequence axioms, it is convenient to abstract the axioms that a given system of congruence classes will be obeying. For any set , let denote the set of squarefree natural numbers whose prime divisors lie in .

Definition 11Let . Acongruence class systemon is a collection of sets residue classes for each obeying the following axioms:

- (i) (Primitivity) For each , is a subset of .
- (ii) (Chinese remainder theorem) For any coprime , we have , using the canonical identification between and .
- (iii) (Uniform bound) There is a fixed such that for all primes .
If for all , thus , we say that is a

singleton congruence class system.For any integer , let denote the multiplicity function

or equivalently

A congruence class system is said to have

controlled multiplicityif for any congruence class with , we have

Note from Axiom (ii) that a congruence class system can be specified through its values at primes . A simple example of a singleton congruence class system with controlled multiplicity is a fixed congruence class for some fixed non-zero , with avoiding all the prime divisors of . This is a special case of the following more general fact:

Lemma 12Let be a fixed admissible -tuple for some fixed , and let . For any modulus , setwhere . Then for any , is a congruence class system on with controlled multiplicity.

Similarly, if is such that is coprime to for all and , then the system defined by setting when is coprime to , and when divides , with then defined for arbitrary by the Chinese remainder theorem, is also a congruence system on with controlled multiplicity.

*Proof:* We just prove the first claim, as the second is similar. Axioms (i)-(iii) are obvious; it remains only to verify the controlled multiplicity. Let be a congruence class with .

We can split

and hence

for each . Writing , this becomes

The claim then follows Lemma 8.

The more precise statement of Theorem 1 is now as follows:

Theorem 13 (Zhang’s theorem, numerically optimised)Let be fixed parameters such thatLet , and let be a congruence class system with controlled multiplicity. Then

for any fixed , where is the von Mangoldt function.

For the application to prime gaps we only need to apply Theorem 13 to the congruence class system associated to a fixed -tuple through Lemma 12, but one could imagine that this theorem could have future application with some other congruence class systems.

One advantage of the abstract formulation of a congruence class systems is that we get a cheap reduction to the singleton case:

Proposition 14 (Reduction to singletons)In order to prove Theorem 13, it suffices to do so for good singleton class congruence systems.

*Proof:* Let be a good congruence class system. By removing all primes with from we may assume without loss of generality that for all and some fixed .

We use the probabilistic method. Construct a random singleton congruence class system by selecting uniformly at random from independently for all , and then set and extend by the Chinese remainder theorem. Writing , we observe that the property that enjoys of being a congruence class system of controlled multiplicity is inherited by . From hypothesis we then have that

for any fixed ; taking expectations we conclude that

On the other hand, from Lemma 8 we have

and the claim then follows from Cauchy-Schwarz.

This reduction is not essential to Zhang’s argument (indeed, it is not used in Zhang’s paper), but it does allow for a slightly simpler notation (since the summation over is eliminated).

We can now also state the precise versions of Theorems 2, 3, 4 that we need:

Theorem 15 (Type 0 estimate, precise version)Let be fixed, and let be coefficient sequences at scales respectively with smooth,and

for some fixed . Then obeys the Elliott-Halberstam conjecture up to scale . In particular, for any and any singleton congruence class system we have

*Proof:* This is immediate from Lemma 10(ii) and Lemma 9(iv).

Theorem 16 (Type I/II estimate, precise version)Let be fixed quantities such thatand let be coefficient sequences at scales respectively with

and

with obeying a Siegel-Walfisz theorem. Then for any and any singleton congruence class system with controlled multiplicity we have

The condition (22) is dominated by (20) and can thus be ignored, at least at our current numerical ranges of parameters. We remark that this theorem is the only theorem that actually uses the controlled multiplicity hypothesis.

Theorem 17 (Type III estimate, precise version)Let be fixed quantities. Let be scales obeying the relationsand

for some fixed . Let be coefficient sequences at scales respectively, with smooth. Then for any , and any singleton congruence class system we have

for any fixed .

As we saw, Theorem 15 is easy to establish. On the other hand, Theorem 16 and Theorem 17 are far deeper and will be the subject of future blog posts. We will not discuss them further here, but now turn to the question of how to deduce Theorem 13 from Theorems 15, 16, 17 using Lemma 5.

** — 3. Decomposing the von Mangoldt function — **

The basic strategy is to decompose the von Mangoldt function as a combination of Dirichlet convolutions of other functions such as , the constant function , and the logarithmic function . The simplest identity of this form is

but this is unsuitable for our purposes because when we localise to scale , the factor could also be localised at scales as large as , and our understanding of the equidistribution properties of the Möbius function is basically no better than that of the von Mangoldt function, so we have not gained anything with this decomposition.

To get around this we need to find decompositions that don’t let rough functions such as the Möbius function get up to scales anywhere close to . One promising identity in this regard is *Linnik’s identity*, which takes the form

where is the restriction of the constant function to numbers larger than ; this is just the coefficient version of the formal geometric series identity

There are no Möbius functions in sight on the right-hand side, which is promising. However, Linnik’s identity has an unbounded number of terms, which render it unsuitable for our argument (we have various factors of and whose exponent would become unbounded if we had to deal with Dirichlet convolutions of unbounded length, which would swamp any gain of that we are trying to detect). So we will rely on a truncated variant of Linnik’s identity, known as the Heath-Brown identity.

We will need a fixed positive natural number (Zhang takes ; the precise choice here does not seem to be terribly important). From the identities and , where is the Dirichlet convolution identity, we see that we can write as a -fold convolution

where denotes the convolution of copies of .

As with (23), the identity (24) is not directly useful for our strategy because one of the factors can still get as large as the scale that we are studying at. However, as Heath-Brown observed, we can manipulate (24) into a useful form by truncating the Möbius function . More precisely, we split the Möbius function as

where is the Möbius function restricted to the interval , and is the Möbius function restricted to the interval . The reason for this splitting is that the -fold Möbius convolution vanishes on , and in particular we have

on . Expanding out and using the binomial formula, we conclude that

By (24), the term of (25) is just . For all the other terms, we can cancel against using the Möbius inversion formula to conclude the *Heath-Brown identity*

on . Now we see that each of the Möbius factors cannot reach scales much larger than , although the factors may *collectively* still get close to when is close to . From the triangle inequality and Proposition 14, we thus see that to establish Theorem 13, it suffices to establish the bounds

whenever are fixed and obey (20), , , is fixed, and is a singleton congruence class system with controlled multiplicity.

This is now looking closer to the type of estimates that can be handled by Theorems 15, 16, 17, but there are still some technical issues to resolve, namely the presence of the cutoff and also the fact that the functions , , are not localised to any given scale, but are instead spread out at across many scales. This however can be dealt with by a routine dyadic decomposition (which, in harmonic analysis, is sometimes referred to as Littlewood-Paley decomposition, at least when applied in the frequency domain), though here instead of using the usual dyadic range of scales, one uses instead a sub-dyadic range to eliminate edge effects. (This trick dates back at least to the work of Fouvry; thanks to Emmanuel Kowalski for this reference.)

More precisely, let be a large fixed number to be chosen later, and let be a smooth function supported on that equals on and obeys the derivative estimates

for any fixed (note that the implied constant here can depend on ). We then have a smooth partition of unity

indexed by the multiplicative semigroup for any natural number , where

We can thus decompose

and similarly

and

For , the expression

can thus be split as

which we can rewrite as

where is a variant of .

Observe that the summand vanishes unless

and

and the factor can be eliminated except in the boundary cases when

or

Let us deal with the contribution of the boundary cases to (26). If we let be the sum of all the boundary summands, then we have

From Lemma 8 we have that

and

for any fixed and any . On the other hand, is supported on two intervals of length . Thus by Cauchy-Schwarz one has

and

for any fixed . We thus have

and hence by Lemma 8 again

which is acceptable for (26) if we take large enough. Thus it suffices to deal with the contribution of the interior summands, in which the cutoff may be dropped. There are such summands, and the factor is bounded by , so it will suffice to show that

whenever and are such that each takes the form , , or . In particular, from Lemmas 9, (10) we observe the following facts for any and :

- (i) Each is a coefficient sequence at scale . More generally the convolution of the for is a coefficient sequence at scale .
- (ii) If for some fixed , then is smooth.
- (iii) If for some fixed , then obeys a Siegel-Walfisz theorem. More generally, obeys a Siegel-Walfisz theorem if for some fixed .
- (iv) .

Now we can prove (27). We can write for , where the are non-negative reals that sum to . We apply Lemma 5 and conclude that the obey one of the three conclusions (Type 0), (Type I/II), (Type III) of that lemma. Furthermore, in the Type III case, an inspection of Lemma 6 reveals that we have an additional lower bound available, which in particular implies that for some fixed if is large enough ( will certainly do).

In the Type 0 case, we can write in a form in which Theorem 15 applies. Similarly, in the Type I/II case we can write in a form in which Theorem 16 applies, and in the Type III case we can write in a form in which Theorem 17 applies. Thus in all cases we can establish (27), and Theorem 13 follows.

## 63 comments

Comments feed for this article

10 June, 2013 at 11:49 am

Omar Aboura“…we land in Type I/II” should be Type III

[Corrected, thanks – T.]10 June, 2013 at 12:02 pm

bengreenTerry, a couple of quick comments: first, I think the term coming from Heath-Brown is a bit of a red herring and should be ignored; my impression is that Heath-Brown is a kind of cut off version of Linnik’s identity and that one should think conceptually in terms of the latter (see my earlier post on this). The only reason you can’t actually use Linnik is that the terms resulting form it do not obey the requisite bounds.

Second, I convinced myself that the hard case is and , , with small. I thought that in this case you can either use Type III directly (in Zhang’s notation, with , or you use Type I by grouping the terms and the terms. Is there some other way to proceed?

10 June, 2013 at 12:40 pm

v08ltuIt the statement of Theorem 4, it should be subscripts that are distinct, not the themselves. More roundly, there is a restriction in (A4), which says something about how large the can be, namely I think, assuming one orders the -values decreasingly. This is stronger than the stated bound of 1/2 in your Theorem 4.

10 June, 2013 at 12:49 pm

Terence TaoThat condition is indeed stated in Zhang’s paper, but I think it is not actually used in the Type III argument, it is just what comes out of the combinatorial analysis (represented here by Lemma 6). In particular, the quantity is associated to the Type I analysis rather than the Type III analysis.

10 June, 2013 at 12:55 pm

v08ltuThere is also the issue that, in Zhang’s notation, only can go into a Type III convolution, as it involves just the characteristic function while the involve the Mobius function. I am not sure how this is portrayed in your -values, perhaps it falls out somewhere.

10 June, 2013 at 1:01 pm

Terence TaoYes, this is the “K” issue mentioned at the bottom of the post. As Ben says, one should morally be using the Linnik decomposition in which there are no Mobius factors present (this corresponds to setting ). But the Linnik decomposition has an unbounded number of terms and is not directly available, so we use the Heath-Brown decomposition at some level K (Zhang takes ) instead. As such, any factor that is less than or equal to could potentially be coming from a Mobius factor rather than a (piecewise) smooth factor . But in practice the in the Type III analysis will be larger than if we set large enough (basically because of the lower bound in Lemma 6) so this is not an issue.

I plan to expand this post shortly with the formal number-theoretic versions of the above statements (rather than just the oversimplified combinatorial cartoon) which will mention these various technicalities.

10 June, 2013 at 1:01 pm

meditationataeI’ve thought of negating the Conclusion of the Subset sum lemma. It would start: There exist non-negative reals such that such that none of the following statements hold , etc. If we are given the , it seems to me that a straightforward simple computation will show whether or not any of the three statements hold. In particular, for the statement of Type I/II, one only needs to look at as large as possible while still satisfying the inequality That's why I think that "ruling-out" candidate pairs from the Conclusion of the Subset sum lemma could be easy, given the right choice of The second observation is as follows: Let satisfy the conclusion of the Subset sum lemma, so does . I was thinking of the case , and I’m curious if might give better bounds. David Bernier

10 June, 2013 at 1:15 pm

pavferHmm, funny,

So,

in each election, in which all voted, there is

either

one party with 55% or more,

or

all parties groupable into two coalitions, each between 45% and 55%,

or

at least three possible 2-party coalitions with 55% or more,

with both coalition partners “non-marginal” and “not-almost-winning”.

Nice :-)

10 June, 2013 at 2:27 pm

Polymath8: Bounded Gaps Between Primes | Combinatorics and more[…] proof and improving the upper bound for the gaps. Here are links for three posts (I, II, III) on Terry Tao’s blog and for a post on the polymath blog. And here is the table for the world […]

10 June, 2013 at 2:35 pm

qianghua“Mor precisely” should be More precisely.

[Corrected, thanks – T.]10 June, 2013 at 2:54 pm

v08ltuI do realise that you are trying to simplify the picture, but there is some subtlety in why in Zhang is the smallest (not the largest, as might be suggested by selecting the biggest in your Theorem 4).

The point is that Zhang dispenses (via 1-variable Kloos bound) the small in 13.10-13.11 via a bound with in it. This is then used later in comparison to in the middle of page 47 (namely that ). I don’t know how much of this is mandatory, the arguments could be arranged, and as I pointed out a few days ago, Zhang could also use the Deligne 2-variable bound (treating trivially) on 13.7 to handle smaller , see the remark of FI upon their Theorem 4.

It might be that taking the smallest simply means that Zhang can work uniformly, w/o worrying about side issues of the variable ranges.

10 June, 2013 at 2:59 pm

v08ltuI mean selecting , in the last line of your Theorem 4. In the previous line, you do indeed put the inequalities correct.

10 June, 2013 at 3:04 pm

v08ltuLet me try to explain my thought, over again. If you look at the bottom line of Theorem 4, you would want to take (with the 5 factor) as the largest. But the previous line in Theorem 4 prohibits this. However, does Zhang’s methodology in these parts (S13&14) really require one to take , and if not, to what extent can it be modified? Though, perhaps the “critical” case near 5/16 could dominate the analysis anyway.

10 June, 2013 at 5:35 pm

Terence TaoI’ve now expanded the post significantly to give the number-theoretic details of how Zhang deduces his distribution theorem on the von Mangoldt function from the Type I/II and Type III estimates. (This was primarily achieved by cannibalising part of an enormous post I was writing for all of Zhang’s Theorem 2, before I decided to break it up into more manageable pieces.) The material here can basically be viewed as an exposition of Section 6 of Zhang.

10 June, 2013 at 9:23 pm

Emmanuel KowalskiNote: the use of intervals of type [x,x(1+L^{-A})^m] is completely standard in this game (it’s already done in Fouvry-Iwaniec or even Fouvry’s “Autour du théorème de Bombieri-Vinogradov”, which is a bit earlier, and I’m sure it was not new then.)

[Article updated to reflect this reference, thanks – T.]11 June, 2013 at 12:32 am

Nick Cook(Along the lines of what Ben Green suggested.) If we’re not in type 0 or I/II in the proof of Lemma 6, then the three largest, are in ; call them , and let denote the sum of the rest of the . We could do better than bounding for all three pairs if is somewhat smaller than . That is, if for some , we can pick to ensure Type III of lemma 5 as long as . If this is not the case, and if is small enough (smaller than for some positive ) then we can again conclude Type III of lemma 5 if (using the relation ). This leaves the case that is too large. But then we could pick small enough that this puts one of the terms in , i.e. Type I/II, which we already excluded.

Unfortunately, fiddling with this has not yielded improvement around the special point you mention (but does seem to increase the feasible region for in the region ). Perhaps something along these lines could be made to work.

11 June, 2013 at 12:39 am

MichaelI have done the question as a linear programming exercise.

Let and , where The conditions I can glean from Lemma 5 are (treating as fixed)

then I minimize as the change.

With the value of I get ,

but if I use instead of get

I don’t have a connection between and .

11 June, 2013 at 12:56 am

MichaelI just got – that is, just let

What have I missed?

11 June, 2013 at 9:56 am

Terence TaoDear Michael,

I think the main issue is that the negation of the Type I/II case is actually rather complicated; in principle there are as many as different constraints coming out of this negation, due to all the possible ways to form the subset S. In practice many of these cases are redundant and can be ignored, but it is not immediately obvious that one can discard all but a handful of them as you seem to be doing.

11 June, 2013 at 4:29 pm

MichaelThe order of the coordinates doesn’t matter, so I assume .

You showed that, if it isn’t Type 0 or i/ii, there are exactly three powerful coordinates, any one of which is between and . This is covered by and

. Any two of the powerful coordinates is more than , and that is covered by

.

11 June, 2013 at 2:54 am

Gergely HarcosCondition (2) in Lemma 5 is optimal, at least for which is implicit in the Type I/II condition. More precisely, when , we have the counterexample , , where . In other words, if we want a better condition than (2), then we have to improve the existing bounds for . (This response was inspired by Ben’s comment above.)

11 June, 2013 at 6:03 am

Dustin G. MixonTo get strict inequality in , you actually want , but your counterexample is good. Is the implicit requirement that a fundamental limit in Zhang's technique, or can this be weakened as well?

11 June, 2013 at 12:27 pm

Gergely HarcosMy is not the same as in Lemma 5. My (perhaps I should have called it ) is a convenient variable for defining the counterexample, and the example is ok (you can check by hand that it is neither of the three types in Lemma 5). About Zhang’s technique: the condition in Lemma 5 is a certain natural limit, as shown by my example, but hopefully this limit is not a fundamental one.

11 June, 2013 at 10:13 am

Terence TaoAh well, so no cheap numerical pickup here :). Still, at least we now know what the enemy is. Plugging in the explicit numerical values (ignoring infinitesimals)

which are our current optimal values we have

.

The Type I/II analysis lets one win when any partial sum lies between and , which at this endpoint case are also the values of and respectively. On the number theory side, this case corresponds to the component of that takes the form

where is smooth and at scale for . I was hoping that maybe the smoothness of the first factor (called in the Type III analysis) could be exploitable, but then I realised that one could make the above counterexample more complicated by splitting the term up arbitrarily into many small pieces and so one does not necessarily get to assume is smooth in the critical case.

Still, at least we have some explicit numerical values to work with when we try to improve either the Type I/II or Type III analysis: specifically, one needs to get some power gain in Theorem 16 with and , or Theorem 17 with and .

11 June, 2013 at 4:28 am

meditationataeIn the definition of

singleton congruence class system, which goes with Definition 11, I think the sentence should begin with: “If for all primes , […]” . David Bernier[Corrected, thanks – T.]11 June, 2013 at 4:32 am

Dustin G. MixonSometimes you write (22) when I think you mean (2).

[Corrected, thanks – T.]11 June, 2013 at 11:45 am

Eytan PaldiIn fact, it is possible (for each given n) to replace (2) by finding explicitly the EXACT necessary and sufficient conditions on and

such that lemma 5 holds. (It may happen, however, that for very small values of n, the admissible set for defined by the necessary and sufficient conditions is empty, but as we already know for sufficiently large n it is nonempty.) A systematic way of doing this is as follows:

1. To find the COMPLEMENT of the above admissible set, containing exactly the pairs for which the conclusion of lemma 5 does not hold (i.e. there is a non-negative non-decreasing sequence

with sum 1, which does NOT belong to any type.)

Note that for type (0 ) there are n conditions, negating the lemma condition for each , for type (I / II) there are conditions (negating the lemma conditions for each partition), along with the existence of some satisfying its lemma conditions, and for type (III) there are conditions (one for each triple

such that $t_i < t_j < t_k < 1/2$ and negating the last inequality in lemma 5 for this triple.)

2. Use any method of quantifier elimination (see e.g. chapter 14 in "algorithms in real algebraic geometry" by S. Basu, R. Pollack and M-F Roy, Springer -Verlag, 2003.) to get the desired necessary and sufficient conditions for the complement of the above admissible set for the pairs .

Remark: In general such (symbolic) quantifier elimination are used for general systems of polynomial inequalities, but in our case of systems of linear inequalities I think that the computational complexity should be manageable – the resulting admissible set should be a finite union of certain polygons. As I know, there should be several software implementations of quantifier elimination algorithms.

11 June, 2013 at 12:40 pm

Gergely HarcosDear Terry, perhaps it would be useful to emphasize in the post that (2) is optimal for Lemma 6, so that others don’t try to improve it. On the other front, I try to catch up with Zhang’s paper.

Some other small suggestions:

1. In the proof of Lemma 5 it would be useful to emphasize (for the sake of the reader) that a powerful summand is not only able to convert a small sum to a large sum, but it can do this for a small sum with powerful summands only (as powerless summands can be omitted).

2. In the sentence “We then apply Lemma 5” and the next one, the numbers 5 and 6 should be flipped.

[Corrections and suggestions implemented, thanks – T.]11 June, 2013 at 2:31 pm

Eytan PaldiIs it possible that for larger n, (2) is not optimal for lemma 5?

11 June, 2013 at 4:26 pm

Dustin G. MixonNo – Use the same counterexample, except take the other ‘s to be zero.

11 June, 2013 at 6:08 pm

Eytan PaldiIn fact, your suggestion is the same counter example with since the other parameters can be ignored!

But your example can be modified to have the remaining parameters equal to a sufficiently small number to still get a non-degenerate counter example!

12 June, 2013 at 1:03 pm

Estimation of the Type I and Type II sums | What's new[…] seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and a post […]

12 June, 2013 at 10:43 pm

Terence TaoI have an idea which may be able to largely eliminate the role of in Zhang’s analysis. The main reason that we can’t take to be too large is because of the following inefficiency: when we want to factor a -smooth number into two factors with chose to some preselected with , we can’t quite hit our targets with perfect accuracy, and must instead settle for loose bounds like

. (*)

This gap of between where we would like our parameters to be, and where they eventually end up, leads to most of the dependencies on the parameter in Zhang’s theorem . (There are a couple places where needs to be small relative to other quantities, but these conditions on are very mild compared to the main effect of on the conditions for .)

Now, if all we know about is that it is -smooth, then we can’t do much better than (*), at least when are close to integer powers of , since could simply be the product of primes that are close to . (One could eke out an incredibly tiny improvement to (*) in the case that are not integer powers of , but this is not the direction I am pursuing here.)

On the other hand, if we return to the sieve theory side of things, most of the moduli n for which we actually want an Elliott-Halberstam estimate are not of the form “product of primes that are close to ” – most of them also contain a bunch of prime factors that are significantly less than . I have to do the sieve calculations carefully, but there is a chance that one could show that “most” of the moduli d in the sieve have enough prime factors that their factors are spread much more densely between 1 and d (as measured in log-scale) than (*) suggests, and that one should in fact be able to find a factor r that is basically within of rather than within . If so, this should largely eliminate the influence that has on the Type I and Type II sums (so long as it is reasonably small, say 1/100 or so, as opposed to the current level of 1/2000 or so). (Amusingly, one may need to use some inverse Littlewood-Offord theory to make this intuition precise, which is something that I happen to be quite familiar with.)

This is all contingent though on checking that “most” moduli in the sieve have this desired property (in the sense that the contribution of the exceptional moduli is small). Will think about this…

12 June, 2013 at 11:22 pm

v08ltuMy impression would be that the which are “bad” in this sense are a positive proportion in the contribution, but it would be like the Dickman function at or something, so quite quite small. But this is just a guess.

12 June, 2013 at 11:49 pm

v08ltuIf I parse Ford’s paper correctly: http://annals.math.princeton.edu/wp-content/uploads/annals-v168-n2-p01.pdf

His theorem p371 states that the proportion of numbers with a factor in [y,z] looks like where and . For say, this is 2.7%, so most numbers don’t have a factor in such a range.

13 June, 2013 at 12:50 am

bengreenTerry, I think you’re quite likely to be right about this. The calculation does need to be done *inside* the sieve, though, because there are “many” in with all prime factors pretty near , where “many” means at least or so. And the sieve weights may more than cancel that denominator of a power of . But certainly the prime factors of a typical , even a typical smooth one, are going to be quite spread out and hence arrangeable to be more-or-less whatever size one wants. One could even get interdisciplinary about this and cite this play:

https://www.msri.org/general_events/19160

or, more seriously

http://www.dms.umontreal.ca/~andrew/PDF/Anatomy.pdf

In fact it seems to me that we don’t really care about being -smooth at all. The key property is that if then the set of subset sums of is -dense: this is what lets you arrange factorisations of any desired size (within ). We could even redefine to be this parameter, couldn’t we? It’s obviously less than or equal to the old . And, now you are talking about ruling out sets whose subset sums are strangely distributed, I see why inverse Littlewood-Offord comes into the game. Nice observation!

I suppose one really (for the moment) only cares about being able to arrange factorisations nicely in the vicinity of the important threshold at the Type I/III border; this may give some extra saving.

13 June, 2013 at 3:43 am

v08ltu“But certainly the prime factors of a typical d, even a typical smooth one, are going to be quite spread out and hence arrangeable to be more-or-less whatever size one wants.”

For a typical isn’t the th prime factor typically like , which doesn’t allow one that much wiggle room, if you want to force divisors where you want? You may typically have two divisors within a factor or two of each other, but that doesn’t mean that you can force a divisor to be in (too short) an interval wherever you want. If I read the above quoted theorem of Ford (following Tenenbaum) correctly, the percentage of numbers with a factor in a specific power-range like is quite small (2.7%). Of course we have the smoothness hypothesis, but I don’t know whether this will be a panacea (it will allow us to reduce , but not to an infinitesimal, or irrevelancy). Just my opinion.

13 June, 2013 at 4:02 am

bengreenI take your point. But it should certainly allow one to reduce (or, rather, perhaps we should change the definition of ).

13 June, 2013 at 9:40 am

Terence TaoThanks for all the feedback! The issue is a bit more subtle than I had expected: even after restricting to -smooth numbers, the proportion of numbers which fail to be, say, -divisor-dense (i.e. their set of divisors manages to miss an interval of the form for some medium-sized ) does have a very small but positive density. But because the Selberg sieve has a lot of cancellation in it (as evidenced by the Mobius function factor), “very small density” does not automatically translate to “negligible for the purposes of sieve theory” – a naive attempt to estimate the error by taking absolute values everywhere sets one back a few factors of a logarithm (if one uses the GPY sieve, one in fact loses the horrific amount of , but I am hoping with the elementary Selberg sieve one only loses something like or , which is too large to be absorbed by a smallness of density but it makes it more plausible that one could exploit the cancellation of the Mobius function to control things).

Of course, one does not necessarily need to compare a new sieve with an old sieve to show that the new sieve is effective – one might be able to compute all the relevant sieve asymptotics for a GPY sieve restricted to “dense-divisor” numbers directly. One thing that works in our favor here is that the property of having a dense set of divisors is sort of multiplicative: if q and r have dense sets of divisors, then qr does as well. But one may have to select the precise definition of having “dense divisors” carefully; as I said before, there is a lot of cancellation going on in the sieve calculations and one cannot afford to disrupt that.

It seems that if one is willing to just halve the influence of rather than eliminate it entirely (roughly speaking, this would replace the condition by , at least in principle) then one can get a fair amount of control on the exceptional -smooth numbers that avoid having any divisor in , basically by a modification of the combinatorial arguments that happen to be in the main post here. Indeed, call a factor of "small" if it is less than and "large" if it is larger than . Call a prime dividing "powerless" if it cannot turn a small factor into a large factor and "powerful" otherwise; in particular, powerful primes lie between and . Now look at the minimal large factor of , it has to consist entirely of powerful primes as we can strike out all the powerless primes without making the factor small. Furthermore, if one orders the powerful primes as , then the minimal large factor has to consist of an initial segment since otherwise one could swap one powerful prime by a smaller one which reduces the product by at most and thus cannot make the product small. Thus every product of i or more powerful primes is large, and every product of i-1 or less primes is small (otherwise the above arguments would also show that is large, contradicting minimality of ). Thus (roughly speaking) all the powerful primes lie between and , which in practice is quite a narrow band. Also all the powerless primes when multiplied together cannot exceed (since otherwise no prime less than could be powerful enough to bridge the gap once the effect of the powerless primes are factored in). This is quite a special configuration (the density of such numbers inside the -smooth numbers appears to be something like from a back-of-the-envelope calculation). Of course as I said before this doesn't immediately allow one to declare victory due to the cancellation in the sieve but it is a promising sign.

13 June, 2013 at 4:00 pm

v08ltuI guess my concern was: the for an “average” d are like (this is a very vague statement, the geometric series should be to some extent, and I made it add up to 1). So that is more sparse than powers of 2, and you cover little with subset sums. But if you are pre-assuming that numbers have no large prime factors, say more than , then I don’t know, but would guess that the above set is looking more like typically, which looks much denser in its subset sums.

13 June, 2013 at 7:48 pm

v08ltuActually it should be in the numerator, which is about the which is an upper bound of with these smooths.

13 June, 2013 at 8:58 pm

meditationataeFor some reason, the link to Anatomy.pdf by Granville, when clicked, yields “gibberish”. But searching/googling for: The Anatomy of Integers and Permutations Andrew Granville should work …

13 June, 2013 at 9:46 pm

andrescaicedoThere is a small typo (some extra spaces), just remove %E2%80%8E

14 June, 2013 at 8:47 am

Estimation of the Type III sums | What's new[…] seminar of Zhang’s paper for the polymath8 project. (There are two other continuations; this previous post, which deals with the combinatorial aspects of the second part of Zhang’s paper, and this […]

18 June, 2013 at 8:46 am

Terence TaoA quick note: Janos Pintz has kindly sent me some handwritten notes (with permission to share them with this project) in which he has (somewhat independently of our own discussion) worked out a way to implement the idea mentioned in comments above of relaxing the condition that the modulus be -smooth. Basically, the idea is to restrict instead to the wider class of squarefree moduli which are -smooth for some large C and also obey the property

since the greedy algorithm then shows that such a will contain a factor between and for any . His notes show that the truncation error in the the elementary Selberg sieve (discussed back in https://terrytao.wordpress.com/2013/06/08/the-elementary-selberg-sieve-and-bounded-prime-gaps/ , and is a little different from the analytic Selberg sieve which we are currently using) caused by restricting to such numbers is negligible; I’ll try to write up a blog post on this shortly.

In practice the effect of this will come close to the ideal of being able to ignore delta completely (e.g. the condition morally becomes ).

With this and the improvement on Type III sums that Michel has just announced at https://terrytao.wordpress.com/2013/06/14/estimation-of-the-type-iii-sums/#comment-235003 , it looks like we will have some more progress on these bounds in a few days…

18 June, 2013 at 10:30 am

Anonymousi don’t see any announcement in the last link…

[Link corrected, thanks – T.]18 June, 2013 at 6:19 pm

A truncated elementary Selberg sieve of Pintz | What's new[…] a more general systetm of congruence classes obeying a certain controlled multiplicity axiom; see this post. Secondly, and more importantly for this post, the requirement that the modulus lies in can be […]

21 June, 2013 at 10:52 am

Pace NielsenIn (18), I think that the on the RHS of the formula should be an (as is the summation variable).

[Corrected, thanks – T.]23 June, 2013 at 9:14 pm

The distribution of primes in densely divisible moduli | What's new[…] improves upon previous constraints of (see this blog comment) and (see Theorem 13 of this previous post), albeit for instead of . Inserting Theorem 4 into the Pintz sieve from this previous post gives […]

25 June, 2013 at 10:53 am

Gergely HarcosThree comments and some typos:

1. Before (27) you say there are summands. I think the number is a bit larger, namely . This is because there are about elements of in every dyadic range, hence about elements in .

2. In the proof of (iv) in Lemma 9, there is no need to introduce and proceed by Cauchy-Schwarz (in fact I had difficulty following the original argument). By (13) and the first display in this part of the proof, we have directly

,

so that, using the assumption on ,

.

3. At the end of the proof of Lemma 12, there is no need to reduce to primitive, since (12) does not assume primitivity.

4a. In the proof of Lemma 9, third line, should be .

4b. In Section 3, should be .

4c. In Section 3, should be , and should be .

4d. should be .

4e. In (21) and in the proof of Proposition 14, should be .

4f. In the proof of Theorem 15, Lemma 7 should be Lemma 9, and the link to it should be updated.

4g. In Theorem 17, "any be a singleton" should be "for any singleton".

[Corrected, thanks! – T.]30 June, 2013 at 12:39 pm

Bounded gaps between primes (Polymath8) – a progress report | What's new[…] convolution. The precise identity is a little complicated (although not difficult to prove); see this post for details. Using this identity, we can indeed reduce (2) to slightly more complicated versions of […]

1 July, 2013 at 10:01 pm

Avi LevyIn the sentence before (26), there is a citation to an unidentified “Proposition”

[Corrected, thanks – T.]7 July, 2013 at 11:17 pm

The distribution of primes in doubly densely divisible moduli | What's new[…] 4, 5, 6 will be given below the fold, while the proof of Lemma 7 follows from the arguments in this previous post. We remark that in our current arguments, the double dense divisibility is only fully used in the […]

27 July, 2013 at 6:55 pm

An improved Type I estimate | What's new[…] Again, these conditions are implied by (8). The claim then follows from the Heath-Brown identity and dyadic decomposition as in this previous post. […]

17 October, 2013 at 11:46 am

Stijn HansonIn Lemma 8 it proves equation (10) using a simple bound and then Proposition 5 from “The elementary Selberg sieve and bounded prime gaps” but that’s a proof of MPZ from EH and doesn’t appear to contain any sums of the required form (requiring either squarefree-ness or some coprimality condition on the summands). Is there some numbering/linking mix-up here or am I just not reading it properly?

17 October, 2013 at 12:03 pm

Terence TaoHmm, that is a numbering issue; one can use for instance Lemma 8 from the previous post. (These estimates can also be found in standard multiplicative number theory texts, e.g. Corollary 2.15 of Montgomery-Vaughan. A good rule of thumb regarding these sorts of estimates is that the number of divisors of a large number is typically of size ; this is closely related to the Erdos-Kac theorem, which is a precise version of the assertion that the number of prime factors of a number is typically of size .)

9 February, 2014 at 3:03 pm

Stijn HansonSorry, could you expand a little on how the “crude bounds” give the last displayed equation of Proposition 14?

9 February, 2014 at 3:41 pm

Terence TaoOne can bound by and by .

10 February, 2014 at 3:56 pm

Stijn Hansonok, I roughly see how those bounds help (presumably by using for all but how exactly do you get ? I’m looking at the bounds in the definition and the closest asymptotic bound I can easily see is . Sorry, I’m probably being really thick.

10 February, 2014 at 7:28 pm

Terence TaoUse hypotheses (ii) and (iii) of Definition 11, rather than hypothesis (i). (iii) gives the case when q is prime, and (ii) then gives the general squarefree case as a consequence.

More generally, bounds on multiplicative functions should usually be proven by first considering the prime case (or the prime power case, if one is not restricted to squarefree numbers).

24 September, 2014 at 2:29 pm

Derived multiplicative functions | What's new[…] to , where and are arbitrary parameters and denotes the -fold convolution of , and discussed in this previous blog post; this is the top order component […]

2 December, 2015 at 1:51 pm

A conjectural local Fourier-uniformity of the Liouville function | What's new[…] (e.g. Vaughan’s identity or Heath-Brown’s identity), as discussed for instance in this previous post (which is focused more on the von Mangoldt function, but analogous identities exist for the […]

6 July, 2017 at 12:07 am

Correlations of the von Mangoldt and higher divisor functions I. Long shift ranges | What's new[…] step, which is again standard, is the use of the Heath-Brown identity (as discussed for instance in this previous blog post) to split up into a number of components that have a Dirichlet convolution structure. Because the […]