The purpose of this post is to isolate a combinatorial optimisation problem regarding subset sums; any improvement upon the current known bounds for this problem would lead to numerical improvements for the quantities pursued in the Polymath8 project. (UPDATE: Unfortunately no purely combinatorial improvement is possible, see comments.) We will also record the number-theoretic details of how this combinatorial problem is used in Zhang’s argument establishing bounded prime gaps.
First, some (rough) motivational background, omitting all the number-theoretic details and focusing on the combinatorics. (But readers who just want to see the combinatorial problem can skip the motivation and jump ahead to Lemma 5.) As part of the Polymath8 project we are trying to establish a certain estimate called for as wide a range of as possible. Currently the best result we have is:
Enlarging this region would lead to a better value of certain parameters , which in turn control the best bound on asymptotic gaps between consecutive primes. See this previous post for more discussion of this. At present, the best value of is applied by taking sufficiently close to , so improving Theorem 1 in the neighbourhood of this value is particularly desirable.
I’ll state exactly what is below the fold. For now, suffice to say that it involves a certain number-theoretic function, the von Mangoldt function . To prove the theorem, the first step is to use a certain identity (the Heath-Brown identity) to decompose into a lot of pieces, which take the form
for some bounded (in Zhang’s paper never exceeds ) and various weights supported at various scales that multiply up to approximately :
We can write , thus ignoring negligible errors, are non-negative real numbers that add up to :
A key technical feature of the Heath-Brown identity is that the weight associated to sufficiently large values of (e.g. ) are “smooth” in a certain sense; this will be detailed below the fold.
The operation is Dirichlet convolution, which is commutative and associative. We can thus regroup the convolution (1) in a number of ways. For instance, given any partition into disjoint sets , we can rewrite (1) as
where is the convolution of those with , and similarly for .
Zhang’s argument splits into two major pieces, in which certain classes of (1) are established. Cheating a little bit, the following three results are established:
Theorem 2 (Type 0 estimate, informal version) The term (1) gives an acceptable contribution to whenever
for some .
Theorem 3 (Type I/II estimate, informal version) The term (1) gives an acceptable contribution to whenever one can find a partition such that
where is a quantity such that
Theorem 4 (Type III estimate, informal version) The term (1) gives an acceptable contribution to whenever one can find with distinct with
The above assertions are oversimplifications; there are some additional minor smallness hypotheses on that are needed but at the current (small) values of under consideration they are not relevant and so will be omitted.
Let be non-negative reals such that
Then at least one of the following statements hold:
- (Type 0) There is such that .
- (Type I/II) There is a partition such that
where is a quantity such that
- (Type III) One can find distinct with
The purely combinatorial question is whether the hypothesis (2) can be relaxed here to a weaker condition. This would allow us to improve the ranges for Theorem 1 (and hence for the values of and alluded to earlier) without needing further improvement on Theorems 2, 3, 4 (although such improvement is also going to be a focus of Polymath8 investigations in the future).
Let us review how this lemma is currently proven. The key sublemma is the following:
- (Type 0) There is a with .
- (Type I/II) There is a partition such that
- (Type III) There exist distinct with and .
Proof: Suppose Type I/II never occurs, then every partial sum is either “small” in the sense that it is less than or equal to , or “large” in the sense that it is greater than or equal to , since otherwise we would be in the Type I/II case either with as is and the complement of , or vice versa.
Call a summand “powerless” if it cannot be used to turn a small partial sum into a large partial sum, thus there are no such that is small and is large. We then split where are the powerless elements and are the powerful elements.
By induction we see that if and is small, then is also small. Thus every sum of powerful summand is either less than or larger than . Since a powerful element must be able to convert a small sum to a large sum (in fact it must be able to convert a small sum of powerful summands to a large sum, by stripping out the powerless summands), we conclude that every powerful element has size greater than . We may assume we are not in Type 0, then every powerful summand is at least and at most . In particular, there have to be at least three powerful summands, otherwise cannot be as large as . As , we have , and we conclude that the sum of any two powerful summands is large (which, incidentally, shows that there are exactly three powerful summands). Taking to be three powerful summands in increasing order we land in Type III.
for some sufficiently small . We observe from (2) that we certainly have
with plenty of room to spare. We then apply Lemma 6. The Type 0 case of that lemma then implies the Type 0 case of Lemma 5, while the Type I/II case of Lemma 6 also implies the Type I/II case of Lemma 5. Finally, suppose that we are in the Type III case of Lemma 6. Since
we thus have
and so we will be done if
Inserting (3) and taking small enough, it suffices to verify that
but after some computation this is equivalent to (2).
It seems that there is some slack in this computation; some of the conclusions of the Type III case of Lemma 5, in particular, ended up being “wasted”, and it is possible that one did not fully exploit all the partial sums that could be used to create a Type I/II situation. So there may be a way to make improvements through purely combinatorial arguments. (UPDATE: As it turns out, this is sadly not the case: consderation of the case when , , and shows that one cannot obtain any further improvement without actually improving the Type I/II and Type III analysis.)
A technical remark: for the application to Theorem 1, it is possible to enforce a bound on the number of summands in Lemma 5. More precisely, we may assume that is an even number of size at most for any natural number we please, at the cost of adding the additioal constraint to the Type III conclusion. Since is already at least , which is at least , one can safely take , so can be taken to be an even number of size at most , which in principle makes the problem of optimising Lemma 5 a fixed linear programming problem. (Zhang takes , but this appears to be overkill. On the other hand, does not appear to be a parameter that overly influences the final numerical bounds.)
Below the fold I give the number-theoretic details of the combinatorial aspects of Zhang’s argument that correspond to the combinatorial problem described above.
— 1. Coefficient sequences —
We now give some number-theoretic background material that will serve two purposes. The most immediate purpose is to enable one to understand the precise statement of Theorems 1, 2, 3, 4, as well as the deduction of the first theorem from the other three. A secondary purpose is to establish some reference material which will be used in subsequent posts on the Type I/II and Type III analysis in Zhang’s arguments.
As in previous posts, we let be an asymptotic parameter tending to infinity and define the usual asymptotic notation relative to this parameter. It is also convenient to have a large fixed quantity to be chosen later, in order to achieve a fine localisation of scales.
for all , where is the divisor function. (In particular, any sequence that is pointwise dominated by a coefficient sequence is again a coefficient sequence.)
- (i) If is a coefficient sequence and is a primitive residue class, the (signed) discrepancy of in the sequence is defined to be the quantity
Note that this expression is linear in , so in particular we have the triangle inequality
- (ii) A coefficient sequence is said to be at scale for some if it is supported on an interval of the form .
- (iii) A coefficient sequence at scale is said to obey the Siegel-Walfisz theorem if one has
for any , any fixed , and any primitive residue class .
- (iv) A coefficient sequence at scale is said to obey the Elliott-Halberstam conjecture up to scale for some if one has
for any fixed . Thus for instance the Siegel-Walfisz theorem implies the Elliott-Halberstam conjecture up to scale for any fixed .
- (v) A coefficient sequence at scale is said to be smooth if it takes the form for some smooth function supported on obeying the derivative bounds
for all fixed (note that the implied constant in the notation may depend on ).
To control error terms, we will frequently use the following crude bounds:
and hence from (5) we have the crude discrepancy bound
(In particular, the Siegel-Walfisz estimate (7) is trivial unless .) Finally, we have the crude bound
for all .
(i.e. a Brun-Titchmarsh type inequality for powers of the divisor function); this estimate follows from this paper of Shiu or this paper of Barban-Vehov, and can also be proven using the methods in this previous blog post. (The factor of is needed to account for the possibility that is not primitive, while the term accounts for the possibility that is as large as .) Finally, (16) follows from the standard divisor bound; see this previous post.
As a general rule, we may freely use (16) when we are expecting a net power savings to come from another part of the analysis, and we may freely use the other estimates in Lemma 8 when we have a net super-logarithmic savings from elsewhere in the analysis. When we have neither a net power savings or a net super-logarithmic savings from other sources then we usually cannot afford to use any of the bounds in Lemma 8.
The concept of a coefficient sequence is stable under the operation of Dirichlet convolution operation
- (i) is also a coefficient sequence.
- (ii) If are coefficient sequences at scales respectively, then is a coefficient sequence at scale .
- (iii) If are coefficient sequences at scales respectively, with for some fixed , and obeys a Siegel-Walfisz theorem, then also obeys a Siegel-Walfisz theorem (at scale ).
- (iv) If are coefficient sequences at scales respectively, with for some fixed , and obeys the Elliott-Halberstam conjecture up to some scale , then (viewed as a coefficient sequence at scale ) also obeys the Elliott-Halberstam conjecture up to scale .
- (v) If is the logarithm function, then is a coefficient sequence. Furthermore, if is smooth at some scale , then is smooth at that scale also.
Proof: To verify (i), observe from (4) that for any one has
so that is again a coefficient sequence. The claim (ii) then follows by considering the support of .
Now we verify (iii). We need to show that
for any and any residue class . By restricting to integers coprime to (and noting that the restricted version of still obeys the Siegel-Walfisz property if one divides out by a suitable power of ) we may assume that are supported on integers coprime to , at which point we may drop the constraint. As noted in Lemma 8 we may also assume that .
We now have
and by the triangle inequality so we may upper bound by
for any fixed , which is acceptable using the bound on .
Now we verify (iv), which is similar to (iii). Arguing as before, we have the inequality
The claim then follows from (13) and the Elliott-Halberstam hypothesis for .
Finally, (v) is an easy consequence of the product rule and is left to the reader.
We also have some basic sequences that obey the Siegel-Walfisz property:
- (i) obeys the Siegel-Walfisz theorem.
- (ii) In fact, obeys the Elliott-Halberstam conjecture up to scale for any fixed .
- (iii) obeys the Siegel-Walfisz theorem, where is the Möbius function.
Note that some lower bound on is necessary here, since one cannot hope for a sequence to be equidistributed to the extent predicted by the Siegel-Walfisz theorem (7) if the sequence is only supported on a logarithmic range!
Proof: We first prove (i). Our task is to show that
for any and any residue class . By applying the Möbius inversion formula
and the triangle inequality (6) we thus see that it suffices to show that
for any . We may assume that is coprime to as the left-hand side vanishes otherwise. Our task is now to show that
for any residue class (not necessarily primitive). Note that we may assume that as the claim follows from (14) otherwise. We use the Fourier expansion
where for or (by abuse of notation) for . We can thus write the left-hand side of (17) as
The term here is the first term on the right-hand side of (17). Thus it will suffice to show that
for all non-zero . But if we write , then by the Poisson summation formula the left-hand side is equal to
where is the Fourier transform of . From the smoothness bounds (9) and integration by parts we have
for any fixed , so for large enough (actually is enough) we obtain the claim thanks to the lower bound and the upper bound . We remark that the same argument (now with as large as needed) in fact shows the much stronger bound
for any fixed and any , which also gives (ii).
To prove (iii), we can argue similarly to before and reduce to showing that
for any , but this follows from the Siegel-Walfisz theorem for the Möbius function (which can be deduced from the more usual Siegel-Walfisz theorem for the von Mangoldt function, or else proven by essentially the same method) and summation by parts (if one wishes, one can also reduce to the case of primitive residue classes using the multiplicativity of ).
We remark that the Siegel-Walfisz theorem for the Möbius function is ineffective, although in practice one can obtain effective substitutes for this theorem that can make applications (such as the one for bounded prime gaps) effective, see e.g. the last section of this article of Pintz for further discussion. One amusing remark in this regard is that if there do happen to be infinitely many Siegel zeroes, then an old result of Heath-Brown shows that there are infinitely mnay twin primes already, so the ineffective case of Siegel-Walfisz’s theorem is in some sense the best case scenario for us!
Lemmas 9 and 10 combine to give plenty of coefficient sequences obeying the Siegel-Walfisz property. This will become useful when the time comes to deduce Theorem 1 from (the precise versions of) Theorems 2, 3, 4.
— 2. Congruence class systems —
The Elliott-Halberstam conjecture on a sequence at scale requires taking a supremum over all primitive residue classes. At present, we do not know how to achieve such a strong claim for non-smooth arithmetic sequences such as or once the modulus goes beyond the level , even if one restricts to smooth moduli. However, Zhang’s work (as well as some of the precursor work of Bombieri, Fouvry, Friedlander, and Iwaniec) allow one to get a restricted version of the Elliott-Halberstam conjecture if one restricts the congruences one is permitted to work with.
Much as with the coefficient sequence axioms, it is convenient to abstract the axioms that a given system of congruence classes will be obeying. For any set , let denote the set of squarefree natural numbers whose prime divisors lie in .
Definition 11 Let . A congruence class system on is a collection of sets residue classes for each obeying the following axioms:
- (i) (Primitivity) For each , is a subset of .
- (ii) (Chinese remainder theorem) For any coprime , we have , using the canonical identification between and .
- (iii) (Uniform bound) There is a fixed such that for all primes .
If for all , thus , we say that is a singleton congruence class system.
For any integer , let denote the multiplicity function
for any fixed .
Note from Axiom (ii) that a congruence class system can be specified through its values at primes . A simple example of a singleton congruence class system with controlled multiplicity is a fixed congruence class for some fixed non-zero , with avoiding all the prime divisors of . This is a special case of the following more general fact:
where . Then for any , is a congruence class system on with controlled multiplicity.
Similarly, if is such that is coprime to for all and , then the system defined by setting when is coprime to , and when divides , with then defined for arbitrary by the Chinese remainder theorem, is also a congruence system on with controlled multiplicity.
Proof: We just prove the first claim, as the second is similar. Axioms (i)-(iii) are obvious; it remains only to verify the controlled multiplicity. Let be a congruence class with .
We can split
for each . Writing , this becomes
The claim then follows Lemma 8.
The more precise statement of Theorem 1 is now as follows:
for any fixed , where is the von Mangoldt function.
For the application to prime gaps we only need to apply Theorem 13 to the congruence class system associated to a fixed -tuple through Lemma 12, but one could imagine that this theorem could have future application with some other congruence class systems.
One advantage of the abstract formulation of a congruence class systems is that we get a cheap reduction to the singleton case:
Proposition 14 (Reduction to singletons) In order to prove Theorem 13, it suffices to do so for good singleton class congruence systems.
Proof: Let be a good congruence class system. By removing all primes with from we may assume without loss of generality that for all and some fixed .
We use the probabilistic method. Construct a random singleton congruence class system by selecting uniformly at random from independently for all , and then set and extend by the Chinese remainder theorem. Writing , we observe that the property that enjoys of being a congruence class system of controlled multiplicity is inherited by . From hypothesis we then have that
for any fixed ; taking expectations we conclude that
On the other hand, from Lemma 8 we have
and the claim then follows from Cauchy-Schwarz.
This reduction is not essential to Zhang’s argument (indeed, it is not used in Zhang’s paper), but it does allow for a slightly simpler notation (since the summation over is eliminated).
for some fixed . Then obeys the Elliott-Halberstam conjecture up to scale . In particular, for any and any singleton congruence class system we have
and let be coefficient sequences at scales respectively with
with obeying a Siegel-Walfisz theorem. Then for any and any singleton congruence class system with controlled multiplicity we have
The condition (22) is dominated by (20) and can thus be ignored, at least at our current numerical ranges of parameters. We remark that this theorem is the only theorem that actually uses the controlled multiplicity hypothesis.
for some fixed . Let be coefficient sequences at scales respectively, with smooth. Then for any , and any singleton congruence class system we have
for any fixed .
As we saw, Theorem 15 is easy to establish. On the other hand, Theorem 16 and Theorem 17 are far deeper and will be the subject of future blog posts. We will not discuss them further here, but now turn to the question of how to deduce Theorem 13 from Theorems 15, 16, 17 using Lemma 5.
— 3. Decomposing the von Mangoldt function —
The basic strategy is to decompose the von Mangoldt function as a combination of Dirichlet convolutions of other functions such as , the constant function , and the logarithmic function . The simplest identity of this form is
but this is unsuitable for our purposes because when we localise to scale , the factor could also be localised at scales as large as , and our understanding of the equidistribution properties of the Möbius function is basically no better than that of the von Mangoldt function, so we have not gained anything with this decomposition.
To get around this we need to find decompositions that don’t let rough functions such as the Möbius function get up to scales anywhere close to . One promising identity in this regard is Linnik’s identity, which takes the form
where is the restriction of the constant function to numbers larger than ; this is just the coefficient version of the formal geometric series identity
There are no Möbius functions in sight on the right-hand side, which is promising. However, Linnik’s identity has an unbounded number of terms, which render it unsuitable for our argument (we have various factors of and whose exponent would become unbounded if we had to deal with Dirichlet convolutions of unbounded length, which would swamp any gain of that we are trying to detect). So we will rely on a truncated variant of Linnik’s identity, known as the Heath-Brown identity.
We will need a fixed positive natural number (Zhang takes ; the precise choice here does not seem to be terribly important). From the identities and , where is the Dirichlet convolution identity, we see that we can write as a -fold convolution
where denotes the convolution of copies of .
As with (23), the identity (24) is not directly useful for our strategy because one of the factors can still get as large as the scale that we are studying at. However, as Heath-Brown observed, we can manipulate (24) into a useful form by truncating the Möbius function . More precisely, we split the Möbius function as
where is the Möbius function restricted to the interval , and is the Möbius function restricted to the interval . The reason for this splitting is that the -fold Möbius convolution vanishes on , and in particular we have
on . Now we see that each of the Möbius factors cannot reach scales much larger than , although the factors may collectively still get close to when is close to . From the triangle inequality and Proposition 14, we thus see that to establish Theorem 13, it suffices to establish the bounds
whenever are fixed and obey (20), , , is fixed, and is a singleton congruence class system with controlled multiplicity.
This is now looking closer to the type of estimates that can be handled by Theorems 15, 16, 17, but there are still some technical issues to resolve, namely the presence of the cutoff and also the fact that the functions , , are not localised to any given scale, but are instead spread out at across many scales. This however can be dealt with by a routine dyadic decomposition (which, in harmonic analysis, is sometimes referred to as Littlewood-Paley decomposition, at least when applied in the frequency domain), though here instead of using the usual dyadic range of scales, one uses instead a sub-dyadic range to eliminate edge effects. (This trick dates back at least to the work of Fouvry; thanks to Emmanuel Kowalski for this reference.)
More precisely, let be a large fixed number to be chosen later, and let be a smooth function supported on that equals on and obeys the derivative estimates
for any fixed (note that the implied constant here can depend on ). We then have a smooth partition of unity
indexed by the multiplicative semigroup for any natural number , where
We can thus decompose
For , the expression
can thus be split as
which we can rewrite as
where is a variant of .
Observe that the summand vanishes unless
and the factor can be eliminated except in the boundary cases when
Let us deal with the contribution of the boundary cases to (26). If we let be the sum of all the boundary summands, then we have
From Lemma 8 we have that
for any fixed and any . On the other hand, is supported on two intervals of length . Thus by Cauchy-Schwarz one has
for any fixed . We thus have
and hence by Lemma 8 again
which is acceptable for (26) if we take large enough. Thus it suffices to deal with the contribution of the interior summands, in which the cutoff may be dropped. There are such summands, and the factor is bounded by , so it will suffice to show that
- (i) Each is a coefficient sequence at scale . More generally the convolution of the for is a coefficient sequence at scale .
- (ii) If for some fixed , then is smooth.
- (iii) If for some fixed , then obeys a Siegel-Walfisz theorem. More generally, obeys a Siegel-Walfisz theorem if for some fixed .
- (iv) .
Now we can prove (27). We can write for , where the are non-negative reals that sum to . We apply Lemma 5 and conclude that the obey one of the three conclusions (Type 0), (Type I/II), (Type III) of that lemma. Furthermore, in the Type III case, an inspection of Lemma 6 reveals that we have an additional lower bound available, which in particular implies that for some fixed if is large enough ( will certainly do).
In the Type 0 case, we can write in a form in which Theorem 15 applies. Similarly, in the Type I/II case we can write in a form in which Theorem 16 applies, and in the Type III case we can write in a form in which Theorem 17 applies. Thus in all cases we can establish (27), and Theorem 13 follows.