As in all previous posts in this series, we adopt the following asymptotic notation: is a parameter going off to infinity, and all quantities may depend on unless explicitly declared to be “fixed”. The asymptotic notation is then defined relative to this parameter. A quantity is said to be of polynomial size if one has , and bounded if . We also write for , and for .
The purpose of this (rather technical) post is both to roll over the polymath8 research thread from this previous post, and also to record the details of the latest improvement to the Type I estimates (based on exploiting additional averaging and using Deligne’s proof of the Weil conjectures) which lead to a slight improvement in the numerology.
In order to obtain this new Type I estimate, we need to strengthen the previously used properties of “dense divisibility” or “double dense divisibility” as follows.
Definition 1 (Multiple dense divisibility) Let . For each natural number , we define a notion of -tuply -dense divisibility recursively as follows:
- Every natural number is -tuply -densely divisible.
- If and is a natural number, we say that is -tuply -densely divisible if, whenever are natural numbers with , and , one can find a factorisation with such that is -tuply -densely divisible and is -tuply -densely divisible.
We let denote the set of -tuply -densely divisible numbers. We abbreviate “-tuply densely divisible” as “densely divisible”, “-tuply densely divisible” as “doubly densely divisible”, and so forth; we also abbreviate as .
Given any finitely supported sequence and any primitive residue class , we define the discrepancy
We now recall the key concept of a coefficient sequence, with some slight tweaks in the definitions that are technically convenient for this post.
for all , where is the divisor function.
- (i) A coefficient sequence is said to be located at scale for some if it is supported on an interval of the form for some .
- (ii) A coefficient sequence located at scale for some is said to obey the Siegel-Walfisz theorem if one has
for any , any fixed , and any primitive residue class .
- (iii) A coefficient sequence is said to be smooth at scale for some is said to be smooth if it takes the form for some smooth function supported on an interval of size and obeying the derivative bounds
for all fixed (note that the implied constant in the notation may depend on ).
Note that we allow sequences to be smooth at scale without being located at scale ; for instance if one arbitrarily translates of a sequence that is both smooth and located at scale , it will remain smooth at this scale but may not necessarily be located at this scale any more. Note also that we allow the smoothness scale of a coefficient sequence to be less than one. This is to allow for the following convenient rescaling property: if is smooth at scale , , and is an integer, then is smooth at scale , even if is less than one.
Now we adapt the Type I estimate to the -tuply densely divisible setting.
Definition 3 (Type I estimates) Let , , and be fixed quantities, and let be a fixed natural number. We let be an arbitrary bounded subset of , let , and let a primitive congruence class. We say that holds if, whenever are quantities with
for any fixed . Here, as in previous posts, denotes the square-free natural numbers whose prime factors lie in .
The main theorem of this post is then
In practice, the first condition here is dominant. Except for weakening double dense divisibility to quadruple dense divisibility, this improves upon the previous Type I estimate that established under the stricter hypothesis
As in previous posts, Type I estimates (when combined with existing Type II and Type III estimates) lead to distribution results of Motohashi-Pintz-Zhang type. For any fixed and , we let denote the assertion that
for any fixed , any bounded , and any primitive , where is the von Mangoldt function.
Proof: Setting sufficiently close to , we see from the above theorem that holds whenever
The second condition is implied by the first and can be deleted.
From this previous post we know that (which we define analogously to from previous sections) holds whenever
while holds with sufficiently close to whenever
As before, we let denote the claim that given any admissible -tuple , there are infinitely many translates of that contain at least two primes.
This follows from the Pintz sieve, as discussed below the fold. Combining this with the best known prime tuples, we obtain that there are infinitely many prime gaps of size at most , improving slightly over the previous record of .
— 1. Multiple dense divisibility —
We record some useful properties of dense divisibility.
- (i) If is -tuply -densely divisible, and is a factor of , then is -tuply -densely divisible. Similarly, if is a multiple of , then is -densely divisible.
- (ii) If are -densely divisible, then is also -densely divisible.
- (iii) Any -smooth number is -tuply -densely divisible.
- (iv) If is -smooth and square-free for some , and , then is -tuply -densely divisible.
Proof: (i) is easily established by induction on , the idea being to start with a good factorisation of and perturb it into a factorisation of or by dividing or multiplying by a small number. To prove (ii), we may assume without loss of generality that , so that . If we set , then the factors of , as well as the factors of multiplied by , are both factors of . From this we can deduce the -dense divisibility of from the -dense divisibility of .
The claim (iii) is easily established by induction on and a greedy algorithm, so we turn to (iv). The claim is trivial for . Next, we consider the case. Our task is to show that for any , one can find a factorisation with . If , we can achieve this factorisation by initialising to equal and then greedily multiplying the remaining factors of until one exceeds , so we may assume instead that . Then by the greedy algorithm we can find a factor of with ; if we then greedily multiply by factors with we obtain the claim.
Finally we consider the case. We assume inductively that the claim has already been proven for smaller values of . Let be such that . By hypothesis, the -smooth quantity is at least . By the greedy algorithm, we may thus factor where
Now we divide into several cases. Suppose first that . Then , so by the case, we may find a factorisation with . Setting and , the claim then follows from the induction hypothesis.
Now suppose that . By the greedy algorithm, we may then find a factor of the -smooth quantity with ; setting , we see that is a multiple of and hence . The claim now follows from the induction hypothesis and (iii).
Finally, suppose that . By the greedy algorithm, we may then find a factor of the -smooth quantity with ; setting , we see that is a multiple of and hence . The claim now follows from the induction hypothesis and (iii).
Now we record the criterion for using to deduce .
Proposition 8 (Criterion for DHL) Let be such that holds. Suppose that one can find a natural number and real numbers and such that
Proof: We use the Pintz sieve from this post, repeating the proof of Theorem 5 from that post (and using the explicit formulae for and from this comment thread). The main difference is that the exponent in equation (10) of that post needs to be replaced with (and similarly for the displays up to (11)), and needs to be replaced with .
Applying this proposition with , , , sufficiently close to , and we obtain as claimed.
— 2. van der Corput estimates —
In this section we generalise the van der Corput estimates from Section 1 of this previous post to wider classes of “structured functions” than rational phases. We will adopt an axiomatic approach, laying out the precise axioms that we need a given class of structured functions to obey:
Definition 9 (Structured functions) Let be a bounded subset of . A class of structured functions is a family of collections of functions defined on subsets of for each prime and every ; an element of is then said to be a structured function of complexity at most and modulus . Furthermore we place an equivalence relation on each class with sufficiently large depending on . This class and this equivalence relation is assumed to obey the following axioms:
- (i) (Monotonicity) One has whenever . Furthermore, if is sufficiently large depending on , the equivalence relations on and agree on their common domain of definition.
- (ii) (Near-total definition) If , then the domain of consists of with at most points removed.
- (iii) (Pointwise bound) If , then for all in the domain of .
- (iv) (Conjugacy) If , then for some .
- (v) (Multiplication) If , then the pointwise product (on the common domain of definition) can be expressed as the sum of functions (which we will call the components of ) in for some .
- (vi) (Translation invariance) If , and , then the function (defined on the translation of the domain of definition of ) lies in for some .
- (vii) (Dilation invariance) If , and , then the function (defined on the dilation of the domain of definition of ) lies in for some .
- (viii) (Polynomial phases) If is a polynomial of degree at most , then the function lies in for some . More generally, if , then the product lies in for some . Furthermore, if is sufficiently large depending on , this operation respects the equivalence relation : if and only if . Finally, if and is not identically zero, then .
- (ix) (Almost orthogonality) If have domains of definition respectively, one has for an algebraic integer , with the error term being Galois-absolute in the sense that all Galois conjugates of the error term are also . Furthermore, if is sufficiently large depending on , then vanishes whenever .
- (x) (Integration) Suppose that is such that contains a component equivalent to for some . Suppose also that is sufficiently large depending on . Then there exists such that .
Example 1 (Polynomial phases) Let be a bounded subset of . If, for every prime and , we define to be the set of all functions of the form , where are polynomials of degree at most with integer coefficients, defined on all of , then this is a class of structured functions (note that the almost orthogonality axiom requires the Weil conjectures for curves). Two polyomial phases will be declared equivalent if differ only in the constant term. Note from the Chinese remainder theorem that the function is then also a structured function of complexity at most and modulus .
Example 2 (Polynomial phases twisted by characters) Let be a bounded subset of . If, for every prime and , we define to be the set of all functions of the form , where is a phase, are polynomials of degree at most with integer coefficients, , and the are Dirichlet characters of order , with the non-standard convention that is undefined (instead of vanishing) at zero. Then this is a class of structured functions (again, the almost orthogonality axiom requires the Weil conjectures for curves). We declare two structured functions to be equivalent if they agree up to a constant phase on their common domain of definition. Note from the Chinese remainder theorem that the function is then also a structured function of complexity at most if the are Dirichlet characters of period (and conductor dividing ), again with the convention that is undefined (instead of vanishing) when .
Example 3 (Rational phases) Let be a bounded subset of . If, for every prime and , we define to be the set of all functions of the form , where are polynomials of degree at most with integer coefficients and with monic, with the function only defined when , then this is a class of structured functions (again, the almost orthogonality axiom requires the Weil conjectures for curves). We declare two structured functions to be equivalent if they agree up to a constant phase on their common domain of definition. Note from the Chinese remainder theorem that the function is then also a structured function of complexity at most and modulus .
Example 4 (Trace weights) Let be a bounded subset of . We fix a prime not in , and we fix an embedding of the -adics into . If, for every prime and , we define to be the set of all functions of the form
where is with at most points removed, and is a lisse -adic sheaf on that is pure of weight and geometrically isotypic with conductor at most (see this previous post for definitions of these terms), then this is a class of structured functions. We declare two trace weights to be equivalent if one has and for some geometrically isotypic sheaves whose geometrically irreducible components are isomorphic. The almost orthogonality now is deeper, being a consequence of Deligne’s second proof of the Weil conjectures, and also using a form of Schur’s lemma for sheaves; see Section 5 of this paper of Fouvry, Kowalski, and Michel. The integration axiom follows from Lemma 5.3 of the same paper. This class of structured functions includes the previous three classes, but also includes Kloosterman-type objects such as (and many other exponential sums) besides. (Indeed, it basically closed under the operations of Fourier transforms, convolution, and pullback, as long as certain degenerate cases are avoided.)
We now turn to the problem of obtaining non-trivial bounds for the expression
where , is a structured function of bounded complexity and modulus , and is a smooth function at scale . The trivial bound here is
since one has from the divisor bound. In some cases we cannot hope to improve upon this bound; for instance, if is a constant phase then there is clearly no improvement available. Similarly, if is the linear phase , then there is no improvement in the regime ; if is the quadratic phase then there is no improvement in the regime ; if is the cubic phase then there is no improvement in the regime ; and so forth. On the other hand, we will be able to establish a van der Corput estimate which roughly speaking asserts that as long as these polynomial obstructions are avoided, and is smooth, one gets a non-trivial gain.
We first need a lemma:
Lemma 10 (Fundamental theorem of calculus) Let be a class of structured functions. Let , let , and let be a structured function of complexity at most with modulus . Assume that is sufficiently large depending on . Let , and suppose that there is a polynomial of degree at most such that for all for which this identity is well-defined. Then there exists a polynomial of degree at most such that for all for which this identity is well-defined.
Proof: By dilating by and using the dilation invariance of structured functions, we may assume without loss of generality that . We can write in terms of the binomial functions for (which are well-defined if ) as
for some coefficients . If we then define
then is a polynomial of degree at most (if ) and by Pascal’s identity. So if we multiply by (using the polynomial phase invariance of structured functions) we may assume without loss of generality that , thus . But then the claim follows from the integration axiom.
Now we can state the van der Corput estimate.
Proposition 11 (van der Corput) Let be a class of structured functions. Let be of polynomial size, and let be a structured function of modulus and complexity at most . Let be fixed, and let denote the set of sufficiently large primes dividing with the property that there exists a polynomial of degree at most such that , and let . Then for any of polynomial size, any factorisation , and any coefficient sequence function which is smooth at scale , one has
where , , and , where the sum is implicitly assumed to range over those for which is defined.
The parameter is technical, as is the term; heuristically one should view this estimate as asserting that
under reasonable non-degeneracy conditions. Assuming sufficient dense divisibility and in the regime , the optimal value of the right-hand side is , where , which is attained when for and .
Proof: We induct on , assuming that the claim has already been proven for all smaller values of .
We may factor where and . Then we may write
Observe that any given , has magnitude (from the divisor bound), the function is of the form for some supported on an interval of length and obeying the bounds , and the function is a structured function of modulus and complexity at most (here we use the dilation and translation invariance properties of structured functions). From this we see that to prove the proposition for a given value of , it suffices to do so under the assumption , in which case the objective is to prove that
The claim is trivial (from the divisor bound) with , so we may assume , in which case we will show that
By applying a similar reduction to before we may also assume that all prime factors of are larger than some large fixed constant , which we will assume to be sufficiently large for the arguments below to work.
We begin with the base case . In this case it will suffice to establish the bound
By completion of sums, it will suffice to show that
for all . By the Chinese remainder theorem and the divisor bound, it will suffice to show that
for all and all . However, by the hypothesis , , and the claim now follows from the almost orthogonality properties of structured functions.
Now suppose that , and the claim has already been proven for smaller values of . If then the claim follows from the bound, so we may assume that , in which case we will establish
If we have , then
and the claim then follows by the induction hypothesis (concatenating and ). Similarly, if , then , and the claim follows from the triangle inequality. Thus we may assume that
Let . We can rewrite as
and by the divisor bound , and so by the triangle inequality and the Cauchy-Schwarz inequality
since the summand is only non-zero when is supported on an interval of length . This last expression may be rearranged as
We observe that is the sum of structured functions of modulus and complexity , each of which the product of one of the components of of modulus and complexity for all . We can of course delete any components that vanish identically. Suppose that for one of these primes , one of the components of the function is equivalent to for some polynomial of degree at most . Then by Lemma 10, if is sufficiently large (larger than a fixed constant), either , or else is equivalent for some polynomial of degree at most , but by the hypothesis the latter case cannot occur since is non-vanishing and . Thus if we set to be the product of all the primes with this property, we see that .
Applying the induction hypothesis, we may thus bound
We may bound
The first term is dominated by the term appearing as the summand in (9), while the contribution of the second term may be bounded using another application of Lemma 5 of this previous post and the bound by
which is acceptable.
Remark 1 The above arguments relied on a -version of the van der Corput -process, and in the case of Dirichlet characters is essentially due to Graham and Ringrose (see also Heath-Brown). If we work with a class of structured functions that is closed under Fourier transforms (such as the trace weights), then the -version of the van der Corput -process also becomes available (in principle, at least), thus potentially giving a slightly larger range of “exponent pairs”; however this looks complicated to implement (the role of polynomial phases now needs to be replaced by a more complicated class that involves things like the Fourier transforms of polynomial phases, as well as their “antiderivatives”) and will likely only produce rather small improvements in the final numerology.
We isolate a special case of the above result:
Corollary 12 Let the notation and assumptions be as in Proposition 11 with , , and -densely divisible. Then for any , one has the bounds
The dependence on in the first bound can be improved, but we will not need this improvement here.
Proof: From the case of the above proposition we have
giving the first claim of the proposition.
Similarly, from the case of the above proposition we have
for any factorisation of . As is -densely divisible, we may select so that
and the second claim follows.
— 3. A two-dimensional exponential sum —
We now apply the above theory to obtain a new bound on a certain two-dimensional exponential sum that will show up in the Type I estimate.
Here the summations are implicitly restricted to those for which the denominator in the phase is non-zero. We also have the bound
The main term here is , which in certain regimes improves upon the bound of that one obtains by completing the sums in the variable but not exploiting any additional cancellation in the variable.
Proof: We first claim that it suffices to verify the proposition when . Indeed, if we set
(where one computes the reciprocal of inside ), we see that is -densely divisible (thanks to Lemma 7), squarefree, and polynomial size, that , and that
By the inclusion-exclusion formula and divisor bound, it thus suffices to show that for all , one has
where is either of the two right-hand sides in the proposition, i.e. either
By the divisor bound, we see that there are pairs such that . Thus it will suffice to show that
Making the change of variables , and using the case of the proposition, we can bound the left-hand side by
and one verifies that these two quantities bound the two possible values of respectively.
Henceforth . Note that the above reduction also allows us to assume that has no prime factors less than a sufficiently large fixed constant to be chosen later. Our task is now to show that
From completion of sums we have
so it will suffice to show that
By the Chinese remainder theorem, this function factors as , where
and . Note that for any prime dividing (and thus larger than ), the rational function is not divisible by . From the Weil conjectures for curves this implies that . In fact, from Deligne’s theorem (and in particular the fact that cohomology groups of sheaves are again sheaves), we have the stronger assertion that is a sum of boundedly many trace weights at modulus with complexity in the sense of Example 4. (In the Grothendieck-Lefschetz trace formula, only the first cohomology is non-trivial; the second cohomology disappears because the rational function is not divisible by , and the zeroth cohomology disappears because the underlying curve is affine, although in any event the contribution of the zeroth cohomology could be absorbed into the term in (10).) By the divisor bound, this implies that is the sum of trace weights at modulus with complexity . We can of course delete any components that vanish identically.
We claim that for any dividing (and hence larger than ), none of the components of are equivalent to a quadratic phase . Assuming this claim for the moment, the required bound (10) then follows from Corollary 12. It thus suffices to verify the claim. If the claim failed, then we would have
for some algebraic integer , which is non-zero since is equivalent to and is non-zero. Since all non-zero algebraic integers have at least one Galois conjugate of modulus at least , it will suffice (for large enough) to establish that all Galois conjugates of the left-hand side of (27) are . In other words, it suffices to establish the bound
for all . Setting and and concatenating parameters, it suffices to show that
whenever and .
We now use a result of Hooley, which asserts that for any rational function of two variables and bounded degree, one has
is a geometrically generically irreducible curve (i.e. irreducible over an algebraic closure of ) and also that
is a (possibly reducible or empty) curve for any . We apply this result to the rational function
For any , it is clear that is not identically zero, so the second condition of Hooley is satisfied. It remains to verify the first. (Thanks to Brian Conrad for fixing some errors in the argument that follows.) Suppose that the claim failed, thus is reducible for generic , or equivalently that the polynomial
is reducible in . Being linear in , this polynomial is clearly irreducible in ; since does not lie in , it remains irreducible in the larger ring by Gauss’s lemma.
We now perform a technical reduction to deal with the problem that the field is not perfect. Since involves the nonzero term as its only -term, over it cannot be a constant multiple of a -power. Hence, if it is irreducible over the separable closure of then it remains irreducible over the perfect closure of , so it suffices to check irreducibility over the separable closure.
Assuming is reducible over the separable closure, then up to constant multipliers (i.e. multiples in ) its irreducible factors in must be Galois conjugate to each other with respect to . Thus, none of these factors can lie in or , as otherwise all the factors would and hence so would their product (a contradiction since ). Thus, the irreducible factorization over remains an irreducible factorization in and over . Since has nonzero constant term and degree at most in either or , this implies that the irreducible factors of in are linear in both and , thus
for some and . But is visibly constant, so all vanish and hence , an absurdity.
— 4. Type I estimate —
We begin the proof of Theorem 4, closely following the arguments from Section 5 of this previous post or Section 2 of this previous post. One difference however will be that we will not discard the averaging as we will need it near the end of the argument. Let be as in the theorem. We can restrict to the range
for some sufficiently slowly decaying , since otherwise we may use the Bombieri-Vinogradov theorem (Theorem 4 from this previous post). Thus, by dyadic decomposition, we need to show that
for any fixed and for any in the range
be a sufficiently small fixed exponent.
By Lemma 11 of this previous post, we know that for all in outside of a small number of exceptions, we have
Specifically, the number of exceptions in the interval is for any fixed . The contribution of the exceptional can be shown to be acceptable by Cauchy-Schwarz and trivial estimates (see Section 5 of this previous post), so we restrict attention to those for which (14) holds. In particular, as is restricted to be quadruply -densely divisible, we may factor
with coprime and square-free, with -densely divisible with , doubly -densely divisible,and
Here we use the easily verified fact that , and we have also used Lemma 7 to ensure that dense divisibility is essentially preserved when transferring a factor of from (namely, the portion of coming from primes up to ) to .
By dyadic decomposition, it thus suffices to show that
Fix . We abbreviate and by and respectively, thus our task is to show that
We now split the discrepancy
as the sum of the subdiscrepancies
In Section 5 of this previous post, it was established (using the Bombieri-Vinogradov theorem) that
It will suffice to prove the slightly stronger statement
for all coprime to , since if one then specialises to the case when and averages over all primitive we obtain (18) from the triangle inequality.
We use the dispersion method. We write the left-hand side of (19) as
for some bounded sequence . This expression may be rearranged as
where is subject to the same constraints as (thus and for ), and is some quantity that is independent of .
Observe that must be coprime to and coprime to , with , to have a non-zero contribution to (21). We then rearrange the left-hand side as
note that these inverses in the various rings , , are well-defined thanks to the coprimality hypotheses.
for some independent of , .
At this stage in previous posts we isolated the coprime case as the dominant case, using a controlled multiplicity hypothesis to deal with the non-coprime case. Here, we will carry the non-coprime case with us for a little longer so as not to rely on a controlled multiplicity hypothesis; this introduces some additional factors of into the analysis but they should be ignored on a first reading.
Let us first deal with the main term (23). The contribution of the coprime case does not depend on and can thus be absorbed into the term. Now we consider the contribution of the non-coprime case when . We may estimate the contribution of this case by
We may estimate by . We just estimate the contribution of , as the other case is treated similarly (after shifting by ). We rearrange this contribution as
The summation is . Evaluating the summations, we obtain a bound of
Since and , we have , and so we may evaluate the summation as
It remains to control (24). We may assume that , as the claim is trivial otherwise. It will suffice to obtain the bound
Using (25), it will suffice to show that
for each .
for each .
Henceforth we work with a single choice of . We pause to verify the relationship
As is -densely divisible, we may now factor where
Factoring out , we may then write where
By dyadic decomposition, it thus suffices to show that
whenever are such that
We rearrange this estimate as
for some bounded sequence which is only non-zero when
By Cauchy-Schwarz and crude estimates, it then suffices to show that
The contribution of the diagonal case is by the divisor bound, which is acceptable since . Thus it suffices to control the off-diagonal case .
Note that need to lie in for the summand to be non-vanishing. We use the following elementary lemma:
Proof: Setting , it suffices to show that
for each fixed . Since
it suffices to show that
for all coprime of polynomial size.
If divides both and , then for each dividing , must divide one of , , or . Thus we can factor and , , , , which implies that . For fixed , we see from the divisor bound that there are choices for . Fixing , we see that have magnitude , so there are possible pairs of whose difference is non-zero and divisible by . The claim then follows from the divisor bound.
From this lemma, we see that for each fixed choice of in the above sum, it suffices to show that
Thus far the arguments have been essentially identical to that in the previous post, except that we have retained the averaging (and crucially, this averaging is inside the absolute values rather than outside). We now exploit the doubly dense divisibility of to factor where
which are conditions which we will verify later. By dyadic decomposition, and the triangle inequality in , it thus suffices to show that
Note that if the factors are to be non-vanishing, , are to be -densely divisible, and so is -densely divisible as well thanks to Lemma 7.
We write the above estimate as
We now perform Weyl differencing. Set , then and we can rewrite
and so it suffices to show that
By Cauchy-Schwarz, it suffices to show that
We restrict and to individual residue classes and ; it then suffices to show that
From (26) we see that the quantity vanishes unless
is square-free, and in that case it takes the form
when restricted to , , where are quantities that may depend on but are independent of with
adopting the convention that vanishes when , and is a bounded quantity depending on but otherwise independent of . If we let be such that and , and let , we can simplify the above as
and note that
We thus have
It thus suffices to show that
The contribution of the diagonal case is , which is acceptable thanks to (29) (which implies ; we have a factor of to spare which we will simply discard). It thus suffices to control the off-diagonal case . It then suffices to show that
for each non-zero .
Performing a Taylor expansion, we can write
for any fixed , where
Absorbing the factor into , and taking large enough, it suffices to show that
for coefficient sequences which are smooth at scales respectively. But by applying Proposition 13, and making the substitutions , , we may bound the left-hand side by
Using the former bound when and the latter bound when , we obtain the upper bound of
and , it suffices to show that
Inserting these bounds and discarding the remaining powers of , we reduce to
We rearrange these as
Applying the bounds on from (30), these reduce to
The third bound follows since , and so may be dropped. We also recall the two bounds assumed from (29):
so the remaining three bounds may be rewritten as
Since , these three bounds reduce to
From (16) we have , so the third bound is automatic, and the other two bounds become
Since , these two bounds become
which we rearrange as
and the claim follows.