You are currently browsing the category archive for the ‘math.NT’ category.

The twin prime conjecture, still unsolved, asserts that there are infinitely many primes such that is also prime. A more precise form of this conjecture is (a special case) of the Hardy-Littlewood prime tuples conjecture, which asserts that

as , where is the von Mangoldt function and is the twin prime constant

Because is almost entirely supported on the primes, it is not difficult to see that (1) implies the twin prime conjecture.

One can give a heuristic justification of the asymptotic (1) (and hence the twin prime conjecture) via sieve theoretic methods. Recall that the von Mangoldt function can be decomposed as a Dirichlet convolution

where is the Möbius function. Because of this, we can rewrite the left-hand side of (1) as

To compute this double sum, it is thus natural to consider sums such as

or (to simplify things by removing the logarithm)

The prime number theorem in arithmetic progressions suggests that one has an asymptotic of the form

where is the multiplicative function with for even and

for odd. Summing by parts, one then expects

and so we heuristically have

The Dirichlet series

has an Euler product factorisation

for ; comparing this with the Euler product factorisation

for the Riemann zeta function, and recalling that has a simple pole of residue at , we see that

has a simple zero at with first derivative

From this and standard multiplicative number theory manipulations, one can calculate the asymptotic

which concludes the heuristic justification of (1).

What prevents us from making the above heuristic argument rigorous, and thus proving (1) and the twin prime conjecture? Note that the variable in (2) ranges to be as large as . On the other hand, the prime number theorem in arithmetic progressions (3) is not expected to hold for anywhere that large (for instance, the left-hand side of (3) vanishes as soon as exceeds ). The best unconditional result known of the type (3) is the Siegel-Walfisz theorem, which allows to be as large as . Even the powerful generalised Riemann hypothesis (GRH) only lets one prove an estimate of the form (3) for up to about .

However, because of the averaging effect of the summation in in (2), we don’t need the asymptotic (3) to be true for *all* in a particular range; having it true for *almost all* in that range would suffice. Here the situation is much better; the celebrated Bombieri-Vinogradov theorem (sometimes known as “GRH on the average”) implies, roughly speaking, that the approximation (3) is valid for *almost all* for any fixed . While this is not enough to control (2) or (1), the Bombieri-Vinogradov theorem can at least be used to control variants of (1) such as

for various sieve weights whose associated divisor function is supposed to approximate the von Mangoldt function , although that theorem only lets one do this when the weights are supported on the range . This is still enough to obtain some partial results towards (1); for instance, by selecting weights according to the Selberg sieve, one can use the Bombieri-Vinogradov theorem to establish the upper bound

which is off from (1) by a factor of about . See for instance this blog post for details.

It has been difficult to improve upon the Bombieri-Vinogradov theorem in its full generality, although there are various improvements to certain restricted versions of the Bombieri-Vinogradov theorem, for instance in the famous work of Zhang on bounded gaps between primes. Nevertheless, it is believed that the Elliott-Halberstam conjecture (EH) holds, which roughly speaking would mean that (3) now holds for almost all for any fixed . (Unfortunately, the factor cannot be removed, as investigated in a series of papers by Friedlander, Granville, and also Hildebrand and Maier.) This comes tantalisingly close to having enough distribution to control all of (1). Unfortunately, it still falls short. Using this conjecture in place of the Bombieri-Vinogradov theorem leads to various improvements to sieve theoretic bounds; for instance, the factor of in (4) can now be improved to .

In two papers from the 1970s (which can be found online here and here respectively, the latter starting on page 255 of the pdf), Bombieri developed what is now known as the *Bombieri asymptotic sieve* to clarify the situation more precisely. First, he showed that on the Elliott-Halberstam conjecture, while one still could not establish the asymptotic (1), one could prove the generalised asymptotic

for all natural numbers , where the generalised von Mangoldt functions are defined by the formula

These functions behave like the von Mangoldt function, but are concentrated on -almost primes (numbers with at most prime factors) rather than primes. The right-hand side of (5) corresponds to what one would expect if one ran the same heuristics used to justify (1). Sadly, the case of (5), which is just (1), is just barely excluded from Bombieri’s analysis.

More generally, on the assumption of EH, the Bombieri asymptotic sieve provides the asymptotic

for any fixed and any tuple of natural numbers other than , where

is a further generalisation of the von Mangoldt function (now concentrated on -almost primes). By combining these asymptotics with some elementary identities involving the , together with the Weierstrass approximation theorem, Bombieri was able to control a wide family of sums including (1), except for one undetermined scalar . Namely, he was able to show (again on EH) that for any fixed and any continuous function on the simplex that had suitable vanishing at the boundary, the sum

when was even, where the integral on is with respect to the measure (this is Dirac measure in the case ). In particular, we have

and the twin prime conjecture would be proved if one could show that is bounded away from zero, while (1) is equivalent to the assertion that is equal to . Unfortunately, no additional bound beyond the inequalities provided by the Bombieri asymptotic sieve is known, even if one assumes all other major conjectures in number theory than the prime tuples conjecture and its variants (e.g. GRH, GEH, GUE, abc, Chowla, …).

To put it another way, the Bombieri asymptotic sieve is able (on EH) to compute asymptotics for sums

without needing to know the unknown scalar , when is a function supported on almost primes of the form

for and some fixed , with vanishing elsewhere and for some continuous (symmetric) functions obeying some vanishing at the boundary, so long as the parity condition

is obeyed (informally: gives the same weight to products of an odd number of primes as to products of an even number of primes, or to put it another way, is asymptotically orthogonal to the Möbius function ). But when violates the parity condition, the asymptotic involves the unknown . This scalar thus embodies the “parity problem” for the twin prime conjecture (discussed in these previous blog posts).

Because the obstruction to the parity problem is only one-dimensional (on EH), one can replace any parity-violating weight (such as ) with any other parity-violating weight and obtain a logically equivalent estimate. For instance, to prove the twin prime conjecture on EH, it would suffice to show that

for some fixed , or equivalently that there are solutions to the equation in primes with and . (In some cases, this sort of reduction can also be made using other sieves than the Bombieri asymptotic sieve, as was observed by Ng.) As another example, the Bombieri asymptotic sieve can be used to show that the asymptotic (1) is equivalent to the asymptotic

where is the set of numbers that are *rough* in the sense that they have no prime factors less than for some fixed (the function clearly correlates with and so must violate the parity condition). One can replace with similar sieve weights (e.g. a Selberg sieve) that concentrate on almost primes if desired.

As it turns out, if one is willing to strengthen the assumption of the Elliott-Halberstam (EH) conjecture to the assumption of the *generalised Elliott-Halberstam (GEH) conjecture* (as formulated for instance in Claim 2.6 of the Polymath8b paper), one can also swap the factor in the above asymptotics with other parity-violating weights and obtain a logically equivalent estimate, as the Bombieri asymptotic sieve also applies to weights such as under the assumption of GEH. For instance, on GEH one can use two such applications of the Bombieri asymptotic sieve to show that the twin prime conjecture would follow if one could show that there are solutions to the equation

in primes with and , for some . Similarly, on GEH the asymptotic (1) is equivalent to the asymptotic

for some fixed , and similarly with replaced by other sieves. This form of the quantitative twin primes conjecture is appealingly similar to the (special case)

of the Chowla conjecture, for which there has been some recent progress (discussed for instance in these recent posts). Informally, the Bombieri asymptotic sieve lets us (on GEH) view the twin prime conjecture as a sort of Chowla conjecture restricted to almost primes. Unfortunately, the recent progress on the Chowla conjecture relies heavily on the multiplicativity of at small primes, which is completely destroyed by inserting a weight such as , so this does not yet yield a viable path towards the twin prime conjecture even assuming GEH. Still, the similarity is striking, and one can hope that further ways to attack the Chowla conjecture may emerge that could impact the twin prime conjecture. (Alternatively, if one assumes a sufficiently optimistic version of the GEH, one could perhaps relax the notion of “almost prime” to the extent that one could start usefully using multiplicativity at smallish primes, though this seems rather wishful at present, particularly since the most optimistic versions of GEH are known to be false.)

The Bombieri asymptotic sieve is already well explained in the original two papers of Bombieri; there is also a slightly different treatment of the sieve by Friedlander and Iwaniec, as well as a simplified version in the book of Friedlander and Iwaniec (in which the distribution hypothesis is strengthened in order to shorten the arguments. I’ve decided though to write up my own notes on the sieve below the fold; this is primarily for my own benefit, but may be useful to some readers also. I largely follow the treatment of Bombieri, with the one idiosyncratic twist of replacing the usual “elementary” Selberg sieve with the “analytic” Selberg sieve used in particular in many of the breakthrough works in small gaps between primes; I prefer working with the latter due to its Fourier-analytic flavour.

** — 1. Controlling generalised von Mangoldt sums — **

To prove (5), we shall first generalise it, by replacing the sequence by a more general sequence obeying the following axioms:

- (i) (Non-negativity) One has for all .
- (ii) (Crude size bound) One has for all , where is the divisor function.
- (iii) (Size) We have for some constant .
- (iv) (Elliott-Halberstam type conjecture) For any , one has
where is a multiplicative function with for all primes and .

These axioms are a little bit stronger than what is actually needed to make the Bombieri asymptotic sieve work, but we will not attempt to work with the weakest possible axioms here.

We introduce the function

which is analytic for ; in particular it can be evaluated at to yield

There are two model examples of data to keep in mind. The first, discussed in the introduction, is when , then and is as in the introduction; one of course needs EH to justify axiom (iv) in this case. The other is when , in which case and for all . We will later take advantage of the second example to avoid doing some (routine, but messy) main term computations.

The main result of this section is then

Theorem 1Let be as above. Let be a tuple of natural numbers (independent of ) that is not equal to . Then one has the asymptoticas , where .

Note that this recovers (5) (on EH) as a special case.

We now begin the proof of this theorem. Henceforth we allow implied constants in the or notation to depend on and .

It will be convenient to replace the range by a shorter range by the following standard localisation trick. Let be a large quantity depending on to be chosen later, and let denote the interval . We will show the estimate

from which the original claim follows by a routine summation argument. Observe from axiom (iv) and the triangle inequality that

for any .

Write for the logarithm function , thus for any . Without loss of generality we may assume that ; we then factor , where

This function is just when . When the function is more complicated, but we at least have the following crude bound:

*Proof:* We induct on . The case is obvious, so suppose and the claim has already been proven for . Since , we see from induction hypothesis and the triangle inequality that

Since by Möbius inversion, the claim follows.

We can write

In the region , we have . Thus

for . The contribution of the error term to to (10) is easily seen to be negligible if is large enough, so we may freely replace with with little difficulty.

If we insert this replacement directly into the left-hand side of (10) and rearrange, we get

We can’t quite control this using axiom (iv) because the range of is a bit too big, as explained in the introduction. So let us introduce a truncated function

where is a small quantity to be chosen later, and is a smooth function that equals on and equals on . Suppose one could establish the following two estimates for any fixed :

where is a quantity that depends on but not on . Then on combining the two estimates we would have

One could in principle compute explicitly from the proof of (13), but one can avoid doing so by the following comparison trick. In the special case , standard multiplicative number theory (noting that the Dirichlet series has a pole of order at , with top Laurent coefficient ) gives the asymptotic

which when compared with (14) for (recalling that in this case) gives the formula

Inserting this back into (14) and recalling that can be made arbitrarily small, we obtain (10).

As it turns out, the estimate (13) is easy to establish, but the estimate (12) is not, roughly speaking because the typical number in has too many divisors in the range , each of which gives a contribution to the error term. (In the book of Friedlander and Iwaniec, the estimate (13) is established anyway, but only after assuming a stronger version of (iv), roughly speaking in which is allowed to be as large as .) To resolve this issue, we will insert a preliminary sieve that will remove most of the potential divisors i the range (leaving only about such divisors on the average for typical ), making the analogue of (12) easier to prove (at the cost of making the analogue of (13) more difficult). Namely, if one can find a function for which one has the estimates

for some quantity that depends on but not on , then by repeating the previous arguments we will again be able to establish (10).

The key estimate is (16). As we shall see, when comparing with , the weight will cost us a factor of , but the term in the definitions of and will recover a factor of , which will give the desired bound since we are assuming .

One has some flexibility in how to select the weight : basically any standard sieve that uses divisors of size at most to localise (at least approximately) to numbers that are rough in the sense that they have no (or at least very few) factors less than , will do. We will use the analytic Selberg sieve choice

where is a smooth function supported on that equals on .

It remains to establish the bounds (15), (16), (17). To warm up and introduce the various methods needed, we begin with the standard bound

where denotes the derivative of . Note the loss of that had previously been pointed out. In the arguments that follows I will be a little brief with the details, as they are standard (see e.g. this previous post).

We now prove (19). The left-hand side can be expanded as

where denotes the least common multiple of and . From the support of we see that the summand is only non-vanishing when . We now use axiom (iv) and split the left-hand side into a main term

and an error term that is at most

From axiom (ii) and elementary multiplicative number theory, we have the bound

so from axiom (iv) and Cauchy-Schwarz we see that the error term (20) is acceptable. Thus it will suffice to establish the bound

The summand here is almost, but not quite, multiplicative in . To make it genuinely multiplicative, we perform a (shifted) Fourier expansion

for some rapidly decreasing function (essentially the Fourier transform of ). Thus

and so the left-hand side of (21) can be rearranged using Fubini’s theorem as

We can factorise as an Euler product:

Taking absolute values and using Mertens’ theorem leads to the crude bound

which when combined with the rapid decrease of , allows us to restrict the region of integration in (23) to the square (say) with negligible error. Next, we use the Euler product

for to factorise

where

For with nonnegative real part, one has

and so by the Weierstrass -test, is continuous at . Since

we thus have

Also, since has a pole of order at with residue , we have

and thus

The quantity (23) can thus be written, up to errors of , as

Using the rapid decrease of , we may remove the restriction on , and it will now suffice to prove the identity

But on differentiating and then squaring (22) we have

and the claim follows by integrating in from zero to infinity (noting that vanishes for ).

We have the following variant of (19):

for any . We also have the variant

If in addition has no prime factors less than for some fixed , one has

Roughly speaking, the above estimates assert that is concentrated on those numbers with no prime factors much less than , but factors without such small prime divisors occur with about the same relative density as they do in the integers.

*Proof:* The left-hand side of (24) can be expanded as

If we define

then the previous expression can be written as

while one has

which gives (25) from Axiom (iv). To prove (24), it now suffices to show that

Arguing as before, the left-hand side is

where

From Mertens’ theorem we have

when , so the contribution of the terms where can be absorbed into the error (after increasing that error slightly). For the remaining contributions, we see that

where if does not divide , and

if divides times for some . In the latter case, Taylor expansion gives the bounds

and the claim (28) follows. When and we have

and (27) follows by repeating the previous calculations. Finally, (26) is proven similarly to (24) (using in place of ).

Now we can prove (15), (16), (17). We begin with (15). Using the Leibniz rule applied to the identity and using and Möbius inversion (and the associativity and commutativity of Dirichlet convolution) we see that

Next, by applying the Leibniz rule to for some and using (29) we see that

and hence we have the recursive identity

In particular, from induction we see that is supported on numbers with at most distinct prime factors, and hence is supported on numbers with at most distinct prime factors. In particular, from (18) we see that on the support of . Thus it will suffice to show that

If and , then has at most distinct prime factors , with . If we factor , where is the contribution of those with , and is the contribution of those with , then at least one of the following two statements hold:

- (a) (and hence ) is divisible by a square number of size at least .
- (b) .

The contribution of case (a) is easily seen to be acceptable by axiom (ii). For case (b), we observe from (30) and induction that

and so it will suffice to show that

where ranges over numbers bounded by with at most distinct prime factors, the smallest of which is at most , and consists of those numbers with no prime factor less than or equal to . Applying (26) (with replaced by ) gives the bound

so by (25) it suffices to show that

subject to the same constraints on as before. The contribution of those with distinct prime factors can be bounded by

applying Mertens’ theorem and summing over , one obtains the claim.

Now we show (16). As discussed previously in this section, we can replace by with negligible error. Comparing this with (16) and (11), we see that it suffices to show that

From the support of , the summand on the left-hand side is only non-zero when , which makes , where we use the crucial hypothesis to gain enough powers of to make the argument here work. Applying Lemma 2, we reduce to showing that

We can make the change of variables to flip the sum

and then swap the sums to reduce to showing that

By Lemma 3, it suffices to show that

To prove this, we use the Rankin trick, bounding the implied weight by . We can then bound the left-hand side by the Euler product

which can be bounded by

and the claim follows from Mertens’ theorem.

Finally, we show (17). By (11), the left-hand side expands as

We let be a small constant to be chosen later. We divide the outer sum into two ranges, depending on whether only has prime factors greater than or not. In the former case, we can apply (27) to write this contribution as

plus a negligible error, where the is implicitly restricted to numbers with all prime factors greater than . The main term is messy, but it is of the required form up to an acceptable error, so there is no need to compute it any further. It remains to consider those that have at least one prime factor less than . Here we use (24) instead of (27) as well as Lemma 3 to dominate this contribution by

up to negligible errors, where is now restricted to have at least one prime factor less than . This makes at least one of the factors to be at most . A routine application of Rankin’s trick shows that

and so the total contribution of this case is . Since can be made arbitrarily small, (17) follows.

** — 2. Weierstrass approximation — **

Having proved Theorem 1, we now take linear combinations of this theorem, combined with the Weierstrass approximation theorem, to give the asymptotics (7), (8) described in the introduction.

Let , , , be as in that theorem. It will be convenient to normalise the weights by to make their mean value comparable to . From Theorem 1 and summation by parts we have

whenever does not consist entirely of ones.

We now take a closer look at what happens when does consist entirely of ones. Let denote the -tuple . Convolving the case of (30) with copies of for some and using the Leibniz rule, we see that

and hence

Multiplying by and summing over , and using (31) to control the term, one has

If we define (up to an error of ) by the formula

then an induction then shows that

for odd , and

for even . In particular, after adjusting by if necessary, we have since the left-hand sides are non-negative.

If we now define the comparison sequence , standard multiplicative number theory shows that the above estimates also hold when is replaced by ; thus

for both odd and even . The bound (31) also holds for when does not consist entirely of ones, and hence

for any fixed (which may or may not consist entirely of ones).

Next, from induction (on ), the Leibniz rule, and (30), we see that for any and , , the function

is a finite linear combination of functions of the form for tuples that may possibly consist entirely of ones. We thus have

whenever is one of these functions (32). Specialising to the case , we thus have

where . The contribution of those that are powers of primes can be easily seen to be negligible, leading to

where now . The contribution of the case where two of the primes agree can also be seen to be negligible, as can the error when replacing with , and then by symmetry

By linearity, this implies that

for any polynomial that vanishes on the coordinate hyperplanes . The right-hand side can also be evaluated by Mertens’ theorem as

when is odd and

when is even. Using the Weierstrass approximation theorem, we then have

for any continuous function that is compactly supported in the interior of . Computing the right-hand side using Mertens’ theorem as before, we obtain the claimed asymptotics (7), (8).

Remark 4The Bombieri asymptotic sieve has to use the full power of EH (or GEH); there are constructions due to Ford that show that if one only has a distributional hypothesis up to for some fixed constant , then the asymptotics of sums such as (5), or more generally (9), are not determined by a single scalar parameter , but can also vary in other ways as well. Thus the Bombieri asymptotic sieve really is asymptotic; in order to get type error terms one needs the level of distribution to be asymptotically equal to as . Related to this, the quantitative decay of the error terms in the Bombieri asymptotic sieve are extremely poor; in particular, they depend on the dependence of implied constant in axiom (iv) on the parameters , for which there is no consensus on what one should conjecturally expect.

I’ve just uploaded to the arXiv my paper “Equivalence of the logarithmically averaged Chowla and Sarnak conjectures“, submitted to the Festschrift “Number Theory – Diophantine problems, uniform distribution and applications” in honour of Robert F. Tichy. This paper is a spinoff of my previous paper establishing a logarithmically averaged version of the Chowla (and Elliott) conjectures in the two-point case. In that paper, the estimate

as was demonstrated, where was any positive integer and denoted the Liouville function. The proof proceeded using a method I call the “entropy decrement argument”, which ultimately reduced matters to establishing a bound of the form

whenever was a slowly growing function of . This was in turn established in a previous paper of Matomaki, Radziwill, and myself, using the recent breakthrough of Matomaki and Radziwill.

It is natural to see to what extent the arguments can be adapted to attack the higher-point cases of the logarithmically averaged Chowla conjecture (ignoring for this post the more general Elliott conjecture for other bounded multiplicative functions than the Liouville function). That is to say, one would like to prove that

as for any fixed distinct integers . As it turns out (and as is detailed in the current paper), the entropy decrement argument extends to this setting (after using some known facts about linear equations in primes), and allows one to reduce the above estimate to an estimate of the form

for a slowly growing function of and some fixed (in fact we can take for ), where is the (normalised) local Gowers uniformity norm. (In the case , , this becomes the Fourier-uniformity conjecture discussed in this previous post.) If one then applied the (now proven) inverse conjecture for the Gowers norms, this estimate is in turn equivalent to the more complicated looking assertion

where the supremum is over all possible choices of *nilsequences* of controlled step and complexity (see the paper for definitions of these terms).

The main novelty in the paper (elaborating upon a previous comment I had made on this blog) is to observe that this latter estimate in turn follows from the logarithmically averaged form of Sarnak’s conjecture (discussed in this previous post), namely that

whenever is a zero entropy (i.e. deterministic) sequence. Morally speaking, this follows from the well-known fact that nilsequences have zero entropy, but the presence of the supremum in (1) means that we need a little bit more; roughly speaking, we need the *class* of nilsequences of a given step and complexity to have “uniformly zero entropy” in some sense.

On the other hand, it was already known (see previous post) that the Chowla conjecture implied the Sarnak conjecture, and similarly for the logarithmically averaged form of the two conjectures. Putting all these implications together, we obtain the pleasant fact that the logarithmically averaged Sarnak and Chowla conjectures are equivalent, which is the main result of the current paper. There have been a large number of special cases of the Sarnak conjecture worked out (when the deterministic sequence involved came from a special dynamical system), so these results can now also be viewed as partial progress towards the Chowla conjecture also (at least with logarithmic averaging). However, my feeling is that the full resolution of these conjectures will not come from these sorts of special cases; instead, conjectures like the Fourier-uniformity conjecture in this previous post look more promising to attack.

It would also be nice to get rid of the pesky logarithmic averaging, but this seems to be an inherent requirement of the entropy decrement argument method, so one would probably have to find a way to avoid that argument if one were to remove the log averaging.

Over the last few years, a large group of mathematicians have been developing an online database to systematically collect the known facts, numerical data, and algorithms concerning some of the most central types of objects in modern number theory, namely the L-functions associated to various number fields, curves, and modular forms, as well as further data about these modular forms. This of course includes the most famous examples of L-functions and modular forms respectively, namely the Riemann zeta function and the discriminant modular form , but there are countless other examples of both. The connections between these classes of objects lie at the heart of the Langlands programme.

As of today, the “L-functions and modular forms database” is now out of beta, and open to the public; at present the database is mostly geared towards specialists in computational number theory, but will hopefully develop into a more broadly useful resource as time develops. An article by John Cremona summarising the purpose of the database can be found here.

(Thanks to Andrew Sutherland and Kiran Kedlaya for the information.)

Tamar Ziegler and I have just uploaded to the arXiv two related papers: “Concatenation theorems for anti-Gowers-uniform functions and Host-Kra characteoristic factors” and “polynomial patterns in primes“, with the former developing a “quantitative Bessel inequality” for local Gowers norms that is crucial in the latter.

We use the term “concatenation theorem” to denote results in which structural control of a function in two or more “directions” can be “concatenated” into structural control in a *joint* direction. A trivial example of such a concatenation theorem is the following: if a function is constant in the first variable (thus is constant for each ), and also constant in the second variable (thus is constant for each ), then it is constant in the joint variable . A slightly less trivial example: if a function is affine-linear in the first variable (thus, for each , there exist such that for all ) and affine-linear in the second variable (thus, for each , there exist such that for all ) then is a quadratic polynomial in ; in fact it must take the form

for some real numbers . (This can be seen for instance by using the affine linearity in to show that the coefficients are also affine linear.)

The same phenomenon extends to higher degree polynomials. Given a function from one additive group to another, we say that is of *degree less than * along a subgroup of if all the -fold iterated differences of along directions in vanish, that is to say

for all and , where is the difference operator

(We adopt the convention that the only of degree less than is the zero function.)

We then have the following simple proposition:

Proposition 1 (Concatenation of polynomiality)Let be of degree less than along one subgroup of , and of degree less than along another subgroup of , for some . Then is of degree less than along the subgroup of .

Note the previous example was basically the case when , , , , and .

*Proof:* The claim is trivial for or (in which is constant along or respectively), so suppose inductively and the claim has already been proven for smaller values of .

We take a derivative in a direction along to obtain

where is the shift of by . Then we take a further shift by a direction to obtain

leading to the *cocycle equation*

Since has degree less than along and degree less than along , has degree less than along and less than along , so is degree less than along by induction hypothesis. Similarly is also of degree less than along . Combining this with the cocycle equation we see that is of degree less than along for any , and hence is of degree less than along , as required.

While this proposition is simple, it already illustrates some basic principles regarding how one would go about proving a concatenation theorem:

- (i) One should perform induction on the degrees involved, and take advantage of the recursive nature of degree (in this case, the fact that a function is of less than degree along some subgroup of directions iff all of its first derivatives along are of degree less than ).
- (ii) Structure is preserved by operations such as addition, shifting, and taking derivatives. In particular, if a function is of degree less than along some subgroup , then any derivative of is also of degree less than along ,
*even if does not belong to*.

Here is another simple example of a concatenation theorem. Suppose an at most countable additive group acts by measure-preserving shifts on some probability space ; we call the pair (or more precisely ) a *-system*. We say that a function is a *generalised eigenfunction of degree less than * along some subgroup of and some if one has

almost everywhere for all , and some functions of degree less than along , with the convention that a function has degree less than if and only if it is equal to . Thus for instance, a function is an generalised eigenfunction of degree less than along if it is constant on almost every -ergodic component of , and is a generalised function of degree less than along if it is an eigenfunction of the shift action on almost every -ergodic component of . A basic example of a higher order eigenfunction is the function on the *skew shift* with action given by the generator for some irrational . One can check that for every integer , where is a generalised eigenfunction of degree less than along , so is of degree less than along .

We then have

Proposition 2 (Concatenation of higher order eigenfunctions)Let be a -system, and let be a generalised eigenfunction of degree less than along one subgroup of , and a generalised eigenfunction of degree less than along another subgroup of , for some . Then is a generalised eigenfunction of degree less than along the subgroup of .

The argument is almost identical to that of the previous proposition and is left as an exercise to the reader. The key point is the point (ii) identified earlier: the space of generalised eigenfunctions of degree less than along is preserved by multiplication and shifts, as well as the operation of “taking derivatives” even along directions that do not lie in . (To prove this latter claim, one should restrict to the region where is non-zero, and then divide by to locate .)

A typical example of this proposition in action is as follows: consider the -system given by the -torus with generating shifts

for some irrational , which can be checked to give a action

The function can then be checked to be a generalised eigenfunction of degree less than along , and also less than along , and less than along . One can view this example as the dynamical systems translation of the example (1) (see this previous post for some more discussion of this sort of correspondence).

The main results of our concatenation paper are analogues of these propositions concerning a more complicated notion of “polynomial-like” structure that are of importance in additive combinatorics and in ergodic theory. On the ergodic theory side, the notion of structure is captured by the *Host-Kra characteristic factors* of a -system along a subgroup . These factors can be defined in a number of ways. One is by duality, using the *Gowers-Host-Kra uniformity seminorms* (defined for instance here) . Namely, is the factor of defined up to equivalence by the requirement that

An equivalent definition is in terms of the *dual functions* of along , which can be defined recursively by setting and

where denotes the ergodic average along a Følner sequence in (in fact one can also define these concepts in non-amenable abelian settings as per this previous post). The factor can then be alternately defined as the factor generated by the dual functions for .

In the case when and is -ergodic, a deep theorem of Host and Kra shows that the factor is equivalent to the inverse limit of nilsystems of step less than . A similar statement holds with replaced by any finitely generated group by Griesmer, while the case of an infinite vector space over a finite field was treated in this paper of Bergelson, Ziegler, and myself. The situation is more subtle when is not -ergodic, or when is -ergodic but is a proper subgroup of acting non-ergodically, when one has to start considering measurable families of directional nilsystems; see for instance this paper of Austin for some of the subtleties involved (for instance, higher order group cohomology begins to become relevant!).

One of our main theorems is then

Proposition 3 (Concatenation of characteristic factors)Let be a -system, and let be measurable with respect to the factor and with respect to the factor for some and some subgroups of . Then is also measurable with respect to the factor .

We give two proofs of this proposition in the paper; an ergodic-theoretic proof using the Host-Kra theory of “cocycles of type (along a subgroup )”, which can be used to inductively describe the factors , and a combinatorial proof based on a combinatorial analogue of this proposition which is harder to state (but which roughly speaking asserts that a function which is nearly orthogonal to all bounded functions of small norm, and also to all bounded functions of small norm, is also nearly orthogonal to alll bounded functions of small norm). The combinatorial proof parallels the proof of Proposition 2. A key point is that dual functions obey a property analogous to being a generalised eigenfunction, namely that

where and is a “structured function of order ” along . (In the language of this previous paper of mine, this is an assertion that dual functions are uniformly almost periodic of order .) Again, the point (ii) above is crucial, and in particular it is key that any structure that has is inherited by the associated functions and . This sort of inheritance is quite easy to accomplish in the ergodic setting, as there is a ready-made language of factors to encapsulate the concept of structure, and the shift-invariance and -algebra properties of factors make it easy to show that just about any “natural” operation one performs on a function measurable with respect to a given factor, returns a function that is still measurable in that factor. In the finitary combinatorial setting, though, encoding the fact (ii) becomes a remarkably complicated notational nightmare, requiring a huge amount of “epsilon management” and “second-order epsilon management” (in which one manages not only scalar epsilons, but also function-valued epsilons that depend on other parameters). In order to avoid all this we were forced to utilise a nonstandard analysis framework for the combinatorial theorems, which made the arguments greatly resemble the ergodic arguments in many respects (though the two settings are still not equivalent, see this previous blog post for some comparisons between the two settings). Unfortunately the arguments are still rather complicated.

For combinatorial applications, dual formulations of the concatenation theorem are more useful. A direct dualisation of the theorem yields the following decomposition theorem: a bounded function which is small in norm can be split into a component that is small in norm, and a component that is small in norm. (One may wish to understand this type of result by first proving the following baby version: any function that has mean zero on every coset of , can be decomposed as the sum of a function that has mean zero on every coset, and a function that has mean zero on every coset. This is dual to the assertion that a function that is constant on every coset and constant on every coset, is constant on every coset.) Combining this with some standard “almost orthogonality” arguments (i.e. Cauchy-Schwarz) give the following Bessel-type inequality: if one has a lot of subgroups and a bounded function is small in norm for most , then it is also small in norm for most . (Here is a baby version one may wish to warm up on: if a function has small mean on for some large prime , then it has small mean on most of the cosets of most of the one-dimensional subgroups of .)

There is also a generalisation of the above Bessel inequality (as well as several of the other results mentioned above) in which the subgroups are replaced by more general *coset progressions* (of bounded rank), so that one has a Bessel inequailty controlling “local” Gowers uniformity norms such as by “global” Gowers uniformity norms such as . This turns out to be particularly useful when attempting to compute polynomial averages such as

for various functions . After repeated use of the van der Corput lemma, one can control such averages by expressions such as

(actually one ends up with more complicated expressions than this, but let’s use this example for sake of discussion). This can be viewed as an average of various Gowers uniformity norms of along arithmetic progressions of the form for various . Using the above Bessel inequality, this can be controlled in turn by an average of various Gowers uniformity norms along rank two generalised arithmetic progressions of the form for various . But for generic , this rank two progression is close in a certain technical sense to the “global” interval (this is ultimately due to the basic fact that two randomly chosen large integers are likely to be coprime, or at least have a small gcd). As a consequence, one can use the concatenation theorems from our first paper to control expressions such as (2) in terms of *global* Gowers uniformity norms. This is important in number theoretic applications, when one is interested in computing sums such as

or

where and are the Möbius and von Mangoldt functions respectively. This is because we are able to control global Gowers uniformity norms of such functions (thanks to results such as the proof of the inverse conjecture for the Gowers norms, the orthogonality of the Möbius function with nilsequences, and asymptotics for linear equations in primes), but much less control is currently available for local Gowers uniformity norms, even with the assistance of the generalised Riemann hypothesis (see this previous blog post for some further discussion).

By combining these tools and strategies with the “transference principle” approach from our previous paper (as improved using the recent “densification” technique of Conlon, Fox, and Zhao, discussed in this previous post), we are able in particular to establish the following result:

Theorem 4 (Polynomial patterns in the primes)Let be polynomials of degree at most , whose degree coefficients are all distinct, for some . Suppose that is admissible in the sense that for every prime , there are such that are all coprime to . Then there exist infinitely many pairs of natural numbers such that are prime.

Furthermore, we obtain an asymptotic for the number of such pairs in the range , (actually for minor technical reasons we reduce the range of to be very slightly less than ). In fact one could in principle obtain asymptotics for smaller values of , and relax the requirement that the degree coefficients be distinct with the requirement that no two of the differ by a constant, provided one had good enough local uniformity results for the Möbius or von Mangoldt functions. For instance, we can obtain an asymptotic for triplets of the form unconditionally for , and conditionally on GRH for all , using known results on primes in short intervals on average.

The case of this theorem was obtained in a previous paper of myself and Ben Green (using the aforementioned conjectures on the Gowers uniformity norm and the orthogonality of the Möbius function with nilsequences, both of which are now proven). For higher , an older result of Tamar and myself was able to tackle the case when (though our results there only give lower bounds on the number of pairs , and no asymptotics). Both of these results generalise my older theorem with Ben Green on the primes containing arbitrarily long arithmetic progressions. The theorem also extends to multidimensional polynomials, in which case there are some additional previous results; see the paper for more details. We also get a technical refinement of our previous result on narrow polynomial progressions in (dense subsets of) the primes by making the progressions just a little bit narrower in the case of the density of the set one is using is small.

There is a very nice recent paper by Lemke Oliver and Soundararajan (complete with a popular science article about it by the consistently excellent Erica Klarreich for Quanta) about a surprising (but now satisfactorily explained) bias in the distribution of pairs of consecutive primes when reduced to a small modulus .

This phenomenon is superficially similar to the more well known Chebyshev bias concerning the reduction of a single prime to a small modulus , but is in fact a rather different (and much stronger) bias than the Chebyshev bias, and seems to arise from a completely different source. The Chebyshev bias asserts, roughly speaking, that a randomly selected prime of a large magnitude will typically (though not always) be slightly more likely to be a quadratic non-residue modulo than a quadratic residue, but the bias is small (the difference in probabilities is only about for typical choices of ), and certainly consistent with known or conjectured positive results such as Dirichlet’s theorem or the generalised Riemann hypothesis. The reason for the Chebyshev bias can be traced back to the von Mangoldt explicit formula which relates the distribution of the von Mangoldt function modulo with the zeroes of the -functions with period . This formula predicts (assuming some standard conjectures like GRH) that the von Mangoldt function is quite unbiased modulo . The von Mangoldt function is *mostly* concentrated in the primes, but it also has a medium-sized contribution coming from *squares* of primes, which are of course all located in the quadratic residues modulo . (Cubes and higher powers of primes also make a small contribution, but these are quite negligible asymptotically.) To balance everything out, the contribution of the primes must then exhibit a small preference towards quadratic non-residues, and this is the Chebyshev bias. (See this article of Rubinstein and Sarnak for a more technical discussion of the Chebyshev bias, and this survey of Granville and Martin for an accessible introduction. The story of the Chebyshev bias is also related to Skewes’ number, once considered the largest explicit constant to naturally appear in a mathematical argument.)

The paper of Lemke Oliver and Soundararajan considers instead the distribution of the pairs for small and for large consecutive primes , say drawn at random from the primes comparable to some large . For sake of discussion let us just take . Then all primes larger than are either or ; Chebyshev’s bias gives a very slight preference to the latter (of order , as discussed above), but apart from this, we expect the primes to be more or less equally distributed in both classes. For instance, assuming GRH, the probability that lands in would be , and similarly for .

In view of this, one would expect that up to errors of or so, the pair should be equally distributed amongst the four options , , , , thus for instance the probability that this pair is would naively be expected to be , and similarly for the other three tuples. These assertions are not yet proven (although some non-trivial upper and lower bounds for such probabilities can be obtained from recent work of Maynard).

However, Lemke Oliver and Soundararajan argue (backed by both plausible heuristic arguments (based ultimately on the Hardy-Littlewood prime tuples conjecture), as well as substantial numerical evidence) that there is a significant bias away from the tuples and – informally, adjacent primes don’t like being in the same residue class! For instance, they predict that the probability of attaining is in fact

with similar predictions for the other three pairs (in fact they give a somewhat more precise prediction than this). The magnitude of this bias, being comparable to , is significantly stronger than the Chebyshev bias of .

One consequence of this prediction is that the prime gaps are slightly less likely to be divisible by than naive random models of the primes would predict. Indeed, if the four options , , , all occurred with equal probability , then should equal with probability , and and with probability each (as would be the case when taking the difference of two random numbers drawn from those integers not divisible by ); but the Lemke Oliver-Soundararajan bias predicts that the probability of being divisible by three should be slightly lower, being approximately .

Below the fold we will give a somewhat informal justification of (a simplified version of) this phenomenon, based on the Lemke Oliver-Soundararajan calculation using the prime tuples conjecture.

In this blog post, I would like to specialise the arguments of Bourgain, Demeter, and Guth from the previous post to the two-dimensional case of the Vinogradov main conjecture, namely

Theorem 1 (Two-dimensional Vinogradov main conjecture)One hasas .

This particular case of the main conjecture has a classical proof using some elementary number theory. Indeed, the left-hand side can be viewed as the number of solutions to the system of equations

with . These two equations can combine (using the algebraic identity applied to ) to imply the further equation

which, when combined with the divisor bound, shows that each is associated to choices of excluding diagonal cases when two of the collide, and this easily yields Theorem 1. However, the Bourgain-Demeter-Guth argument (which, in the two dimensional case, is essentially contained in a previous paper of Bourgain and Demeter) does not require the divisor bound, and extends for instance to the the more general case where ranges in a -separated set of reals between to .

In this special case, the Bourgain-Demeter argument simplifies, as the lower dimensional inductive hypothesis becomes a simple almost orthogonality claim, and the multilinear Kakeya estimate needed is also easy (collapsing to just Fubini’s theorem). Also one can work entirely in the context of the Vinogradov main conjecture, and not turn to the increased generality of decoupling inequalities (though this additional generality is convenient in higher dimensions). As such, I am presenting this special case as an introduction to the Bourgain-Demeter-Guth machinery.

We now give the specialisation of the Bourgain-Demeter argument to Theorem 1. It will suffice to establish the bound

for all , (where we keep fixed and send to infinity), as the bound then follows by combining the above bound with the trivial bound . Accordingly, for any and , we let denote the claim that

as . Clearly, for any fixed , holds for some large , and it will suffice to establish

Proposition 2Let , and let be such that holds. Then there exists (depending continuously on ) such that holds.

Indeed, this proposition shows that for , the infimum of the for which holds is zero.

We prove the proposition below the fold, using a simplified form of the methods discussed in the previous blog post. To simplify the exposition we will be a bit cavalier with the uncertainty principle, for instance by essentially ignoring the tails of rapidly decreasing functions.

Given any finite collection of elements in some Banach space , the triangle inequality tells us that

However, when the all “oscillate in different ways”, one expects to improve substantially upon the triangle inequality. For instance, if is a Hilbert space and the are mutually orthogonal, we have the Pythagorean theorem

For sake of comparison, from the triangle inequality and Cauchy-Schwarz one has the general inequality

for any finite collection in any Banach space , where denotes the cardinality of . Thus orthogonality in a Hilbert space yields “square root cancellation”, saving a factor of or so over the trivial bound coming from the triangle inequality.

More generally, let us somewhat informally say that a collection exhibits *decoupling in * if one has the Pythagorean-like inequality

for any , thus one obtains almost the full square root cancellation in the norm. The theory of *almost orthogonality* can then be viewed as the theory of decoupling in Hilbert spaces such as . In spaces for one usually does not expect this sort of decoupling; for instance, if the are disjointly supported one has

and the right-hand side can be much larger than when . At the opposite extreme, one usually does not expect to get decoupling in , since one could conceivably align the to all attain a maximum magnitude at the same location with the same phase, at which point the triangle inequality in becomes sharp.

However, in some cases one can get decoupling for certain . For instance, suppose we are in , and that are *bi-orthogonal* in the sense that the products for are pairwise orthogonal in . Then we have

giving decoupling in . (Similarly if each of the is orthogonal to all but of the other .) A similar argument also gives decoupling when one has tri-orthogonality (with the mostly orthogonal to each other), and so forth. As a slight variant, Khintchine’s inequality also indicates that decoupling should occur for any fixed if one multiplies each of the by an independent random sign .

In recent years, Bourgain and Demeter have been establishing *decoupling theorems* in spaces for various key exponents of , in the “restriction theory” setting in which the are Fourier transforms of measures supported on different portions of a given surface or curve; this builds upon the earlier decoupling theorems of Wolff. In a recent paper with Guth, they established the following decoupling theorem for the curve parameterised by the polynomial curve

For any ball in , let denote the weight

which should be viewed as a smoothed out version of the indicator function of . In particular, the space can be viewed as a smoothed out version of the space . For future reference we observe a fundamental self-similarity of the curve : any arc in this curve, with a compact interval, is affinely equivalent to the standard arc .

Theorem 1 (Decoupling theorem)Let . Subdivide the unit interval into equal subintervals of length , and for each such , let be the Fourier transformof a finite Borel measure on the arc , where . Then the exhibit decoupling in for any ball of radius .

Orthogonality gives the case of this theorem. The bi-orthogonality type arguments sketched earlier only give decoupling in up to the range ; the point here is that we can now get a much larger value of . The case of this theorem was previously established by Bourgain and Demeter (who obtained in fact an analogous theorem for any curved hypersurface). The exponent (and the radius ) is best possible, as can be seen by the following basic example. If

where is a bump function adapted to , then standard Fourier-analytic computations show that will be comparable to on a rectangular box of dimensions (and thus volume ) centred at the origin, and exhibit decay away from this box, with comparable to

On the other hand, is comparable to on a ball of radius comparable to centred at the origin, so is , which is just barely consistent with decoupling. This calculation shows that decoupling will fail if is replaced by any larger exponent, and also if the radius of the ball is reduced to be significantly smaller than .

This theorem has the following consequence of importance in analytic number theory:

Corollary 2 (Vinogradov main conjecture)Let be integers, and let . Then

*Proof:* By the Hölder inequality (and the trivial bound of for the exponential sum), it suffices to treat the critical case , that is to say to show that

We can rescale this as

As the integrand is periodic along the lattice , this is equivalent to

The left-hand side may be bounded by , where and . Since

the claim now follows from the decoupling theorem and a brief calculation.

Using the Plancherel formula, one may equivalently (when is an integer) write the Vinogradov main conjecture in terms of solutions to the system of equations

but we will not use this formulation here.

A history of the Vinogradov main conjecture may be found in this survey of Wooley; prior to the Bourgain-Demeter-Guth theorem, the conjecture was solved completely for , or for and either below or above , with the bulk of recent progress coming from the *efficient congruencing* technique of Wooley. It has numerous applications to exponential sums, Waring’s problem, and the zeta function; to give just one application, the main conjecture implies the predicted asymptotic for the number of ways to express a large number as the sum of fifth powers (the previous best result required fifth powers). The Bourgain-Demeter-Guth approach to the Vinogradov main conjecture, based on decoupling, is ostensibly very different from the efficient congruencing technique, which relies heavily on the arithmetic structure of the program, but it appears (as I have been told from second-hand sources) that the two methods are actually closely related, with the former being a sort of “Archimedean” version of the latter (with the intervals in the decoupling theorem being analogous to congruence classes in the efficient congruencing method); hopefully there will be some future work making this connection more precise. One advantage of the decoupling approach is that it generalises to non-arithmetic settings in which the set that is drawn from is replaced by some other similarly separated set of real numbers. (A random thought – could this allow the Vinogradov-Korobov bounds on the zeta function to extend to Beurling zeta functions?)

Below the fold we sketch the Bourgain-Demeter-Guth argument proving Theorem 1.

I thank Jean Bourgain and Andrew Granville for helpful discussions.

Let denote the Liouville function. The prime number theorem is equivalent to the estimate

as , that is to say that exhibits cancellation on large intervals such as . This result can be improved to give cancellation on shorter intervals. For instance, using the known zero density estimates for the Riemann zeta function, one can establish that

as if for some fixed ; I believe this result is due to Ramachandra (see also Exercise 21 of this previous blog post), and in fact one could obtain a better error term on the right-hand side that for instance gained an arbitrary power of . On the Riemann hypothesis (or the weaker density hypothesis), it was known that the could be lowered to .

Early this year, there was a major breakthrough by Matomaki and Radziwill, who (among other things) showed that the asymptotic (1) was in fact valid for *any* with that went to infinity as , thus yielding cancellation on extremely short intervals. This has many further applications; for instance, this estimate, or more precisely its extension to other “non-pretentious” bounded multiplicative functions, was a key ingredient in my recent solution of the Erdös discrepancy problem, as well as in obtaining logarithmically averaged cases of Chowla’s conjecture, such as

It is of interest to twist the above estimates by phases such as the linear phase . In 1937, Davenport showed that

which of course improves the prime number theorem. Recently with Matomaki and Radziwill, we obtained a common generalisation of this estimate with (1), showing that

as , for any that went to infinity as . We were able to use this estimate to obtain an averaged form of Chowla’s conjecture.

In that paper, we asked whether one could improve this estimate further by moving the supremum inside the integral, that is to say to establish the bound

as , for any that went to infinity as . This bound is asserting that is locally Fourier-uniform on most short intervals; it can be written equivalently in terms of the “local Gowers norm” as

from which one can see that this is another averaged form of Chowla’s conjecture (stronger than the one I was able to prove with Matomaki and Radziwill, but a consequence of the unaveraged Chowla conjecture). If one inserted such a bound into the machinery I used to solve the Erdös discrepancy problem, it should lead to further averaged cases of Chowla’s conjecture, such as

though I have not fully checked the details of this implication. It should also have a number of new implications for sign patterns of the Liouville function, though we have not explored these in detail yet.

One can write (4) equivalently in the form

uniformly for all -dependent phases . In contrast, (3) is equivalent to the subcase of (6) when the linear phase coefficient is independent of . This dependency of on seems to necessitate some highly nontrivial additive combinatorial analysis of the function in order to establish (4) when is small. To date, this analysis has proven to be elusive, but I would like to record what one can do with more classical methods like Vaughan’s identity, namely:

Proposition 1The estimate (4) (or equivalently (6)) holds in the range for any fixed . (In fact one can improve the right-hand side by an arbitrary power of in this case.)

The values of in this range are far too large to yield implications such as new cases of the Chowla conjecture, but it appears that the exponent is the limit of “classical” methods (at least as far as I was able to apply them), in the sense that one does not do any combinatorial analysis on the function , nor does one use modern equidistribution results on “Type III sums” that require deep estimates on Kloosterman-type sums. The latter may shave a little bit off of the exponent, but I don’t see how one would ever hope to go below without doing some non-trivial combinatorics on the function . UPDATE: I have come across this paper of Zhan which uses mean-value theorems for L-functions to lower the exponent to .

Let me now sketch the proof of the proposition, omitting many of the technical details. We first remark that known estimates on sums of the Liouville function (or similar functions such as the von Mangoldt function) in short arithmetic progressions, based on zero-density estimates for Dirichlet -functions, can handle the “major arc” case of (4) (or (6)) where is restricted to be of the form for (the exponent here being of the same numerology as the exponent in the classical result of Ramachandra, tied to the best zero density estimates currently available); for instance a modification of the arguments in this recent paper of Koukoulopoulos would suffice. Thus we can restrict attention to “minor arc” values of (or , using the interpretation of (6)).

Next, one breaks up (or the closely related Möbius function) into Dirichlet convolutions using one of the standard identities (e.g. Vaughan’s identity or Heath-Brown’s identity), as discussed for instance in this previous post (which is focused more on the von Mangoldt function, but analogous identities exist for the Liouville and Möbius functions). The exact choice of identity is not terribly important, but the upshot is that can be decomposed into terms, each of which is either of the “Type I” form

for some coefficients that are roughly of logarithmic size on the average, and scales with and , or else of the “Type II” form

for some coefficients that are roughly of logarithmic size on the average, and scales with and . As discussed in the previous post, the exponent is a natural barrier in these identities if one is unwilling to also consider “Type III” type terms which are roughly of the shape of the third divisor function .

A Type I sum makes a contribution to that can be bounded (via Cauchy-Schwarz) in terms of an expression such as

The inner sum exhibits a lot of cancellation unless is within of an integer. (Here, “a lot” should be loosely interpreted as “gaining many powers of over the trivial bound”.) Since is significantly larger than , standard Vinogradov-type manipulations (see e.g. Lemma 13 of these previous notes) show that this bad case occurs for many only when is “major arc”, which is the case we have specifically excluded. This lets us dispose of the Type I contributions.

A Type II sum makes a contribution to roughly of the form

We can break this up into a number of sums roughly of the form

for ; note that the range is non-trivial because is much larger than . Applying the usual bilinear sum Cauchy-Schwarz methods (e.g. Theorem 14 of these notes) we conclude that there is a lot of cancellation unless one has for some . But with , is well below the threshold for the definition of major arc, so we can exclude this case and obtain the required cancellation.

A basic estimate in multiplicative number theory (particularly if one is using the Granville-Soundararajan “pretentious” approach to this subject) is the following inequality of Halasz (formulated here in a quantitative form introduced by Montgomery and Tenenbaum).

Theorem 1 (Halasz inequality)Let be a multiplicative function bounded in magnitude by , and suppose that , , and are such that

As a qualitative corollary, we conclude (by standard compactness arguments) that if

as . In the more recent work of this paper of Granville and Soundararajan, the sharper bound

is obtained (with a more precise description of the term).

The usual proofs of Halasz’s theorem are somewhat lengthy (though there has been a recent simplification, in forthcoming work of Granville, Harper, and Soundarajan). Below the fold I would like to give a relatively short proof of the following “cheap” version of the inequality, which has slightly weaker quantitative bounds, but still suffices to give qualitative conclusions such as (2).

Theorem 2 (Cheap Halasz inequality)Let be a multiplicative function bounded in magnitude by . Let and , and suppose that is sufficiently large depending on . If (1) holds for all , then

The non-optimal exponent can probably be improved a bit by being more careful with the exponents, but I did not try to optimise it here. A similar bound appears in the first paper of Halasz on this topic.

The idea of the argument is to split as a Dirichlet convolution where is the portion of coming from “small”, “medium”, and “large” primes respectively (with the dividing line between the three types of primes being given by various powers of ). Using a Perron-type formula, one can express this convolution in terms of the product of the Dirichlet series of respectively at various complex numbers with . One can use based estimates to control the Dirichlet series of , while using the hypothesis (1) one can get estimates on the Dirichlet series of . (This is similar to the Fourier-analytic approach to ternary additive problems, such as Vinogradov’s theorem on representing large odd numbers as the sum of three primes.) This idea was inspired by a similar device used in the work of Granville, Harper, and Soundarajan. A variant of this argument also appears in unpublished work of Adam Harper.

I thank Andrew Granville for helpful comments which led to significant simplifications of the argument.

Kevin Ford, James Maynard, and I have uploaded to the arXiv our preprint “Chains of large gaps between primes“. This paper was announced in our previous paper with Konyagin and Green, which was concerned with the largest gap

between consecutive primes up to , in which we improved the Rankin bound of

to

for large (where we use the abbreviations , , and ). Here, we obtain an analogous result for the quantity

which measures how far apart the gaps between chains of consecutive primes can be. Our main result is

whenever is sufficiently large depending on , with the implied constant here absolute (and effective). The factor of is inherent to the method, and related to the basic probabilistic fact that if one selects numbers at random from the unit interval , then one expects the minimum gap between adjacent numbers to be about (i.e. smaller than the mean spacing of by an additional factor of ).

Our arguments combine those from the previous paper with the matrix method of Maier, who (in our notation) showed that

for an infinite sequence of going to infinity. (Maier needed to restrict to an infinite sequence to avoid Siegel zeroes, but we are able to resolve this issue by the now standard technique of simply eliminating a prime factor of an exceptional conductor from the sieve-theoretic portion of the argument. As a byproduct, this also makes all of the estimates in our paper effective.)

As its name suggests, the Maier matrix method is usually presented by imagining a matrix of numbers, and using information about the distribution of primes in the columns of this matrix to deduce information about the primes in at least one of the rows of the matrix. We found it convenient to interpret this method in an equivalent probabilistic form as follows. Suppose one wants to find an interval which contained a block of at least primes, each separated from each other by at least (ultimately, will be something like and something like ). One can do this by the probabilistic method: pick to be a random large natural number (with the precise distribution to be chosen later), and try to lower bound the probability that the interval contains at least primes, no two of which are within of each other.

By carefully choosing the residue class of with respect to small primes, one can eliminate several of the from consideration of being prime immediately. For instance, if is chosen to be large and even, then the with even have no chance of being prime and can thus be eliminated; similarly if is large and odd, then cannot be prime for any odd . Using the methods of our previous paper, we can find a residue class (where is a product of a large number of primes) such that, if one chooses to be a large random element of (that is, for some large random integer ), then the set of shifts for which still has a chance of being prime has size comparable to something like ; furthermore this set is fairly well distributed in in the sense that it does not concentrate too strongly in any short subinterval of . The main new difficulty, not present in the previous paper, is to get *lower* bounds on the size of in addition to upper bounds, but this turns out to be achievable by a suitable modification of the arguments.

Using a version of the prime number theorem in arithmetic progressions due to Gallagher, one can show that for each remaining shift , is going to be prime with probability comparable to , so one expects about primes in the set . An upper bound sieve (e.g. the Selberg sieve) also shows that for any distinct , the probability that and are both prime is . Using this and some routine second moment calculations, one can then show that with large probability, the set will indeed contain about primes, no two of which are closer than to each other; with no other numbers in this interval being prime, this gives a lower bound on .

## Recent Comments