You are currently browsing the category archive for the ‘math.NT’ category.

Let be the Liouville function, thus is defined to equal when is the product of an even number of primes, and when is the product of an odd number of primes. The Chowla conjecture asserts that has the statistics of a random sign pattern, in the sense that

for all and all distinct natural numbers , where we use the averaging notation

For , this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any .

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

of the conjecture, where we use the logarithmic averaging notation

Using the summation by parts (or telescoping series) identity

it is not difficult to show that the Chowla conjecture (1) for a given implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for , we have already mentioned that the Chowla conjecture

is equivalent to the prime number theorem; but the logarithmically averaged analogue

is significantly easier to show (a proof with the Liouville function replaced by the closely related Möbius function is given in this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for , and in this recent paper with Joni Teravainen, we proved the conjecture for all odd (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1Assume that the logarithmically averaged Chowla conjecture (2) is true for all . Then there exists a sequence going to infinity such that the Chowla conjecture (1) is true for all along that sequence, that is to sayfor all and all distinct .

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2Let be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for . Then there exists a set of natural numbers of logarithmic density (that is, ) such thatfor any distinct .

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture ( and odd ) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct , we take a large number and consider the limiting the second moment

We can expand this as

If all the are distinct, the hypothesis (2) tells us that the inner averages goes to zero as . The remaining averages are , and there are of these averages. We conclude that

By Markov’s inequality (and (3)), we conclude that for any fixed , there exists a set of upper logarithmic density at least , thus

such that

By deleting at most finitely many elements, we may assume that consists only of elements of size at least (say).

For any , if we let be the union of for , then has logarithmic density . By a diagonalisation argument (using the fact that the set of tuples is countable), we can then find a set of natural numbers of logarithmic density , such that for every , every sufficiently large element of lies in . Thus for every sufficiently large in , one has

for some with . By Cauchy-Schwarz, this implies that

interchanging the sums and using and , this implies that

We conclude on taking to infinity that

as required.

Joni Teräväinen and I have just uploaded to the arXiv our paper “Odd order cases of the logarithmically averaged Chowla conjecture“, submitted to J. Numb. Thy. Bordeaux. This paper gives an alternate route to one of the main results of our previous paper, and more specifically reproves the asymptotic

for all odd and all integers (that is to say, all the odd order cases of the logarithmically averaged Chowla conjecture). Our previous argument relies heavily on some deep ergodic theory results of Bergelson-Host-Kra, Leibman, and Le (and was applicable to more general multiplicative functions than the Liouville function ); here we give a shorter proof that avoids ergodic theory (but instead requires the Gowers uniformity of the (W-tricked) von Mangoldt function, established in several papers of Ben Green, Tamar Ziegler, and myself). The proof follows the lines sketched in the previous blog post. In principle, due to the avoidance of ergodic theory, the arguments here have a greater chance to be made quantitative; however, at present the known bounds on the Gowers uniformity of the von Mangoldt function are qualitative, except at the level, which is unfortunate since the first non-trivial odd case requires quantitative control on the level. (But it may be possible to make the Gowers uniformity bounds for quantitative if one assumes GRH, although when one puts everything together, the actual decay rate obtained in (1) is likely to be poor.)

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of logarithmically averaged correlations of multiplicative functions, with applications to the Chowla and Elliott conjectures“, submitted to Duke Mathematical Journal. This paper builds upon my previous paper in which I introduced an “entropy decrement method” to prove the two-point (logarithmically averaged) cases of the Chowla and Elliott conjectures. A bit more specifically, I showed that

whenever were sequences going to infinity, were distinct integers, and were -bounded multiplicative functions which were *non-pretentious* in the sense that

for all Dirichlet characters and for . Thus, for instance, one had the logarithmically averaged two-point Chowla conjecture

for fixed any non-zero , where was the Liouville function.

One would certainly like to extend these results to higher order correlations than the two-point correlations. This looks to be difficult (though perhaps not completely impossible if one allows for logarithmic averaging): in a previous paper I showed that achieving this in the context of the Liouville function would be equivalent to resolving the logarithmically averaged Sarnak conjecture, as well as establishing logarithmically averaged local Gowers uniformity of the Liouville function. However, in this paper we are able to avoid having to resolve these difficult conjectures to obtain partial results towards the (logarithmically averaged) Chowla and Elliott conjecture. For the Chowla conjecture, we can obtain all odd order correlations, in that

for all odd and all integers (which, in the odd order case, are no longer required to be distinct). (Superficially, this looks like we have resolved “half of the cases” of the logarithmically averaged Chowla conjecture; but it seems the odd order correlations are significantly easier than the even order ones. For instance, because of the Katai-Bourgain-Sarnak-Ziegler criterion, one can basically deduce the odd order cases of (2) from the even order cases (after allowing for some dilations in the argument ).

For the more general Elliott conjecture, we can show that

for any , any integers and any bounded multiplicative functions , unless the product *weakly pretends to be a Dirichlet character * in the sense that

This can be seen to imply (2) as a special case. Even when *does* pretend to be a Dirichlet character , we can still say something: if the limits

exist for each (which can be guaranteed if we pass to a suitable subsequence), then is the uniform limit of periodic functions , each of which is –isotypic in the sense that whenever are integers with coprime to the periods of and . This does not pin down the value of any single correlation , but does put significant constraints on how these correlations may vary with .

Among other things, this allows us to show that all possible length four sign patterns of the Liouville function occur with positive density, and all possible length four sign patterns occur with the conjectured logarithmic density. (In a previous paper with Matomaki and Radziwill, we obtained comparable results for length three patterns of Liouville and length two patterns of Möbius.)

To describe the argument, let us focus for simplicity on the case of the Liouville correlations

assuming for sake of discussion that all limits exist. (In the paper, we instead use the device of generalised limits, as discussed in this previous post.) The idea is to combine together two rather different ways to control this function . The first proceeds by the entropy decrement method mentioned earlier, which roughly speaking works as follows. Firstly, we pick a prime and observe that for any , which allows us to rewrite (3) as

Making the change of variables , we obtain

The difference between and is negligible in the limit (here is where we crucially rely on the log-averaging), hence

and thus by (3) we have

The entropy decrement argument can be used to show that the latter limit is small for most (roughly speaking, this is because the factors behave like independent random variables as varies, so that concentration of measure results such as Hoeffding’s inequality can apply, after using entropy inequalities to decouple somewhat these random variables from the factors). We thus obtain the approximate isotopy property

On the other hand, by the Furstenberg correspondence principle (as discussed in these previous posts), it is possible to express as a multiple correlation

for some probability space equipped with a measure-preserving invertible map . Using results of Bergelson-Host-Kra, Leibman, and Le, this allows us to obtain a decomposition of the form

where is a nilsequence, and goes to zero in density (even along the primes, or constant multiples of the primes). The original work of Bergelson-Host-Kra required ergodicity on , which is very definitely a hypothesis that is not available here; however, the later work of Leibman removed this hypothesis, and the work of Le refined the control on so that one still has good control when restricting to primes, or constant multiples of primes.

Ignoring the small error , we can now combine (5) to conclude that

Using the equidistribution theory of nilsequences (as developed in this previous paper of Ben Green and myself), one can break up further into a periodic piece and an “irrational” or “minor arc” piece . The contribution of the minor arc piece can be shown to mostly cancel itself out after dilating by primes and averaging, thanks to Vinogradov-type bilinear sum estimates (transferred to the primes). So we end up with

which already shows (heuristically, at least) the claim that can be approximated by periodic functions which are isotopic in the sense that

But if is odd, one can use Dirichlet’s theorem on primes in arithmetic progressions to restrict to primes that are modulo the period of , and conclude now that vanishes identically, which (heuristically, at least) gives (2).

The same sort of argument works to give the more general bounds on correlations of bounded multiplicative functions. But for the specific task of proving (2), we initially used a slightly different argument that avoids using the ergodic theory machinery of Bergelson-Host-Kra, Leibman, and Le, but replaces it instead with the Gowers uniformity norm theory used to count linear equations in primes. Basically, by averaging (4) in using the “-trick”, as well as known facts about the Gowers uniformity of the von Mangoldt function, one can obtain an approximation of the form

where ranges over a large range of integers coprime to some primorial . On the other hand, by iterating (4) we have

for most semiprimes , and by again averaging over semiprimes one can obtain an approximation of the form

For odd, one can combine the two approximations to conclude that . (This argument is not given in the current paper, but we plan to detail it in a subsequent one.)

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions I. Long shift ranges“, submitted to Proceedings of the London Mathematical Society. This paper is concerned with the estimation of correlations such as

for medium-sized and large , where is the von Mangoldt function; we also consider variants of this sum in which one of the von Mangoldt functions is replaced with a (higher order) divisor function, but for sake of discussion let us focus just on the sum (1). Understanding this sum is very closely related to the problem of finding pairs of primes that differ by ; for instance, if one could establish a lower bound

then this would easily imply the twin prime conjecture.

The (first) Hardy-Littlewood conjecture asserts an asymptotic

as for any fixed positive , where the *singular series* is an arithmetic factor arising from the irregularity of distribution of at small moduli, defined explicitly by

when is even, and when is odd, where

is (half of) the twin prime constant. See for instance this previous blog post for a a heuristic explanation of this conjecture. From the previous discussion we see that (2) for would imply the twin prime conjecture. Sieve theoretic methods are only able to provide an upper bound of the form .

Needless to say, apart from the trivial case of odd , there are no values of for which the Hardy-Littlewood conjecture is known. However there are some results that say that this conjecture holds “on the average”: in particular, if is a quantity depending on that is somewhat large, there are results that show that (2) holds for most (i.e. for ) of the betwen and . Ideally one would like to get as small as possible, in particular one can view the full Hardy-Littlewood conjecture as the endpoint case when is bounded.

The first results in this direction were by van der Corput and by Lavrik, who established such a result with (with a subsequent refinement by Balog); Wolke lowered to , and Mikawa lowered further to . The main result of this paper is a further lowering of to . In fact (as in the preceding works) we get a better error term than , namely an error of the shape for any .

Our arguments initially proceed along standard lines. One can use the Hardy-Littlewood circle method to express the correlation in (2) as an integral involving exponential sums . The contribution of “major arc” is known by a standard computation to recover the main term plus acceptable errors, so it is a matter of controlling the “minor arcs”. After averaging in and using the Plancherel identity, one is basically faced with establishing a bound of the form

for any “minor arc” . If is somewhat close to a low height rational (specifically, if it is within of such a rational with ), then this type of estimate is roughly of comparable strength (by another application of Plancherel) to the best available prime number theorem in short intervals on the average, namely that the prime number theorem holds for most intervals of the form , and we can handle this case using standard mean value theorems for Dirichlet series. So we can restrict attention to the “strongly minor arc” case where is far from such rationals.

The next step (following some ideas we found in a paper of Zhan) is to rewrite this estimate not in terms of the exponential sums , but rather in terms of the Dirichlet polynomial . After a certain amount of computation (including some oscillatory integral estimates arising from stationary phase), one is eventually reduced to the task of establishing an estimate of the form

for any (with sufficiently large depending on ).

The next step, which is again standard, is the use of the Heath-Brown identity (as discussed for instance in this previous blog post) to split up into a number of components that have a Dirichlet convolution structure. Because the exponent we are shooting for is less than , we end up with five types of components that arise, which we call “Type “, “Type “, “Type “, “Type “, and “Type II”. The “Type II” sums are Dirichlet convolutions involving a factor supported on a range and is quite easy to deal with; the “Type ” terms are Dirichlet convolutions that resemble (non-degenerate portions of) the divisor function, formed from convolving together portions of . The “Type ” and “Type ” terms can be estimated satisfactorily by standard moment estimates for Dirichlet polynomials; this already recovers the result of Mikawa (and our argument is in fact slightly more elementary in that no Kloosterman sum estimates are required). It is the treatment of the “Type ” and “Type ” sums that require some new analysis, with the Type terms turning to be the most delicate. After using an existing moment estimate of Jutila for Dirichlet L-functions, matters reduce to obtaining a family of estimates, a typical one of which (relating to the more difficult Type sums) is of the form

for “typical” ordinates of size , where is the Dirichlet polynomial (a fragment of the Riemann zeta function). The precise definition of “typical” is a little technical (because of the complicated nature of Jutila’s estimate) and will not be detailed here. Such a claim would follow easily from the Lindelof hypothesis (which would imply that ) but of course we would like to have an unconditional result.

At this point, having exhausted all the Dirichlet polynomial estimates that are usefully available, we return to “physical space”. Using some further Fourier-analytic and oscillatory integral computations, we can estimate the left-hand side of (3) by an expression that is roughly of the shape

The phase can be Taylor expanded as the sum of and a lower order term , plus negligible errors. If we could discard the lower order term then we would get quite a good bound using the exponential sum estimates of Robert and Sargos, which control averages of exponential sums with purely monomial phases, with the averaging allowing us to exploit the hypothesis that is “typical”. Figuring out how to get rid of this lower order term caused some inefficiency in our arguments; the best we could do (after much experimentation) was to use Fourier analysis to shorten the sums, estimate a one-parameter average exponential sum with a binomial phase by a two-parameter average with a monomial phase, and then use the van der Corput process followed by the estimates of Robert and Sargos. This rather complicated procedure works up to it may be possible that some alternate way to proceed here could improve the exponent somewhat.

In a sequel to this paper, we will use a somewhat different method to reduce to a much smaller value of , but only if we replace the correlations by either or , and also we now only save a in the error term rather than .

Given a function on the natural numbers taking values in , one can invoke the Furstenberg correspondence principle to locate a measure preserving system – a probability space together with a measure-preserving shift (or equivalently, a measure-preserving -action on ) – together with a measurable function (or “observable”) that has essentially the same statistics as in the sense that

for any integers . In particular, one has

whenever the limit on the right-hand side exists. We will refer to the system together with the designated function as a *Furstenberg limit* ot the sequence . These Furstenberg limits capture some, but not all, of the asymptotic behaviour of ; roughly speaking, they control the typical “local” behaviour of , involving correlations such as in the regime where are much smaller than . However, the control on error terms here is usually only qualitative at best, and one usually does not obtain non-trivial control on correlations in which the are allowed to grow at some significant rate with (e.g. like some power of ).

The correspondence principle is discussed in these previous blog posts. One way to establish the principle is by introducing a Banach limit that extends the usual limit functional on the subspace of consisting of convergent sequences while still having operator norm one. Such functionals cannot be constructed explicitly, but can be proven to exist (non-constructively and non-uniquely) using the Hahn-Banach theorem; one can also use a non-principal ultrafilter here if desired. One can then seek to construct a system and a measurable function for which one has the statistics

for all . One can explicitly construct such a system as follows. One can take to be the Cantor space with the product -algebra and the shift

with the function being the coordinate function at zero:

(so in particular for any ). The only thing remaining is to construct the invariant measure . In order to be consistent with (2), one must have

for any distinct integers and signs . One can check that this defines a premeasure on the Boolean algebra of defined by cylinder sets, and the existence of then follows from the Hahn-Kolmogorov extension theorem (or the closely related Kolmogorov extension theorem). One can then check that the correspondence (2) holds, and that is translation-invariant; the latter comes from the translation invariance of the (Banach-)Césaro averaging operation . A variant of this construction shows that the Furstenberg limit is unique up to equivalence if and only if all the limits appearing in (1) actually exist.

One can obtain a slightly tighter correspondence by using a smoother average than the Césaro average. For instance, one can use the logarithmic Césaro averages in place of the Césaro average , thus one replaces (2) by

Whenever the Césaro average of a bounded sequence exists, then the logarithmic Césaro average exists and is equal to the Césaro average. Thus, a Furstenberg limit constructed using logarithmic Banach-Césaro averaging still obeys (1) for all when the right-hand side limit exists, but also obeys the more general assertion

whenever the limit of the right-hand side exists.

In a recent paper of Frantizinakis, the Furstenberg limits of the Liouville function (with logarithmic averaging) were studied. Some (but not all) of the known facts and conjectures about the Liouville function can be interpreted in the Furstenberg limit. For instance, in a recent breakthrough result of Matomaki and Radziwill (discussed previously here), it was shown that the Liouville function exhibited cancellation on short intervals in the sense that

In terms of Furstenberg limits of the Liouville function, this assertion is equivalent to the assertion that

for all Furstenberg limits of Liouville (including those without logarithmic averaging). Invoking the mean ergodic theorem (discussed in this previous post), this assertion is in turn equivalent to the observable that corresponds to the Liouville function being orthogonal to the invariant factor of ; equivalently, the first Gowers-Host-Kra seminorm of (as defined for instance in this previous post) vanishes. The Chowla conjecture, which asserts that

for all distinct integers , is equivalent to the assertion that all the Furstenberg limits of Liouville are equivalent to the Bernoulli system ( with the product measure arising from the uniform distribution on , with the shift and observable as before). Similarly, the logarithmically averaged Chowla conjecture

is equivalent to the assertion that all the Furstenberg limits of Liouville with logarithmic averaging are equivalent to the Bernoulli system. Recently, I was able to prove the two-point version

of the logarithmically averaged Chowla conjecture, for any non-zero integer ; this is equivalent to the perfect strong mixing property

for any Furstenberg limit of Liouville with logarithmic averaging, and any .

The situation is more delicate with regards to the Sarnak conjecture, which is equivalent to the assertion that

for any zero-entropy sequence (see this previous blog post for more discussion). Morally speaking, this conjecture should be equivalent to the assertion that any Furstenberg limit of Liouville is disjoint from any zero entropy system, but I was not able to formally establish an implication in either direction due to some technical issues regarding the fact that the Furstenberg limit does not directly control long-range correlations, only short-range ones. (There are however ergodic theoretic interpretations of the Sarnak conjecture that involve the notion of generic points; see this paper of El Abdalaoui, Lemancyk, and de la Rue.) But the situation is currently better with the logarithmically averaged Sarnak conjecture

as I was able to show that this conjecture was equivalent to the logarithmically averaged Chowla conjecture, and hence to all Furstenberg limits of Liouville with logarithmic averaging being Bernoulli; I also showed the conjecture was equivalent to local Gowers uniformity of the Liouville function, which is in turn equivalent to the function having all Gowers-Host-Kra seminorms vanishing in every Furstenberg limit with logarithmic averaging. In this recent paper of Frantzikinakis, this analysis was taken further, showing that the logarithmically averaged Chowla and Sarnak conjectures were in fact equivalent to the much milder seeming assertion that all Furstenberg limits with logarithmic averaging were ergodic.

Actually, the logarithmically averaged Furstenberg limits have more structure than just a -action on a measure preserving system with a single observable . Let denote the semigroup of affine maps on the integers with and positive. Also, let denote the profinite integers (the inverse limit of the cyclic groups ). Observe that acts on by taking the inverse limit of the obvious actions of on .

Proposition 1 (Enriched logarithmically averaged Furstenberg limit of Liouville)Let be a Banach limit. Then there exists a probability space with an action of the affine semigroup , as well as measurable functions and , with the following properties:

- (i) (Affine Furstenberg limit) For any , and any congruence class , one has
- (ii) (Equivariance of ) For any , one has
for -almost every .

- (iii) (Multiplicativity at fixed primes) For any prime , one has
for -almost every , where is the dilation map .

- (iv) (Measure pushforward) If is of the form and is the set , then the pushforward of by is equal to , that is to say one has
for every measurable .

Note that can be viewed as the subgroup of consisting of the translations . If one only keeps the -portion of the action and forgets the rest (as well as the function ) then the action becomes measure-preserving, and we recover an ordinary Furstenberg limit with logarithmic averaging. However, the additional structure here can be quite useful; for instance, one can transfer the proof of (3) to this setting, which we sketch below the fold, after proving the proposition.

The observable , roughly speaking, means that points in the Furstenberg limit constructed by this proposition are still “virtual integers” in the sense that one can meaningfully compute the residue class of modulo any natural number modulus , by first applying and then reducing mod . The action of means that one can also meaningfully multiply by any natural number, and translate it by any integer. As with other applications of the correspondence principle, the main advantage of moving to this more “virtual” setting is that one now acquires a probability measure , so that the tools of ergodic theory can be readily applied.

Given a random variable that takes on only finitely many values, we can define its Shannon entropy by the formula

with the convention that . (In some texts, one uses the logarithm to base rather than the natural logarithm, but the choice of base will not be relevant for this discussion.) This is clearly a nonnegative quantity. Given two random variables taking on finitely many values, the joint variable is also a random variable taking on finitely many values, and also has an entropy . It obeys the *Shannon inequalities*

so we can define some further nonnegative quantities, the mutual information

and the conditional entropies

More generally, given three random variables , one can define the conditional mutual information

and the final of the Shannon entropy inequalities asserts that this quantity is also non-negative.

The mutual information is a measure of the extent to which and fail to be independent; indeed, it is not difficult to show that vanishes if and only if and are independent. Similarly, vanishes if and only if and are *conditionally* independent relative to . At the other extreme, is a measure of the extent to which fails to depend on ; indeed, it is not difficult to show that if and only if is determined by in the sense that there is a deterministic function such that . In a related vein, if and are equivalent in the sense that there are deterministic functional relationships , between the two variables, then is interchangeable with for the purposes of computing the above quantities, thus for instance , , , , etc..

One can get some initial intuition for these information-theoretic quantities by specialising to a simple situation in which all the random variables being considered come from restricting a single random (and uniformly distributed) boolean function on a given finite domain to some subset of :

In this case, has the law of a random uniformly distributed boolean function from to , and the entropy here can be easily computed to be , where denotes the cardinality of . If is the restriction of to , and is the restriction of to , then the joint variable is equivalent to the restriction of to . If one discards the normalisation factor , one then obtains the following dictionary between entropy and the combinatorics of finite sets:

Random variables | Finite sets |

Entropy | Cardinality |

Joint variable | Union |

Mutual information | Intersection cardinality |

Conditional entropy | Set difference cardinality |

Conditional mutual information | |

independent | disjoint |

determined by | a subset of |

conditionally independent relative to |

Every (linear) inequality or identity about entropy (and related quantities, such as mutual information) then specialises to a combinatorial inequality or identity about finite sets that is easily verified. For instance, the Shannon inequality becomes the union bound , and the definition of mutual information becomes the inclusion-exclusion formula

For a more advanced example, consider the data processing inequality that asserts that if are conditionally independent relative to , then . Specialising to sets, this now says that if are disjoint outside of , then ; this can be made apparent by considering the corresponding Venn diagram. This dictionary also suggests how to *prove* the data processing inequality using the existing Shannon inequalities. Firstly, if and are not necessarily disjoint outside of , then a consideration of Venn diagrams gives the more general inequality

and a further inspection of the diagram then reveals the more precise identity

Using the dictionary in the reverse direction, one is then led to conjecture the identity

which (together with non-negativity of conditional mutual information) implies the data processing inequality, and this identity is in turn easily established from the definition of mutual information.

On the other hand, not every assertion about cardinalities of sets generalises to entropies of random variables that are not arising from restricting random boolean functions to sets. For instance, a basic property of sets is that disjointness from a given set is preserved by unions:

Indeed, one has the union bound

Applying the dictionary in the reverse direction, one might now conjecture that if was independent of and was independent of , then should also be independent of , and furthermore that

but these statements are well known to be false (for reasons related to pairwise independence of random variables being strictly weaker than joint independence). For a concrete counterexample, one can take to be independent, uniformly distributed random elements of the finite field of two elements, and take to be the sum of these two field elements. One can easily check that each of and is separately independent of , but the joint variable determines and thus is not independent of .

From the inclusion-exclusion identities

one can check that (1) is equivalent to the trivial lower bound . The basic issue here is that in the dictionary between entropy and combinatorics, there is no satisfactory entropy analogue of the notion of a triple intersection . (Even the double intersection only exists information theoretically in a “virtual” sense; the mutual information allows one to “compute the entropy” of this “intersection”, but does not actually describe this intersection itself as a random variable.)

However, this issue only arises with three or more variables; it is not too difficult to show that the only linear equalities and inequalities that are necessarily obeyed by the information-theoretic quantities associated to just two variables are those that are also necessarily obeyed by their combinatorial analogues . (See for instance the Venn diagram at the Wikipedia page for mutual information for a pictorial summation of this statement.)

One can work with a larger class of special cases of Shannon entropy by working with random *linear* functions rather than random *boolean* functions. Namely, let be some finite-dimensional vector space over a finite field , and let be a random linear functional on , selected uniformly among all such functions. Every subspace of then gives rise to a random variable formed by restricting to . This random variable is also distributed uniformly amongst all linear functions on , and its entropy can be easily computed to be . Given two random variables formed by restricting to respectively, the joint random variable determines the random linear function on the union on the two spaces, and thus by linearity on the Minkowski sum as well; thus is equivalent to the restriction of to . In particular, . This implies that and also , where is the quotient map. After discarding the normalising constant , this leads to the following dictionary between information theoretic quantities and linear algebra quantities, analogous to the previous dictionary:

Random variables | Subspaces |

Entropy | Dimension |

Joint variable | Sum |

Mutual information | Dimension of intersection |

Conditional entropy | Dimension of projection |

Conditional mutual information | |

independent | transverse () |

determined by | a subspace of |

conditionally independent relative to | , transverse. |

The combinatorial dictionary can be regarded as a specialisation of the linear algebra dictionary, by taking to be the vector space over the finite field of two elements, and only considering those subspaces that are coordinate subspaces associated to various subsets of .

As before, every linear inequality or equality that is valid for the information-theoretic quantities discussed above, is automatically valid for the linear algebra counterparts for subspaces of a vector space over a finite field by applying the above specialisation (and dividing out by the normalising factor of ). In fact, the requirement that the field be finite can be removed by applying the compactness theorem from logic (or one of its relatives, such as Los’s theorem on ultraproducts, as done in this previous blog post).

The linear algebra model captures more of the features of Shannon entropy than the combinatorial model. For instance, in contrast to the combinatorial case, it is possible in the linear algebra setting to have subspaces such that and are separately transverse to , but their sum is not; for instance, in a two-dimensional vector space , one can take to be the one-dimensional subspaces spanned by , , and respectively. Note that this is essentially the same counterexample from before (which took to be the field of two elements). Indeed, one can show that any necessarily true linear inequality or equality involving the dimensions of three subspaces (as well as the various other quantities on the above table) will also be necessarily true when applied to the entropies of three discrete random variables (as well as the corresponding quantities on the above table).

However, the linear algebra model does not completely capture the subtleties of Shannon entropy once one works with *four* or more variables (or subspaces). This was first observed by Ingleton, who established the dimensional inequality

for any subspaces . This is easiest to see when the three terms on the right-hand side vanish; then are transverse, which implies that ; similarly . But and are transverse, and this clearly implies that and are themselves transverse. To prove the general case of Ingleton’s inequality, one can define and use (and similarly for instead of ) to reduce to establishing the inequality

which can be rearranged using (and similarly for instead of ) and as

but this is clear since .

Returning to the entropy setting, the analogue

of (3) is true (exercise!), but the analogue

of Ingleton’s inequality is false in general. Again, this is easiest to see when all the terms on the right-hand side vanish; then are conditionally independent relative to , and relative to , and and are independent, and the claim (4) would then be asserting that and are independent. While there is no linear counterexample to this statement, there are simple non-linear ones: for instance, one can take to be independent uniform variables from , and take and to be (say) and respectively (thus are the indicators of the events and respectively). Once one conditions on either or , one of has positive conditional entropy and the other has zero entropy, and so are conditionally independent relative to either or ; also, or are independent of each other. But and are not independent of each other (they cannot be simultaneously equal to ). Somehow, the feature of the linear algebra model that is not present in general is that in the linear algebra setting, every pair of subspaces has a well-defined intersection that is also a subspace, whereas for arbitrary random variables , there does not necessarily exist the analogue of an intersection, namely a “common information” random variable that has the entropy of and is determined either by or by .

I do not know if there is any simpler model of Shannon entropy that captures all the inequalities available for four variables. One significant complication is that there exist some information inequalities in this setting that are not of Shannon type, such as the Zhang-Yeung inequality

One can however still use these simpler models of Shannon entropy to be able to guess arguments that would work for general random variables. An example of this comes from my paper on the logarithmically averaged Chowla conjecture, in which I showed among other things that

whenever was sufficiently large depending on , where is the Liouville function. The information-theoretic part of the proof was as follows. Given some intermediate scale between and , one can form certain random variables . The random variable is a sign pattern of the form where is a random number chosen from to (with logarithmic weighting). The random variable was tuple of reductions of to primes comparable to . Roughly speaking, what was implicitly shown in the paper (after using the multiplicativity of , the circle method, and the Matomaki-Radziwill theorem on short averages of multiplicative functions) is that if the inequality (5) fails, then there was a lower bound

on the mutual information between and . From translation invariance, this also gives the more general lower bound

for any , where denotes the shifted sign pattern . On the other hand, one had the entropy bounds

and from concatenating sign patterns one could see that is equivalent to the joint random variable for any . Applying these facts and using an “entropy decrement” argument, I was able to obtain a contradiction once was allowed to become sufficiently large compared to , but the bound was quite weak (coming ultimately from the unboundedness of as the interval of values of under consideration becomes large), something of the order of ; the quantity needs at various junctures to be less than a small power of , so the relationship between and becomes essentially quadruple exponential in nature, . The basic strategy was to observe that the lower bound (6) causes some slowdown in the growth rate of the mean entropy, in that this quantity decreased by as increased from to , basically by dividing into components , and observing from (6) each of these shares a bit of common information with the same variable . This is relatively clear when one works in a set model, in which is modeled by a set of size , and is modeled by a set of the form

for various sets of size (also there is some translation symmetry that maps to a shift while preserving all of the ).

However, on considering the set model recently, I realised that one can be a little more efficient by exploiting the fact (basically the Chinese remainder theorem) that the random variables are basically jointly independent as ranges over dyadic values that are much smaller than , which in the set model corresponds to the all being disjoint. One can then establish a variant

of (6), which in the set model roughly speaking asserts that each claims a portion of the of cardinality that is not claimed by previous choices of . This leads to a more efficient contradiction (relying on the unboundedness of rather than ) that looks like it removes one order of exponential growth, thus the relationship between and is now . Returning to the entropy model, one can use (7) and Shannon inequalities to establish an inequality of the form

for a small constant , which on iterating and using the boundedness of gives the claim. (A modification of this analysis, at least on the level of the back of the envelope calculation, suggests that the Matomaki-Radziwill theorem is needed only for ranges greater than or so, although at this range the theorem is not significantly simpler than the general case).

I’ve just uploaded to the arXiv my paper “Some remarks on the lonely runner conjecture“, submitted to Contributions to discrete mathematics. I had blogged about the lonely runner conjecture in this previous blog post, and I returned to the problem recently to see if I could obtain anything further. The results obtained were more modest than I had hoped, but they did at least seem to indicate a potential strategy to make further progress on the problem, and also highlight some of the difficulties of the problem.

One can rephrase the lonely runner conjecture as the following covering problem. Given any integer “velocity” and radius , define the *Bohr set* to be the subset of the unit circle given by the formula

where denotes the distance of to the nearest integer. Thus, for positive, is simply the union of the intervals for , projected onto the unit circle ; in the language of the usual formulation of the lonely runner conjecture, represents those times in which a runner moving at speed returns to within of his or her starting position. For any non-zero integers , let be the smallest radius such that the Bohr sets cover the unit circle:

Then define to be the smallest value of , as ranges over tuples of distinct non-zero integers. The Dirichlet approximation theorem quickly gives that

and hence

for any . The lonely runner conjecture is equivalent to the assertion that this bound is in fact optimal:

Conjecture 1 (Lonely runner conjecture)For any , one has .

This conjecture is currently known for (see this paper of Barajas and Serra), but remains open for higher .

It is natural to try to attack the problem by establishing lower bounds on the quantity . We have the following “trivial” bound, that gets within a factor of two of the conjecture:

Proposition 2 (Trivial bound)For any , one has .

*Proof:* It is not difficult to see that for any non-zero velocity and any , the Bohr set has Lebesgue measure . In particular, by the union bound

we see that the covering (1) is only possible if , giving the claim.

So, in some sense, all the difficulty is coming from the need to improve upon the trivial union bound (2) by a factor of two.

Despite the crudeness of the union bound (2), it has proven surprisingly hard to make substantial improvements on the trivial bound . In 1994, Chen obtained the slight improvement

which was improved a little by Chen and Cusick in 1999 to

when was prime. In a recent paper of Perarnau and Serra, the bound

was obtained for arbitrary . These bounds only improve upon the trivial bound by a multiplicative factor of . Heuristically, one reason for this is as follows. The union bound (2) would of course be sharp if the Bohr sets were all disjoint. Strictly speaking, such disjointness is not possible, because all the Bohr sets have to contain the origin as an interior point. However, it is possible to come up with a large number of Bohr sets which are *almost* disjoint. For instance, suppose that we had velocities that were all prime numbers between and , and that was equal to (and in particular was between and . Then each set can be split into a “kernel” interval , together with the “petal” intervals . Roughly speaking, as the prime varies, the kernel interval stays more or less fixed, but the petal intervals range over disjoint sets, and from this it is not difficult to show that

so that the union bound is within a multiplicative factor of of the truth in this case.

This does not imply that is within a multiplicative factor of of , though, because there are not enough primes between and to assign to distinct velocities; indeed, by the prime number theorem, there are only about such velocities that could be assigned to a prime. So, while the union bound could be close to tight for up to Bohr sets, the above counterexamples don’t exclude improvements to the union bound for larger collections of Bohr sets. Following this train of thought, I was able to obtain a logarithmic improvement to previous lower bounds:

Theorem 3For sufficiently large , one has for some absolute constant .

The factors of in the denominator are for technical reasons and might perhaps be removable by a more careful argument. However it seems difficult to adapt the methods to improve the in the numerator, basically because of the obstruction provided by the near-counterexample discussed above.

Roughly speaking, the idea of the proof of this theorem is as follows. If we have the covering (1) for very close to , then the multiplicity function will then be mostly equal to , but occasionally be larger than . On the other hand, one can compute that the norm of this multiplicity function is significantly larger than (in fact it is at least ). Because of this, the norm must be very large, which means that the triple intersections must be quite large for many triples . Using some basic Fourier analysis and additive combinatorics, one can deduce from this that the velocities must have a large structured component, in the sense that there exists an arithmetic progression of length that contains of these velocities. For simplicity let us take the arithmetic progression to be , thus of the velocities lie in . In particular, from the prime number theorem, most of these velocities will not be prime, and will in fact likely have a “medium-sized” prime factor (in the precise form of the argument, “medium-sized” is defined to be “between and “). Using these medium-sized prime factors, one can show that many of the will have quite a large overlap with many of the other , and this can be used after some elementary arguments to obtain a more noticeable improvement on the union bound (2) than was obtained previously.

A modification of the above argument also allows for the improved estimate

if one knows that *all* of the velocities are of size .

In my previous blog post, I showed that in order to prove the lonely runner conjecture, it suffices to do so under the additional assumption that all of the velocities are of size ; I reproduce this argument (slightly cleaned up for publication) in the current preprint. There is unfortunately a huge gap between and , so the above bound (3) does not immediately give any new bounds for . However, one could perhaps try to start attacking the lonely runner conjecture by increasing the range for which one has good results, and by decreasing the range that one can reduce to. For instance, in the current preprint I give an elementary argument (using a certain amount of case-checking) that shows that the lonely runner bound

holds if all the velocities are assumed to lie between and . This upper threshold of is only a tiny improvement over the trivial threshold of , but it seems to be an interesting sub-problem of the lonely runner conjecture to increase this threshold further. One key target would be to get up to , as there are actually a number of -tuples in this range for which (4) holds with equality. The Dirichlet approximation theorem of course gives the tuple , but there is also the double of this tuple, and furthermore there is an additional construction of Goddyn and Wong that gives some further examples such as , or more generally one can start with the standard tuple and accelerate one of the velocities to ; this turns out to work as long as shares a common factor with every integer between and . There are a few more examples of this type in the paper of Goddyn and Wong, but all of them can be placed in an arithmetic progression of length at most, so if one were very optimistic, one could perhaps envision a strategy in which the upper bound of mentioned earlier was reduced all the way to something like , and then a separate argument deployed to treat this remaining case, perhaps isolating the constructions of Goddyn and Wong (and possible variants thereof) as the only extreme cases.

Let be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic

where denotes the estimate as . Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if is a random number selected uniformly between and , then the above estimate can be written as

that is to say the random variable has mean approximately . (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near , basically thanks to the Hardy-Ramanujan theorem.)

Now we turn to the pair correlations for a fixed positive integer . There is a classical computation of Ingham that shows that

The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.

Using our probabilistic lens, the estimate (2) can be written as

From (1) (and the asymptotic negligibility of the shift by ) we see that the random variables and both have a mean of , so the additional factor of represents some arithmetic coupling between the two random variables.

Ingham’s formula can be established in a number of ways. Firstly, one can expand out and use the hyperbola method (splitting into the cases and and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form

for various . For much less than this can be achieved using a further application of the hyperbola method, but for comparable to things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums for various frequencies . The contribution of “major arc” can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.

Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor will emerge at the end. One can at least explain the as a normalisation constant needed to balance the factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy of , then

using symmetry to order (discarding the diagonal case ) and making the change of variables , we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of in is equal to . (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if denotes the asymptotic mean in , then we have (heuristically at least)

and we obtain the desired consistency after multiplying by .

This still however does not explain the presence of the factor. Intuitively it is reasonable that if has many prime factors, and has a lot of factors, then will have slightly more factors than average, because any common factor to and will automatically be acquired by . But how to quantify this effect?

One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor

where the product is over all primes , and is the local version of at (which in this case, is just one plus the –valuation of : ). Note that all but finitely many of the terms in this product will equal , so the infinite product is well-defined. In a similar fashion, we can factor

where

(or in terms of valuations, ). Heuristically, the Chinese remainder theorem suggests that the various factors behave like independent random variables, and so the correlation between and should approximately decouple into the product of correlations between the local factors and . And indeed we do have the following local version of Ingham’s asymptotics:

Proposition 1 (Local Ingham asymptotics)For fixed and integer , we haveand

From the Euler formula

we see that

and so one can “explain” the arithmetic factor in Ingham’s asymptotic as the product of the arithmetic factors in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.

Remark 2The relation between the local means and the global mean can also be seen heuristically through the applicationof Mertens’ theorem, where is Pólya’s magic exponent, which serves as a useful heuristic limiting threshold in situations where the product of local factors is divergent.

Let us now prove this proposition. One could brute-force the computations by observing that for any fixed , the valuation is equal to with probability , and with a little more effort one can also compute the joint distribution of and , at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of and force to have a factor also.

It is first convenient to get rid of error terms by observing that in the limit , the random variable converges vaguely to a uniform random variable on the profinite integers , or more precisely that the pair converges vaguely to . Because of this (and because of the easily verified uniform integrability properties of and their powers), it suffices to establish the exact formulae

in the profinite setting (this setting will make it easier to set up the recursion).

We begin with (5). Observe that is coprime to with probability , in which case is equal to . Conditioning to the complementary probability event that is divisible by , we can factor where is also uniformly distributed over the profinite integers, in which event we have . We arrive at the identity

As and have the same distribution, the quantities and are equal, and (5) follows by a brief amount of high-school algebra.

We use a similar method to treat (6). First treat the case when is coprime to . Then we see that with probability , and are simultaneously coprime to , in which case . Furthermore, with probability , is divisible by and is not; in which case we can write as before, with and . Finally, in the remaining event with probability , is divisible by and is not; we can then write , so that and . Putting all this together, we obtain

and the claim (6) in this case follows from (5) and a brief computation (noting that in this case).

Now suppose that is divisible by , thus for some integer . Then with probability , and are simultaneously coprime to , in which case . In the remaining event, we can write , and then and . Putting all this together we have

which by (5) (and replacing by ) leads to the recursive relation

and (6) then follows by induction on the number of powers of .

The estimate (2) of Ingham was refined by Estermann, who obtained the more accurate expansion

for certain complicated but explicit coefficients . For instance, is given by the formula

where is the Euler-Mascheroni constant,

The formula for is similar but even more complicated. The error term was improved by Heath-Brown to ; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation here, but this is well out of reach of current methods.

These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the term below the fold.

The twin prime conjecture, still unsolved, asserts that there are infinitely many primes such that is also prime. A more precise form of this conjecture is (a special case) of the Hardy-Littlewood prime tuples conjecture, which asserts that

as , where is the von Mangoldt function and is the twin prime constant

Because is almost entirely supported on the primes, it is not difficult to see that (1) implies the twin prime conjecture.

One can give a heuristic justification of the asymptotic (1) (and hence the twin prime conjecture) via sieve theoretic methods. Recall that the von Mangoldt function can be decomposed as a Dirichlet convolution

where is the Möbius function. Because of this, we can rewrite the left-hand side of (1) as

To compute this double sum, it is thus natural to consider sums such as

or (to simplify things by removing the logarithm)

The prime number theorem in arithmetic progressions suggests that one has an asymptotic of the form

where is the multiplicative function with for even and

for odd. Summing by parts, one then expects

and so we heuristically have

The Dirichlet series

has an Euler product factorisation

for ; comparing this with the Euler product factorisation

for the Riemann zeta function, and recalling that has a simple pole of residue at , we see that

has a simple zero at with first derivative

From this and standard multiplicative number theory manipulations, one can calculate the asymptotic

which concludes the heuristic justification of (1).

What prevents us from making the above heuristic argument rigorous, and thus proving (1) and the twin prime conjecture? Note that the variable in (2) ranges to be as large as . On the other hand, the prime number theorem in arithmetic progressions (3) is not expected to hold for anywhere that large (for instance, the left-hand side of (3) vanishes as soon as exceeds ). The best unconditional result known of the type (3) is the Siegel-Walfisz theorem, which allows to be as large as . Even the powerful generalised Riemann hypothesis (GRH) only lets one prove an estimate of the form (3) for up to about .

However, because of the averaging effect of the summation in in (2), we don’t need the asymptotic (3) to be true for *all* in a particular range; having it true for *almost all* in that range would suffice. Here the situation is much better; the celebrated Bombieri-Vinogradov theorem (sometimes known as “GRH on the average”) implies, roughly speaking, that the approximation (3) is valid for *almost all* for any fixed . While this is not enough to control (2) or (1), the Bombieri-Vinogradov theorem can at least be used to control variants of (1) such as

for various sieve weights whose associated divisor function is supposed to approximate the von Mangoldt function , although that theorem only lets one do this when the weights are supported on the range . This is still enough to obtain some partial results towards (1); for instance, by selecting weights according to the Selberg sieve, one can use the Bombieri-Vinogradov theorem to establish the upper bound

which is off from (1) by a factor of about . See for instance this blog post for details.

It has been difficult to improve upon the Bombieri-Vinogradov theorem in its full generality, although there are various improvements to certain restricted versions of the Bombieri-Vinogradov theorem, for instance in the famous work of Zhang on bounded gaps between primes. Nevertheless, it is believed that the Elliott-Halberstam conjecture (EH) holds, which roughly speaking would mean that (3) now holds for almost all for any fixed . (Unfortunately, the factor cannot be removed, as investigated in a series of papers by Friedlander, Granville, and also Hildebrand and Maier.) This comes tantalisingly close to having enough distribution to control all of (1). Unfortunately, it still falls short. Using this conjecture in place of the Bombieri-Vinogradov theorem leads to various improvements to sieve theoretic bounds; for instance, the factor of in (4) can now be improved to .

In two papers from the 1970s (which can be found online here and here respectively, the latter starting on page 255 of the pdf), Bombieri developed what is now known as the *Bombieri asymptotic sieve* to clarify the situation more precisely. First, he showed that on the Elliott-Halberstam conjecture, while one still could not establish the asymptotic (1), one could prove the generalised asymptotic

for all natural numbers , where the generalised von Mangoldt functions are defined by the formula

These functions behave like the von Mangoldt function, but are concentrated on -almost primes (numbers with at most prime factors) rather than primes. The right-hand side of (5) corresponds to what one would expect if one ran the same heuristics used to justify (1). Sadly, the case of (5), which is just (1), is just barely excluded from Bombieri’s analysis.

More generally, on the assumption of EH, the Bombieri asymptotic sieve provides the asymptotic

for any fixed and any tuple of natural numbers other than , where

is a further generalisation of the von Mangoldt function (now concentrated on -almost primes). By combining these asymptotics with some elementary identities involving the , together with the Weierstrass approximation theorem, Bombieri was able to control a wide family of sums including (1), except for one undetermined scalar . Namely, he was able to show (again on EH) that for any fixed and any continuous function on the simplex that had suitable vanishing at the boundary, the sum

when was even, where the integral on is with respect to the measure (this is Dirac measure in the case ). In particular, we have

and the twin prime conjecture would be proved if one could show that is bounded away from zero, while (1) is equivalent to the assertion that is equal to . Unfortunately, no additional bound beyond the inequalities provided by the Bombieri asymptotic sieve is known, even if one assumes all other major conjectures in number theory than the prime tuples conjecture and its variants (e.g. GRH, GEH, GUE, abc, Chowla, …).

To put it another way, the Bombieri asymptotic sieve is able (on EH) to compute asymptotics for sums

without needing to know the unknown scalar , when is a function supported on almost primes of the form

for and some fixed , with vanishing elsewhere and for some continuous (symmetric) functions obeying some vanishing at the boundary, so long as the parity condition

is obeyed (informally: gives the same weight to products of an odd number of primes as to products of an even number of primes, or to put it another way, is asymptotically orthogonal to the Möbius function ). But when violates the parity condition, the asymptotic involves the unknown . This scalar thus embodies the “parity problem” for the twin prime conjecture (discussed in these previous blog posts).

Because the obstruction to the parity problem is only one-dimensional (on EH), one can replace any parity-violating weight (such as ) with any other parity-violating weight and obtain a logically equivalent estimate. For instance, to prove the twin prime conjecture on EH, it would suffice to show that

for some fixed , or equivalently that there are solutions to the equation in primes with and . (In some cases, this sort of reduction can also be made using other sieves than the Bombieri asymptotic sieve, as was observed by Ng.) As another example, the Bombieri asymptotic sieve can be used to show that the asymptotic (1) is equivalent to the asymptotic

where is the set of numbers that are *rough* in the sense that they have no prime factors less than for some fixed (the function clearly correlates with and so must violate the parity condition). One can replace with similar sieve weights (e.g. a Selberg sieve) that concentrate on almost primes if desired.

As it turns out, if one is willing to strengthen the assumption of the Elliott-Halberstam (EH) conjecture to the assumption of the *generalised Elliott-Halberstam (GEH) conjecture* (as formulated for instance in Claim 2.6 of the Polymath8b paper), one can also swap the factor in the above asymptotics with other parity-violating weights and obtain a logically equivalent estimate, as the Bombieri asymptotic sieve also applies to weights such as under the assumption of GEH. For instance, on GEH one can use two such applications of the Bombieri asymptotic sieve to show that the twin prime conjecture would follow if one could show that there are solutions to the equation

in primes with and , for some . Similarly, on GEH the asymptotic (1) is equivalent to the asymptotic

for some fixed , and similarly with replaced by other sieves. This form of the quantitative twin primes conjecture is appealingly similar to the (special case)

of the Chowla conjecture, for which there has been some recent progress (discussed for instance in these recent posts). Informally, the Bombieri asymptotic sieve lets us (on GEH) view the twin prime conjecture as a sort of Chowla conjecture restricted to almost primes. Unfortunately, the recent progress on the Chowla conjecture relies heavily on the multiplicativity of at small primes, which is completely destroyed by inserting a weight such as , so this does not yet yield a viable path towards the twin prime conjecture even assuming GEH. Still, the similarity is striking, and one can hope that further ways to attack the Chowla conjecture may emerge that could impact the twin prime conjecture. (Alternatively, if one assumes a sufficiently optimistic version of the GEH, one could perhaps relax the notion of “almost prime” to the extent that one could start usefully using multiplicativity at smallish primes, though this seems rather wishful at present, particularly since the most optimistic versions of GEH are known to be false.)

The Bombieri asymptotic sieve is already well explained in the original two papers of Bombieri; there is also a slightly different treatment of the sieve by Friedlander and Iwaniec, as well as a simplified version in the book of Friedlander and Iwaniec (in which the distribution hypothesis is strengthened in order to shorten the arguments. I’ve decided though to write up my own notes on the sieve below the fold; this is primarily for my own benefit, but may be useful to some readers also. I largely follow the treatment of Bombieri, with the one idiosyncratic twist of replacing the usual “elementary” Selberg sieve with the “analytic” Selberg sieve used in particular in many of the breakthrough works in small gaps between primes; I prefer working with the latter due to its Fourier-analytic flavour.

** — 1. Controlling generalised von Mangoldt sums — **

To prove (5), we shall first generalise it, by replacing the sequence by a more general sequence obeying the following axioms:

- (i) (Non-negativity) One has for all .
- (ii) (Crude size bound) One has for all , where is the divisor function.
- (iii) (Size) We have for some constant .
- (iv) (Elliott-Halberstam type conjecture) For any , one has
where is a multiplicative function with for all primes and .

These axioms are a little bit stronger than what is actually needed to make the Bombieri asymptotic sieve work, but we will not attempt to work with the weakest possible axioms here.

We introduce the function

which is analytic for ; in particular it can be evaluated at to yield

There are two model examples of data to keep in mind. The first, discussed in the introduction, is when , then and is as in the introduction; one of course needs EH to justify axiom (iv) in this case. The other is when , in which case and for all . We will later take advantage of the second example to avoid doing some (routine, but messy) main term computations.

The main result of this section is then

Theorem 1Let be as above. Let be a tuple of natural numbers (independent of ) that is not equal to . Then one has the asymptoticas , where .

Note that this recovers (5) (on EH) as a special case.

We now begin the proof of this theorem. Henceforth we allow implied constants in the or notation to depend on and .

It will be convenient to replace the range by a shorter range by the following standard localisation trick. Let be a large quantity depending on to be chosen later, and let denote the interval . We will show the estimate

from which the original claim follows by a routine summation argument. Observe from axiom (iv) and the triangle inequality that

for any .

Write for the logarithm function , thus for any . Without loss of generality we may assume that ; we then factor , where

This function is just when . When the function is more complicated, but we at least have the following crude bound:

*Proof:* We induct on . The case is obvious, so suppose and the claim has already been proven for . Since , we see from induction hypothesis and the triangle inequality that

Since by Möbius inversion, the claim follows.

We can write

In the region , we have . Thus

for . The contribution of the error term to to (10) is easily seen to be negligible if is large enough, so we may freely replace with with little difficulty.

If we insert this replacement directly into the left-hand side of (10) and rearrange, we get

We can’t quite control this using axiom (iv) because the range of is a bit too big, as explained in the introduction. So let us introduce a truncated function

where is a small quantity to be chosen later, and is a smooth function that equals on and equals on . Suppose one could establish the following two estimates for any fixed :

where is a quantity that depends on but not on . Then on combining the two estimates we would have

One could in principle compute explicitly from the proof of (13), but one can avoid doing so by the following comparison trick. In the special case , standard multiplicative number theory (noting that the Dirichlet series has a pole of order at , with top Laurent coefficient ) gives the asymptotic

which when compared with (14) for (recalling that in this case) gives the formula

Inserting this back into (14) and recalling that can be made arbitrarily small, we obtain (10).

As it turns out, the estimate (13) is easy to establish, but the estimate (12) is not, roughly speaking because the typical number in has too many divisors in the range , each of which gives a contribution to the error term. (In the book of Friedlander and Iwaniec, the estimate (13) is established anyway, but only after assuming a stronger version of (iv), roughly speaking in which is allowed to be as large as .) To resolve this issue, we will insert a preliminary sieve that will remove most of the potential divisors i the range (leaving only about such divisors on the average for typical ), making the analogue of (12) easier to prove (at the cost of making the analogue of (13) more difficult). Namely, if one can find a function for which one has the estimates

for some quantity that depends on but not on , then by repeating the previous arguments we will again be able to establish (10).

The key estimate is (16). As we shall see, when comparing with , the weight will cost us a factor of , but the term in the definitions of and will recover a factor of , which will give the desired bound since we are assuming .

One has some flexibility in how to select the weight : basically any standard sieve that uses divisors of size at most to localise (at least approximately) to numbers that are rough in the sense that they have no (or at least very few) factors less than , will do. We will use the analytic Selberg sieve choice

where is a smooth function supported on that equals on .

It remains to establish the bounds (15), (16), (17). To warm up and introduce the various methods needed, we begin with the standard bound

where denotes the derivative of . Note the loss of that had previously been pointed out. In the arguments that follows I will be a little brief with the details, as they are standard (see e.g. this previous post).

We now prove (19). The left-hand side can be expanded as

where denotes the least common multiple of and . From the support of we see that the summand is only non-vanishing when . We now use axiom (iv) and split the left-hand side into a main term

and an error term that is at most

From axiom (ii) and elementary multiplicative number theory, we have the bound

so from axiom (iv) and Cauchy-Schwarz we see that the error term (20) is acceptable. Thus it will suffice to establish the bound

The summand here is almost, but not quite, multiplicative in . To make it genuinely multiplicative, we perform a (shifted) Fourier expansion

for some rapidly decreasing function (essentially the Fourier transform of ). Thus

and so the left-hand side of (21) can be rearranged using Fubini’s theorem as

We can factorise as an Euler product:

Taking absolute values and using Mertens’ theorem leads to the crude bound

which when combined with the rapid decrease of , allows us to restrict the region of integration in (23) to the square (say) with negligible error. Next, we use the Euler product

for to factorise

where

For with nonnegative real part, one has

and so by the Weierstrass -test, is continuous at . Since

we thus have

Also, since has a pole of order at with residue , we have

and thus

The quantity (23) can thus be written, up to errors of , as

Using the rapid decrease of , we may remove the restriction on , and it will now suffice to prove the identity

But on differentiating and then squaring (22) we have

and the claim follows by integrating in from zero to infinity (noting that vanishes for ).

We have the following variant of (19):

for any . We also have the variant

If in addition has no prime factors less than for some fixed , one has

Roughly speaking, the above estimates assert that is concentrated on those numbers with no prime factors much less than , but factors without such small prime divisors occur with about the same relative density as they do in the integers.

*Proof:* The left-hand side of (24) can be expanded as

If we define

then the previous expression can be written as

while one has

which gives (25) from Axiom (iv). To prove (24), it now suffices to show that

Arguing as before, the left-hand side is

where

From Mertens’ theorem we have

when , so the contribution of the terms where can be absorbed into the error (after increasing that error slightly). For the remaining contributions, we see that

where if does not divide , and

if divides times for some . In the latter case, Taylor expansion gives the bounds

and the claim (28) follows. When and we have

and (27) follows by repeating the previous calculations. Finally, (26) is proven similarly to (24) (using in place of ).

Now we can prove (15), (16), (17). We begin with (15). Using the Leibniz rule applied to the identity and using and Möbius inversion (and the associativity and commutativity of Dirichlet convolution) we see that

Next, by applying the Leibniz rule to for some and using (29) we see that

and hence we have the recursive identity

In particular, from induction we see that is supported on numbers with at most distinct prime factors, and hence is supported on numbers with at most distinct prime factors. In particular, from (18) we see that on the support of . Thus it will suffice to show that

If and , then has at most distinct prime factors , with . If we factor , where is the contribution of those with , and is the contribution of those with , then at least one of the following two statements hold:

- (a) (and hence ) is divisible by a square number of size at least .
- (b) .

The contribution of case (a) is easily seen to be acceptable by axiom (ii). For case (b), we observe from (30) and induction that

and so it will suffice to show that

where ranges over numbers bounded by with at most distinct prime factors, the smallest of which is at most , and consists of those numbers with no prime factor less than or equal to . Applying (26) (with replaced by ) gives the bound

so by (25) it suffices to show that

subject to the same constraints on as before. The contribution of those with distinct prime factors can be bounded by

applying Mertens’ theorem and summing over , one obtains the claim.

Now we show (16). As discussed previously in this section, we can replace by with negligible error. Comparing this with (16) and (11), we see that it suffices to show that

From the support of , the summand on the left-hand side is only non-zero when , which makes , where we use the crucial hypothesis to gain enough powers of to make the argument here work. Applying Lemma 2, we reduce to showing that

We can make the change of variables to flip the sum

and then swap the sums to reduce to showing that

By Lemma 3, it suffices to show that

To prove this, we use the Rankin trick, bounding the implied weight by . We can then bound the left-hand side by the Euler product

which can be bounded by

and the claim follows from Mertens’ theorem.

Finally, we show (17). By (11), the left-hand side expands as

We let be a small constant to be chosen later. We divide the outer sum into two ranges, depending on whether only has prime factors greater than or not. In the former case, we can apply (27) to write this contribution as

plus a negligible error, where the is implicitly restricted to numbers with all prime factors greater than . The main term is messy, but it is of the required form up to an acceptable error, so there is no need to compute it any further. It remains to consider those that have at least one prime factor less than . Here we use (24) instead of (27) as well as Lemma 3 to dominate this contribution by

up to negligible errors, where is now restricted to have at least one prime factor less than . This makes at least one of the factors to be at most . A routine application of Rankin’s trick shows that

and so the total contribution of this case is . Since can be made arbitrarily small, (17) follows.

** — 2. Weierstrass approximation — **

Having proved Theorem 1, we now take linear combinations of this theorem, combined with the Weierstrass approximation theorem, to give the asymptotics (7), (8) described in the introduction.

Let , , , be as in that theorem. It will be convenient to normalise the weights by to make their mean value comparable to . From Theorem 1 and summation by parts we have

whenever does not consist entirely of ones.

We now take a closer look at what happens when does consist entirely of ones. Let denote the -tuple . Convolving the case of (30) with copies of for some and using the Leibniz rule, we see that

and hence

Multiplying by and summing over , and using (31) to control the term, one has

If we define (up to an error of ) by the formula

then an induction then shows that

for odd , and

for even . In particular, after adjusting by if necessary, we have since the left-hand sides are non-negative.

If we now define the comparison sequence , standard multiplicative number theory shows that the above estimates also hold when is replaced by ; thus

for both odd and even . The bound (31) also holds for when does not consist entirely of ones, and hence

for any fixed (which may or may not consist entirely of ones).

Next, from induction (on ), the Leibniz rule, and (30), we see that for any and , , the function

is a finite linear combination of functions of the form for tuples that may possibly consist entirely of ones. We thus have

whenever is one of these functions (32). Specialising to the case , we thus have

where . The contribution of those that are powers of primes can be easily seen to be negligible, leading to

where now . The contribution of the case where two of the primes agree can also be seen to be negligible, as can the error when replacing with , and then by symmetry

By linearity, this implies that

for any polynomial that vanishes on the coordinate hyperplanes . The right-hand side can also be evaluated by Mertens’ theorem as

when is odd and

when is even. Using the Weierstrass approximation theorem, we then have

for any continuous function that is compactly supported in the interior of . Computing the right-hand side using Mertens’ theorem as before, we obtain the claimed asymptotics (7), (8).

Remark 4The Bombieri asymptotic sieve has to use the full power of EH (or GEH); there are constructions due to Ford that show that if one only has a distributional hypothesis up to for some fixed constant , then the asymptotics of sums such as (5), or more generally (9), are not determined by a single scalar parameter , but can also vary in other ways as well. Thus the Bombieri asymptotic sieve really is asymptotic; in order to get type error terms one needs the level of distribution to be asymptotically equal to as . Related to this, the quantitative decay of the error terms in the Bombieri asymptotic sieve are extremely poor; in particular, they depend on the dependence of implied constant in axiom (iv) on the parameters , for which there is no consensus on what one should conjecturally expect.

## Recent Comments