You are currently browsing the category archive for the ‘math.NT’ category.

A basic object of study in multiplicative number theory are the arithmetic functions: functions from the natural numbers to the complex numbers. Some fundamental examples of such functions include

- The constant function ;
- The Kronecker delta function ;
- The natural logarithm function ;
- The divisor function ;
- The von Mangoldt function , with defined to equal when is a power of a prime for some , and defined to equal zero otherwise; and
- The Möbius function , with defined to equal when is the product of distinct primes, and defined to equal zero otherwise.

Given an arithmetic function , we are often interested in statistics such as the summatory function

the logarithmically (or harmonically) weighted summatory function

or the Dirichlet series

In the latter case, one typically has to first restrict to those complex numbers whose real part is large enough in order to ensure the series on the right converges; but in many important cases, one can then extend the Dirichlet series to almost all of the complex plane by analytic continuation. One is also interested in correlations involving additive shifts, such as , but these are significantly more difficult to study and cannot be easily estimated by the methods of classical multiplicative number theory.

A key operation on arithmetic functions is that of Dirichlet convolution, which when given two arithmetic functions , forms a new arithmetic function , defined by the formula

Thus for instance , , , and for any arithmetic function . Dirichlet convolution and Dirichlet series are related by the fundamental formula

at least when the real part of is large enough that all sums involved become absolutely convergent (but in practice one can use analytic continuation to extend this identity to most of the complex plane). There is also the identity

at least when the real part of is large enough to justify interchange of differentiation and summation. As a consequence, many Dirichlet series can be expressed in terms of the Riemann zeta function , thus for instance

Much of the difficulty of multiplicative number theory can be traced back to the discrete nature of the natural numbers , which form a rather complicated abelian semigroup with respect to multiplication (in particular the set of generators is the set of prime numbers). One can obtain a simpler analogue of the subject by working instead with the half-infinite interval , which is a much simpler abelian semigroup under multiplication (being a one-dimensional Lie semigroup). (I will think of this as a sort of “completion” of at the infinite place , hence the terminology.) Accordingly, let us define a *continuous arithmetic function* to be a locally integrable function . The analogue of the summatory function (1) is then an integral

and similarly the analogue of (2) is

The analogue of the Dirichlet series is the Mellin-type transform

which will be well-defined at least if the real part of is large enough and if the continuous arithmetic function does not grow too quickly, and hopefully will also be defined elsewhere in the complex plane by analytic continuation.

For instance, the continuous analogue of the discrete constant function would be the constant function , which maps any to , and which we will denote by in order to keep it distinct from . The two functions and have approximately similar statistics; for instance one has

and

where is the harmonic number, and we are deliberately vague as to what the symbol means. Continuing this analogy, we would expect

which reflects the fact that has a simple pole at with residue , and no other poles. Note that the identity is initially only valid in the region , but clearly the right-hand side can be continued analytically to the entire complex plane except for the pole at , and so one can define in this region also.

In a similar vein, the logarithm function is approximately similar to the logarithm function , giving for instance the crude form

of Stirling’s formula, or the Dirichlet series approximation

The continuous analogue of Dirichlet convolution is multiplicative convolution using the multiplicative Haar measure : given two continuous arithmetic functions , one can define their convolution by the formula

Thus for instance . A short computation using Fubini’s theorem shows the analogue

of (3) whenever the real part of is large enough that Fubini’s theorem can be justified; similarly, differentiation under the integral sign shows that

again assuming that the real part of is large enough that differentiation under the integral sign (or some other tool like this, such as the Cauchy integral formula for derivatives) can be justified.

Direct calculation shows that for any complex number , one has

(at least for the real part of large enough), and hence by several applications of (5)

for any natural number . This can lead to the following heuristic: if a Dirichlet series behaves like a linear combination of poles , in that

for some set of poles and some coefficients and natural numbers (where we again are vague as to what means, and how to interpret the sum if the set of poles is infinite), then one should expect the arithmetic function to behave like the continuous arithmetic function

In particular, if we only have simple poles,

then we expect to have behave like continuous arithmetic function

Integrating this from to , this heuristically suggests an approximation

for the summatory function, and similarly

with the convention that is when , and similarly is when . One can make these sorts of approximations more rigorous by means of Perron’s formula (or one of its variants) combined with the residue theorem, provided that one has good enough control on the relevant Dirichlet series, but we will not pursue these rigorous calculations here. (But see for instance this previous blog post for some examples.)

For instance, using the more refined approximation

to the zeta function near , we have

we would expect that

and thus for instance

which matches what one actually gets from the Dirichlet hyperbola method (see e.g. equation (44) of this previous post).

Or, noting that has a simple pole at and assuming simple zeroes elsewhere, the log derivative will have simple poles of residue at and at all the zeroes, leading to the heuristic

suggesting that should behave like the continuous arithmetic function

leading for instance to the summatory approximation

which is a heuristic form of the Riemann-von Mangoldt explicit formula (see Exercise 45 of these notes for a rigorous version of this formula).

Exercise 1Go through some of the other explicit formulae listed at this Wikipedia page and give heuristic justifications for them (up to some lower order terms) by similar calculations to those given above.

Given the “adelic” perspective on number theory, I wonder if there are also -adic analogues of arithmetic functions to which a similar set of heuristics can be applied, perhaps to study sums such as . A key problem here is that there does not seem to be any good interpretation of the expression when is complex and is a -adic number, so it is not clear that one can analyse a Dirichlet series -adically. For similar reasons, we don’t have a canonical way to define for a Dirichlet character (unless its conductor happens to be a power of ), so there doesn’t seem to be much to say in the -aspect either.

Let be the Liouville function, thus is defined to equal when is the product of an even number of primes, and when is the product of an odd number of primes. The Chowla conjecture asserts that has the statistics of a random sign pattern, in the sense that

for all and all distinct natural numbers , where we use the averaging notation

For , this conjecture is equivalent to the prime number theorem (as discussed in this previous blog post), but the conjecture remains open for any .

In recent years, it has been realised that one can make more progress on this conjecture if one works instead with the logarithmically averaged version

of the conjecture, where we use the logarithmic averaging notation

Using the summation by parts (or telescoping series) identity

it is not difficult to show that the Chowla conjecture (1) for a given implies the logarithmically averaged conjecture (2). However, the converse implication is not at all clear. For instance, for , we have already mentioned that the Chowla conjecture

is equivalent to the prime number theorem; but the logarithmically averaged analogue

is significantly easier to show (a proof with the Liouville function replaced by the closely related Möbius function is given in this previous blog post). And indeed, significantly more is now known for the logarithmically averaged Chowla conjecture; in this paper of mine I had proven (2) for , and in this recent paper with Joni Teravainen, we proved the conjecture for all odd (with a different proof also given here).

In view of this emerging consensus that the logarithmically averaged Chowla conjecture was easier than the ordinary Chowla conjecture, it was thus somewhat of a surprise for me to read a recent paper of Gomilko, Kwietniak, and Lemanczyk who (among other things) established the following statement:

Theorem 1Assume that the logarithmically averaged Chowla conjecture (2) is true for all . Then there exists a sequence going to infinity such that the Chowla conjecture (1) is true for all along that sequence, that is to sayfor all and all distinct .

This implication does not use any special properties of the Liouville function (other than that they are bounded), and in fact proceeds by ergodic theoretic methods, focusing in particular on the ergodic decomposition of invariant measures of a shift into ergodic measures. Ergodic methods have proven remarkably fruitful in understanding these sorts of number theoretic and combinatorial problems, as could already be seen by the ergodic theoretic proof of Szemerédi’s theorem by Furstenberg, and more recently by the work of Frantzikinakis and Host on Sarnak’s conjecture. (My first paper with Teravainen also uses ergodic theory tools.) Indeed, many other results in the subject were first discovered using ergodic theory methods.

On the other hand, many results in this subject that were first proven ergodic theoretically have since been reproven by more combinatorial means; my second paper with Teravainen is an instance of this. As it turns out, one can also prove Theorem 1 by a standard combinatorial (or probabilistic) technique known as the second moment method. In fact, one can prove slightly more:

Theorem 2Let be a natural number. Assume that the logarithmically averaged Chowla conjecture (2) is true for . Then there exists a set of natural numbers of logarithmic density (that is, ) such thatfor any distinct .

It is not difficult to deduce Theorem 1 from Theorem 2 using a diagonalisation argument. Unfortunately, the known cases of the logarithmically averaged Chowla conjecture ( and odd ) are currently insufficient to use Theorem 2 for any purpose other than to reprove what is already known to be true from the prime number theorem. (Indeed, the even cases of Chowla, in either logarithmically averaged or non-logarithmically averaged forms, seem to be far more powerful than the odd cases; see Remark 1.7 of this paper of myself and Teravainen for a related observation in this direction.)

We now sketch the proof of Theorem 2. For any distinct , we take a large number and consider the limiting the second moment

We can expand this as

If all the are distinct, the hypothesis (2) tells us that the inner averages goes to zero as . The remaining averages are , and there are of these averages. We conclude that

By Markov’s inequality (and (3)), we conclude that for any fixed , there exists a set of upper logarithmic density at least , thus

such that

By deleting at most finitely many elements, we may assume that consists only of elements of size at least (say).

For any , if we let be the union of for , then has logarithmic density . By a diagonalisation argument (using the fact that the set of tuples is countable), we can then find a set of natural numbers of logarithmic density , such that for every , every sufficiently large element of lies in . Thus for every sufficiently large in , one has

for some with . By Cauchy-Schwarz, this implies that

interchanging the sums and using and , this implies that

We conclude on taking to infinity that

as required.

Joni Teräväinen and I have just uploaded to the arXiv our paper “Odd order cases of the logarithmically averaged Chowla conjecture“, submitted to J. Numb. Thy. Bordeaux. This paper gives an alternate route to one of the main results of our previous paper, and more specifically reproves the asymptotic

for all odd and all integers (that is to say, all the odd order cases of the logarithmically averaged Chowla conjecture). Our previous argument relies heavily on some deep ergodic theory results of Bergelson-Host-Kra, Leibman, and Le (and was applicable to more general multiplicative functions than the Liouville function ); here we give a shorter proof that avoids ergodic theory (but instead requires the Gowers uniformity of the (W-tricked) von Mangoldt function, established in several papers of Ben Green, Tamar Ziegler, and myself). The proof follows the lines sketched in the previous blog post. In principle, due to the avoidance of ergodic theory, the arguments here have a greater chance to be made quantitative; however, at present the known bounds on the Gowers uniformity of the von Mangoldt function are qualitative, except at the level, which is unfortunate since the first non-trivial odd case requires quantitative control on the level. (But it may be possible to make the Gowers uniformity bounds for quantitative if one assumes GRH, although when one puts everything together, the actual decay rate obtained in (1) is likely to be poor.)

Joni Teräväinen and I have just uploaded to the arXiv our paper “The structure of logarithmically averaged correlations of multiplicative functions, with applications to the Chowla and Elliott conjectures“, submitted to Duke Mathematical Journal. This paper builds upon my previous paper in which I introduced an “entropy decrement method” to prove the two-point (logarithmically averaged) cases of the Chowla and Elliott conjectures. A bit more specifically, I showed that

whenever were sequences going to infinity, were distinct integers, and were -bounded multiplicative functions which were *non-pretentious* in the sense that

for all Dirichlet characters and for . Thus, for instance, one had the logarithmically averaged two-point Chowla conjecture

for fixed any non-zero , where was the Liouville function.

One would certainly like to extend these results to higher order correlations than the two-point correlations. This looks to be difficult (though perhaps not completely impossible if one allows for logarithmic averaging): in a previous paper I showed that achieving this in the context of the Liouville function would be equivalent to resolving the logarithmically averaged Sarnak conjecture, as well as establishing logarithmically averaged local Gowers uniformity of the Liouville function. However, in this paper we are able to avoid having to resolve these difficult conjectures to obtain partial results towards the (logarithmically averaged) Chowla and Elliott conjecture. For the Chowla conjecture, we can obtain all odd order correlations, in that

for all odd and all integers (which, in the odd order case, are no longer required to be distinct). (Superficially, this looks like we have resolved “half of the cases” of the logarithmically averaged Chowla conjecture; but it seems the odd order correlations are significantly easier than the even order ones. For instance, because of the Katai-Bourgain-Sarnak-Ziegler criterion, one can basically deduce the odd order cases of (2) from the even order cases (after allowing for some dilations in the argument ).

For the more general Elliott conjecture, we can show that

for any , any integers and any bounded multiplicative functions , unless the product *weakly pretends to be a Dirichlet character * in the sense that

This can be seen to imply (2) as a special case. Even when *does* pretend to be a Dirichlet character , we can still say something: if the limits

exist for each (which can be guaranteed if we pass to a suitable subsequence), then is the uniform limit of periodic functions , each of which is –isotypic in the sense that whenever are integers with coprime to the periods of and . This does not pin down the value of any single correlation , but does put significant constraints on how these correlations may vary with .

Among other things, this allows us to show that all possible length four sign patterns of the Liouville function occur with positive density, and all possible length four sign patterns occur with the conjectured logarithmic density. (In a previous paper with Matomaki and Radziwill, we obtained comparable results for length three patterns of Liouville and length two patterns of Möbius.)

To describe the argument, let us focus for simplicity on the case of the Liouville correlations

assuming for sake of discussion that all limits exist. (In the paper, we instead use the device of generalised limits, as discussed in this previous post.) The idea is to combine together two rather different ways to control this function . The first proceeds by the entropy decrement method mentioned earlier, which roughly speaking works as follows. Firstly, we pick a prime and observe that for any , which allows us to rewrite (3) as

Making the change of variables , we obtain

The difference between and is negligible in the limit (here is where we crucially rely on the log-averaging), hence

and thus by (3) we have

The entropy decrement argument can be used to show that the latter limit is small for most (roughly speaking, this is because the factors behave like independent random variables as varies, so that concentration of measure results such as Hoeffding’s inequality can apply, after using entropy inequalities to decouple somewhat these random variables from the factors). We thus obtain the approximate isotopy property

On the other hand, by the Furstenberg correspondence principle (as discussed in these previous posts), it is possible to express as a multiple correlation

for some probability space equipped with a measure-preserving invertible map . Using results of Bergelson-Host-Kra, Leibman, and Le, this allows us to obtain a decomposition of the form

where is a nilsequence, and goes to zero in density (even along the primes, or constant multiples of the primes). The original work of Bergelson-Host-Kra required ergodicity on , which is very definitely a hypothesis that is not available here; however, the later work of Leibman removed this hypothesis, and the work of Le refined the control on so that one still has good control when restricting to primes, or constant multiples of primes.

Ignoring the small error , we can now combine (5) to conclude that

Using the equidistribution theory of nilsequences (as developed in this previous paper of Ben Green and myself), one can break up further into a periodic piece and an “irrational” or “minor arc” piece . The contribution of the minor arc piece can be shown to mostly cancel itself out after dilating by primes and averaging, thanks to Vinogradov-type bilinear sum estimates (transferred to the primes). So we end up with

which already shows (heuristically, at least) the claim that can be approximated by periodic functions which are isotopic in the sense that

But if is odd, one can use Dirichlet’s theorem on primes in arithmetic progressions to restrict to primes that are modulo the period of , and conclude now that vanishes identically, which (heuristically, at least) gives (2).

The same sort of argument works to give the more general bounds on correlations of bounded multiplicative functions. But for the specific task of proving (2), we initially used a slightly different argument that avoids using the ergodic theory machinery of Bergelson-Host-Kra, Leibman, and Le, but replaces it instead with the Gowers uniformity norm theory used to count linear equations in primes. Basically, by averaging (4) in using the “-trick”, as well as known facts about the Gowers uniformity of the von Mangoldt function, one can obtain an approximation of the form

where ranges over a large range of integers coprime to some primorial . On the other hand, by iterating (4) we have

for most semiprimes , and by again averaging over semiprimes one can obtain an approximation of the form

For odd, one can combine the two approximations to conclude that . (This argument is not given in the current paper, but we plan to detail it in a subsequent one.)

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions I. Long shift ranges“, submitted to Proceedings of the London Mathematical Society. This paper is concerned with the estimation of correlations such as

for medium-sized and large , where is the von Mangoldt function; we also consider variants of this sum in which one of the von Mangoldt functions is replaced with a (higher order) divisor function, but for sake of discussion let us focus just on the sum (1). Understanding this sum is very closely related to the problem of finding pairs of primes that differ by ; for instance, if one could establish a lower bound

then this would easily imply the twin prime conjecture.

The (first) Hardy-Littlewood conjecture asserts an asymptotic

as for any fixed positive , where the *singular series* is an arithmetic factor arising from the irregularity of distribution of at small moduli, defined explicitly by

when is even, and when is odd, where

is (half of) the twin prime constant. See for instance this previous blog post for a a heuristic explanation of this conjecture. From the previous discussion we see that (2) for would imply the twin prime conjecture. Sieve theoretic methods are only able to provide an upper bound of the form .

Needless to say, apart from the trivial case of odd , there are no values of for which the Hardy-Littlewood conjecture is known. However there are some results that say that this conjecture holds “on the average”: in particular, if is a quantity depending on that is somewhat large, there are results that show that (2) holds for most (i.e. for ) of the betwen and . Ideally one would like to get as small as possible, in particular one can view the full Hardy-Littlewood conjecture as the endpoint case when is bounded.

The first results in this direction were by van der Corput and by Lavrik, who established such a result with (with a subsequent refinement by Balog); Wolke lowered to , and Mikawa lowered further to . The main result of this paper is a further lowering of to . In fact (as in the preceding works) we get a better error term than , namely an error of the shape for any .

Our arguments initially proceed along standard lines. One can use the Hardy-Littlewood circle method to express the correlation in (2) as an integral involving exponential sums . The contribution of “major arc” is known by a standard computation to recover the main term plus acceptable errors, so it is a matter of controlling the “minor arcs”. After averaging in and using the Plancherel identity, one is basically faced with establishing a bound of the form

for any “minor arc” . If is somewhat close to a low height rational (specifically, if it is within of such a rational with ), then this type of estimate is roughly of comparable strength (by another application of Plancherel) to the best available prime number theorem in short intervals on the average, namely that the prime number theorem holds for most intervals of the form , and we can handle this case using standard mean value theorems for Dirichlet series. So we can restrict attention to the “strongly minor arc” case where is far from such rationals.

The next step (following some ideas we found in a paper of Zhan) is to rewrite this estimate not in terms of the exponential sums , but rather in terms of the Dirichlet polynomial . After a certain amount of computation (including some oscillatory integral estimates arising from stationary phase), one is eventually reduced to the task of establishing an estimate of the form

for any (with sufficiently large depending on ).

The next step, which is again standard, is the use of the Heath-Brown identity (as discussed for instance in this previous blog post) to split up into a number of components that have a Dirichlet convolution structure. Because the exponent we are shooting for is less than , we end up with five types of components that arise, which we call “Type “, “Type “, “Type “, “Type “, and “Type II”. The “Type II” sums are Dirichlet convolutions involving a factor supported on a range and is quite easy to deal with; the “Type ” terms are Dirichlet convolutions that resemble (non-degenerate portions of) the divisor function, formed from convolving together portions of . The “Type ” and “Type ” terms can be estimated satisfactorily by standard moment estimates for Dirichlet polynomials; this already recovers the result of Mikawa (and our argument is in fact slightly more elementary in that no Kloosterman sum estimates are required). It is the treatment of the “Type ” and “Type ” sums that require some new analysis, with the Type terms turning to be the most delicate. After using an existing moment estimate of Jutila for Dirichlet L-functions, matters reduce to obtaining a family of estimates, a typical one of which (relating to the more difficult Type sums) is of the form

for “typical” ordinates of size , where is the Dirichlet polynomial (a fragment of the Riemann zeta function). The precise definition of “typical” is a little technical (because of the complicated nature of Jutila’s estimate) and will not be detailed here. Such a claim would follow easily from the Lindelof hypothesis (which would imply that ) but of course we would like to have an unconditional result.

At this point, having exhausted all the Dirichlet polynomial estimates that are usefully available, we return to “physical space”. Using some further Fourier-analytic and oscillatory integral computations, we can estimate the left-hand side of (3) by an expression that is roughly of the shape

The phase can be Taylor expanded as the sum of and a lower order term , plus negligible errors. If we could discard the lower order term then we would get quite a good bound using the exponential sum estimates of Robert and Sargos, which control averages of exponential sums with purely monomial phases, with the averaging allowing us to exploit the hypothesis that is “typical”. Figuring out how to get rid of this lower order term caused some inefficiency in our arguments; the best we could do (after much experimentation) was to use Fourier analysis to shorten the sums, estimate a one-parameter average exponential sum with a binomial phase by a two-parameter average with a monomial phase, and then use the van der Corput process followed by the estimates of Robert and Sargos. This rather complicated procedure works up to it may be possible that some alternate way to proceed here could improve the exponent somewhat.

In a sequel to this paper, we will use a somewhat different method to reduce to a much smaller value of , but only if we replace the correlations by either or , and also we now only save a in the error term rather than .

Given a function on the natural numbers taking values in , one can invoke the Furstenberg correspondence principle to locate a measure preserving system – a probability space together with a measure-preserving shift (or equivalently, a measure-preserving -action on ) – together with a measurable function (or “observable”) that has essentially the same statistics as in the sense that

for any integers . In particular, one has

whenever the limit on the right-hand side exists. We will refer to the system together with the designated function as a *Furstenberg limit* ot the sequence . These Furstenberg limits capture some, but not all, of the asymptotic behaviour of ; roughly speaking, they control the typical “local” behaviour of , involving correlations such as in the regime where are much smaller than . However, the control on error terms here is usually only qualitative at best, and one usually does not obtain non-trivial control on correlations in which the are allowed to grow at some significant rate with (e.g. like some power of ).

The correspondence principle is discussed in these previous blog posts. One way to establish the principle is by introducing a Banach limit that extends the usual limit functional on the subspace of consisting of convergent sequences while still having operator norm one. Such functionals cannot be constructed explicitly, but can be proven to exist (non-constructively and non-uniquely) using the Hahn-Banach theorem; one can also use a non-principal ultrafilter here if desired. One can then seek to construct a system and a measurable function for which one has the statistics

for all . One can explicitly construct such a system as follows. One can take to be the Cantor space with the product -algebra and the shift

with the function being the coordinate function at zero:

(so in particular for any ). The only thing remaining is to construct the invariant measure . In order to be consistent with (2), one must have

for any distinct integers and signs . One can check that this defines a premeasure on the Boolean algebra of defined by cylinder sets, and the existence of then follows from the Hahn-Kolmogorov extension theorem (or the closely related Kolmogorov extension theorem). One can then check that the correspondence (2) holds, and that is translation-invariant; the latter comes from the translation invariance of the (Banach-)Césaro averaging operation . A variant of this construction shows that the Furstenberg limit is unique up to equivalence if and only if all the limits appearing in (1) actually exist.

One can obtain a slightly tighter correspondence by using a smoother average than the Césaro average. For instance, one can use the logarithmic Césaro averages in place of the Césaro average , thus one replaces (2) by

Whenever the Césaro average of a bounded sequence exists, then the logarithmic Césaro average exists and is equal to the Césaro average. Thus, a Furstenberg limit constructed using logarithmic Banach-Césaro averaging still obeys (1) for all when the right-hand side limit exists, but also obeys the more general assertion

whenever the limit of the right-hand side exists.

In a recent paper of Frantizinakis, the Furstenberg limits of the Liouville function (with logarithmic averaging) were studied. Some (but not all) of the known facts and conjectures about the Liouville function can be interpreted in the Furstenberg limit. For instance, in a recent breakthrough result of Matomaki and Radziwill (discussed previously here), it was shown that the Liouville function exhibited cancellation on short intervals in the sense that

In terms of Furstenberg limits of the Liouville function, this assertion is equivalent to the assertion that

for all Furstenberg limits of Liouville (including those without logarithmic averaging). Invoking the mean ergodic theorem (discussed in this previous post), this assertion is in turn equivalent to the observable that corresponds to the Liouville function being orthogonal to the invariant factor of ; equivalently, the first Gowers-Host-Kra seminorm of (as defined for instance in this previous post) vanishes. The Chowla conjecture, which asserts that

for all distinct integers , is equivalent to the assertion that all the Furstenberg limits of Liouville are equivalent to the Bernoulli system ( with the product measure arising from the uniform distribution on , with the shift and observable as before). Similarly, the logarithmically averaged Chowla conjecture

is equivalent to the assertion that all the Furstenberg limits of Liouville with logarithmic averaging are equivalent to the Bernoulli system. Recently, I was able to prove the two-point version

of the logarithmically averaged Chowla conjecture, for any non-zero integer ; this is equivalent to the perfect strong mixing property

for any Furstenberg limit of Liouville with logarithmic averaging, and any .

The situation is more delicate with regards to the Sarnak conjecture, which is equivalent to the assertion that

for any zero-entropy sequence (see this previous blog post for more discussion). Morally speaking, this conjecture should be equivalent to the assertion that any Furstenberg limit of Liouville is disjoint from any zero entropy system, but I was not able to formally establish an implication in either direction due to some technical issues regarding the fact that the Furstenberg limit does not directly control long-range correlations, only short-range ones. (There are however ergodic theoretic interpretations of the Sarnak conjecture that involve the notion of generic points; see this paper of El Abdalaoui, Lemancyk, and de la Rue.) But the situation is currently better with the logarithmically averaged Sarnak conjecture

as I was able to show that this conjecture was equivalent to the logarithmically averaged Chowla conjecture, and hence to all Furstenberg limits of Liouville with logarithmic averaging being Bernoulli; I also showed the conjecture was equivalent to local Gowers uniformity of the Liouville function, which is in turn equivalent to the function having all Gowers-Host-Kra seminorms vanishing in every Furstenberg limit with logarithmic averaging. In this recent paper of Frantzikinakis, this analysis was taken further, showing that the logarithmically averaged Chowla and Sarnak conjectures were in fact equivalent to the much milder seeming assertion that all Furstenberg limits with logarithmic averaging were ergodic.

Actually, the logarithmically averaged Furstenberg limits have more structure than just a -action on a measure preserving system with a single observable . Let denote the semigroup of affine maps on the integers with and positive. Also, let denote the profinite integers (the inverse limit of the cyclic groups ). Observe that acts on by taking the inverse limit of the obvious actions of on .

Proposition 1 (Enriched logarithmically averaged Furstenberg limit of Liouville)Let be a Banach limit. Then there exists a probability space with an action of the affine semigroup , as well as measurable functions and , with the following properties:

- (i) (Affine Furstenberg limit) For any , and any congruence class , one has
- (ii) (Equivariance of ) For any , one has
for -almost every .

- (iii) (Multiplicativity at fixed primes) For any prime , one has
for -almost every , where is the dilation map .

- (iv) (Measure pushforward) If is of the form and is the set , then the pushforward of by is equal to , that is to say one has
for every measurable .

Note that can be viewed as the subgroup of consisting of the translations . If one only keeps the -portion of the action and forgets the rest (as well as the function ) then the action becomes measure-preserving, and we recover an ordinary Furstenberg limit with logarithmic averaging. However, the additional structure here can be quite useful; for instance, one can transfer the proof of (3) to this setting, which we sketch below the fold, after proving the proposition.

The observable , roughly speaking, means that points in the Furstenberg limit constructed by this proposition are still “virtual integers” in the sense that one can meaningfully compute the residue class of modulo any natural number modulus , by first applying and then reducing mod . The action of means that one can also meaningfully multiply by any natural number, and translate it by any integer. As with other applications of the correspondence principle, the main advantage of moving to this more “virtual” setting is that one now acquires a probability measure , so that the tools of ergodic theory can be readily applied.

Given a random variable that takes on only finitely many values, we can define its Shannon entropy by the formula

with the convention that . (In some texts, one uses the logarithm to base rather than the natural logarithm, but the choice of base will not be relevant for this discussion.) This is clearly a nonnegative quantity. Given two random variables taking on finitely many values, the joint variable is also a random variable taking on finitely many values, and also has an entropy . It obeys the *Shannon inequalities*

so we can define some further nonnegative quantities, the mutual information

and the conditional entropies

More generally, given three random variables , one can define the conditional mutual information

and the final of the Shannon entropy inequalities asserts that this quantity is also non-negative.

The mutual information is a measure of the extent to which and fail to be independent; indeed, it is not difficult to show that vanishes if and only if and are independent. Similarly, vanishes if and only if and are *conditionally* independent relative to . At the other extreme, is a measure of the extent to which fails to depend on ; indeed, it is not difficult to show that if and only if is determined by in the sense that there is a deterministic function such that . In a related vein, if and are equivalent in the sense that there are deterministic functional relationships , between the two variables, then is interchangeable with for the purposes of computing the above quantities, thus for instance , , , , etc..

One can get some initial intuition for these information-theoretic quantities by specialising to a simple situation in which all the random variables being considered come from restricting a single random (and uniformly distributed) boolean function on a given finite domain to some subset of :

In this case, has the law of a random uniformly distributed boolean function from to , and the entropy here can be easily computed to be , where denotes the cardinality of . If is the restriction of to , and is the restriction of to , then the joint variable is equivalent to the restriction of to . If one discards the normalisation factor , one then obtains the following dictionary between entropy and the combinatorics of finite sets:

Random variables | Finite sets |

Entropy | Cardinality |

Joint variable | Union |

Mutual information | Intersection cardinality |

Conditional entropy | Set difference cardinality |

Conditional mutual information | |

independent | disjoint |

determined by | a subset of |

conditionally independent relative to |

Every (linear) inequality or identity about entropy (and related quantities, such as mutual information) then specialises to a combinatorial inequality or identity about finite sets that is easily verified. For instance, the Shannon inequality becomes the union bound , and the definition of mutual information becomes the inclusion-exclusion formula

For a more advanced example, consider the data processing inequality that asserts that if are conditionally independent relative to , then . Specialising to sets, this now says that if are disjoint outside of , then ; this can be made apparent by considering the corresponding Venn diagram. This dictionary also suggests how to *prove* the data processing inequality using the existing Shannon inequalities. Firstly, if and are not necessarily disjoint outside of , then a consideration of Venn diagrams gives the more general inequality

and a further inspection of the diagram then reveals the more precise identity

Using the dictionary in the reverse direction, one is then led to conjecture the identity

which (together with non-negativity of conditional mutual information) implies the data processing inequality, and this identity is in turn easily established from the definition of mutual information.

On the other hand, not every assertion about cardinalities of sets generalises to entropies of random variables that are not arising from restricting random boolean functions to sets. For instance, a basic property of sets is that disjointness from a given set is preserved by unions:

Indeed, one has the union bound

Applying the dictionary in the reverse direction, one might now conjecture that if was independent of and was independent of , then should also be independent of , and furthermore that

but these statements are well known to be false (for reasons related to pairwise independence of random variables being strictly weaker than joint independence). For a concrete counterexample, one can take to be independent, uniformly distributed random elements of the finite field of two elements, and take to be the sum of these two field elements. One can easily check that each of and is separately independent of , but the joint variable determines and thus is not independent of .

From the inclusion-exclusion identities

one can check that (1) is equivalent to the trivial lower bound . The basic issue here is that in the dictionary between entropy and combinatorics, there is no satisfactory entropy analogue of the notion of a triple intersection . (Even the double intersection only exists information theoretically in a “virtual” sense; the mutual information allows one to “compute the entropy” of this “intersection”, but does not actually describe this intersection itself as a random variable.)

However, this issue only arises with three or more variables; it is not too difficult to show that the only linear equalities and inequalities that are necessarily obeyed by the information-theoretic quantities associated to just two variables are those that are also necessarily obeyed by their combinatorial analogues . (See for instance the Venn diagram at the Wikipedia page for mutual information for a pictorial summation of this statement.)

One can work with a larger class of special cases of Shannon entropy by working with random *linear* functions rather than random *boolean* functions. Namely, let be some finite-dimensional vector space over a finite field , and let be a random linear functional on , selected uniformly among all such functions. Every subspace of then gives rise to a random variable formed by restricting to . This random variable is also distributed uniformly amongst all linear functions on , and its entropy can be easily computed to be . Given two random variables formed by restricting to respectively, the joint random variable determines the random linear function on the union on the two spaces, and thus by linearity on the Minkowski sum as well; thus is equivalent to the restriction of to . In particular, . This implies that and also , where is the quotient map. After discarding the normalising constant , this leads to the following dictionary between information theoretic quantities and linear algebra quantities, analogous to the previous dictionary:

Random variables | Subspaces |

Entropy | Dimension |

Joint variable | Sum |

Mutual information | Dimension of intersection |

Conditional entropy | Dimension of projection |

Conditional mutual information | |

independent | transverse () |

determined by | a subspace of |

conditionally independent relative to | , transverse. |

The combinatorial dictionary can be regarded as a specialisation of the linear algebra dictionary, by taking to be the vector space over the finite field of two elements, and only considering those subspaces that are coordinate subspaces associated to various subsets of .

As before, every linear inequality or equality that is valid for the information-theoretic quantities discussed above, is automatically valid for the linear algebra counterparts for subspaces of a vector space over a finite field by applying the above specialisation (and dividing out by the normalising factor of ). In fact, the requirement that the field be finite can be removed by applying the compactness theorem from logic (or one of its relatives, such as Los’s theorem on ultraproducts, as done in this previous blog post).

The linear algebra model captures more of the features of Shannon entropy than the combinatorial model. For instance, in contrast to the combinatorial case, it is possible in the linear algebra setting to have subspaces such that and are separately transverse to , but their sum is not; for instance, in a two-dimensional vector space , one can take to be the one-dimensional subspaces spanned by , , and respectively. Note that this is essentially the same counterexample from before (which took to be the field of two elements). Indeed, one can show that any necessarily true linear inequality or equality involving the dimensions of three subspaces (as well as the various other quantities on the above table) will also be necessarily true when applied to the entropies of three discrete random variables (as well as the corresponding quantities on the above table).

However, the linear algebra model does not completely capture the subtleties of Shannon entropy once one works with *four* or more variables (or subspaces). This was first observed by Ingleton, who established the dimensional inequality

for any subspaces . This is easiest to see when the three terms on the right-hand side vanish; then are transverse, which implies that ; similarly . But and are transverse, and this clearly implies that and are themselves transverse. To prove the general case of Ingleton’s inequality, one can define and use (and similarly for instead of ) to reduce to establishing the inequality

which can be rearranged using (and similarly for instead of ) and as

but this is clear since .

Returning to the entropy setting, the analogue

of (3) is true (exercise!), but the analogue

of Ingleton’s inequality is false in general. Again, this is easiest to see when all the terms on the right-hand side vanish; then are conditionally independent relative to , and relative to , and and are independent, and the claim (4) would then be asserting that and are independent. While there is no linear counterexample to this statement, there are simple non-linear ones: for instance, one can take to be independent uniform variables from , and take and to be (say) and respectively (thus are the indicators of the events and respectively). Once one conditions on either or , one of has positive conditional entropy and the other has zero entropy, and so are conditionally independent relative to either or ; also, or are independent of each other. But and are not independent of each other (they cannot be simultaneously equal to ). Somehow, the feature of the linear algebra model that is not present in general is that in the linear algebra setting, every pair of subspaces has a well-defined intersection that is also a subspace, whereas for arbitrary random variables , there does not necessarily exist the analogue of an intersection, namely a “common information” random variable that has the entropy of and is determined either by or by .

I do not know if there is any simpler model of Shannon entropy that captures all the inequalities available for four variables. One significant complication is that there exist some information inequalities in this setting that are not of Shannon type, such as the Zhang-Yeung inequality

One can however still use these simpler models of Shannon entropy to be able to guess arguments that would work for general random variables. An example of this comes from my paper on the logarithmically averaged Chowla conjecture, in which I showed among other things that

whenever was sufficiently large depending on , where is the Liouville function. The information-theoretic part of the proof was as follows. Given some intermediate scale between and , one can form certain random variables . The random variable is a sign pattern of the form where is a random number chosen from to (with logarithmic weighting). The random variable was tuple of reductions of to primes comparable to . Roughly speaking, what was implicitly shown in the paper (after using the multiplicativity of , the circle method, and the Matomaki-Radziwill theorem on short averages of multiplicative functions) is that if the inequality (5) fails, then there was a lower bound

on the mutual information between and . From translation invariance, this also gives the more general lower bound

for any , where denotes the shifted sign pattern . On the other hand, one had the entropy bounds

and from concatenating sign patterns one could see that is equivalent to the joint random variable for any . Applying these facts and using an “entropy decrement” argument, I was able to obtain a contradiction once was allowed to become sufficiently large compared to , but the bound was quite weak (coming ultimately from the unboundedness of as the interval of values of under consideration becomes large), something of the order of ; the quantity needs at various junctures to be less than a small power of , so the relationship between and becomes essentially quadruple exponential in nature, . The basic strategy was to observe that the lower bound (6) causes some slowdown in the growth rate of the mean entropy, in that this quantity decreased by as increased from to , basically by dividing into components , and observing from (6) each of these shares a bit of common information with the same variable . This is relatively clear when one works in a set model, in which is modeled by a set of size , and is modeled by a set of the form

for various sets of size (also there is some translation symmetry that maps to a shift while preserving all of the ).

However, on considering the set model recently, I realised that one can be a little more efficient by exploiting the fact (basically the Chinese remainder theorem) that the random variables are basically jointly independent as ranges over dyadic values that are much smaller than , which in the set model corresponds to the all being disjoint. One can then establish a variant

of (6), which in the set model roughly speaking asserts that each claims a portion of the of cardinality that is not claimed by previous choices of . This leads to a more efficient contradiction (relying on the unboundedness of rather than ) that looks like it removes one order of exponential growth, thus the relationship between and is now . Returning to the entropy model, one can use (7) and Shannon inequalities to establish an inequality of the form

for a small constant , which on iterating and using the boundedness of gives the claim. (A modification of this analysis, at least on the level of the back of the envelope calculation, suggests that the Matomaki-Radziwill theorem is needed only for ranges greater than or so, although at this range the theorem is not significantly simpler than the general case).

I’ve just uploaded to the arXiv my paper “Some remarks on the lonely runner conjecture“, submitted to Contributions to discrete mathematics. I had blogged about the lonely runner conjecture in this previous blog post, and I returned to the problem recently to see if I could obtain anything further. The results obtained were more modest than I had hoped, but they did at least seem to indicate a potential strategy to make further progress on the problem, and also highlight some of the difficulties of the problem.

One can rephrase the lonely runner conjecture as the following covering problem. Given any integer “velocity” and radius , define the *Bohr set* to be the subset of the unit circle given by the formula

where denotes the distance of to the nearest integer. Thus, for positive, is simply the union of the intervals for , projected onto the unit circle ; in the language of the usual formulation of the lonely runner conjecture, represents those times in which a runner moving at speed returns to within of his or her starting position. For any non-zero integers , let be the smallest radius such that the Bohr sets cover the unit circle:

Then define to be the smallest value of , as ranges over tuples of distinct non-zero integers. The Dirichlet approximation theorem quickly gives that

and hence

for any . The lonely runner conjecture is equivalent to the assertion that this bound is in fact optimal:

Conjecture 1 (Lonely runner conjecture)For any , one has .

This conjecture is currently known for (see this paper of Barajas and Serra), but remains open for higher .

It is natural to try to attack the problem by establishing lower bounds on the quantity . We have the following “trivial” bound, that gets within a factor of two of the conjecture:

Proposition 2 (Trivial bound)For any , one has .

*Proof:* It is not difficult to see that for any non-zero velocity and any , the Bohr set has Lebesgue measure . In particular, by the union bound

we see that the covering (1) is only possible if , giving the claim.

So, in some sense, all the difficulty is coming from the need to improve upon the trivial union bound (2) by a factor of two.

Despite the crudeness of the union bound (2), it has proven surprisingly hard to make substantial improvements on the trivial bound . In 1994, Chen obtained the slight improvement

which was improved a little by Chen and Cusick in 1999 to

when was prime. In a recent paper of Perarnau and Serra, the bound

was obtained for arbitrary . These bounds only improve upon the trivial bound by a multiplicative factor of . Heuristically, one reason for this is as follows. The union bound (2) would of course be sharp if the Bohr sets were all disjoint. Strictly speaking, such disjointness is not possible, because all the Bohr sets have to contain the origin as an interior point. However, it is possible to come up with a large number of Bohr sets which are *almost* disjoint. For instance, suppose that we had velocities that were all prime numbers between and , and that was equal to (and in particular was between and . Then each set can be split into a “kernel” interval , together with the “petal” intervals . Roughly speaking, as the prime varies, the kernel interval stays more or less fixed, but the petal intervals range over disjoint sets, and from this it is not difficult to show that

so that the union bound is within a multiplicative factor of of the truth in this case.

This does not imply that is within a multiplicative factor of of , though, because there are not enough primes between and to assign to distinct velocities; indeed, by the prime number theorem, there are only about such velocities that could be assigned to a prime. So, while the union bound could be close to tight for up to Bohr sets, the above counterexamples don’t exclude improvements to the union bound for larger collections of Bohr sets. Following this train of thought, I was able to obtain a logarithmic improvement to previous lower bounds:

Theorem 3For sufficiently large , one has for some absolute constant .

The factors of in the denominator are for technical reasons and might perhaps be removable by a more careful argument. However it seems difficult to adapt the methods to improve the in the numerator, basically because of the obstruction provided by the near-counterexample discussed above.

Roughly speaking, the idea of the proof of this theorem is as follows. If we have the covering (1) for very close to , then the multiplicity function will then be mostly equal to , but occasionally be larger than . On the other hand, one can compute that the norm of this multiplicity function is significantly larger than (in fact it is at least ). Because of this, the norm must be very large, which means that the triple intersections must be quite large for many triples . Using some basic Fourier analysis and additive combinatorics, one can deduce from this that the velocities must have a large structured component, in the sense that there exists an arithmetic progression of length that contains of these velocities. For simplicity let us take the arithmetic progression to be , thus of the velocities lie in . In particular, from the prime number theorem, most of these velocities will not be prime, and will in fact likely have a “medium-sized” prime factor (in the precise form of the argument, “medium-sized” is defined to be “between and “). Using these medium-sized prime factors, one can show that many of the will have quite a large overlap with many of the other , and this can be used after some elementary arguments to obtain a more noticeable improvement on the union bound (2) than was obtained previously.

A modification of the above argument also allows for the improved estimate

if one knows that *all* of the velocities are of size .

In my previous blog post, I showed that in order to prove the lonely runner conjecture, it suffices to do so under the additional assumption that all of the velocities are of size ; I reproduce this argument (slightly cleaned up for publication) in the current preprint. There is unfortunately a huge gap between and , so the above bound (3) does not immediately give any new bounds for . However, one could perhaps try to start attacking the lonely runner conjecture by increasing the range for which one has good results, and by decreasing the range that one can reduce to. For instance, in the current preprint I give an elementary argument (using a certain amount of case-checking) that shows that the lonely runner bound

holds if all the velocities are assumed to lie between and . This upper threshold of is only a tiny improvement over the trivial threshold of , but it seems to be an interesting sub-problem of the lonely runner conjecture to increase this threshold further. One key target would be to get up to , as there are actually a number of -tuples in this range for which (4) holds with equality. The Dirichlet approximation theorem of course gives the tuple , but there is also the double of this tuple, and furthermore there is an additional construction of Goddyn and Wong that gives some further examples such as , or more generally one can start with the standard tuple and accelerate one of the velocities to ; this turns out to work as long as shares a common factor with every integer between and . There are a few more examples of this type in the paper of Goddyn and Wong, but all of them can be placed in an arithmetic progression of length at most, so if one were very optimistic, one could perhaps envision a strategy in which the upper bound of mentioned earlier was reduced all the way to something like , and then a separate argument deployed to treat this remaining case, perhaps isolating the constructions of Goddyn and Wong (and possible variants thereof) as the only extreme cases.

Let be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic

where denotes the estimate as . Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if is a random number selected uniformly between and , then the above estimate can be written as

that is to say the random variable has mean approximately . (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near , basically thanks to the Hardy-Ramanujan theorem.)

Now we turn to the pair correlations for a fixed positive integer . There is a classical computation of Ingham that shows that

The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.

Using our probabilistic lens, the estimate (2) can be written as

From (1) (and the asymptotic negligibility of the shift by ) we see that the random variables and both have a mean of , so the additional factor of represents some arithmetic coupling between the two random variables.

Ingham’s formula can be established in a number of ways. Firstly, one can expand out and use the hyperbola method (splitting into the cases and and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form

for various . For much less than this can be achieved using a further application of the hyperbola method, but for comparable to things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums for various frequencies . The contribution of “major arc” can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.

Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor will emerge at the end. One can at least explain the as a normalisation constant needed to balance the factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy of , then

using symmetry to order (discarding the diagonal case ) and making the change of variables , we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of in is equal to . (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if denotes the asymptotic mean in , then we have (heuristically at least)

and we obtain the desired consistency after multiplying by .

This still however does not explain the presence of the factor. Intuitively it is reasonable that if has many prime factors, and has a lot of factors, then will have slightly more factors than average, because any common factor to and will automatically be acquired by . But how to quantify this effect?

One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor

where the product is over all primes , and is the local version of at (which in this case, is just one plus the –valuation of : ). Note that all but finitely many of the terms in this product will equal , so the infinite product is well-defined. In a similar fashion, we can factor

where

(or in terms of valuations, ). Heuristically, the Chinese remainder theorem suggests that the various factors behave like independent random variables, and so the correlation between and should approximately decouple into the product of correlations between the local factors and . And indeed we do have the following local version of Ingham’s asymptotics:

Proposition 1 (Local Ingham asymptotics)For fixed and integer , we haveand

From the Euler formula

we see that

and so one can “explain” the arithmetic factor in Ingham’s asymptotic as the product of the arithmetic factors in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.

Remark 2The relation between the local means and the global mean can also be seen heuristically through the applicationof Mertens’ theorem, where is Pólya’s magic exponent, which serves as a useful heuristic limiting threshold in situations where the product of local factors is divergent.

Let us now prove this proposition. One could brute-force the computations by observing that for any fixed , the valuation is equal to with probability , and with a little more effort one can also compute the joint distribution of and , at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of and force to have a factor also.

It is first convenient to get rid of error terms by observing that in the limit , the random variable converges vaguely to a uniform random variable on the profinite integers , or more precisely that the pair converges vaguely to . Because of this (and because of the easily verified uniform integrability properties of and their powers), it suffices to establish the exact formulae

in the profinite setting (this setting will make it easier to set up the recursion).

We begin with (5). Observe that is coprime to with probability , in which case is equal to . Conditioning to the complementary probability event that is divisible by , we can factor where is also uniformly distributed over the profinite integers, in which event we have . We arrive at the identity

As and have the same distribution, the quantities and are equal, and (5) follows by a brief amount of high-school algebra.

We use a similar method to treat (6). First treat the case when is coprime to . Then we see that with probability , and are simultaneously coprime to , in which case . Furthermore, with probability , is divisible by and is not; in which case we can write as before, with and . Finally, in the remaining event with probability , is divisible by and is not; we can then write , so that and . Putting all this together, we obtain

and the claim (6) in this case follows from (5) and a brief computation (noting that in this case).

Now suppose that is divisible by , thus for some integer . Then with probability , and are simultaneously coprime to , in which case . In the remaining event, we can write , and then and . Putting all this together we have

which by (5) (and replacing by ) leads to the recursive relation

and (6) then follows by induction on the number of powers of .

The estimate (2) of Ingham was refined by Estermann, who obtained the more accurate expansion

for certain complicated but explicit coefficients . For instance, is given by the formula

where is the Euler-Mascheroni constant,

The formula for is similar but even more complicated. The error term was improved by Heath-Brown to ; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation here, but this is well out of reach of current methods.

These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the term below the fold.

## Recent Comments