Let be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic
where denotes the estimate as . Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if is a random number selected uniformly between and , then the above estimate can be written as
that is to say the random variable has mean approximately . (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near , basically thanks to the Hardy-Ramanujan theorem.)
Now we turn to the pair correlations for a fixed positive integer . There is a classical computation of Ingham that shows that
The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.
Using our probabilistic lens, the estimate (2) can be written as
From (1) (and the asymptotic negligibility of the shift by ) we see that the random variables and both have a mean of , so the additional factor of represents some arithmetic coupling between the two random variables.
Ingham’s formula can be established in a number of ways. Firstly, one can expand out and use the hyperbola method (splitting into the cases and and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form
for various . For much less than this can be achieved using a further application of the hyperbola method, but for comparable to things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums for various frequencies . The contribution of “major arc” can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.
Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor will emerge at the end. One can at least explain the as a normalisation constant needed to balance the factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy of , then
using symmetry to order (discarding the diagonal case ) and making the change of variables , we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of in is equal to . (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if denotes the asymptotic mean in , then we have (heuristically at least)
and we obtain the desired consistency after multiplying by .
This still however does not explain the presence of the factor. Intuitively it is reasonable that if has many prime factors, and has a lot of factors, then will have slightly more factors than average, because any common factor to and will automatically be acquired by . But how to quantify this effect?
One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor
where the product is over all primes , and is the local version of at (which in this case, is just one plus the –valuation of : ). Note that all but finitely many of the terms in this product will equal , so the infinite product is well-defined. In a similar fashion, we can factor
(or in terms of valuations, ). Heuristically, the Chinese remainder theorem suggests that the various factors behave like independent random variables, and so the correlation between and should approximately decouple into the product of correlations between the local factors and . And indeed we do have the following local version of Ingham’s asymptotics:
From the Euler formula
we see that
and so one can “explain” the arithmetic factor in Ingham’s asymptotic as the product of the arithmetic factors in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.
Remark 2 The relation between the local means and the global mean can also be seen heuristically through the application
Let us now prove this proposition. One could brute-force the computations by observing that for any fixed , the valuation is equal to with probability , and with a little more effort one can also compute the joint distribution of and , at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of and force to have a factor also.
It is first convenient to get rid of error terms by observing that in the limit , the random variable converges vaguely to a uniform random variable on the profinite integers , or more precisely that the pair converges vaguely to . Because of this (and because of the easily verified uniform integrability properties of and their powers), it suffices to establish the exact formulae
We begin with (5). Observe that is coprime to with probability , in which case is equal to . Conditioning to the complementary probability event that is divisible by , we can factor where is also uniformly distributed over the profinite integers, in which event we have . We arrive at the identity
As and have the same distribution, the quantities and are equal, and (5) follows by a brief amount of high-school algebra.
We use a similar method to treat (6). First treat the case when is coprime to . Then we see that with probability , and are simultaneously coprime to , in which case . Furthermore, with probability , is divisible by and is not; in which case we can write as before, with and . Finally, in the remaining event with probability , is divisible by and is not; we can then write , so that and . Putting all this together, we obtain
Now suppose that is divisible by , thus for some integer . Then with probability , and are simultaneously coprime to , in which case . In the remaining event, we can write , and then and . Putting all this together we have
which by (5) (and replacing by ) leads to the recursive relation
and (6) then follows by induction on the number of powers of .
for certain complicated but explicit coefficients . For instance, is given by the formula
where is the Euler-Mascheroni constant,
The formula for is similar but even more complicated. The error term was improved by Heath-Brown to ; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation here, but this is well out of reach of current methods.
These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the term below the fold.
— 1. More refined heuristics —
To begin with we revisit the heuristic justification of (2) and modify it a bit, in the spirit of the “modified Cramér models” from this previous blog post. Our arguments here will be even less rigorous than in the preceding section, for instance we shall be quite cavalier with the use of the symbol without precisely quantifying error terms.
For this section, it will be convenient to set not to be the uniform distribution from to as before, but rather the uniform distribution from to for some fairly small (e.g. will suffice). The reason for this is that we need to start keeping track of the magnitude of to a relative accuracy of at least , and with this new choice of the magnitude is simply (up to this accuracy). By “differentiating” asymptotics such as (7) in we see that
where we ignore terms of lower order than the term and
so that we have the factorisation . From the Chinese remainder theorem we expect and to behave approximately independently, at least when is fixed and is large.
We can statistics of to deduce statistics of . This is traditionally done using the device of Möbius inversion, but we will use a recursive probabilistic approach here similar to that used to prove Proposition 1. Let’s start with understanding for some prime . With probability about , is coprime to , so that . In the remaining event of probability about , we may write where is uniformly distributed near , and then . This leads to the approximate identity
To work with this equation we make two further heuristic assumptions, which turn out to be reasonable at the level of accuracy required for justifying (10), but not for (9). Firstly, we assume that the distribution of is largely unchanged after conditioning to the event , which is plausible from the Chinese remainder theorem. Secondly, we assume that the distribution of is largely the same as that of , and similarly for . With these assumptions, (11) simplifies to
(note that this is consistent with Proposition 1 and the presumed independence of and ). Iterating this argument gives
for any fixed squarefree . Applying (1) we conclude
Now let be a moderately large number (growing very slowly with , e.g. ), and let be the primorial of . We factor
There is arithmetic coupling between the and factors; indeed from Proposition 1 and the Chinese remainder theorem we have
Meanwhile, from (13) we have
In analogy with the modified Cramér model for the primes, we now assume that the random variables , , and behave like independent random variables; this turns out to be sufficiently accurate for predicting (10), but not (9). This leads to the prediction
and if goes to infinity with then we recover (10).
We now repeat the above analysis but expanding to the next order in , which will require keeping better track of the parameter. The Dirichlet hyperbola method actually gives the refinement
We previously computed the expectation . Now that we want to be more accurate, it turns out that it is better to compute the slightly different conditional expectation . Now we compute the mean of a bit more carefully. Again we begin with for prime. From (11) and (16), and by replacing with , we have
Previously we simplified this equation by assuming that the conditioning had negligible impact on , and that had similar distribution to . As it turns out, we can no longer afford to assume these simplifying assumptions at this level of accuracy, and must argue more carefully. With probability about is coprime to , and in the remaining event of probability about , we have and . This gives the approximate identity
If we refine the approximation (12) to give the ansatz
for some constants to be worked out momentarily, the above two equations give after some computation (and canceling out all the terms
which can be solved as
Thus we have
It is instructive (albeit lengthy) to deduce this asymptotic rigorously for fixed square-free and large by using Möbius inversion and the hyperbola method; one can also proceed a bit more quickly by using Perron’s formula and inspecting the pole of the relevant Dirichlet series at . (Not surprisingly, similar calculations appear when computing major arc contributions to (2).)
Now we are ready to predict the value of . Our starting point is again the expansion (14), except that we now strip out the factors within that divide , in order to use (19). More precisely, we adopt the factorisation of arbitrary natural numbers by the formulae
and observe that
Previously we had assumed that the factors , , and behaved independently. However this is not quite true at the current level of accuracy, because the magnitude of the natural numbers and are influenced by the size of and respectively, and these in turn are certainly correlated with . Because of the non-trivial dependence on on the right-hand side of (19) at this level of accuracy, the size of and influences the expected value of and , causing a coupling between the three factors of (20). However, if we first condition the magnitude of and to be fixed, then we expect the three factors to be conditionally independent (at this level of accuracy). Heuristically, with this conditioning, behaves like a random element of size about that is coprime to , and hence by (19)
We can thus heuristically estimate (20) by
This can be expanded as
and is something complicated that we do not compute here. Using (15) we have
together with the analogue of (21) with replaced by . (One in fact has the approximation , which reflects the Laurent approximation near , but we will not need this.)
we can write the left-hand side of (21) as a sum over primes,
The and terms decouple, so by Proposition 1 this becomes
Extracting a common factor of , we reduce to showing that
The left-hand side is a sum over primes , so we work on making the right-hand side a sum over primes as well. This is achieved by the following two lemmas:
Lemma 3 We have .
Proof: While this identity can be proven elementarily, it is fastest to proceed using Dirichlet series. From Taylor expansion and (8) we have
for close to . Inverting, we have
The left-hand side is , giving the claim.
Lemma 4 We have .
Proof: Again it is fastest to proceed using Dirichlet series. Setting and , we have from the fundamental theorem of arithmetic that
Differentiating this at we obtain the claim.
We remark that is an example of a derived multiplicative function, discussed in this previous blog post.
In view of these expansions, we can reduce matters to establishing a purely local estimate at a single prime , namely that
(compare with Proposition 1).
We need two preliminary estimates to handle lower order terms:
Proof: Once again we use recursion. With probability , is coprime to , so vanishes. Otherwise, , , and . We conclude that
From Proposition 1 we have
and the claim follows.
Proof: First suppose that is coprime to . Then is equal to whenever is non-zero, so the claim follows from (23) in this case. Now suppose that for some integer .
When is coprime to the left-hand side is equal to when is coprime to and otherwise, giving the claim. Now suppose that for some integer . With probability , is coprime to , so vanishes; otherwise,, , and . We conclude that
Arguing as before we conclude that
and the claim follows by induction on the number of times divides (noting that ).
Now we can prove (22). First suppose is coprime to . In this case, the random variable is only non-zero when is divisible by , which forces to equal . By Lemma 5, the left-hand side of (22) is thus equal to , thus proving (22) in this case.
Now suppose that is divisible by . With probability , is coprime to , so vanishes. Otherwise, we write and observe that
thus the left-hand side of (22) expands as
The claim then follows by induction on the number of times divides , together with a tedious computation using the identities