Let be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic

where denotes the estimate as . Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if is a random number selected uniformly between and , then the above estimate can be written as

that is to say the random variable has mean approximately . (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near , basically thanks to the Hardy-Ramanujan theorem.)

Now we turn to the pair correlations for a fixed positive integer . There is a classical computation of Ingham that shows that

The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.

Using our probabilistic lens, the estimate (2) can be written as

From (1) (and the asymptotic negligibility of the shift by ) we see that the random variables and both have a mean of , so the additional factor of represents some arithmetic coupling between the two random variables.

Ingham’s formula can be established in a number of ways. Firstly, one can expand out and use the hyperbola method (splitting into the cases and and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form

for various . For much less than this can be achieved using a further application of the hyperbola method, but for comparable to things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums for various frequencies . The contribution of “major arc” can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.

Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor will emerge at the end. One can at least explain the as a normalisation constant needed to balance the factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy of , then

using symmetry to order (discarding the diagonal case ) and making the change of variables , we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of in is equal to . (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if denotes the asymptotic mean in , then we have (heuristically at least)

and we obtain the desired consistency after multiplying by .

This still however does not explain the presence of the factor. Intuitively it is reasonable that if has many prime factors, and has a lot of factors, then will have slightly more factors than average, because any common factor to and will automatically be acquired by . But how to quantify this effect?

One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor

where the product is over all primes , and is the local version of at (which in this case, is just one plus the –valuation of : ). Note that all but finitely many of the terms in this product will equal , so the infinite product is well-defined. In a similar fashion, we can factor

where

(or in terms of valuations, ). Heuristically, the Chinese remainder theorem suggests that the various factors behave like independent random variables, and so the correlation between and should approximately decouple into the product of correlations between the local factors and . And indeed we do have the following local version of Ingham’s asymptotics:

Proposition 1 (Local Ingham asymptotics)For fixed and integer , we haveand

From the Euler formula

we see that

and so one can “explain” the arithmetic factor in Ingham’s asymptotic as the product of the arithmetic factors in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.

Remark 2The relation between the local means and the global mean can also be seen heuristically through the applicationof Mertens’ theorem, where is Pólya’s magic exponent, which serves as a useful heuristic limiting threshold in situations where the product of local factors is divergent.

Let us now prove this proposition. One could brute-force the computations by observing that for any fixed , the valuation is equal to with probability , and with a little more effort one can also compute the joint distribution of and , at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of and force to have a factor also.

It is first convenient to get rid of error terms by observing that in the limit , the random variable converges vaguely to a uniform random variable on the profinite integers , or more precisely that the pair converges vaguely to . Because of this (and because of the easily verified uniform integrability properties of and their powers), it suffices to establish the exact formulae

in the profinite setting (this setting will make it easier to set up the recursion).

We begin with (5). Observe that is coprime to with probability , in which case is equal to . Conditioning to the complementary probability event that is divisible by , we can factor where is also uniformly distributed over the profinite integers, in which event we have . We arrive at the identity

As and have the same distribution, the quantities and are equal, and (5) follows by a brief amount of high-school algebra.

We use a similar method to treat (6). First treat the case when is coprime to . Then we see that with probability , and are simultaneously coprime to , in which case . Furthermore, with probability , is divisible by and is not; in which case we can write as before, with and . Finally, in the remaining event with probability , is divisible by and is not; we can then write , so that and . Putting all this together, we obtain

and the claim (6) in this case follows from (5) and a brief computation (noting that in this case).

Now suppose that is divisible by , thus for some integer . Then with probability , and are simultaneously coprime to , in which case . In the remaining event, we can write , and then and . Putting all this together we have

which by (5) (and replacing by ) leads to the recursive relation

and (6) then follows by induction on the number of powers of .

The estimate (2) of Ingham was refined by Estermann, who obtained the more accurate expansion

for certain complicated but explicit coefficients . For instance, is given by the formula

where is the Euler-Mascheroni constant,

The formula for is similar but even more complicated. The error term was improved by Heath-Brown to ; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation here, but this is well out of reach of current methods.

These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the term below the fold.

** — 1. More refined heuristics — **

To begin with we revisit the heuristic justification of (2) and modify it a bit, in the spirit of the “modified Cramér models” from this previous blog post. Our arguments here will be even less rigorous than in the preceding section, for instance we shall be quite cavalier with the use of the symbol without precisely quantifying error terms.

For this section, it will be convenient to set not to be the uniform distribution from to as before, but rather the uniform distribution from to for some fairly small (e.g. will suffice). The reason for this is that we need to start keeping track of the magnitude of to a relative accuracy of at least , and with this new choice of the magnitude is simply (up to this accuracy). By “differentiating” asymptotics such as (7) in we see that

where we ignore terms of lower order than the term and

and so our task is now to heuristically justify the statement

Let us begin by revisiting the derivation of the simpler asymptotic

without passing immediately to local factors, but instead using a modified Cramér type model that will be more amenable to keeping careful track of the magnitude . For any square-free , write

and

so that we have the factorisation . From the Chinese remainder theorem we expect and to behave approximately independently, at least when is fixed and is large.

We can use statistics of to deduce statistics of . This is traditionally done using the device of Möbius inversion, but we will use a recursive probabilistic approach here similar to that used to prove Proposition 1. Let’s start with understanding for some prime . With probability about , is coprime to , so that . In the remaining event of probability about , we may write where is uniformly distributed near , and then . This leads to the approximate identity

To work with this equation we make two further heuristic assumptions, which turn out to be reasonable at the level of accuracy required for justifying (10), but not for (9). Firstly, we assume that the distribution of is largely unchanged after conditioning to the event , which is plausible from the Chinese remainder theorem. Secondly, we assume that the distribution of is largely the same as that of , and similarly for . With these assumptions, (11) simplifies to

(note that this is consistent with Proposition 1 and the presumed independence of and ). Iterating this argument gives

for any fixed squarefree . Applying (1) we conclude

Now let be a moderately large number (growing very slowly with , e.g. ), and let be the primorial of . We factor

There is arithmetic coupling between the and factors; indeed from Proposition 1 and the Chinese remainder theorem we have

Meanwhile, from (13) we have

and

In analogy with the modified Cramér model for the primes, we now assume that the random variables , , and behave like independent random variables; this turns out to be sufficiently accurate for predicting (10), but not (9). This leads to the prediction

and if goes to infinity with then we recover (10).

We now repeat the above analysis but expanding to the next order in , which will require keeping better track of the parameter. The Dirichlet hyperbola method actually gives the refinement

and hence

We previously computed the expectation . Now that we want to be more accurate, it turns out that it is better to compute the slightly different *conditional expectation* . Now we compute the mean of a bit more carefully. Again we begin with for prime. From (11) and (16), and by replacing with , we have

Previously we simplified this equation by assuming that the conditioning had negligible impact on , and that had similar distribution to . As it turns out, we can no longer afford to assume these simplifying assumptions at this level of accuracy, and must argue more carefully. With probability about is coprime to , and in the remaining event of probability about , we have and . This gives the approximate identity

If we refine the approximation (12) to give the ansatz

and

for some constants to be worked out momentarily, the above two equations give after some computation (and canceling out all the terms

which can be solved as

Thus we have

It is instructive (albeit lengthy) to deduce this asymptotic rigorously for fixed square-free and large by using Möbius inversion and the hyperbola method; one can also proceed a bit more quickly by using Perron’s formula and inspecting the pole of the relevant Dirichlet series at . (Not surprisingly, similar calculations appear when computing major arc contributions to (2).)

Now we are ready to predict the value of . Our starting point is again the expansion (14), except that we now strip out the factors within that divide , in order to use (19). More precisely, we adopt the factorisation of arbitrary natural numbers by the formulae

and observe that

and similarly

We are now faced with computing the expression

Previously we had assumed that the factors , , and behaved independently. However this is not quite true at the current level of accuracy, because the magnitude of the natural numbers and are influenced by the size of and respectively, and these in turn are certainly correlated with . Because of the non-trivial dependence on on the right-hand side of (19) at this level of accuracy, the size of and influences the expected value of and , causing a coupling between the three factors of (20). However, if we first condition the magnitude of and to be fixed, then we expect the three factors to be *conditionally* independent (at this level of accuracy). Heuristically, with this conditioning, behaves like a random element of size about that is coprime to , and hence by (19)

and similarly

We can thus heuristically estimate (20) by

This can be expanded as

where

and is something complicated that we do not compute here. Using (15) we have

which recovers the Ingham estimate (10). To obtain the finer approximation (9) we need to show that

It will suffice to obtain the asymptotic

together with the analogue of (21) with replaced by . (One in fact has the approximation , which reflects the Laurent approximation near , but we will not need this.)

We will just prove (21), as the analogue of (21) for is proven similarly. As this is a local calculation we may replace by . Splitting

we can write the left-hand side of (21) as a sum over primes,

The and terms decouple, so by Proposition 1 this becomes

Extracting a common factor of , we reduce to showing that

The left-hand side is a sum over primes , so we work on making the right-hand side a sum over primes as well. This is achieved by the following two lemmas:

Lemma 3We have .

*Proof:* While this identity can be proven elementarily, it is fastest to proceed using Dirichlet series. From Taylor expansion and (8) we have

for close to . Inverting, we have

and hence

The left-hand side is , giving the claim.

Lemma 4We have .

*Proof:* Again it is fastest to proceed using Dirichlet series. Setting and , we have from the fundamental theorem of arithmetic that

Differentiating this at we obtain the claim.

We remark that is an example of a derived multiplicative function, discussed in this previous blog post.

In view of these expansions, we can reduce matters to establishing a purely local estimate at a single prime , namely that

(compare with Proposition 1).

We need two preliminary estimates to handle lower order terms:

*Proof:* Once again we use recursion. With probability , is coprime to , so vanishes. Otherwise, , , and . We conclude that

From Proposition 1 we have

and from the identity we thus have

and the claim follows.

Note that one can view Lemma 5 as the (p-adic) limiting case of Lemma 6.

*Proof:* First suppose that is coprime to . Then is equal to whenever is non-zero, so the claim follows from (23) in this case. Now suppose that for some integer .

When is coprime to the left-hand side is equal to when is coprime to and otherwise, giving the claim. Now suppose that for some integer . With probability , is coprime to , so vanishes; otherwise,, , and . We conclude that

Arguing as before we conclude that

and the claim follows by induction on the number of times divides (noting that ).

Now we can prove (22). First suppose is coprime to . In this case, the random variable is only non-zero when is divisible by , which forces to equal . By Lemma 5, the left-hand side of (22) is thus equal to , thus proving (22) in this case.

Now suppose that is divisible by . With probability , is coprime to , so vanishes. Otherwise, we write and observe that

thus the left-hand side of (22) expands as

This expands into eight terms that can be computed using Proposition 1, (23), and Lemmas 5, (6) as

The claim then follows by induction on the number of times divides , together with a tedious computation using the identities

and

## 9 comments

Comments feed for this article

30 August, 2016 at 4:43 pm

AnonymousA simpler explanation for the local factorization of is by its multiplicativity.

30 August, 2016 at 5:27 pm

Marco BaroneHi, everybody, how can I open a new post? It is about a set of measures on N, that may lead to some new ways of weighing certain special sets of integers. It has something to do with this thread but not enough to be a comment to it. Please help, I really can’t figure this out. I can only comment here. Apologies for writing here!

31 August, 2016 at 7:12 am

Jhon ManugalI recall the day in Probability class the teacher explained that was sufficient to make all my computations work (and pass the class).

Here I am tempted to write but this became circular reasoning. Somehow we need to find the two-point function (or correlation) $$ and such elementary reasoning led nowhere.

31 August, 2016 at 12:28 pm

Heuristic computation of correlations of higher order divisor functions | What's new[…] is a postscript to the previous blog post which was concerned with obtaining heuristic asymptotic predictions for the […]

31 August, 2016 at 12:36 pm

Lior SilbermanIf all you want is the asymptotic behaviour then the estimate follows from changing the order of summation. The hyperbola method is needed to get the correct error term ().

1 September, 2016 at 8:27 pm

kodluVery nice post. Right above equation (4) is the phrase:

“introduce an independent copy ${{\bf n}’}$ is an independent copy of ${{\bf n}}$”

which probably could be changed to

“introduce an independent copy ${{\bf n}’}$ of ${{\bf n}}$”.

[Corrected, thanks – T.]3 September, 2016 at 6:19 am

Lior Bary-SorokerCan you say something about the heuristics of the correlation of twisted divisor function, e.g., replacing by , where is the quadratic character modulo 4?

3 September, 2016 at 8:45 am

Terence TaoYes, the same arguments apply (evaluating local factors and then multiplying the correlations together to predict the main term). In many cases the correlations will now have a vanishing main term and thus be very small, though not always. For instance will not be small because and so the summand is consistently negative.

2 October, 2016 at 7:07 pm

Ravi A. BajajThere is a typo in the first sentence of the paragraph where you “use a recursive probabilistic approach” instead of a Mobius transformation at that point of the article. It says, “We can statistics of” — a word is missing.

[Corrected, thanks – T.]