This is a postscript to the previous blog post which was concerned with obtaining heuristic asymptotic predictions for the correlation

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h), \ \ \ \ \ (1)$

for the divisor function ${\tau(n) := \sum_{d|n} 1}$, in particular recovering the calculation of Ingham that obtained the asymptotic

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h) \sim \frac{6}{\pi^2} \sigma_{-1}(h) x \log^2 x \ \ \ \ \ (2)$

when ${h}$ was fixed and non-zero and ${x}$ went to infinity. It is natural to consider the more general correlations

$\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h)$

for fixed ${k,l \geq 1}$ and non-zero ${h}$, where

$\displaystyle \tau_k(n) := \sum_{d_1 \dots d_k = n} 1$

is the order ${k}$ divisor function. The sum (1) then corresponds to the case ${k=l=2}$. For ${l=1}$, ${\tau_1(n) = 1}$, and a routine application of the Dirichlet hyperbola method (or Perron’s formula) gives the asymptotic

$\displaystyle \sum_{n \leq x} \tau_k(n) \sim \frac{\log^{k-1} x}{(k-1)!} x,$

or more accurately

$\displaystyle \sum_{n \leq x} \tau_k(n) \sim P_k(\log x) x$

where ${P_k(t)}$ is a certain explicit polynomial of degree ${k-1}$ with leading coefficient ${\frac{1}{(k-1)!}}$; see e.g. Exercise 31 of this previous post for a discussion of the ${k=3}$ case (which is already typical). Similarly if ${k=1}$. For more general ${k,l \geq 1}$, there is a conjecture of Conrey and Gonek which predicts that

$\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h) \sim P_{k,l,h}(\log x) x$

for some polynomial ${P_{k,l,h}(t)}$ of degree ${k+l-2}$ which is explicit but whose form is rather complicated (one has to compute residues of a various complicated products of zeta functions and local factors). This conjecture has been verified when ${k \leq 2}$ or ${l \leq 2}$, by the work of Linnik, Motohashi, Fouvry-Tenenbaum, and others, but all the remaining cases when ${k,l \geq 3}$ are currently open.

In principle, the calculations of the previous post should recover the predictions of Conrey and Gonek. In this post I would like to record this for the top order term:

Conjecture 1 If ${k,l \geq 2}$ and ${h \neq 0}$ are fixed, then

$\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h) \sim \frac{\log^{k-1} x}{(k-1)!} \frac{\log^{l-1} x}{(l-1)!} x \prod_p {\mathfrak S}_{k,l,p}(h)$

as ${x \rightarrow \infty}$, where the product is over all primes ${p}$, and the local factors ${{\mathfrak S}_{k,l,p}(h)}$ are given by the formula

$\displaystyle {\mathfrak S}_{k,l,p}(h) := (\frac{p-1}{p})^{k+l-2} \sum_{j \geq 0: p^j|h} \frac{1}{p^j} P_{k,l,p}(j) \ \ \ \ \ (3)$

where ${P_{k,l,p}}$ is the degree ${k+l-4}$ polynomial

$\displaystyle P_{k,l,p}(j) := \sum_{k'=2}^k \sum_{l'=2}^l \binom{k-k'+j-1}{k-k'} \binom{l-l'+j-1}{l-l'} \alpha_{k',l',p}$

where

$\displaystyle \alpha_{k',l',p} := (\frac{p}{p-1})^{k'-1} + (\frac{p}{p-1})^{l'-1} - 1$

and one adopts the conventions that ${\binom{-1}{0} = 1}$ and ${\binom{m-1}{m} = 0}$ for ${m \geq 1}$.

For instance, if ${k=l=2}$ then

$\displaystyle P_{2,2,p}(h) = \frac{p}{p-1} + \frac{p}{p-1} - 1 = \frac{p+1}{p-1}$

and hence

$\displaystyle {\mathfrak S}_{2,2,p}(h) = (1 - \frac{1}{p^2}) \sum_{j \geq 0: p^j|h} \frac{1}{p^j}$

and the above conjecture recovers the Ingham formula (2). For ${k=2, l=3}$, we have

$\displaystyle P_{2,3,p}(h) =$

$\displaystyle (\frac{p}{p-1} + (\frac{p}{p-1})^2 - 1) + (\frac{p}{p-1} + \frac{p}{p-1} - 1) j$

$\displaystyle = \frac{p^2+p-1}{(p-1)^2} + \frac{p+1}{p-1} j$

and so we predict

$\displaystyle \sum_{n \leq x} \tau(n) \tau_3(n+h) \sim \frac{x \log^3 x}{2} \prod_p {\mathfrak S}_{2,3,p}(h)$

where

$\displaystyle {\mathfrak S}_{2,3,p}(h) = \sum_{j \geq 0: p^j|h} \frac{\frac{p^3 - 2p + 1}{p^3} + \frac{(p+1)(p-1)^2}{p^3} j}{p^j}.$

Similarly, if ${k=l=3}$ we have

$\displaystyle P_{3,3,p}(h) = ((\frac{p}{p-1})^2 + (\frac{p}{p-1})^2 - 1) + 2 (\frac{p}{p-1} + (\frac{p}{p-1})^2 - 1) j$

$\displaystyle + (\frac{p}{p-1} + \frac{p}{p-1} - 1) j^2$

$\displaystyle = \frac{p^2+2p-1}{(p-1)^2} + 2 \frac{p^2+p-1}{(p-1)^2} j + \frac{p+1}{p-1} j^2$

and so we predict

$\displaystyle \sum_{n \leq x} \tau_3(n) \tau_3(n+h) \sim \frac{x \log^4 x}{4} \prod_p {\mathfrak S}_{3,3,p}(h)$

where

$\displaystyle {\mathfrak S}_{3,3,p}(h) = \sum_{j \geq 0: p^j|h} \frac{\frac{p^4 - 4p^2 + 4p - 1}{p^4} + 2 \frac{(p^2+p-1)(p-1)^2}{p^4} j + \frac{(p+1)(p-1)^3}{p^4} j^2}{p^j}.$

and so forth.

As in the previous blog, the idea is to factorise

$\displaystyle \tau_k(n) = \prod_p \tau_{k,p}(n)$

where the local factors ${\tau_{k,p}(n)}$ are given by

$\displaystyle \tau_{k,p}(n) := \sum_{j_1,\dots,j_k \geq 0: p^{j_1+\dots+j_k} || n} 1$

(where ${p^j || n}$ means that ${p}$ divides ${n}$ precisely ${j}$ times), or in terms of the valuation ${v_p(n)}$ of ${n}$ at ${p}$,

$\displaystyle \tau_{k,p}(n) = \binom{k-1+v_p(n)}{k-1}. \ \ \ \ \ (4)$

We then have the following exact local asymptotics:

Proposition 2 (Local correlations) Let ${{\bf n}}$ be a profinite integer chosen uniformly at random, let ${h}$ be a profinite integer, and let ${k,l \geq 2}$. Then

$\displaystyle {\bf E} \tau_{k,p}({\bf n}) = (\frac{p}{p-1})^{k-1} \ \ \ \ \ (5)$

and

$\displaystyle {\bf E} \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = (\frac{p}{p-1})^{k+l-2} {\mathfrak S}_{k,l,p}(h). \ \ \ \ \ (6)$

(For profinite integers it is possible that ${v_p({\bf n})}$ and hence ${\tau_{k,p}({\bf n})}$ are infinite, but this is a probability zero event and so can be ignored.)

Conjecture 1 can then be heuristically justified from the local calculations (2) by various pseudorandomness heuristics, as discussed in the previous post.

I’ll give a short proof of the above proposition below, basically using the recursive methods of the previous post. This short proof actually took be quite a while to find; I spent several hours and a fair bit of scratch paper working out the cases ${k,l = 2,3}$ laboriously by hand (with some assistance and cross-checking from Maple). Here is an unorganised sample of some of this scratch, just to show how the sausage is actually made:

It was only after expending all this effort that I realised that it would be much more efficient to compute the correlations for all values of ${k,l}$ simultaneously by using generating functions. After performing this computation, it then became apparent that there would be a direct combinatorial proof of (6) that was shorter than even the generating function proof. (I will not supply the full generating function calculations here, but will at least show them for the easier correlation (5).)

I am confident that Conjecture 1 is consistent with the explicit asymptotic in the Conrey-Gonek conjecture, but have not yet rigorously established that the leading order term in the latter is indeed identical to the expression provided above.

We now prove (5). Here we will use the generating function method. From the binomial formula we have the formal expansion

$\displaystyle \sum_{k=1}^\infty \tau_{k,p}(n) t^{k-1} = (1-t)^{-v_p(n)-1}$

for any ${n}$. By the geometric series formula, it thus suffices to show that

$\displaystyle {\bf E} (1-t)^{-v_p({\bf n})-1} = (1 - \frac{p}{p-1} t)^{-1}$

in the ring of formal power series in ${t}$.

With probability ${\frac{p-1}{p}}$, ${{\bf n}}$ is coprime to ${p}$ and thus ${(1-t)^{-v_p({\bf n})-1} = \frac{1}{1-t}}$. In the remaining probability ${\frac{1}{p}}$ event, ${{\bf n} = p {\bf n}'}$ for some ${{\bf n}'}$ with the same distribution as the original random variable ${{\bf n}}$, and ${(1-t)^{-v_p({\bf n})-1} = \frac{1}{1-t} (1-t)^{-v_p({\bf n}')-1}}$. This leads to the identity

$\displaystyle {\bf E} (1-t)^{-v_p({\bf n})-1} = \frac{p-1}{p} \frac{1}{1-t} + \frac{1}{p} \frac{1}{1-t} {\bf E} (1-t)^{-v_p({\bf n})-1}$

and the claim then follows from a brief amount of high-school algebra.

Now we prove (6). By (3), our task is to show that

$\displaystyle {\bf E} \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = \sum_{k'=2}^k \sum_{l'=2}^l \sum_{j \geq 0: p^j|h} \frac{1}{p^j} \binom{k-k'+j-1}{k-k'} \binom{l-l'+j-1}{l-l'} \alpha_{k',l',p}. \ \ \ \ \ (7)$

The above generating function method will establish this (and, as I said above, this was how I originally discovered the formula), but we give here a combinatorial proof. We first deal with the case when ${h}$ is not divisible by ${p}$. In this case, at least one of ${\tau_{k,p}({\bf n})}$ and ${\tau_{l,p}({\bf n}+h)}$ must equal ${1}$; we can phrase this fact algebraically as the identity

$\displaystyle (\tau_{k,p}({\bf n})-1) (\tau_{l,p}({\bf n}+h)-1) = 0$

or equivalently

$\displaystyle \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = \tau_{k,p}({\bf n}) + \tau_{l,p}({\bf n}+h) - 1. \ \ \ \ \ (8)$

By linearity (and translation invariance) of expectation and (5) we thus have

$\displaystyle {\bf E} \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = (\frac{p}{p-1})^{k-1} + (\frac{p}{p-1})^{l-1} - 1$

$\displaystyle = \alpha_{k,l,p}$

which gives (7) in this case.

Now suppose that ${h = ph'}$ for some (profinite) integer ${h'}$. With probability ${\frac{p-1}{p}}$, ${{\bf n}}$ is coprime to ${p}$, so that ${\tau_{k,p}({\bf n}) = \tau_{l,p}({\bf n}+h) = 1}$; in particular (8) holds here. In the remaining probability ${\frac{1}{p}}$ event, we can write ${{\bf n} = p {\bf n}'}$. From (4) and Pascal triangle identities, we have

$\displaystyle \tau_{k,p}({\bf n}) = 1 + \sum_{k'=2}^k \tau_{k',p}({\bf n}')$

and similarly

$\displaystyle \tau_{l,p}({\bf n}+h) = 1 + \sum_{l'=2}^l \tau_{l',p}({\bf n}'+h')$

and hence on subtracting ${1}$ from both equations, multiplying, and rearranging we have a variant of (8):

$\displaystyle \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = \tau_{k,p}({\bf n}) + \tau_{l,p}({\bf n}+h) - 1$

$\displaystyle + \sum_{k'=2}^k \sum_{l'=2}^l \tau_{k',p}({\bf n}') \tau_{l',p}({\bf n}'+h').$

Putting this together, we see that

$\displaystyle \mathop{\bf E} \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = \mathop{\bf E} \tau_{k,p}({\bf n}) + \tau_{l,p}({\bf n}+h) - 1$

$\displaystyle + \frac{1}{p} \sum_{k'=2}^k \sum_{l'=2}^l \mathop{\bf E} \tau_{k',p}({\bf n}') \tau_{l',p}({\bf n}'+h')$

and hence by (5) as before

$\displaystyle \mathop{\bf E} \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = \alpha_{k,l,p} + \frac{1}{p} \sum_{k'=2}^k \sum_{l'=2}^l \mathop{\bf E} \tau_{k',p}({\bf n}) \tau_{l',p}({\bf n}+h').$

The claim is now easily verified from an induction on ${v_p(h)}$, as well as many applications of Pascal triangle identities.