You are currently browsing the tag archive for the ‘divisor function’ tag.

Kaisa Matomaki, Maksym Radziwill, and I have uploaded to the arXiv our paper “Correlations of the von Mangoldt and higher divisor functions II. Divisor correlations in short ranges“. This is a sequel of sorts to our previous paper on divisor correlations, though the proof techniques in this paper are rather different. As with the previous paper, our interest is in correlations such as

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) \ \ \ \ \ (1)$

for medium-sized ${h}$ and large ${X}$, where ${k \geq l \geq 1}$ are natural numbers and ${d_k(n) = \sum_{n = m_1 \dots m_k} 1}$ is the ${k^{th}}$ divisor function (actually our methods can also treat a generalisation in which ${k}$ is non-integer, but for simplicity let us stick with the integer case for this discussion). Our methods also allow for one of the divisor function factors to be replaced with a von Mangoldt function, but (in contrast to the previous paper) we cannot treat the case when both factors are von Mangoldt.

As discussed in this previous post, one heuristically expects an asymptotic of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O( X^{1/2+\varepsilon})$

for any fixed ${\varepsilon>0}$, where ${P_{k,l,h}}$ is a certain explicit (but rather complicated) polynomial of degree ${k+l-1}$. Such asymptotics are known when ${l \leq 2}$, but remain open for ${k \geq l \geq 3}$. In the previous paper, we were able to obtain a weaker bound of the form

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = P_{k,l,h}( \log X ) X + O_A( X \log^{-A} X)$

for ${1-O_A(\log^{-A} X)}$ of the shifts ${-H \leq h \leq H}$, whenever the shift range ${H}$ lies between ${X^{8/33+\varepsilon}}$ and ${X^{1-\varepsilon}}$. But the methods become increasingly hard to use as ${H}$ gets smaller. In this paper, we use a rather different method to obtain the even weaker bound

$\displaystyle \sum_{n \leq X} d_k(n) d_l(n+h) = (1+o(1)) P_{k,l,h}( \log X ) X$

for ${1-o(1)}$ of the shifts ${-H \leq h \leq H}$, where ${H}$ can now be as short as ${H = \log^{10^4 k \log k} X}$. The constant ${10^4}$ can be improved, but there are serious obstacles to using our method to go below ${\log^{k \log k} X}$ (as the exceptionally large values of ${d_k}$ then begin to dominate). This can be viewed as an analogue to our previous paper on correlations of bounded multiplicative functions on average, in which the functions ${d_k,d_l}$ are now unbounded, and indeed our proof strategy is based in large part on that paper (but with many significant new technical complications).

We now discuss some of the ingredients of the proof. Unsurprisingly, the first step is the circle method, expressing (1) in terms of exponential sums such as

$\displaystyle S(\alpha) := \sum_{n \leq X} d_k(n) e(\alpha).$

Actually, it is convenient to first prune ${d_k}$ slightly by zeroing out this function on “atypical” numbers ${n}$ that have an unusually small or large number of factors in a certain sense, but let us ignore this technicality for this discussion. The contribution of ${S(\alpha)}$ for “major arc” ${\alpha}$ can be treated by standard techniques (and is the source of the main term ${P_{k,l,h}(\log X) X}$; the main difficulty comes from treating the contribution of “minor arc” ${\alpha}$.

In our previous paper on bounded multiplicative functions, we used Plancherel’s theorem to estimate the global ${L^2}$ norm ${\int_{{\bf R}/{\bf Z}} |S(\alpha)|^2\ d\alpha}$, and then also used the Katai-Bourgain-Sarnak-Ziegler orthogonality criterion to control local ${L^2}$ norms ${\int_I |S(\alpha)|^2\ d\alpha}$, where ${I}$ was a minor arc interval of length about ${1/H}$, and these two estimates together were sufficient to get a good bound on correlations by an application of Hölder’s inequality. For ${d_k}$, it is more convenient to use Dirichlet series methods (and Ramaré-type factorisations of such Dirichlet series) to control local ${L^2}$ norms on minor arcs, in the spirit of the proof of the Matomaki-Radziwill theorem; a key point is to develop “log-free” mean value theorems for Dirichlet series associated to functions such as ${d_k}$, so as not to wipe out the (rather small) savings one will get over the trivial bound from this method. On the other hand, the global ${L^2}$ bound will definitely be unusable, because the ${\ell^2}$ sum ${\sum_{n \leq X} d_k(n)^2}$ has too many unwanted factors of ${\log X}$. Fortunately, we can substitute this global ${L^2}$ bound with a “large values” bound that controls expressions such as

$\displaystyle \sum_{i=1}^J \int_{I_i} |S(\alpha)|^2\ d\alpha$

for a moderate number of disjoint intervals ${I_1,\dots,I_J}$, with a bound that is slightly better (for ${J}$ a medium-sized power of ${\log X}$) than what one would have obtained by bounding each integral ${\int_{I_i} |S(\alpha)|^2\ d\alpha}$ separately. (One needs to save more than ${J^{1/2}}$ for the argument to work; we end up saving a factor of about ${J^{3/4}}$.) This large values estimate is probably the most novel contribution of the paper. After taking the Fourier transform, matters basically reduce to getting a good estimate for

$\displaystyle \sum_{i=1}^J (\int_X^{2X} |\sum_{x \leq n \leq x+H} d_k(n) e(\alpha_i n)|^2\ dx)^{1/2},$

where ${\alpha_i}$ is the midpoint of ${I_i}$; thus we need some upper bound on the large local Fourier coefficients of ${d_k}$. These coefficients are difficult to calculate directly, but, in the spirit of a paper of Ben Green and myself, we can try to replace ${d_k}$ by a more tractable and “pseudorandom” majorant ${\tilde d_k}$ for which the local Fourier coefficients are computable (on average). After a standard duality argument, one ends up having to control expressions such as

$\displaystyle |\sum_{x \leq n \leq x+H} \tilde d_k(n) e((\alpha_i -\alpha_{i'}) n)|$

after various averaging in the ${x, i,i'}$ parameters. These local Fourier coefficients of ${\tilde d_k}$ turn out to be small on average unless ${\alpha_i -\alpha_{i'}}$ is “major arc”. One then is left with a mostly combinatorial problem of trying to bound how often this major arc scenario occurs. This is very close to a computation in the previously mentioned paper of Ben and myself; there is a technical wrinkle in that the ${\alpha_i}$ are not as well separated as they were in my paper with Ben, but it turns out that one can modify the arguments in that paper to still obtain a satisfactory estimate in this case (after first grouping nearby frequencies ${\alpha_i}$ together, and modifying the duality argument accordingly).

This is a postscript to the previous blog post which was concerned with obtaining heuristic asymptotic predictions for the correlation

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h), \ \ \ \ \ (1)$

for the divisor function ${\tau(n) := \sum_{d|n} 1}$, in particular recovering the calculation of Ingham that obtained the asymptotic

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h) \sim \frac{6}{\pi^2} \sigma_{-1}(h) x \log^2 x \ \ \ \ \ (2)$

when ${h}$ was fixed and non-zero and ${x}$ went to infinity. It is natural to consider the more general correlations

$\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h)$

for fixed ${k,l \geq 1}$ and non-zero ${h}$, where

$\displaystyle \tau_k(n) := \sum_{d_1 \dots d_k = n} 1$

is the order ${k}$ divisor function. The sum (1) then corresponds to the case ${k=l=2}$. For ${l=1}$, ${\tau_1(n) = 1}$, and a routine application of the Dirichlet hyperbola method (or Perron’s formula) gives the asymptotic

$\displaystyle \sum_{n \leq x} \tau_k(n) \sim \frac{\log^{k-1} x}{(k-1)!} x,$

or more accurately

$\displaystyle \sum_{n \leq x} \tau_k(n) \sim P_k(\log x) x$

where ${P_k(t)}$ is a certain explicit polynomial of degree ${k-1}$ with leading coefficient ${\frac{1}{(k-1)!}}$; see e.g. Exercise 31 of this previous post for a discussion of the ${k=3}$ case (which is already typical). Similarly if ${k=1}$. For more general ${k,l \geq 1}$, there is a conjecture of Conrey and Gonek which predicts that

$\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h) \sim P_{k,l,h}(\log x) x$

for some polynomial ${P_{k,l,h}(t)}$ of degree ${k+l-2}$ which is explicit but whose form is rather complicated (one has to compute residues of a various complicated products of zeta functions and local factors). This conjecture has been verified when ${k \leq 2}$ or ${l \leq 2}$, by the work of Linnik, Motohashi, Fouvry-Tenenbaum, and others, but all the remaining cases when ${k,l \geq 3}$ are currently open.

In principle, the calculations of the previous post should recover the predictions of Conrey and Gonek. In this post I would like to record this for the top order term:

Conjecture 1 If ${k,l \geq 2}$ and ${h \neq 0}$ are fixed, then

$\displaystyle \sum_{n \leq x} \tau_k(n) \tau_l(n+h) \sim \frac{\log^{k-1} x}{(k-1)!} \frac{\log^{l-1} x}{(l-1)!} x \prod_p {\mathfrak S}_{k,l,p}(h)$

as ${x \rightarrow \infty}$, where the product is over all primes ${p}$, and the local factors ${{\mathfrak S}_{k,l,p}(h)}$ are given by the formula

$\displaystyle {\mathfrak S}_{k,l,p}(h) := (\frac{p-1}{p})^{k+l-2} \sum_{j \geq 0: p^j|h} \frac{1}{p^j} P_{k,l,p}(j) \ \ \ \ \ (3)$

where ${P_{k,l,p}}$ is the degree ${k+l-4}$ polynomial

$\displaystyle P_{k,l,p}(j) := \sum_{k'=2}^k \sum_{l'=2}^l \binom{k-k'+j-1}{k-k'} \binom{l-l'+j-1}{l-l'} \alpha_{k',l',p}$

where

$\displaystyle \alpha_{k',l',p} := (\frac{p}{p-1})^{k'-1} + (\frac{p}{p-1})^{l'-1} - 1$

and one adopts the conventions that ${\binom{-1}{0} = 1}$ and ${\binom{m-1}{m} = 0}$ for ${m \geq 1}$.

For instance, if ${k=l=2}$ then

$\displaystyle P_{2,2,p}(h) = \frac{p}{p-1} + \frac{p}{p-1} - 1 = \frac{p+1}{p-1}$

and hence

$\displaystyle {\mathfrak S}_{2,2,p}(h) = (1 - \frac{1}{p^2}) \sum_{j \geq 0: p^j|h} \frac{1}{p^j}$

and the above conjecture recovers the Ingham formula (2). For ${k=2, l=3}$, we have

$\displaystyle P_{2,3,p}(h) =$

$\displaystyle (\frac{p}{p-1} + (\frac{p}{p-1})^2 - 1) + (\frac{p}{p-1} + \frac{p}{p-1} - 1) j$

$\displaystyle = \frac{p^2+p-1}{(p-1)^2} + \frac{p+1}{p-1} j$

and so we predict

$\displaystyle \sum_{n \leq x} \tau(n) \tau_3(n+h) \sim \frac{x \log^3 x}{2} \prod_p {\mathfrak S}_{2,3,p}(h)$

where

$\displaystyle {\mathfrak S}_{2,3,p}(h) = \sum_{j \geq 0: p^j|h} \frac{\frac{p^3 - 2p + 1}{p^3} + \frac{(p+1)(p-1)^2}{p^3} j}{p^j}.$

Similarly, if ${k=l=3}$ we have

$\displaystyle P_{3,3,p}(h) = ((\frac{p}{p-1})^2 + (\frac{p}{p-1})^2 - 1) + 2 (\frac{p}{p-1} + (\frac{p}{p-1})^2 - 1) j$

$\displaystyle + (\frac{p}{p-1} + \frac{p}{p-1} - 1) j^2$

$\displaystyle = \frac{p^2+2p-1}{(p-1)^2} + 2 \frac{p^2+p-1}{(p-1)^2} j + \frac{p+1}{p-1} j^2$

and so we predict

$\displaystyle \sum_{n \leq x} \tau_3(n) \tau_3(n+h) \sim \frac{x \log^4 x}{4} \prod_p {\mathfrak S}_{3,3,p}(h)$

where

$\displaystyle {\mathfrak S}_{3,3,p}(h) = \sum_{j \geq 0: p^j|h} \frac{\frac{p^4 - 4p^2 + 4p - 1}{p^4} + 2 \frac{(p^2+p-1)(p-1)^2}{p^4} j + \frac{(p+1)(p-1)^3}{p^4} j^2}{p^j}.$

and so forth.

As in the previous blog, the idea is to factorise

$\displaystyle \tau_k(n) = \prod_p \tau_{k,p}(n)$

where the local factors ${\tau_{k,p}(n)}$ are given by

$\displaystyle \tau_{k,p}(n) := \sum_{j_1,\dots,j_k \geq 0: p^{j_1+\dots+j_k} || n} 1$

(where ${p^j || n}$ means that ${p}$ divides ${n}$ precisely ${j}$ times), or in terms of the valuation ${v_p(n)}$ of ${n}$ at ${p}$,

$\displaystyle \tau_{k,p}(n) = \binom{k-1+v_p(n)}{k-1}. \ \ \ \ \ (4)$

We then have the following exact local asymptotics:

Proposition 2 (Local correlations) Let ${{\bf n}}$ be a profinite integer chosen uniformly at random, let ${h}$ be a profinite integer, and let ${k,l \geq 2}$. Then

$\displaystyle {\bf E} \tau_{k,p}({\bf n}) = (\frac{p}{p-1})^{k-1} \ \ \ \ \ (5)$

and

$\displaystyle {\bf E} \tau_{k,p}({\bf n}) \tau_{l,p}({\bf n}+h) = (\frac{p}{p-1})^{k+l-2} {\mathfrak S}_{k,l,p}(h). \ \ \ \ \ (6)$

(For profinite integers it is possible that ${v_p({\bf n})}$ and hence ${\tau_{k,p}({\bf n})}$ are infinite, but this is a probability zero event and so can be ignored.)

Conjecture 1 can then be heuristically justified from the local calculations (2) by various pseudorandomness heuristics, as discussed in the previous post.

I’ll give a short proof of the above proposition below, basically using the recursive methods of the previous post. This short proof actually took be quite a while to find; I spent several hours and a fair bit of scratch paper working out the cases ${k,l = 2,3}$ laboriously by hand (with some assistance and cross-checking from Maple). Here is an unorganised sample of some of this scratch, just to show how the sausage is actually made:

It was only after expending all this effort that I realised that it would be much more efficient to compute the correlations for all values of ${k,l}$ simultaneously by using generating functions. After performing this computation, it then became apparent that there would be a direct combinatorial proof of (6) that was shorter than even the generating function proof. (I will not supply the full generating function calculations here, but will at least show them for the easier correlation (5).)

I am confident that Conjecture 1 is consistent with the explicit asymptotic in the Conrey-Gonek conjecture, but have not yet rigorously established that the leading order term in the latter is indeed identical to the expression provided above.

Let ${\tau(n) := \sum_{d|n} 1}$ be the divisor function. A classical application of the Dirichlet hyperbola method gives the asymptotic

$\displaystyle \sum_{n \leq x} \tau(n) \sim x \log x$

where ${X \sim Y}$ denotes the estimate ${X = (1+o(1))Y}$ as ${x \rightarrow \infty}$. Much better error estimates are possible here, but we will not focus on the lower order terms in this discussion. For somewhat idiosyncratic reasons I will interpret this estimate (and the other analytic number theory estimates discussed here) through the probabilistic lens. Namely, if ${{\bf n} = {\bf n}_x}$ is a random number selected uniformly between ${1}$ and ${x}$, then the above estimate can be written as

$\displaystyle {\bf E} \tau( {\bf n} ) \sim \log x, \ \ \ \ \ (1)$

that is to say the random variable ${\tau({\bf n})}$ has mean approximately ${\log x}$. (But, somewhat paradoxically, this is not the median or mode behaviour of this random variable, which instead concentrates near ${\log^{\log 2} x}$, basically thanks to the Hardy-Ramanujan theorem.)

Now we turn to the pair correlations ${\sum_{n \leq x} \tau(n) \tau(n+h)}$ for a fixed positive integer ${h}$. There is a classical computation of Ingham that shows that

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h) \sim \frac{6}{\pi^2} \sigma_{-1}(h) x \log^2 x, \ \ \ \ \ (2)$

where

$\displaystyle \sigma_{-1}(h) := \sum_{d|h} \frac{1}{d}.$

The error term in (2) has been refined by many subsequent authors, as has the uniformity of the estimates in the ${h}$ aspect, as these topics are related to other questions in analytic number theory, such as fourth moment estimates for the Riemann zeta function; but we will not consider these more subtle features of the estimate here. However, we will look at the next term in the asymptotic expansion for (2) below the fold.

Using our probabilistic lens, the estimate (2) can be written as

$\displaystyle {\bf E} \tau( {\bf n} ) \tau( {\bf n} + h ) \sim \frac{6}{\pi^2} \sigma_{-1}(h) \log^2 x. \ \ \ \ \ (3)$

From (1) (and the asymptotic negligibility of the shift by ${h}$) we see that the random variables ${\tau({\bf n})}$ and ${\tau({\bf n}+h)}$ both have a mean of ${\sim \log x}$, so the additional factor of ${\frac{6}{\pi^2} \sigma_{-1}(h)}$ represents some arithmetic coupling between the two random variables.

Ingham’s formula can be established in a number of ways. Firstly, one can expand out ${\tau(n) = \sum_{d|n} 1}$ and use the hyperbola method (splitting into the cases ${d \leq \sqrt{x}}$ and ${n/d \leq \sqrt{x}}$ and removing the overlap). If one does so, one soon arrives at the task of having to estimate sums of the form

$\displaystyle \sum_{n \leq x: d|n} \tau(n+h)$

for various ${d \leq \sqrt{x}}$. For ${d}$ much less than ${\sqrt{x}}$ this can be achieved using a further application of the hyperbola method, but for ${d}$ comparable to ${\sqrt{x}}$ things get a bit more complicated, necessitating the use of non-trivial estimates on Kloosterman sums in order to obtain satisfactory control on error terms. A more modern approach proceeds using automorphic form methods, as discussed in this previous post. A third approach, which unfortunately is only heuristic at the current level of technology, is to apply the Hardy-Littlewood circle method (discussed in this previous post) to express (2) in terms of exponential sums ${\sum_{n \leq x} \tau(n) e(\alpha n)}$ for various frequencies ${\alpha}$. The contribution of “major arc” ${\alpha}$ can be computed after a moderately lengthy calculation which yields the right-hand side of (2) (as well as the correct lower order terms that are currently being suppressed), but there does not appear to be an easy way to show directly that the “minor arc” contributions are of lower order, although the methods discussed previously do indirectly show that this is ultimately the case.

Each of the methods outlined above requires a fair amount of calculation, and it is not obvious while performing them that the factor ${\frac{6}{\pi^2} \sigma_{-1}(h)}$ will emerge at the end. One can at least explain the ${\frac{6}{\pi^2}}$ as a normalisation constant needed to balance the ${\sigma_{-1}(h)}$ factor (at a heuristic level, at least). To see this through our probabilistic lens, introduce an independent copy ${{\bf n}'}$ of ${{\bf n}}$, then

$\displaystyle {\bf E} \tau( {\bf n} ) \tau( {\bf n}' ) = ({\bf E} \tau ({\bf n}))^2 \sim \log^2 x; \ \ \ \ \ (4)$

using symmetry to order ${{\bf n}' > {\bf n}}$ (discarding the diagonal case ${{\bf n} = {\bf n}'}$) and making the change of variables ${{\bf n}' = {\bf n}+h}$, we see that (4) is heuristically consistent with (3) as long as the asymptotic mean of ${\frac{6}{\pi^2} \sigma_{-1}(h)}$ in ${h}$ is equal to ${1}$. (This argument is not rigorous because there was an implicit interchange of limits present, but still gives a good heuristic “sanity check” of Ingham’s formula.) Indeed, if ${{\bf E}_h}$ denotes the asymptotic mean in ${h}$, then we have (heuristically at least)

$\displaystyle {\bf E}_h \sigma_{-1}(h) = \sum_d {\bf E}_h \frac{1}{d} 1_{d|h}$

$\displaystyle = \sum_d \frac{1}{d^2}$

$\displaystyle = \frac{\pi^2}{6}$

and we obtain the desired consistency after multiplying by ${\frac{6}{\pi^2}}$.

This still however does not explain the presence of the ${\sigma_{-1}(h)}$ factor. Intuitively it is reasonable that if ${h}$ has many prime factors, and ${{\bf n}}$ has a lot of factors, then ${{\bf n}+h}$ will have slightly more factors than average, because any common factor to ${h}$ and ${{\bf n}}$ will automatically be acquired by ${{\bf n}+h}$. But how to quantify this effect?

One heuristic way to proceed is through analysis of local factors. Observe from the fundamental theorem of arithmetic that we can factor

$\displaystyle \tau(n) = \prod_p \tau_p(n)$

where the product is over all primes ${p}$, and ${\tau_p(n) := \sum_{p^j|n} 1}$ is the local version of ${\tau(n)}$ at ${p}$ (which in this case, is just one plus the ${p}$valuation ${v_p(n)}$ of ${n}$: ${\tau_p = 1 + v_p}$). Note that all but finitely many of the terms in this product will equal ${1}$, so the infinite product is well-defined. In a similar fashion, we can factor

$\displaystyle \sigma_{-1}(h) = \prod_p \sigma_{-1,p}(h)$

where

$\displaystyle \sigma_{-1,p}(h) := \sum_{p^j|h} \frac{1}{p^j}$

(or in terms of valuations, ${\sigma_{-1,p}(h) = (1 - p^{-v_p(h)-1})/(1-p^{-1})}$). Heuristically, the Chinese remainder theorem suggests that the various factors ${\tau_p({\bf n})}$ behave like independent random variables, and so the correlation between ${\tau({\bf n})}$ and ${\tau({\bf n}+h)}$ should approximately decouple into the product of correlations between the local factors ${\tau_p({\bf n})}$ and ${\tau_p({\bf n}+h)}$. And indeed we do have the following local version of Ingham’s asymptotics:

Proposition 1 (Local Ingham asymptotics) For fixed ${p}$ and integer ${h}$, we have

$\displaystyle {\bf E} \tau_p({\bf n}) \sim \frac{p}{p-1}$

and

$\displaystyle {\bf E} \tau_p({\bf n}) \tau_p({\bf n}+h) \sim (1-\frac{1}{p^2}) \sigma_{-1,p}(h) (\frac{p}{p-1})^2$

$\displaystyle = \frac{p+1}{p-1} \sigma_{-1,p}(h)$

From the Euler formula

$\displaystyle \prod_p (1-\frac{1}{p^2}) = \frac{1}{\zeta(2)} = \frac{6}{\pi^2}$

we see that

$\displaystyle \frac{6}{\pi^2} \sigma_{-1}(h) = \prod_p (1-\frac{1}{p^2}) \sigma_{-1,p}(h)$

and so one can “explain” the arithmetic factor ${\frac{6}{\pi^2} \sigma_{-1}(h)}$ in Ingham’s asymptotic as the product of the arithmetic factors ${(1-\frac{1}{p^2}) \sigma_{-1,p}(h)}$ in the (much easier) local Ingham asymptotics. Unfortunately we have the usual “local-global” problem in that we do not know how to rigorously derive the global asymptotic from the local ones; this problem is essentially the same issue as the problem of controlling the minor arc contributions in the circle method, but phrased in “physical space” language rather than “frequency space”.

Remark 2 The relation between the local means ${\sim \frac{p}{p-1}}$ and the global mean ${\sim \log^2 x}$ can also be seen heuristically through the application

$\displaystyle \prod_{p \leq x^{1/e^\gamma}} \frac{p}{p-1} \sim \log x$

of Mertens’ theorem, where ${1/e^\gamma}$ is Pólya’s magic exponent, which serves as a useful heuristic limiting threshold in situations where the product of local factors is divergent.

Let us now prove this proposition. One could brute-force the computations by observing that for any fixed ${j}$, the valuation ${v_p({\bf n})}$ is equal to ${j}$ with probability ${\sim \frac{p-1}{p} \frac{1}{p^j}}$, and with a little more effort one can also compute the joint distribution of ${v_p({\bf n})}$ and ${v_p({\bf n}+h)}$, at which point the proposition reduces to the calculation of various variants of the geometric series. I however find it cleaner to proceed in a more recursive fashion (similar to how one can prove the geometric series formula by induction); this will also make visible the vague intuition mentioned previously about how common factors of ${{\bf n}}$ and ${h}$ force ${{\bf n}+h}$ to have a factor also.

It is first convenient to get rid of error terms by observing that in the limit ${x \rightarrow \infty}$, the random variable ${{\bf n} = {\bf n}_x}$ converges vaguely to a uniform random variable ${{\bf n}_\infty}$ on the profinite integers ${\hat {\bf Z}}$, or more precisely that the pair ${(v_p({\bf n}_x), v_p({\bf n}_x+h))}$ converges vaguely to ${(v_p({\bf n}_\infty), v_p({\bf n}_\infty+h))}$. Because of this (and because of the easily verified uniform integrability properties of ${\tau_p({\bf n})}$ and their powers), it suffices to establish the exact formulae

$\displaystyle {\bf E} \tau_p({\bf n}_\infty) = \frac{p}{p-1} \ \ \ \ \ (5)$

and

$\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = (1-\frac{1}{p^2}) \sigma_{-1,p}(h) (\frac{p}{p-1})^2 = \frac{p+1}{p-1} \sigma_{-1,p}(h) \ \ \ \ \ (6)$

in the profinite setting (this setting will make it easier to set up the recursion).

We begin with (5). Observe that ${{\bf n}_\infty}$ is coprime to ${p}$ with probability ${\frac{p-1}{p}}$, in which case ${\tau_p({\bf n}_\infty)}$ is equal to ${1}$. Conditioning to the complementary probability ${\frac{1}{p}}$ event that ${{\bf n}_\infty}$ is divisible by ${p}$, we can factor ${{\bf n}_\infty = p {\bf n}'_\infty}$ where ${{\bf n}'_\infty}$ is also uniformly distributed over the profinite integers, in which event we have ${\tau_p( {\bf n}_\infty ) = 1 + \tau_p( {\bf n}'_\infty )}$. We arrive at the identity

$\displaystyle {\bf E} \tau_p({\bf n}_\infty) = \frac{p-1}{p} + \frac{1}{p} ( 1 + {\bf E} \tau_p( {\bf n}'_\infty ) ).$

As ${{\bf n}_\infty}$ and ${{\bf n}'_\infty}$ have the same distribution, the quantities ${{\bf E} \tau_p({\bf n}_\infty)}$ and ${{\bf E} \tau_p({\bf n}'_\infty)}$ are equal, and (5) follows by a brief amount of high-school algebra.

We use a similar method to treat (6). First treat the case when ${h}$ is coprime to ${p}$. Then we see that with probability ${\frac{p-2}{p}}$, ${{\bf n}_\infty}$ and ${{\bf n}_\infty+h}$ are simultaneously coprime to ${p}$, in which case ${\tau_p({\bf n}_\infty) = \tau_p({\bf n}_\infty+h) = 1}$. Furthermore, with probability ${\frac{1}{p}}$, ${{\bf n}_\infty}$ is divisible by ${p}$ and ${{\bf n}_\infty+h}$ is not; in which case we can write ${{\bf n} = p {\bf n}'}$ as before, with ${\tau_p({\bf n}_\infty) = 1 + \tau_p({\bf n}'_\infty)}$ and ${\tau_p({\bf n}_\infty+h)=1}$. Finally, in the remaining event with probability ${\frac{1}{p}}$, ${{\bf n}+h}$ is divisible by ${p}$ and ${{\bf n}}$ is not; we can then write ${{\bf n}_\infty+h = p {\bf n}'_\infty}$, so that ${\tau_p({\bf n}_\infty+h) = 1 + \tau_p({\bf n}'_\infty)}$ and ${\tau_p({\bf n}_\infty) = 1}$. Putting all this together, we obtain

$\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = \frac{p-2}{p} + 2 \frac{1}{p} (1 + {\bf E} \tau_p({\bf n}'_\infty))$

and the claim (6) in this case follows from (5) and a brief computation (noting that ${\sigma_{-1,p}(h)=1}$ in this case).

Now suppose that ${h}$ is divisible by ${p}$, thus ${h=ph'}$ for some integer ${h'}$. Then with probability ${\frac{p-1}{p}}$, ${{\bf n}_\infty}$ and ${{\bf n}_\infty+h}$ are simultaneously coprime to ${p}$, in which case ${\tau_p({\bf n}_\infty) = \tau_p({\bf n}_\infty+h) = 1}$. In the remaining ${\frac{1}{p}}$ event, we can write ${{\bf n}_\infty = p {\bf n}'_\infty}$, and then ${\tau_p({\bf n}_\infty) = 1 + \tau_p({\bf n}'_\infty)}$ and ${\tau_p({\bf n}_\infty+h) = 1 + \tau_p({\bf n}'_\infty+h')}$. Putting all this together we have

$\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = \frac{p-1}{p} + \frac{1}{p} {\bf E} (1+\tau_p({\bf n}'_\infty)(1+\tau_p({\bf n}'_\infty+h)$

which by (5) (and replacing ${{\bf n}'_\infty}$ by ${{\bf n}_\infty}$) leads to the recursive relation

$\displaystyle {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h) = \frac{p+1}{p-1} + \frac{1}{p} {\bf E} \tau_p({\bf n}_\infty) \tau_p({\bf n}_\infty+h)$

and (6) then follows by induction on the number of powers of ${p}$.

The estimate (2) of Ingham was refined by Estermann, who obtained the more accurate expansion

$\displaystyle \sum_{n \leq x} \tau(n) \tau(n+h) = \frac{6}{\pi^2} \sigma_{-1}(h) x \log^2 x + a_1(h) x \log x + a_2(h) x \ \ \ \ \ (7)$

$\displaystyle + O( x^{11/12+o(1)} )$

for certain complicated but explicit coefficients ${a_1(h), a_2(h)}$. For instance, ${a_1(h)}$ is given by the formula

$\displaystyle a_1(h) = (\frac{12}{\pi^2} (2\gamma-1) + 4 a') \sigma_{-1}(h) - \frac{24}{\pi^2} \sigma'_{-1}(h)$

where ${\gamma}$ is the Euler-Mascheroni constant,

$\displaystyle a' := - \sum_{r=1}^\infty \frac{\mu(r)}{r^2} \log r, \ \ \ \ \ (8)$

and

$\displaystyle \sigma'_{-1}(h) := \sum_{d|h} \frac{\log d}{d}.$

The formula for ${a_2(h)}$ is similar but even more complicated. The error term ${O( x^{11/12+o(1)})}$ was improved by Heath-Brown to ${O( x^{5/6+o(1)})}$; it is conjectured (for instance by Conrey and Gonek) that one in fact has square root cancellation ${O( x^{1/2+o(1)})}$ here, but this is well out of reach of current methods.

These lower order terms are traditionally computed either from a Dirichlet series approach (using Perron’s formula) or a circle method approach. It turns out that a refinement of the above heuristics can also predict these lower order terms, thus keeping the calculation purely in physical space as opposed to the “multiplicative frequency space” of the Dirichlet series approach, or the “additive frequency space” of the circle method, although the computations are arguably as messy as the latter computations for the purposes of working out the lower order terms. We illustrate this just for the ${a_1(h) x \log x}$ term below the fold.

In analytic number theory, an arithmetic function is simply a function ${f: {\bf N} \rightarrow {\bf C}}$ from the natural numbers ${{\bf N} = \{1,2,3,\dots\}}$ to the real or complex numbers. (One occasionally also considers arithmetic functions taking values in more general rings than ${{\bf R}}$ or ${{\bf C}}$, as in this previous blog post, but we will restrict attention here to the classical situation of real or complex arithmetic functions.) Experience has shown that a particularly tractable and relevant class of arithmetic functions for analytic number theory are the multiplicative functions, which are arithmetic functions ${f: {\bf N} \rightarrow {\bf C}}$ with the additional property that

$\displaystyle f(nm) = f(n) f(m) \ \ \ \ \ (1)$

whenever ${n,m \in{\bf N}}$ are coprime. (One also considers arithmetic functions, such as the logarithm function ${L(n) := \log n}$ or the von Mangoldt function, that are not genuinely multiplicative, but interact closely with multiplicative functions, and can be viewed as “derived” versions of multiplicative functions; see this previous post.) A typical example of a multiplicative function is the divisor function

$\displaystyle \tau(n) := \sum_{d|n} 1 \ \ \ \ \ (2)$

that counts the number of divisors of a natural number ${n}$. (The divisor function ${n \mapsto \tau(n)}$ is also denoted ${n \mapsto d(n)}$ in the literature.) The study of asymptotic behaviour of multiplicative functions (and their relatives) is known as multiplicative number theory, and is a basic cornerstone of modern analytic number theory.

There are various approaches to multiplicative number theory, each of which focuses on different asymptotic statistics of arithmetic functions ${f}$. In elementary multiplicative number theory, which is the focus of this set of notes, particular emphasis is given on the following two statistics of a given arithmetic function ${f: {\bf N} \rightarrow {\bf C}}$:

1. The summatory functions

$\displaystyle \sum_{n \leq x} f(n)$

of an arithmetic function ${f}$, as well as the associated natural density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{x} \sum_{n \leq x} f(n)$

(if it exists).

2. The logarithmic sums

$\displaystyle \sum_{n\leq x} \frac{f(n)}{n}$

of an arithmetic function ${f}$, as well as the associated logarithmic density

$\displaystyle \lim_{x \rightarrow \infty} \frac{1}{\log x} \sum_{n \leq x} \frac{f(n)}{n}$

(if it exists).

Here, we are normalising the arithmetic function ${f}$ being studied to be of roughly unit size up to logarithms, obeying bounds such as ${f(n)=O(1)}$, ${f(n) = O(\log^{O(1)} n)}$, or at worst

$\displaystyle f(n) = O(n^{o(1)}). \ \ \ \ \ (3)$

A classical case of interest is when ${f}$ is an indicator function ${f=1_A}$ of some set ${A}$ of natural numbers, in which case we also refer to the natural or logarithmic density of ${f}$ as the natural or logarithmic density of ${A}$ respectively. However, in analytic number theory it is usually more convenient to replace such indicator functions with other related functions that have better multiplicative properties. For instance, the indicator function ${1_{\mathcal P}}$ of the primes is often replaced with the von Mangoldt function ${\Lambda}$.

Typically, the logarithmic sums are relatively easy to control, but the summatory functions require more effort in order to obtain satisfactory estimates; see Exercise 7 below.

If an arithmetic function ${f}$ is multiplicative (or closely related to a multiplicative function), then there is an important further statistic on an arithmetic function ${f}$ beyond the summatory function and the logarithmic sum, namely the Dirichlet series

$\displaystyle {\mathcal D}f(s) := \sum_{n=1}^\infty \frac{f(n)}{n^s} \ \ \ \ \ (4)$

for various real or complex numbers ${s}$. Under the hypothesis (3), this series is absolutely convergent for real numbers ${s>1}$, or more generally for complex numbers ${s}$ with ${\hbox{Re}(s)>1}$. As we will see below the fold, when ${f}$ is multiplicative then the Dirichlet series enjoys an important Euler product factorisation which has many consequences for analytic number theory.

In the elementary approach to multiplicative number theory presented in this set of notes, we consider Dirichlet series only for real numbers ${s>1}$ (and focusing particularly on the asymptotic behaviour as ${s \rightarrow 1^+}$); in later notes we will focus instead on the important complex-analytic approach to multiplicative number theory, in which the Dirichlet series (4) play a central role, and are defined not only for complex numbers with large real part, but are often extended analytically or meromorphically to the rest of the complex plane as well.

Remark 1 The elementary and complex-analytic approaches to multiplicative number theory are the two classical approaches to the subject. One could also consider a more “Fourier-analytic” approach, in which one studies convolution-type statistics such as

$\displaystyle \sum_n \frac{f(n)}{n} G( t - \log n ) \ \ \ \ \ (5)$

as ${t \rightarrow \infty}$ for various cutoff functions ${G: {\bf R} \rightarrow {\bf C}}$, such as smooth, compactly supported functions. See for instance this previous blog post for an instance of such an approach. Another related approach is the “pretentious” approach to multiplicative number theory currently being developed by Granville-Soundararajan and their collaborators. We will occasionally make reference to these more modern approaches in these notes, but will primarily focus on the classical approaches.

To reverse the process and derive control on summatory functions or logarithmic sums starting from control of Dirichlet series is trickier, and usually requires one to allow ${s}$ to be complex-valued rather than real-valued if one wants to obtain really accurate estimates; we will return to this point in subsequent notes. However, there is a cheap way to get upper bounds on such sums, known as Rankin’s trick, which we will discuss later in these notes.

The basic strategy of elementary multiplicative theory is to first gather useful estimates on the statistics of “smooth” or “non-oscillatory” functions, such as the constant function ${n \mapsto 1}$, the harmonic function ${n \mapsto \frac{1}{n}}$, or the logarithm function ${n \mapsto \log n}$; one also considers the statistics of periodic functions such as Dirichlet characters. These functions can be understood without any multiplicative number theory, using basic tools from real analysis such as the (quantitative version of the) integral test or summation by parts. Once one understands the statistics of these basic functions, one can then move on to statistics of more arithmetically interesting functions, such as the divisor function (2) or the von Mangoldt function ${\Lambda}$ that we will discuss below. A key tool to relate these functions to each other is that of Dirichlet convolution, which is an operation that interacts well with summatory functions, logarithmic sums, and particularly well with Dirichlet series.

This is only an introduction to elementary multiplicative number theory techniques. More in-depth treatments may be found in this text of Montgomery-Vaughan, or this text of Bateman-Diamond.

One of the basic problems in analytic number theory is to obtain bounds and asymptotics for sums of the form

$\displaystyle \sum_{n \leq x} f(n)$

in the limit ${x \rightarrow \infty}$, where ${n}$ ranges over natural numbers less than ${x}$, and ${f: {\bf N} \rightarrow {\bf C}}$ is some arithmetic function of number-theoretic interest. (It is also often convenient to replace this sharply truncated sum with a smoother sum such as ${\sum_n f(n) \psi(n/x)}$, but we will not discuss this technicality here.) For instance, the prime number theorem is equivalent to the assertion

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + o(x)$

where ${\Lambda}$ is the von Mangoldt function, while the Riemann hypothesis is equivalent to the stronger assertion

$\displaystyle \sum_{n \leq x} \Lambda(n) = x + O(x^{1/2+o(1)}).$

It is thus of interest to develop techniques to estimate such sums ${\sum_{n \leq x} f(n)}$. Of course, the difficulty of this task depends on how “nice” the function ${f}$ is. The functions ${f}$ that come up in number theory lie on a broad spectrum of “niceness”, with some particularly nice functions being quite easy to sum, and some being insanely difficult.

At the easiest end of the spectrum are those functions ${f}$ that exhibit some sort of regularity or “smoothness”. Examples of smoothness include “Archimedean” smoothness, in which ${f(n)}$ is the restriction of some smooth function ${f: {\bf R} \rightarrow {\bf C}}$ from the reals to the natural numbers, and the derivatives of ${f}$ are well controlled. A typical example is

$\displaystyle \sum_{n \leq x} \log n.$

One can already get quite good bounds on this quantity by comparison with the integral ${\int_1^x \log t\ dt}$, namely

$\displaystyle \sum_{n \leq x} \log n = x \log x - x + O(\log x),$

with sharper bounds available by using tools such as the Euler-Maclaurin formula (see this blog post). Exponentiating such asymptotics, incidentally, leads to one of the standard proofs of Stirling’s formula (as discussed in this blog post).

One can also consider “non-Archimedean” notions of smoothness, such as periodicity relative to a small period ${q}$. Indeed, if ${f}$ is periodic with period ${q}$ (and is thus essentially a function on the cyclic group ${{\bf Z}/q{\bf Z}}$), then one has the easy bound

$\displaystyle \sum_{n \leq x} f(n) = \frac{x}{q} \sum_{n \in {\bf Z}/q{\bf Z}} f(n) + O( \sum_{n \in {\bf Z}/q{\bf Z}} |f(n)| ).$

In particular, we have the fundamental estimate

$\displaystyle \sum_{n \leq x: q|n} 1 = \frac{x}{q} + O(1). \ \ \ \ \ (1)$

This is a good estimate when ${q}$ is much smaller than ${x}$, but as ${q}$ approaches ${x}$ in magnitude, the error term ${O(1)}$ begins to overwhelm the main term ${\frac{n}{q}}$, and one needs much more delicate information on the fractional part of ${\frac{n}{q}}$ in order to obtain good estimates at this point.

One can also consider functions ${f}$ which combine “Archimedean” and “non-Archimedean” smoothness into an “adelic” smoothness. We will not define this term precisely here (though the concept of a Schwartz-Bruhat function is one way to capture this sort of concept), but a typical example might be

$\displaystyle \sum_{n \leq x} \chi(n) \log n$

where ${\chi}$ is periodic with some small period ${q}$. By using techniques such as summation by parts, one can estimate such sums using the techniques used to estimate sums of periodic functions or functions with (Archimedean) smoothness.

Another class of functions that is reasonably well controlled are the multiplicative functions, in which ${f(nm) = f(n) f(m)}$ whenever ${n,m}$ are coprime. Here, one can use the powerful techniques of multiplicative number theory, for instance by working with the Dirichlet series

$\displaystyle \sum_{n=1}^\infty \frac{f(n)}{n^s}$

which are clearly related to the partial sums ${\sum_{n \leq x} f(n)}$ (essentially via the Mellin transform, a cousin of the Fourier and Laplace transforms); for this post we ignore the (important) issue of how to make sense of this series when it is not absolutely convergent (but see this previous blog post for more discussion). A primary reason that this technique is effective is that the Dirichlet series of a multiplicative function factorises as an Euler product

$\displaystyle \sum_{n=1}^\infty \frac{f(n)}{n^s} = \prod_p (\sum_{j=0}^\infty \frac{f(p^j)}{p^{js}}).$

One also obtains similar types of representations for functions that are not quite multiplicative, but are closely related to multiplicative functions, such as the von Mangoldt function ${\Lambda}$ (whose Dirichlet series ${\sum_{n=1}^\infty \frac{\Lambda(n)}{n^s} = -\frac{\zeta'(s)}{\zeta(s)}}$ is not given by an Euler product, but instead by the logarithmic derivative of an Euler product).

Moving another notch along the spectrum between well-controlled and ill-controlled functions, one can consider functions ${f}$ that are divisor sums such as

$\displaystyle f(n) = \sum_{d \leq R; d|n} g(d) = \sum_{d \leq R} 1_{d|n} g(d)$

for some other arithmetic function ${g}$, and some level ${R}$. This is a linear combination of periodic functions ${1_{d|n} g(d)}$ and is thus technically periodic in ${n}$ (with period equal to the least common multiple of all the numbers from ${1}$ to ${R}$), but in practice this periodic is far too large to be useful (except for extremely small levels ${R}$, e.g. ${R = O(\log x)}$). Nevertheless, we can still control the sum ${\sum_{n \leq x} f(n)}$ simply by rearranging the summation:

$\displaystyle \sum_{n \leq x} f(n) = \sum_{d \leq R} g(d) \sum_{n \leq x: d|n} 1$

and thus by (1) one can bound this by the sum of a main term ${x \sum_{d \leq R} \frac{g(d)}{d}}$ and an error term ${O( \sum_{d \leq R} |g(d)| )}$. As long as the level ${R}$ is significantly less than ${x}$, one may expect the main term to dominate, and one can often estimate this term by a variety of techniques (for instance, if ${g}$ is multiplicative, then multiplicative number theory techniques are quite effective, as mentioned previously). Similarly for other slight variants of divisor sums, such as expressions of the form

$\displaystyle \sum_{d \leq R; d | n} g(d) \log \frac{n}{d}$

or expressions of the form

$\displaystyle \sum_{d \leq R} F_d(n)$

where each ${F_d}$ is periodic with period ${d}$.

One of the simplest examples of this comes when estimating the divisor function

$\displaystyle \tau(n) := \sum_{d|n} 1,$

which counts the number of divisors up to ${n}$. This is a multiplicative function, and is therefore most efficiently estimated using the techniques of multiplicative number theory; but for reasons that will become clearer later, let us “forget” the multiplicative structure and estimate the above sum by more elementary methods. By applying the preceding method, we see that

$\displaystyle \sum_{n \leq x} \tau(n) = \sum_{d \leq x} \sum_{n \leq x:d|n} 1$

$\displaystyle = \sum_{d \leq x} (\frac{x}{d} + O(1))$

$\displaystyle = x \log x + O(x). \ \ \ \ \ (2)$

Here, we are (barely) able to keep the error term smaller than the main term; this is right at the edge of the divisor sum method, because the level ${R}$ in this case is equal to ${x}$. Unfortunately, at this high choice of level, it is not always possible to always keep the error term under control like this. For instance, if one wishes to use the standard divisor sum representation

$\displaystyle \Lambda(n) = \sum_{d|n} \mu(d) \log \frac{n}{d},$

where ${\mu}$ is the Möbius function, to compute ${\sum_{n \leq x} \Lambda(n)}$, then one ends up looking at

$\displaystyle \sum_{n \leq x} \Lambda(n) = \sum_{d \leq x} \mu(d) \sum_{n \leq x:d|n} \log \frac{n}{d}$

$\displaystyle = \sum_{d \leq x} \mu(d) ( \frac{n}{d} \log \frac{n}{d} - \frac{n}{d} + O(\log \frac{n}{d}) )$

From Dirichlet series methods, it is not difficult to establish the identities

$\displaystyle \lim_{s\rightarrow 1^+} \sum_{n=1}^\infty \frac{\mu(n)}{n^s} = 0$

and

$\displaystyle \lim_{s \rightarrow 1^+} \sum_{n=1}^\infty \frac{\mu(n) \log n}{n^s} = -1.$

This suggests (but does not quite prove) that one has

$\displaystyle \sum_{n=1}^\infty \frac{\mu(n)}{n} = 0 \ \ \ \ \ (3)$

and

$\displaystyle \sum_{n=1}^\infty \frac{\mu(n)\log n}{n} = -1 \ \ \ \ \ (4)$

in the sense of conditionally convergent series. Assuming one can justify this (which, ultimately, requires one to exclude zeroes of the Riemann zeta function on the line ${\hbox{Re}(s)=1}$, as discussed in this previous post), one is eventually left with the estimate ${x + O(x)}$, which is useless as a lower bound (and recovers only the classical Chebyshev estimate ${\sum_{n \leq x} \Lambda(n) \ll x}$ as the upper bound). The inefficiency here when compared to the situation with the divisor function ${\tau}$ can be attributed to the signed nature of the Möbius function ${\mu(n)}$, which causes some cancellation in the divisor sum expansion that needs to be compensated for with improved estimates.

However, there are a number of tricks available to reduce the level of divisor sums. The simplest comes from exploiting the change of variables ${d \mapsto \frac{n}{d}}$, which can in principle reduce the level by a square root. For instance, when computing the divisor function ${\tau(n) = \sum_{d|n} 1}$, one can observe using this change of variables that every divisor of ${n}$ above ${\sqrt{n}}$ is paired with one below ${\sqrt{n}}$, and so we have

$\displaystyle \tau(n) = 2 \sum_{d \leq \sqrt{n}: d|n} 1 \ \ \ \ \ (5)$

except when ${n}$ is a perfect square, in which case one must subtract one from the right-hand side. Using this reduced-level divisor sum representation, one can obtain an improvement to (2), namely

$\displaystyle \sum_{n \leq x} \tau(n) = x \log x + (2\gamma-1) x + O(\sqrt{x}).$

This type of argument is also known as the Dirichlet hyperbola method. A variant of this argument can also deduce the prime number theorem from (3), (4) (and with some additional effort, one can even drop the use of (4)); this is discussed at this previous blog post.

Using this square root trick, one can now also control divisor sums such as

$\displaystyle \sum_{n \leq x} \tau(n^2+1).$

(Note that ${\tau(n^2+1)}$ has no multiplicativity properties in ${n}$, and so multiplicative number theory techniques cannot be directly applied here.) The level of the divisor sum here is initially of order ${x^2}$, which is too large to be useful; but using the square root trick, we can expand this expression as

$\displaystyle 2 \sum_{n \leq x} \sum_{d \leq n: d | n^2+1} 1$

which one can rewrite as

$\displaystyle 2 \sum_{d \leq x} \sum_{d \leq n \leq x: n^2+1 = 0 \hbox{ mod } d} 1.$

The constraint ${n^2+1=0 \hbox{ mod } d}$ is periodic in ${n}$ with period ${d}$, so we can write this as

$\displaystyle 2 \sum_{d \leq x} ( \frac{x}{d} \rho(d) + O(\rho(d)) )$

where ${\rho(d)}$ is the number of solutions in ${{\bf Z}/d{\bf Z}}$ to the equation ${n^2+1 = 0 \hbox{ mod } d}$, and so

$\displaystyle \sum_{n \leq x} \tau(n^2+1) = 2x \sum_{d \leq x} \frac{\rho(d)}{d} + O(\sum_{d \leq x} \rho(d)).$

The function ${\rho}$ is multiplicative, and can be easily computed at primes ${p}$ and prime powers ${p^j}$ using tools such as quadratic reciprocity and Hensel’s lemma. For instance, by Fermat’s two-square theorem, ${\rho(p)}$ is equal to ${2}$ for ${p=1 \hbox{ mod } 4}$ and ${0}$ for ${p=3 \hbox{ mod } 4}$. From this and standard multiplicative number theory methods (e.g. by obtaining asymptotics on the Dirichlet series ${\sum_d \frac{\rho(d)}{d^s}}$), one eventually obtains the asymptotic

$\displaystyle \sum_{d \leq x} \frac{\rho(d)}{d} = \frac{3}{2\pi} \log x + O(1)$

and also

$\displaystyle \sum_{d \leq x} \rho(d) = O(x)$

and thus

$\displaystyle \sum_{n \leq x} \tau(n^2+1) = \frac{3}{\pi} x \log x + O(x).$

Similar arguments give asymptotics for ${\tau}$ on other quadratic polynomials; see for instance this paper of Hooley and these papers by McKee. Note that the irreducibility of the polynomial will be important. If one considers instead a sum involving a reducible polynomial, such as ${\sum_{n \leq x} \tau(n^2-1)}$, then the analogous quantity ${\rho(n)}$ becomes significantly larger, leading to a larger growth rate (of order ${x \log^2 x}$ rather than ${x\log x}$) for the sum.

However, the square root trick is insufficient by itself to deal with higher order sums involving the divisor function, such as

$\displaystyle \sum_{n \leq x} \tau(n^3+1);$

the level here is initially of order ${x^3}$, and the square root trick only lowers this to about ${x^{3/2}}$, creating an error term that overwhelms the main term. And indeed, the asymptotic for such this sum has not yet been rigorously established (although if one heuristically drops error terms, one can arrive at a reasonable conjecture for this asymptotic), although some results are known if one averages over additional parameters (see e.g. this paper of Greaves, or this paper of Matthiesen).

Nevertheless, there is an ingenious argument of Erdös that allows one to obtain good upper and lower bounds for these sorts of sums, in particular establishing the asymptotic

$\displaystyle x \log x \ll \sum_{n \leq x} \tau(P(n)) \ll x \log x \ \ \ \ \ (6)$

for any fixed irreducible non-constant polynomial ${P}$ that maps ${{\bf N}}$ to ${{\bf N}}$ (with the implied constants depending of course on the choice of ${P}$). There is also the related moment bound

$\displaystyle \sum_{n \leq x} \tau^m(P(n)) \ll x \log^{O(1)} x \ \ \ \ \ (7)$

for any fixed ${P}$ (not necessarily irreducible) and any fixed ${m \geq 1}$, due to van der Corput; this bound is in fact used to dispose of some error terms in the proof of (6). These should be compared with what one can obtain from the divisor bound ${\tau(n) \ll n^{O(1/\log \log n)}}$ and the trivial bound ${\tau(n) \geq 1}$, giving the bounds

$\displaystyle x \ll \sum_{n \leq x} \tau^m(P(n)) \ll x^{1 + O(\frac{1}{\log \log x})}$

for any fixed ${m \geq 1}$.

The lower bound in (6) is easy, since one can simply lower the level in (5) to obtain the lower bound

$\displaystyle \tau(n) \geq \sum_{d \leq n^\theta: d|n} 1$

for any ${\theta>0}$, and the preceding methods then easily allow one to obtain the lower bound by taking ${\theta}$ small enough (more precisely, if ${P}$ has degree ${d}$, one should take ${\theta}$ equal to ${1/d}$ or less). The upper bounds in (6) and (7) are more difficult. Ideally, if we could obtain upper bounds of the form

$\displaystyle \tau(n) \ll \sum_{d \leq n^\theta: d|n} 1 \ \ \ \ \ (8)$

for any fixed ${\theta > 0}$, then the preceding methods would easily establish both results. Unfortunately, this bound can fail, as illustrated by the following example. Suppose that ${n}$ is the product of ${k}$ distinct primes ${p_1 \ldots p_k}$, each of which is close to ${n^{1/k}}$. Then ${n}$ has ${2^k}$ divisors, with ${\binom{n}{j}}$ of them close to ${n^{j/k}}$ for each ${0 \ldots j \leq k}$. One can think of (the logarithms of) these divisors as being distributed according to what is essentially a Bernoulli distribution, thus a randomly selected divisor of ${n}$ has magnitude about ${n^{j/k}}$, where ${j}$ is a random variable which has the same distribution as the number of heads in ${k}$ independently tossed fair coins. By the law of large numbers, ${j}$ should concentrate near ${k/2}$ when ${k}$ is large, which implies that the majority of the divisors of ${n}$ will be close to ${n^{1/2}}$. Sending ${k \rightarrow \infty}$, one can show that the bound (8) fails whenever ${\theta < 1/2}$.

This however can be fixed in a number of ways. First of all, even when ${\theta<1/2}$, one can show weaker substitutes for (8). For instance, for any fixed ${\theta > 0}$ and ${m \geq 1}$ one can show a bound of the form

$\displaystyle \tau(n)^m \ll \sum_{d \leq n^\theta: d|n} \tau(d)^C \ \ \ \ \ (9)$

for some ${C}$ depending only on ${m,\theta}$. This nice elementary inequality (first observed by Landreau) already gives a quite short proof of van der Corput’s bound (7).

For Erdös’s upper bound (6), though, one cannot afford to lose these additional factors of ${\tau(d)}$, and one must argue more carefully. Here, the key observation is that the counterexample discussed earlier – when the natural number ${n}$ is the product of a large number of fairly small primes – is quite atypical; most numbers have at least one large prime factor. For instance, the number of natural numbers less than ${x}$ that contain a prime factor between ${x^{1/2}}$ and ${x}$ is equal to

$\displaystyle \sum_{x^{1/2} \leq p \leq x} (\frac{x}{p} + O(1)),$

which, thanks to Mertens’ theorem

$\displaystyle \sum_{p \leq x} \frac{1}{p} = \log\log x + M+o(1)$

for some absolute constant ${M}$, is comparable to ${x}$. In a similar spirit, one can show by similarly elementary means that the number of natural numbers ${m}$ less than ${x}$ that are ${x^{1/m}}$-smooth, in the sense that all prime factors are at most ${x^{1/m}}$, is only about ${m^{-cm} x}$ or so. Because of this, one can hope that the bound (8), while not true in full generality, will still be true for most natural numbers ${n}$, with some slightly weaker substitute available (such as (7)) for the exceptional numbers ${n}$. This turns out to be the case by an elementary but careful argument.

The Erdös argument is quite robust; for instance, the more general inequality

$\displaystyle x \log^{2^m-1} x \ll \sum_{n \leq x} \tau(P(n))^m \ll x \log^{2^m-1} x$

for fixed irreducible ${P}$ and ${m \geq 1}$, which improves van der Corput’s inequality (8) was shown by Delmer using the same methods. (A slight error in the original paper of Erdös was also corrected in this latter paper.) In a forthcoming revision to my paper on the Erdös-Straus conjecture, Christian Elsholtz and I have also applied this method to obtain bounds such as

$\displaystyle \sum_{a \leq A} \sum_{b \leq B} \tau(a^2 b + 1) \ll AB \log(A+B),$

which turn out to be enough to obtain the right asymptotics for the number of solutions to the equation ${\frac{4}{p}= \frac{1}{x}+\frac{1}{y}+\frac{1}{z}}$.

Below the fold I will provide some more details of the arguments of Landreau and of Erdös.

Given a positive integer $n$, let $d(n)$ denote the number of divisors of n (including 1 and n), thus for instance d(6)=4, and more generally, if n has a prime factorisation

$n = p_1^{a_1} \ldots p_k^{a_k}$ (1)

then (by the fundamental theorem of arithmetic)

$d(n) = (a_1+1) \ldots (a_k+1)$. (2)

Clearly, $d(n) \leq n$.  The divisor bound asserts that, as $n$ gets large, one can improve this trivial bound to

$d(n) \leq C_\varepsilon n^\varepsilon$ (3)

for any $\varepsilon > 0$, where $C_\varepsilon$ depends only on $\varepsilon$; equivalently, in asymptotic notation one has $d(n) = n^{o(1)}$.  In fact one has a more precise bound

$\displaystyle d(n) \leq n^{O( 1/ \log \log n)} = \exp( O( \frac{\log n}{\log \log n} ) )$. (4)

The divisor bound is useful in many applications in number theory, harmonic analysis, and even PDE (on periodic domains); it asserts that for any large number n, only a “logarithmically small” set of numbers less than n will actually divide n exactly, even in the worst-case scenario when n is smooth.  (The average value of d(n) is much smaller, being about $\log n$ on the average, as can be seen easily from the double counting identity

$\sum_{n \leq N} d(n) = \# \{ (m,l) \in {\Bbb N} \times {\Bbb N}: ml \leq N \} = \sum_{m=1}^N \lfloor \frac{N}{m}\rfloor \sim N \log N$,

or from the heuristic that a randomly chosen number m less than n has a probability about 1/m of dividing n, and $\sum_{m.  However, (4) is the correct “worst case” bound, as I discuss below.)

The divisor bound is elementary to prove (and not particularly difficult), and I was asked about it recently, so I thought I would provide the proof here, as it serves as a case study in how to establish worst-case estimates in elementary multiplicative number theory.

[Update, Sep 24: some applications added.]