Given a positive integer $n$, let $d(n)$ denote the number of divisors of n (including 1 and n), thus for instance d(6)=4, and more generally, if n has a prime factorisation $n = p_1^{a_1} \ldots p_k^{a_k}$ (1)

then (by the fundamental theorem of arithmetic) $d(n) = (a_1+1) \ldots (a_k+1)$. (2)

Clearly, $d(n) \leq n$.  The divisor bound asserts that, as $n$ gets large, one can improve this trivial bound to $d(n) \leq C_\varepsilon n^\varepsilon$ (3)

for any $\varepsilon > 0$, where $C_\varepsilon$ depends only on $\varepsilon$; equivalently, in asymptotic notation one has $d(n) = n^{o(1)}$.  In fact one has a more precise bound $\displaystyle d(n) \leq n^{O( 1/ \log \log n)} = \exp( O( \frac{\log n}{\log \log n} ) )$. (4)

The divisor bound is useful in many applications in number theory, harmonic analysis, and even PDE (on periodic domains); it asserts that for any large number n, only a “logarithmically small” set of numbers less than n will actually divide n exactly, even in the worst-case scenario when n is smooth.  (The average value of d(n) is much smaller, being about $\log n$ on the average, as can be seen easily from the double counting identity $\sum_{n \leq N} d(n) = \# \{ (m,l) \in {\Bbb N} \times {\Bbb N}: ml \leq N \} = \sum_{m=1}^N \lfloor \frac{N}{m}\rfloor \sim N \log N$,

or from the heuristic that a randomly chosen number m less than n has a probability about 1/m of dividing n, and $\sum_{m.  However, (4) is the correct “worst case” bound, as I discuss below.)

The divisor bound is elementary to prove (and not particularly difficult), and I was asked about it recently, so I thought I would provide the proof here, as it serves as a case study in how to establish worst-case estimates in elementary multiplicative number theory.

[Update, Sep 24: some applications added.]

— Proof of (3) —

Let’s first prove the weak form of the divisor bound (3), which is already good enough for many applications (because a loss of $n^{o(1)}$ is often negligible if, say, the final goal is to extract some polynomial factor in n in one’s eventual estimates).  By rearranging a bit, our task is to show that for any $\varepsilon > 0$, the expression $\displaystyle \frac{d(n)}{n^\varepsilon}$ (5)

is bounded uniformly in n by some constant depending on $\varepsilon$.  Using (1) and (2), we can express (5) as a product $\displaystyle \prod_{j=1}^k \frac{a_j+1}{p_j^{\varepsilon a_j}}$ (6)

where each term involves a different prime $p_j$, and the $a_j$ are at least 1.  We have thus “localised” the problem to studying the effect of each individual prime independently.  (We are able to do this because d(n) is a multiplicative function.)

Let’s fix a prime $p_j$ and look at a single term $\displaystyle \frac{a_j+1}{p_j^{\varepsilon a_j}}$.  The numerator is linear in $a_j$, while the denominator is exponential.  Thus, as per Malthus, we expect the denominator to dominate, at least when $a_j$ is large.  But, because of the $\varepsilon$,  the numerator might be able to exceed the denominator when $a_j$ is small – but only if $p_j$ is also small.

Following these heuristics, we now divide into cases.  Suppose that $p_j$ is large, say $p_j \geq \exp(1/\varepsilon)$.  Then $p_j^{\varepsilon a_j} \geq \exp(a_j) \geq 1+a_j$ (by Taylor expansion), and so the contribution of $p_j$ to the product (6) is at most 1.  So all the large primes give a net contribution of at most 1 here.

What about the small primes, in which $p_j < \exp(1/\varepsilon)$?  Well, by Malthus, we know that the sequence $\frac{a+1}{p_j^{\varepsilon a}}$ goes to zero as $a \to \infty$.  Since convergent sequences are bounded, we therefore have some bound of the form $\frac{a_j+1}{p_j^{\varepsilon a_j}} \leq C_{p_j,\varepsilon}$ for some $C_{p_j,\varepsilon}$ depending only on $p_j$ and $\varepsilon$, but not on $a_j$.  So, each small prime gives a bounded contribution to (6) (uniformly in n).  But the number of small primes is itself bounded (uniformly in n).  Thus the total product in (6) is also bounded uniformly in n, and the claim follows.

— Proof of (4) —

One can refine the above analysis to get a more explicit value of $C_\varepsilon$, which will let us get (4), as follows.

Again consider the product (6) for some $\varepsilon > 0$.  As discussed previously, each prime larger than $\exp(1/\varepsilon)$ gives a contribution of at most 1.  What about the small primes?  Here we can estimate the denominator from below by Taylor expansion: $p_j^{\varepsilon a_j} = \exp( \varepsilon a_j \log p_j ) \geq 1 + \varepsilon a_j \log p_j$

and hence $\displaystyle \frac{a_j+1}{p_j^{\varepsilon a_j}} \leq \frac{a_j+1}{1+\varepsilon a_j \log p_j} \leq O( \frac{1}{\varepsilon \log p_j} )$

(the point here being that our bound is uniform in $a_j$).  One can of course use undergraduate calculus to try to sharpen the bound here, but it turns out not to improve by too much, basically because the Taylor approximation $\exp(x) \approx 1+x$ is quite accurate when x is small, which is the important case here.

Anyway, inserting this bound into (6), we see that (6) is in fact bounded by $\displaystyle \prod_{p < \exp(1/\varepsilon)} O( \frac{1}{\varepsilon \log p} ).$

Now let’s be very crude and bound $\log p$ from below by $\log 2$, and bound the number of primes less than $\exp(1/\varepsilon)$ by $\exp(1/\varepsilon)$.  (One can of course be more efficient about this, but again it turns out not to improve the final bounds too much.  A general principle is that when one estimating an expression such as $A^B$, or more generally the product of B terms, each of size about A, then it is far more important to get a good bound for B than to get a good bound for A, except in those cases when A is very close to 1.)  We thus conclude that $\displaystyle (6) \leq O( \frac{1}{\varepsilon} )^{\exp(1/\varepsilon)} = \exp( \exp( O(1/\varepsilon) ) )$;

unwinding what this means for (3), we obtain $d(n) \leq \exp( \exp( O( 1/\varepsilon ) ) ) n^\varepsilon$

for all $n \geq 1$ and $\varepsilon > 0$.  If we now set $\varepsilon = C/\log \log n$ for a sufficiently large C, then the second term of the RHS dominates the first (as can be seen by taking logarithms), and the claim (4) follows.

The above argument also suggests the counterexample that will demonstrate that (4) is basically sharp.  Pick $\varepsilon > 0$, and let n be the product of all the primes up to $\exp(1/\varepsilon)$.  The prime number theorem tells us that $\log n \sim \exp( 1/\varepsilon)$.  On the other hand, the prime number theorem also tells us that the number of primes dividing n is $\sim \varepsilon \exp( 1/\varepsilon )$, so by (2), $\log d(n) \sim \varepsilon \exp( 1 / \varepsilon )$.  Using these numbers we see that (4) is tight up to constants.  [If one does not care about the constants, then one does not need the full strength of the prime number theorem to show that (4) is sharp; the more elementary bounds of Chebyshev, that say that the number of primes up to N is comparable to $N/\log N$ up to constants, would suffice.]

— Why is the divisor bound useful? —

One principal application of the divisor bound (and some generalisations of that bound) is to control the number of solutions to a Diophantine equation.  For instance, (3) immediately implies that for any fixed positive n, the number of solutions to the equation $xy = n$

with x,y integer, is only $n^{o(1)}$ at most.  (For x and y real, the number of solutions is of course infinite.) This can be leveraged to some other Diophantine equations by high-school algebra.  For instance, thanks to the identity $x^2-y^2 = (x+y)(x-y)$, we conclude that the number of integer solutions to $x^2 - y^2 = n$

is also at most $n^{o(1)}$; similarly, the identity $x^3+y^3=(x+y)(x^2-xy+y^2)$ implies that the number of integer solutions to $x^3+y^3=n$

is at most $n^{o(1)}$.  (Note from Bezout’s theorem (or direct calculation) that $x+y$ and $x^2-xy+y^2$ determine $x,y$ up to at most a finite ambiguity.)

Now consider the number of solutions to the equation $x^2+y^2 = n$.

In this case, $x^2+y^2$ does not split over the rationals ${\Bbb Q}$, and so one cannot directly exploit the divisor bound for the rational integers ${\Bbb Z}$.  However, we can factor $x^2 + y^2 = (x+iy) (x-iy)$ over the Gaussian rationals ${\Bbb Q}[\sqrt{-1}]$.  Happily, the Gaussian integers ${\Bbb Z}[\sqrt{-1}]$ enjoy essentially the same divisor bound as the rational integers ${\Bbb Z}$; the Gaussian integers have unique factorisation, but perhaps more importantly they only have a finite number of units ( $\{-1,+1,-i,+i\}$ to be precise).  Because of this, one can easily check that $x^2+y^2=n$ also has at most $n^{o(1)}$ solutions.

One can similarly exploit the divisor bound on other number fields; for instance the divisor bound for ${\Bbb Z}[\sqrt{-3}]$ lets one count solutions to $x^2+xy+y^2=n$ or $x^2-xy+y^2=n$.  On the other hand, not all number fields have the divisor bound.  For instance, ${\Bbb Z}[\sqrt{2}]$ has an infinite number of units, which means that the number of solutions to Pell’s equation $x^2-2y^2 = 1$

is infinite.

Another application of the divisor bound comes up in sieve theory. Here, one is often dealing with functions of the form $\nu(n) := \sum_{d|n} a_d$, where the sieve weights $a_d$ typically have size $O(n^{o(1)})$, and the sum is over all d that divide n.  The divisor bound (3) then implies that the sieve function $\nu(n)$ also has size $O(n^{o(1)})$.  This bound is too crude to deal with the most delicate components of a sieve theory estimate, but is often very useful for dealing with error terms (especially those which have gained a factor of $n^{-c}$ relative to the main terms for some $c > 0$).