In this supplemental set of notes we derive some approximations for ${n!}$, when ${n}$ is large, and in particular Stirling’s formula. This formula (and related formulae for binomial coefficients ${\binom{n}{m}}$ will be useful for estimating a number of combinatorial quantities in this course, and also in allowing one to analyse discrete random walks accurately.

From Taylor expansion we have ${x^n/n! \leq e^x}$ for any ${x \geq 0}$. Specialising this to ${x=n}$ we obtain a crude lower bound

$\displaystyle n! \geq n^n e^{-n}. \ \ \ \ \ (1)$

In the other direction, we trivially have

$\displaystyle n! \leq n^n \ \ \ \ \ (2)$

so we know already that ${n!}$ is within an exponential factor of ${n^n}$. (One can obtain a cruder version of this fact without Taylor expansion, by observing the trivial lower bound ${n! \geq (n/2)^{\lfloor n/2\rfloor}}$ coming from considering the second half of the product ${n! = 1 \cdot \ldots \cdot n}$.)

One can do better by starting with the identity

$\displaystyle \log n! = \sum_{m=1}^n \log m$

and viewing the right-hand side as a Riemann integral approximation to ${\int_1^n \log x\ dx}$. Indeed a simple area comparison (cf. the integral test) yields the inequalities

$\displaystyle \int_1^n \log x\ dx \leq \sum_{m=1}^n \log m \leq \log n + \int_1^n \log x\ dx$

$\displaystyle e n^n e^{-n} \leq n! \leq e n \times n^n e^{-n} \ \ \ \ \ (3)$

so the lower bound in (1) was only off by a factor of ${n}$ or so. (This illustrates a general principle, namely that one can often get a non-terrible bound for a series (in this case, the Taylor series for ${e^n}$) by using the largest term in that series (which is ${n^n/n!}$).)

One can do better by using the trapezoid rule. On any interval ${[m,m+1]}$, ${\log x}$ has a second derivative of ${O( 1 / m^2 )}$, which by Taylor expansion leads to the approximation

$\displaystyle \int_m^{m+1} \log x\ dx = \frac{1}{2} \log m + \frac{1}{2} \log(m+1) + \epsilon_m$

for some error ${\epsilon_m = O( 1/m^2 )}$.

The error is absolutely convergent; by the integral test, we have ${\sum_{m=1}^n \epsilon_m = C + O(1/n)}$ for some absolute constant ${C := \sum_{m=1}^\infty \epsilon_m}$. Performing this sum, we conclude that

$\displaystyle \int_1^n \log x\ dx = \sum_{m=1}^{n-1} \log m + \frac{1}{2} \log n + C + O(1/n)$

which after some rearranging leads to the asymptotic

$\displaystyle n! = (1 + O(1/n)) e^{1-C} \sqrt{n} n^n e^{-n} \ \ \ \ \ (4)$

so we see that ${n!}$ actually lies roughly at the geometric mean of the two bounds in (3).

This argument does not easily reveal what the constant ${C}$ actually is (though it can in principle be computed numerically to any specified level of accuracy by this method). To find this out, we take a different tack, interpreting the factorial via the Gamma function. Repeated integration by parts reveals the identity

$\displaystyle n! = \int_0^\infty t^n e^{-t}\ dt. \ \ \ \ \ (5)$

So to estimate ${n!}$, it suffices to estimate the integral in (5). Elementary calculus reveals that the integrand ${t^n e^{-t}}$ achieves its maximum at ${t=n}$, so it is natural to make the substitution ${t = n+s}$, obtaining

$\displaystyle n! = \int_{-n}^\infty (n+s)^n e^{-n-s}\ ds$

which we can simplify a little bit as

$\displaystyle n! = n^n e^{-n} \int_{-n}^\infty (1+\frac{s}{n})^n e^{-s}\ ds,$

pulling out the now-familiar factors of ${n^n e^{-n}}$. We combine the integrand into a single exponential,

$\displaystyle n! = n^n e^{-n} \int_{-n}^\infty \exp( n \log(1+\frac{s}{n}) - s )\ ds.$

From Taylor expansion we see that

$\displaystyle n \log(1+\frac{s}{n}) = s - \frac{s^2}{2n} + \ldots$

so we heuristically have

$\displaystyle \exp( n \log(1+\frac{s}{n}) - s ) \approx \exp( -s^2 / 2n ).$

To achieve this approximation rigorously, we first scale ${s}$ by ${\sqrt{n}}$ to remove the ${n}$ in the denominator. Making the substitution ${s = \sqrt{n} x}$, we obtain

$\displaystyle n! = \sqrt{n} n^n e^{-n} \int_{-\sqrt{n}}^\infty \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)\ dx,$

thus extracting the factor of ${\sqrt{n}}$ that we know from (4) has to be there.

Now, Taylor expansion tells us that for fixed ${x}$, we have the pointwise convergence

$\displaystyle \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x) \rightarrow \exp( -x^2 / 2 ) \ \ \ \ \ (6)$

as ${n \rightarrow \infty}$. To be more precise, as the function ${n \log(1 + \frac{x}{\sqrt{n}})}$ equals ${0}$ with derivative ${\sqrt{n}}$ at the origin, and has second derivative ${\frac{-1}{(1+x/\sqrt{n})^2}}$, we see from two applications of the fundamental theorem of calculus that

$\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x = -\int_0^x \frac{(x-y) dy}{(1+y/\sqrt{n})^2}.$

This gives a uniform upper bound

$\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x \leq - c x^2$

for some ${c > 0}$ when ${|x| \leq \sqrt{n}}$, and

$\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x \leq - c |x| \sqrt{n}$

for ${x > \sqrt{n}}$. This is enough to keep the integrands ${\exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)}$ dominated by an absolutely integrable function. By (6) and the Lebesgue dominated convergence theorem, we thus have

$\displaystyle \int_{-\sqrt{n}}^\infty \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)\ dx \rightarrow \int_{-\infty}^\infty \exp(-x^2/2)\ dx.$

A classical computation (based for instance on computing ${\int_{-\infty}^\infty \int_{-\infty}^\infty \exp(-(x^2+y^2)/2)\ dx dy}$ in both Cartesian and polar coordinates) shows that

$\displaystyle \int_{-\infty}^\infty \exp(-x^2/2)\ dx = \sqrt{2\pi}$

and so we conclude Stirling’s formula

$\displaystyle n! = (1+o(1)) \sqrt{2\pi n} n^n e^{-n}. \ \ \ \ \ (7)$

Remark 1 The dominated convergence theorem does not immediately give any effective rate on the decay ${o(1)}$ (though such a rate can eventually be extracted by a quantitative version of the above argument. But one can combine (7) with (4) to show that the error rate is of the form ${O(1/n)}$. By using fancier versions of the trapezoid rule (e.g. Simpson’s rule) one can obtain an asymptotic expansion of the error term in ${1/n}$, but we will not need such an expansion in this course.

Remark 2 The derivation of (7) demonstrates some general principles concerning the estimation of exponential integrals ${\int e^{\phi(x)}\ dx}$ when ${\phi}$ is large. Firstly, the integral is dominated by the local maxima of ${\phi}$. Then, near these maxima, ${e^{\phi(x)}}$ usually behaves like a rescaled Gaussian, as can be seen by Taylor expansion (though more complicated behaviour emerges if the second derivative of ${\phi}$ degenerates). So one can often understand the asymptotics of such integrals by a change of variables designed to reveal the Gaussian behaviour. A similar set of principles also holds for oscillatory exponential integrals ${\int e^{i \phi(x)}\ dx}$; these principles are collectively referred to as the method of stationary phase.

One can use Stirling’s formula to estimate binomial coefficients. Here is a crude bound:

Exercise 1 (Entropy formula) Let ${n}$ be large, let ${0 < \gamma < 1}$ be fixed, and let ${1 \leq m \leq n}$ be an integer of the form ${m = (\gamma+o(1)) n}$. Show that ${\binom{n}{m} = \exp( (h(\gamma)+o(1)) n )}$, where ${h(\gamma)}$ is the entropy function

$\displaystyle h(\gamma) := \gamma \log \frac{1}{\gamma} + (1-\gamma) \log \frac{1}{1-\gamma}.$

For ${m}$ near ${n/2}$, one also has the following more precise bound:

Exercise 2 (Refined entropy formula) Let ${n}$ be large, and let ${1 \leq m \leq n}$ be an integer of the form ${m = n/2 + k}$ for some ${k = o(n^{2/3})}$. Show that

$\displaystyle \binom{n}{m} = (\sqrt{\frac{2}{\pi}} + o(1)) \frac{2^n}{\sqrt{n}} \exp( - 2k^2/n ). \ \ \ \ \ (8)$

Note the gaussian-type behaviour in ${k}$. This can be viewed as an illustration of the central limit theorem when summing iid Bernoulli variables ${X_1,\ldots,X_n \in \{0,1\}}$, where each ${X_i}$ has a ${1/2}$ probability of being either ${0}$ or ${1}$. Indeed, from (8) we see that

$\displaystyle {\bf P}( X_1 +\ldots + X_n = n/2 + k ) = (\sqrt{\frac{2}{\pi}} + o(1)) \frac{1}{\sqrt{n}} \exp( - 2k^2/n )$

when ${k = o(n^{2/3})}$, which suggests that ${X_1+\ldots+X_n}$ is distributed roughly like the gaussian ${N( n/2, n/4 )}$ with mean ${n/2}$ and variance ${n/4}$.

Update, Jan 4: Rafe Mazzeo pointed me to this short article of Joe Keller that gives a heuristic derivation of the full asymptotic expansion of Stirling’s formula from a Taylor expansion of the Gamma function.