In this supplemental set of notes we derive some approximations for {n!}, when {n} is large, and in particular Stirling’s formula. This formula (and related formulae for binomial coefficients {\binom{n}{m}} will be useful for estimating a number of combinatorial quantities in this course, and also in allowing one to analyse discrete random walks accurately.

From Taylor expansion we have {x^n/n! \leq e^x} for any {x \geq 0}. Specialising this to {x=n} we obtain a crude lower bound

\displaystyle n! \geq n^n e^{-n}. \ \ \ \ \ (1)


In the other direction, we trivially have

\displaystyle n! \leq n^n \ \ \ \ \ (2)


so we know already that {n!} is within an exponential factor of {n^n}. (One can obtain a cruder version of this fact without Taylor expansion, by observing the trivial lower bound {n! \geq (n/2)^{\lfloor n/2\rfloor}} coming from considering the second half of the product {n! = 1 \cdot \ldots \cdot n}.)

One can do better by starting with the identity

\displaystyle \log n! = \sum_{m=1}^n \log m

and viewing the right-hand side as a Riemann integral approximation to {\int_1^n \log x\ dx}. Indeed a simple area comparison (cf. the integral test) yields the inequalities

\displaystyle \int_1^n \log x\ dx \leq \sum_{m=1}^n \log m \leq \log n + \int_1^n \log x\ dx

which leads to the inequalities

\displaystyle e n^n e^{-n} \leq n! \leq e n \times n^n e^{-n} \ \ \ \ \ (3)


so the lower bound in (1) was only off by a factor of {n} or so. (This illustrates a general principle, namely that one can often get a non-terrible bound for a series (in this case, the Taylor series for {e^n}) by using the largest term in that series (which is {n^n/n!}).)

One can do better by using the trapezoid rule. On any interval {[m,m+1]}, {\log x} has a second derivative of {O( 1 / m^2 )}, which by Taylor expansion leads to the approximation

\displaystyle \int_m^{m+1} \log x\ dx = \frac{1}{2} \log m + \frac{1}{2} \log(m+1) + \epsilon_m

for some error {\epsilon_m = O( 1/m^2 )}.

The error is absolutely convergent; by the integral test, we have {\sum_{m=1}^n \epsilon_m = C + O(1/n)} for some absolute constant {C := \sum_{m=1}^\infty \epsilon_m}. Performing this sum, we conclude that

\displaystyle \int_1^n \log x\ dx = \sum_{m=1}^{n-1} \log m + \frac{1}{2} \log n + C + O(1/n)

which after some rearranging leads to the asymptotic

\displaystyle n! = (1 + O(1/n)) e^{1-C} \sqrt{n} n^n e^{-n} \ \ \ \ \ (4)


so we see that {n!} actually lies roughly at the geometric mean of the two bounds in (3).

This argument does not easily reveal what the constant {C} actually is (though it can in principle be computed numerically to any specified level of accuracy by this method). To find this out, we take a different tack, interpreting the factorial via the Gamma function. Repeated integration by parts reveals the identity

\displaystyle n! = \int_0^\infty t^n e^{-t}\ dt. \ \ \ \ \ (5)


So to estimate {n!}, it suffices to estimate the integral in (5). Elementary calculus reveals that the integrand {t^n e^{-t}} achieves its maximum at {t=n}, so it is natural to make the substitution {t = n+s}, obtaining

\displaystyle n! = \int_{-n}^\infty (n+s)^n e^{-n-s}\ ds

which we can simplify a little bit as

\displaystyle n! = n^n e^{-n} \int_{-n}^\infty (1+\frac{s}{n})^n e^{-s}\ ds,

pulling out the now-familiar factors of {n^n e^{-n}}. We combine the integrand into a single exponential,

\displaystyle n! = n^n e^{-n} \int_{-n}^\infty \exp( n \log(1+\frac{s}{n}) - s )\ ds.

From Taylor expansion we see that

\displaystyle n \log(1+\frac{s}{n}) = s - \frac{s^2}{2n} + \ldots

so we heuristically have

\displaystyle \exp( n \log(1+\frac{s}{n}) - s ) \approx \exp( -s^2 / 2n ).

To achieve this approximation rigorously, we first scale {s} by {\sqrt{n}} to remove the {n} in the denominator. Making the substitution {s = \sqrt{n} x}, we obtain

\displaystyle n! = \sqrt{n} n^n e^{-n} \int_{-\sqrt{n}}^\infty \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)\ dx,

thus extracting the factor of {\sqrt{n}} that we know from (4) has to be there.

Now, Taylor expansion tells us that for fixed {x}, we have the pointwise convergence

\displaystyle \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x) \rightarrow \exp( -x^2 / 2 ) \ \ \ \ \ (6)


as {n \rightarrow \infty}. To be more precise, as the function {n \log(1 + \frac{x}{\sqrt{n}})} equals {0} with derivative {\sqrt{n}} at the origin, and has second derivative {\frac{-1}{(1+x/\sqrt{n})^2}}, we see from two applications of the fundamental theorem of calculus that

\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x = -\int_0^x \frac{(x-y) dy}{(1+y/\sqrt{n})^2}.

This gives a uniform upper bound

\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x \leq - c x^2

for some {c > 0} when {|x| \leq \sqrt{n}}, and

\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x \leq - c |x| \sqrt{n}

for {x > \sqrt{n}}. This is enough to keep the integrands {\exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)} dominated by an absolutely integrable function. By (6) and the Lebesgue dominated convergence theorem, we thus have

\displaystyle \int_{-\sqrt{n}}^\infty \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)\ dx \rightarrow \int_{-\infty}^\infty \exp(-x^2/2)\ dx.

A classical computation (based for instance on computing {\int_{-\infty}^\infty \int_{-\infty}^\infty \exp(-(x^2+y^2)/2)\ dx dy} in both Cartesian and polar coordinates) shows that

\displaystyle \int_{-\infty}^\infty \exp(-x^2/2)\ dx = \sqrt{2\pi}

and so we conclude Stirling’s formula

\displaystyle n! = (1+o(1)) \sqrt{2\pi n} n^n e^{-n}. \ \ \ \ \ (7)


Remark 1 The dominated convergence theorem does not immediately give any effective rate on the decay {o(1)} (though such a rate can eventually be extracted by a quantitative version of the above argument. But one can combine (7) with (4) to show that the error rate is of the form {O(1/n)}. By using fancier versions of the trapezoid rule (e.g. Simpson’s rule) one can obtain an asymptotic expansion of the error term in {1/n}, but we will not need such an expansion in this course.

Remark 2 The derivation of (7) demonstrates some general principles concerning the estimation of exponential integrals {\int e^{\phi(x)}\ dx} when {\phi} is large. Firstly, the integral is dominated by the local maxima of {\phi}. Then, near these maxima, {e^{\phi(x)}} usually behaves like a rescaled Gaussian, as can be seen by Taylor expansion (though more complicated behaviour emerges if the second derivative of {\phi} degenerates). So one can often understand the asymptotics of such integrals by a change of variables designed to reveal the Gaussian behaviour. A similar set of principles also holds for oscillatory exponential integrals {\int e^{i \phi(x)}\ dx}; these principles are collectively referred to as the method of stationary phase.

One can use Stirling’s formula to estimate binomial coefficients. Here is a crude bound:

Exercise 1 (Entropy formula) Let {n} be large, let {0 < \gamma < 1} be fixed, and let {1 \leq m \leq n} be an integer of the form {m = (\gamma+o(1)) n}. Show that {\binom{n}{m} = \exp( (h(\gamma)+o(1)) n )}, where {h(\gamma)} is the entropy function

\displaystyle h(\gamma) := \gamma \log \frac{1}{\gamma} + (1-\gamma) \log \frac{1}{1-\gamma}.

For {m} near {n/2}, one also has the following more precise bound:

Exercise 2 (Refined entropy formula) Let {n} be large, and let {1 \leq m \leq n} be an integer of the form {m = n/2 + k} for some {k = o(n^{2/3})}. Show that

\displaystyle \binom{n}{m} = (\sqrt{\frac{2}{\pi}} + o(1)) \frac{2^n}{\sqrt{n}} \exp( - 2k^2/n ). \ \ \ \ \ (8)


Note the gaussian-type behaviour in {k}. This can be viewed as an illustration of the central limit theorem when summing iid Bernoulli variables {X_1,\ldots,X_n \in \{0,1\}}, where each {X_i} has a {1/2} probability of being either {0} or {1}. Indeed, from (8) we see that

\displaystyle {\bf P}( X_1 +\ldots + X_n = n/2 + k ) = (\sqrt{\frac{2}{\pi}} + o(1)) \frac{1}{\sqrt{n}} \exp( - 2k^2/n )

when {k = o(n^{2/3})}, which suggests that {X_1+\ldots+X_n} is distributed roughly like the gaussian {N( n/2, n/4 )} with mean {n/2} and variance {n/4}.

Update, Jan 4: Rafe Mazzeo pointed me to this short article of Joe Keller that gives a heuristic derivation of the full asymptotic expansion of Stirling’s formula from a Taylor expansion of the Gamma function.