In this supplemental set of notes we derive some approximations for , when is large, and in particular Stirling’s formula. This formula (and related formulae for binomial coefficients will be useful for estimating a number of combinatorial quantities in this course, and also in allowing one to analyse discrete random walks accurately.
so we know already that is within an exponential factor of . (One can obtain a cruder version of this fact without Taylor expansion, by observing the trivial lower bound coming from considering the second half of the product .)
One can do better by starting with the identity
and viewing the right-hand side as a Riemann integral approximation to . Indeed a simple area comparison (cf. the integral test) yields the inequalities
so the lower bound in (1) was only off by a factor of or so. (This illustrates a general principle, namely that one can often get a non-terrible bound for a series (in this case, the Taylor series for ) by using the largest term in that series (which is ).)
One can do better by using the trapezoid rule. On any interval , has a second derivative of , which by Taylor expansion leads to the approximation
for some error .
The error is absolutely convergent; by the integral test, we have for some absolute constant . Performing this sum, we conclude that
so we see that actually lies roughly at the geometric mean of the two bounds in (3).
This argument does not easily reveal what the constant actually is (though it can in principle be computed numerically to any specified level of accuracy by this method). To find this out, we take a different tack, interpreting the factorial via the Gamma function. Repeated integration by parts reveals the identity
So to estimate , it suffices to estimate the integral in (5). Elementary calculus reveals that the integrand achieves its maximum at , so it is natural to make the substitution , obtaining
which we can simplify a little bit as
pulling out the now-familiar factors of . We combine the integrand into a single exponential,
From Taylor expansion we see that
so we heuristically have
To achieve this approximation rigorously, we first scale by to remove the in the denominator. Making the substitution , we obtain
thus extracting the factor of that we know from (4) has to be there.
This gives a uniform upper bound
for some when , and
for . This is enough to keep the integrands dominated by an absolutely integrable function. By (6) and the Lebesgue dominated convergence theorem, we thus have
A classical computation (based for instance on computing in both Cartesian and polar coordinates) shows that
and so we conclude Stirling’s formula
Remark 1 The dominated convergence theorem does not immediately give any effective rate on the decay (though such a rate can eventually be extracted by a quantitative version of the above argument. But one can combine (7) with (4) to show that the error rate is of the form . By using fancier versions of the trapezoid rule (e.g. Simpson’s rule) one can obtain an asymptotic expansion of the error term in , but we will not need such an expansion in this course.
Remark 2 The derivation of (7) demonstrates some general principles concerning the estimation of exponential integrals when is large. Firstly, the integral is dominated by the local maxima of . Then, near these maxima, usually behaves like a rescaled Gaussian, as can be seen by Taylor expansion (though more complicated behaviour emerges if the second derivative of degenerates). So one can often understand the asymptotics of such integrals by a change of variables designed to reveal the Gaussian behaviour. A similar set of principles also holds for oscillatory exponential integrals ; these principles are collectively referred to as the method of stationary phase.
One can use Stirling’s formula to estimate binomial coefficients. Here is a crude bound:
Exercise 1 (Entropy formula) Let be large, let be fixed, and let be an integer of the form . Show that , where is the entropy function
For near , one also has the following more precise bound:
Note the gaussian-type behaviour in . This can be viewed as an illustration of the central limit theorem when summing iid Bernoulli variables , where each has a probability of being either or . Indeed, from (8) we see that
when , which suggests that is distributed roughly like the gaussian with mean and variance .
Update, Jan 4: Rafe Mazzeo pointed me to this short article of Joe Keller that gives a heuristic derivation of the full asymptotic expansion of Stirling’s formula from a Taylor expansion of the Gamma function.