254A, Notes 0a: Stirling’s formula

2 January, 2010 in 254A - random matrices, math.CA | Tags: factorial function, gamma function, Stirling's formula, trapezoid rule | by Terence Tao

In this supplemental set of notes we derive some approximations for ${n!}$ , when ${n}$ is large, and in particular Stirling’s formula. This formula (and related formulae for binomial coefficients ${\binom{n}{m}}$ will be useful for estimating a number of combinatorial quantities in this course, and also in allowing one to analyse discrete random walks accurately.

From Taylor expansion we have ${x^n/n! \leq e^x}$ for any ${x \geq 0}$ . Specialising this to ${x=n}$ we obtain a crude lower bound

$\displaystyle n! \geq n^n e^{-n}. \ \ \ \ \ (1)$

In the other direction, we trivially have

$\displaystyle n! \leq n^n \ \ \ \ \ (2)$

so we know already that ${n!}$ is within an exponential factor of ${n^n}$ . (One can obtain a cruder version of this fact without Taylor expansion, by observing the trivial lower bound ${n! \geq (n/2)^{\lfloor n/2\rfloor}}$ coming from considering the second half of the product ${n! = 1 \cdot \ldots \cdot n}$ .)

One can do better by starting with the identity

$\displaystyle \log n! = \sum_{m=1}^n \log m$

and viewing the right-hand side as a Riemann integral approximation to ${\int_1^n \log x\ dx}$ . Indeed a simple area comparison (cf. the integral test) yields the inequalities

$\displaystyle \int_1^n \log x\ dx \leq \sum_{m=1}^n \log m \leq \log n + \int_1^n \log x\ dx$

which leads to the inequalities

$\displaystyle e n^n e^{-n} \leq n! \leq e n \times n^n e^{-n} \ \ \ \ \ (3)$

so the lower bound in (1) was only off by a factor of ${n}$ or so. (This illustrates a general principle, namely that one can often get a non-terrible bound for a series (in this case, the Taylor series for ${e^n}$ ) by using the largest term in that series (which is ${n^n/n!}$ ).)

One can do better by using the trapezoid rule. On any interval ${[m,m+1]}$ , ${\log x}$ has a second derivative of ${O( 1 / m^2 )}$ , which by Taylor expansion leads to the approximation

$\displaystyle \int_m^{m+1} \log x\ dx = \frac{1}{2} \log m + \frac{1}{2} \log(m+1) + \epsilon_m$

for some error ${\epsilon_m = O( 1/m^2 )}$ .

The error is absolutely convergent; by the integral test, we have ${\sum_{m=1}^n \epsilon_m = C + O(1/n)}$ for some absolute constant ${C := \sum_{m=1}^\infty \epsilon_m}$ . Performing this sum, we conclude that

$\displaystyle \int_1^n \log x\ dx = \sum_{m=1}^{n-1} \log m + \frac{1}{2} \log n + C + O(1/n)$

which after some rearranging leads to the asymptotic

$\displaystyle n! = (1 + O(1/n)) e^{1-C} \sqrt{n} n^n e^{-n} \ \ \ \ \ (4)$

so we see that ${n!}$ actually lies roughly at the geometric mean of the two bounds in (3).

This argument does not easily reveal what the constant ${C}$ actually is (though it can in principle be computed numerically to any specified level of accuracy by this method). To find this out, we take a different tack, interpreting the factorial via the Gamma function. Repeated integration by parts reveals the identity

$\displaystyle n! = \int_0^\infty t^n e^{-t}\ dt. \ \ \ \ \ (5)$

So to estimate ${n!}$ , it suffices to estimate the integral in (5). Elementary calculus reveals that the integrand ${t^n e^{-t}}$ achieves its maximum at ${t=n}$ , so it is natural to make the substitution ${t = n+s}$ , obtaining

$\displaystyle n! = \int_{-n}^\infty (n+s)^n e^{-n-s}\ ds$

which we can simplify a little bit as

$\displaystyle n! = n^n e^{-n} \int_{-n}^\infty (1+\frac{s}{n})^n e^{-s}\ ds,$

pulling out the now-familiar factors of ${n^n e^{-n}}$ . We combine the integrand into a single exponential,

$\displaystyle n! = n^n e^{-n} \int_{-n}^\infty \exp( n \log(1+\frac{s}{n}) - s )\ ds.$

From Taylor expansion we see that

$\displaystyle n \log(1+\frac{s}{n}) = s - \frac{s^2}{2n} + \ldots$

so we heuristically have

$\displaystyle \exp( n \log(1+\frac{s}{n}) - s ) \approx \exp( -s^2 / 2n ).$

To achieve this approximation rigorously, we first scale ${s}$ by ${\sqrt{n}}$ to remove the ${n}$ in the denominator. Making the substitution ${s = \sqrt{n} x}$ , we obtain

$\displaystyle n! = \sqrt{n} n^n e^{-n} \int_{-\sqrt{n}}^\infty \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)\ dx,$

thus extracting the factor of ${\sqrt{n}}$ that we know from (4) has to be there.

Now, Taylor expansion tells us that for fixed ${x}$ , we have the pointwise convergence

$\displaystyle \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x) \rightarrow \exp( -x^2 / 2 ) \ \ \ \ \ (6)$

as ${n \rightarrow \infty}$ . To be more precise, as the function ${n \log(1 + \frac{x}{\sqrt{n}})}$ equals ${0}$ with derivative ${\sqrt{n}}$ at the origin, and has second derivative ${\frac{-1}{(1+x/\sqrt{n})^2}}$ , we see from two applications of the fundamental theorem of calculus that

$\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x = -\int_0^x \frac{(x-y) dy}{(1+y/\sqrt{n})^2}.$

This gives a uniform upper bound

$\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x \leq - c x^2$

for some ${c > 0}$ when ${|x| \leq \sqrt{n}}$ , and

$\displaystyle n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x \leq - c |x| \sqrt{n}$

for ${x > \sqrt{n}}$ . This is enough to keep the integrands ${\exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)}$ dominated by an absolutely integrable function. By (6) and the Lebesgue dominated convergence theorem, we thus have

$\displaystyle \int_{-\sqrt{n}}^\infty \exp( n \log(1 + \frac{x}{\sqrt{n}}) - \sqrt{n} x)\ dx \rightarrow \int_{-\infty}^\infty \exp(-x^2/2)\ dx.$

A classical computation (based for instance on computing ${\int_{-\infty}^\infty \int_{-\infty}^\infty \exp(-(x^2+y^2)/2)\ dx dy}$ in both Cartesian and polar coordinates) shows that

$\displaystyle \int_{-\infty}^\infty \exp(-x^2/2)\ dx = \sqrt{2\pi}$

and so we conclude Stirling’s formula

$\displaystyle n! = (1+o(1)) \sqrt{2\pi n} n^n e^{-n}. \ \ \ \ \ (7)$

Remark 1 The dominated convergence theorem does not immediately give any effective rate on the decay ${o(1)}$ (though such a rate can eventually be extracted by a quantitative version of the above argument. But one can combine (7) with (4) to show that the error rate is of the form ${O(1/n)}$ . By using fancier versions of the trapezoid rule (e.g. Simpson’s rule) one can obtain an asymptotic expansion of the error term in ${1/n}$ , but we will not need such an expansion in this course.

Remark 2 The derivation of (7) demonstrates some general principles concerning the estimation of exponential integrals ${\int e^{\phi(x)}\ dx}$ when ${\phi}$ is large. Firstly, the integral is dominated by the local maxima of ${\phi}$ . Then, near these maxima, ${e^{\phi(x)}}$ usually behaves like a rescaled Gaussian, as can be seen by Taylor expansion (though more complicated behaviour emerges if the second derivative of ${\phi}$ degenerates). So one can often understand the asymptotics of such integrals by a change of variables designed to reveal the Gaussian behaviour. A similar set of principles also holds for oscillatory exponential integrals ${\int e^{i \phi(x)}\ dx}$ ; these principles are collectively referred to as the method of stationary phase.

One can use Stirling’s formula to estimate binomial coefficients. Here is a crude bound:

Exercise 1 (Entropy formula) Let ${n}$ be large, let ${0 < \gamma < 1}$ be fixed, and let ${1 \leq m \leq n}$ be an integer of the form ${m = (\gamma+o(1)) n}$ . Show that ${\binom{n}{m} = \exp( (h(\gamma)+o(1)) n )}$ , where ${h(\gamma)}$ is the entropy function

$\displaystyle h(\gamma) := \gamma \log \frac{1}{\gamma} + (1-\gamma) \log \frac{1}{1-\gamma}.$

For ${m}$ near ${n/2}$ , one also has the following more precise bound:

Exercise 2 (Refined entropy formula) Let ${n}$ be large, and let ${1 \leq m \leq n}$ be an integer of the form ${m = n/2 + k}$ for some ${k = o(n^{2/3})}$ . Show that

$\displaystyle \binom{n}{m} = (\sqrt{\frac{2}{\pi}} + o(1)) \frac{2^n}{\sqrt{n}} \exp( - 2k^2/n ). \ \ \ \ \ (8)$

Note the gaussian-type behaviour in ${k}$ . This can be viewed as an illustration of the central limit theorem when summing iid Bernoulli variables ${X_1,\ldots,X_n \in \{0,1\}}$ , where each ${X_i}$ has a ${1/2}$ probability of being either ${0}$ or ${1}$ . Indeed, from (8) we see that

$\displaystyle {\bf P}( X_1 +\ldots + X_n = n/2 + k ) = (\sqrt{\frac{2}{\pi}} + o(1)) \frac{1}{\sqrt{n}} \exp( - 2k^2/n )$

when ${k = o(n^{2/3})}$ , which suggests that ${X_1+\ldots+X_n}$ is distributed roughly like the gaussian ${N( n/2, n/4 )}$ with mean ${n/2}$ and variance ${n/4}$ .

Update, Jan 4: Rafe Mazzeo pointed me to this short article of Joe Keller that gives a heuristic derivation of the full asymptotic expansion of Stirling’s formula from a Taylor expansion of the Gamma function.

40 comments

Comments feed for this article

3 January, 2010 at 12:57 am

Li Jing

hi, good lecture notes. There is one typo: “so we heuristically have.. ” the formula on right hand side, you miss one negative sign.

[Corrected, thanks – T.]

3 January, 2010 at 1:24 am

Georges Elencwajg

Dear Terry,
this post of yours makes me wonder why the hundreds of books on calculus I have browsed in my life never even came close to your remark at the beginning “so we know already that n! is within an exponential factor of n^n”.
There is obviously something flawed in the realm of pedagogy if it takes a mathematician of your caliber to make such childish remarks.
I am using the word “childish” deliberately, because of the the child in Hans Christian Andersen’s tale who exclaims that the emperor has no clothes.
There is also the well-known Newton quotation
“I was like a boy playing on the sea-shore, and diverting myself now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me”.
And Grothendieck likes to compare himself to a child in his autobiography “Récoltes et Semailles”.
It might seem strange to compare you to Grothendieck, who in a sense is completely antipodal to you ( I’m not sure he knows Sterling’s formula!), but your obvious common love of simplicity and hatred of showing-off might reveal some conceptual similarity, appearances to the contrary notwithstanding.

But enough ot this fuzzy philosophizing: it is high time to thank you heartily for this magnificent note : finally Sterling’s formula is not more mysterious, when explained by you , than the hypercohomology of the De Rham complex on a scheme…

7 January, 2010 at 12:26 am

Manjil P. Saikia

Greatness lies in knowing its boundaries and genius is just that boundary, I guess. It takes the genius and greatness of Prof. Tao to write such nice and illuminating pots.

3 January, 2010 at 10:21 pm

254A, Notes 1: Concentration of measure « What’s new

[…] after using a crude form of Stirling’s formula […]

5 January, 2010 at 4:19 pm

254A, Notes 0: A review of probability theory « What’s new

[…] after using a crude form of Stirling’s formula […]

6 January, 2010 at 11:59 am

254A, Notes 2: The central limit theorem « What’s new

[…] by those with . Now make this intuition precise.) Exercise 2 Use Stirling’s formula from Notes 0a to verify the central limit theorem in the case when is a Bernoulli distribution, taking the […]

9 January, 2010 at 12:02 pm

254A, Notes 3: The operator norm of a random matrix « What’s new

[…] that the median (or mean) of is at least . On the other hand, from Stirling’s formula (Notes 0a) we see that converges to as . Taking to be a slowly growing function of , we conclude […]

10 January, 2010 at 9:54 am

Nathan Cook

I think that in (3) both of the derived bounds are too small by a factor of e. Certainly the upper bound doesn’t hold, for instance if n=2.

[Corrected, thanks – T.]

20 January, 2010 at 7:49 pm

Задача 565. Решение « Le blog de Nicky

[…] P.S. Было вычитано у Теренса Тао. […]

12 March, 2010 at 4:46 pm

Anonymous

Dear Prof. Tao,

How do you get (3) from the inequalities just above it? I could not figure it out

thanks

12 March, 2010 at 4:52 pm

Terence Tao

Compute the antiderivative of $\log x$ , use the fundamental theorem of calculus, then exponentiate all sides of the equation preceding (3).

12 March, 2010 at 5:01 pm

Anonymous

I see it now. Thank you. It is really a great post.

14 March, 2010 at 6:06 pm

Weiyu

In the equation before “A classical computation (based for instance on computing”, maybe a “dx” is missing.

[Corrected, thanks. -T]

8 April, 2010 at 2:34 pm

Pablo Lessa

Hi everyone,

I kept wondering about the appearance of entropy in the “entropy formula” (exercise 1), and I found the following probabilistic “almost proof” which I decided to share here (I’m hoping this is considered appropriate, if not I apologize). Does anybody know other interesting explanations as to why entropy occurs in the formula?

Take $X_1,\ldots, X_n,\ldots$ independent random variables with $P(X_n = 1) = \gamma, P(X_n = 0) = 1-\gamma$ . Define $S_n = X_1 + \cdots + X_n$ and let $p(n,k) = P(S_n = k) = \gamma^k(1-\gamma)^{n-k}\left(\begin{array}{c}n\\k\end{array}\right)$ for each $n,k$ .

Since $S_n$ takes only $n$ values it’s entropy is less than $\log(n)$ . From this one deduces (exchanging limit with expected value) that almost surely $\frac{1}{n}\log\left(\frac{1}{p(n,S_n)}\right)$ goes to zero. Substituting the expresion for this and using the law of large numbers one obtains that almost surely:

$\frac{1}{n}\log\left(\begin{array}{c}n\\S_n\end{array}\right) = h(\gamma) + o(1)$

Which is to say that there are a bunch of sequences $S_n = \gamma n + o(1)$ that satisfy the entropy formula (but sadly, it doesn’t show that they all do).

25 April, 2010 at 6:54 pm

t8m8r

In the second paragraph, a formula appears to be missing after “we obtain a crude lower bound”.

[Corrected, thanks – T.]

29 June, 2010 at 6:26 am

Anonymous

Dear Prof.Tao
I am puzzled by Exercise2. Can you give me some hints? Why k=o(n2/3)?
Thanks.

25 August, 2010 at 3:01 am

Anonymous

Dear Prof. Tao
I have a question about Entropy Formula in Exercise 1.
If N is a integer which satisfied
$N\leq\mathrm{C}_n^0+\mathrm{C}_n^1+\cdots+\mathrm{C}_n^{\lfloor\rho n\rfloor}$
where 0<\rho<1, can we arrive at the conclusion
$lnN\leq nh(\rho)(1+o(1))$
by Entropy Formula. If it is right, can you tell me how to get it? Some hints will be OK.

25 August, 2010 at 8:15 am

Terence Tao

(Assuming $C^k_n$ means $\binom{n}{k}$ ) This formula is only valid for $\rho \leq 1/2$ ; for larger values of $\rho$ , the partial sum is of size $n (\log 2 + O(1))$ .

To estimate the order of magnitude of a sum of positive terms, one can often just bound the largest term in the sum, and then multiply by the number of terms, for a crude upper bound. In this case, the largest term will be $\binom{n}{\lfloor \rho n \rfloor}$ if $\rho \leq 1/2$ .

26 August, 2010 at 6:40 am

Anonymous

Dear Prof.Tao
Thanks for the quick reply. I see your point. As you said, since
$\binom{n}{\lfloor\rho n\rfloor}=\exp(n(h(\rho)+o(1)))$ ,
we can get the crude estimate
$N\leq\binom{n}{0}+\binom{n}{1}+\cdots+\binom{n}{\lfloor\rho n\rfloor}\leq \lfloor\rho n\rfloor\exp(n(h(\rho)+o(1)))$ ,
when $0<\rho\leq1/2$ ,that is,
$\ln N\leq n(h(\rho)+\frac{\ln \lfloor\rho n\rfloor}{n}+o(1))$ .
Since $\frac{\ln \lfloor\rho n\rfloor}{n}+o(1)=o(1)$ , then
$\ln N\leq n(h(\rho)+o(1))$ .
But when $1/2<\rho<1$ , we have
$\ln N\leq n(\log2+\frac{\ln \lfloor\rho n\rfloor}{n}+o(1))$ ,
that is, $\ln N\leq n(\log2+o(1))$ . Am I right? If it is so, there is a typo in the reply.

2 October, 2010 at 10:14 pm

Alon

Dan Romik published a short note with a slick, though perhaps less well-motivated, proof of Stirling’s formula: http://www.stat.berkeley.edu/~romik/paperfiles/stirling.pdf

1 November, 2010 at 11:19 pm

Stirling’s formula « Mathsnail

[…] formula is an approximation for large factorials, that is . A good exposition see Tao’s blog. […]

30 January, 2011 at 2:44 pm

Lower bounds on off-diagonal Ramsey numbers « Annoying Precision

[…] the logarithm of both sides and bounding the corresponding Riemann sum by an integral; see also Terence Tao’s notes on Stirling’s formula) […]

14 March, 2011 at 7:22 am

Pi is still wrong « Annoying Precision

[…] of in the definition of the Gaussian distribution (which is where the factor of comes from in Stirling’s formula) is that the Gaussian distribution is its own Fourier transform. This factor is commonly cited as […]

23 July, 2011 at 7:40 pm

Erdos’ divisor bound « What’s new

[…] with sharper bounds available by using tools such as the Euler-Maclaurin formula (see this blog post). Exponentiating such asymptotics, incidentally, leads to one of the standard proofs of Stirling’s formula (as discussed in this blog post). […]

25 August, 2011 at 2:09 pm

The Collatz conjecture, Littlewood-Offord theory, and powers of 2 and 3 « What’s new

[…] some absolute constant . Meanwhile, Stirling’s formula (as discussed in this previous post) combined with the approximation […]

8 October, 2013 at 5:40 pm

KCd

This proof using the dominated convergence theorem can be found in a 1989 Monthly article: J. M. Patin, A Very Short Proof of Stirling’s Formula, Amer. Math. Monthly 96 (1989), 41–42.

29 December, 2014 at 8:01 pm

Intuition for the definition of the Gamma function? | CL-UAT

[…] these notes by Terence Tao is a proof of Stirling’s formula. I really like most of it, but at a crucial step he uses the […]

2 November, 2015 at 7:06 pm

275A, Notes 4: The central limit theorem | What's new

[…] is a function of that goes to zero as . (A proof of this formula may be found in this previous blog post.) Using this formula, and without using the central limit theorem, show […]

15 April, 2016 at 7:52 pm

Herman Jaramillo

You expanded the argument of the exponential function into a Taylor series. After the term of order $-s^2/2n$ there comes the term $s^3/3n^2$. This terms worries me. Taylor series are local. That is, it should be good for $|s|<<1$. Still the integral is between $-\infty$ to $\infty$. Why does not the error introduce by this next factor $exp(s^3/3n^2)$ count? The integral for that factor alone diverges.

16 April, 2016 at 5:52 am

Herman Jaramillo

I believe I have an answer for my own question above. As $n$ grows the “Gaussian” shape of the integrand t^n exp(-t) narrows around the “center” or peak which is at t=n. Since we are shifting the function to have the origin at t=n. The Taylor approximation makes sense around that location. Yes, the function extendes between -infty and infty, but the gross of the integral is due to the near to the maxima points since in the limit this “Gaussian” shape becomes an spike or some kind of Dirac Delta.

16 April, 2016 at 6:23 am

Anonymous

In the fifth line below (6), it should be “uniform upper bound” (instead of “uniform lower bound”). Also, in the RHS of the second estimate for this bound, “ $x$ ” should be replaced by “ $|x|$ “.

[Corrected, thanks – T.]

17 April, 2016 at 5:03 am

Anonymous

It is not difficult to verify (via maximization over $n$ ), that the integrand is dominated by $(1+|x|) e^{-|x|}$ – which is best possible for $x \geq 0$ (as it attained by the integrand for $n=1$ ).

18 May, 2016 at 12:13 am

A symmetric formulation of the Croot-Lev-Pach-Ellenberg-Gijswijt capset bound | What's new

[…] tool of Cramer’s theorem, but can also be derived from Stirling’s formula (discussed in this previous post). Indeed, if , , for some summing to , Stirling’s formula […]

22 September, 2016 at 8:45 am

254A, Notes 1: Elementary multiplicative number theory | What's new

[…] any (a weak form of Stirling’s formula, discussed in this previous blog post), and more generally one […]

17 February, 2017 at 9:57 pm

254A, Notes 2: The central limit theorem | What's new

[…] is a function of that goes to zero as . (A proof of this formula may be found in this previous blog post.) Using this formula, and without using the central limit theorem, show […]

13 July, 2017 at 2:50 am

Bruno Bauwens

To go from pointwise convergence to convergence of the integrals below (6), one can use Lebesgue’s monotone convergence theorem: the values $ n log (1+\sqrt(n)) – \sqrt{n} x $ are increasing in n if x is negative, and are decreasing in n if x is positive. This is because for x=0 all values are 0, and the derivative of x evaluates to
\[
\frac{-x}{1+x/\sqrt{n}},
\]
which is decreasing in n both for negative and positive x.

9 November, 2017 at 11:49 am

Continuous approximations to arithmetic functions | What's new

[…] Stirling’s formula, or the Dirichlet series […]

19 January, 2018 at 4:21 am

The De Bruijn-Newman constant is non-negativ | What's new

[…] and using the method of steepest descent (actually it is slightly simpler to rely instead on the Stirling approximation for the Gamma function, which can be proven in turn by steepest descent methods). Fortunately, it […]

6 January, 2019 at 2:35 pm

George Colpitts

29 December, 2022 at 9:49 am

Stirling’s formula – On Math

[…] [ 2 ] Terence Tao’s blog article “254A, Notes 0a: Stirling’s formula" : https://terrytao.wordpress.com/2010/01/02/254a-notes-0a-stirlings-formula/ […]

	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Erratum for “An inverse…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on Infinite partial sumsets in th…
	Anonymous on A Banach algebra proof of the…
	Anonymous on A Banach algebra proof of the…
	Aleksandar on 245C, Notes 4: Sobolev sp…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Anonymous on Erratum for “An inverse…
	Terence Tao on 245C, Notes 4: Sobolev sp…

254A, Notes 0a: Stirling’s formula

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

40 comments

Leave a comment Cancel reply

For commenters

254A, Notes 0a: Stirling’s formula

Share this:

Recent Comments

Articles by others

Diversions

Mathematics

Selected articles

Software

The sciences

Top Posts

Archives

Categories

The Polymath Blog

40 comments

Leave a comment Cancel reply

For commenters