In the previous set of notes we established the central limit theorem, which we formulate here as follows:

Theorem 1 (Central limit theorem)Let be iid copies of a real random variable of mean and variance , and write . Then, for any fixed , we have

This is however not the end of the matter; there are many variants, refinements, and generalisations of the central limit theorem, and the purpose of this set of notes is to present a small sample of these variants.

First of all, the above theorem does not quantify the *rate* of convergence in (1). We have already addressed this issue to some extent with the Berry-Esséen theorem, which roughly speaking gives a convergence rate of uniformly in if we assume that has finite third moment. However there are still some quantitative versions of (1) which are not addressed by the Berry-Esséen theorem. For instance one may be interested in bounding the *large deviation probabilities*

in the setting where grows with . Chebyshev’s inequality gives an upper bound of for this quantity, but one can often do much better than this in practice. For instance, the central limit theorem (1) suggests that this probability should be bounded by something like ; however, this theorem only kicks in when is very large compared with . For instance, if one uses the Berry-Esséen theorem, one would need as large as or so to reach the desired bound of , even under the assumption of finite third moment. Basically, the issue is that convergence-in-distribution results, such as the central limit theorem, only really control the *typical* behaviour of statistics in ; they are much less effective at controlling the very rare *outlier* events in which the statistic strays far from its typical behaviour. Fortunately, there are large deviation inequalities (or *concentration of measure inequalities*) that do provide exponential type bounds for quantities such as (2), which are valid for both small and large values of . A basic example of this is the Chernoff bound that made an appearance in Exercise 47 of Notes 4; here we give some further basic inequalities of this type, including versions of the Bennett and Hoeffding inequalities.

In the other direction, we can also look at the fine scale behaviour of the sums by trying to control probabilities such as

where is now bounded (but can grow with ). The central limit theorem predicts that this quantity should be roughly , but even if one is able to invoke the Berry-Esséen theorem, one cannot quite see this main term because it is dominated by the error term in Berry-Esséen. There is good reason for this: if for instance takes integer values, then also takes integer values, and can vanish when is less than and is slightly larger than an integer. However, this turns out to essentially be the only obstruction; if does not lie in a lattice such as , then we can establish a *local limit theorem* controlling (3), and when does take values in a lattice like , there is a discrete local limit theorem that controls probabilities such as . Both of these limit theorems will be proven by the Fourier-analytic method used in the previous set of notes.

We also discuss other limit theorems in which the limiting distribution is something other than the normal distribution. Perhaps the most common example of these theorems is the Poisson limit theorems, in which one sums a large number of indicator variables (or approximate indicator variables), each of which is rarely non-zero, but which collectively add up to a random variable of medium-sized mean. In this case, it turns out that the limiting distribution should be a Poisson random variable; this again is an easy application of the Fourier method. Finally, we briefly discuss limit theorems for other stable laws than the normal distribution, which are suitable for summing random variables of infinite variance, such as the Cauchy distribution.

Finally, we mention a very important class of generalisations to the CLT (and to the variants of the CLT discussed in this post), in which the hypothesis of joint independence between the variables is relaxed, for instance one could assume only that the form a martingale. Many (though not all) of the proofs of the CLT extend to these more general settings, and this turns out to be important for many applications in which one does not expect joint independence. However, we will not discuss these generalisations in this course, as they are better suited for subsequent courses in this series when the theory of martingales, conditional expectation, and related tools are developed.

** — 1. Large deviation inequalities — **

We now look at some upper bounds for the large deviation probability (2). To get some intuition as to what kinds of bounds one can expect, we first consider some examples. First suppose that has the standard normal distribution, then , , and has the distribution of , so that has the distribution of . We thus have

which on using the inequality leads to the bound

Next, we consider the example when is a Bernoulli random variable drawn uniformly at random from . Then , , and has the standard binomial distribution on , thus . By symmetry, we then have

We recall Stirling’s formula, which we write crudely as

as , where denotes a quantity with as (and similarly for other uses of the notation in the sequel). If and is bounded away from zero and one, we then have the asymptotic

where is the entropy function

(compare with Exercise 17 of Notes 3). One can check that is decreasing for , and so one can compute that

as for any fixed . To compare this with (4), observe from Taylor expansion that

as .

Finally, consider the example where takes values in with and for some small , thus and . We have , and hence

with

Here, we see that the large deviation probability is somewhat larger than the gaussian prediction of . Instead, the exponent is approximately related to and by the formula

We now give a general large deviations inequality that is consistent with the above examples.

Proposition 2 (Cheap Bennett inequality)Let , and let be independent random variables, each of which takes values in an interval of length at most . Write , and write for the mean of . Let be such that has variance at most . Then for any , we have

There is more precise form of this inequality known as Bennett’s inequality, but we will not prove it here.

The first term in the minimum dominates when , and the second term dominates when . Sometimes it is convenient to weaken the estimate by discarding the logarithmic factor, leading to

(possibly with a slightly different choice of ); thus we have Gaussian type large deviation estimates for as large as , and (slightly better than) exponential decay after that.

In the case when are iid copies of a random variable of mean and variance taking values in an interval of length , we have and , and the above inequality simplifies slightly to

*Proof:* We first begin with some quick reductions. Firstly, by dividing the (and , , and ) by , we may normalise ; by subtracting the mean from each of the , we may assume that the have mean zero, so that as well. We also write for the variance of each , so that . Our task is to show that

for all . We will just prove the upper tail bound

the lower tail bound then follows by replacing all with their negations , and the claim then follows by summing the two estimates.

We use the “exponential moment method”, previously seen in proving the Chernoff bound (Exercise 47 of Notes 4), in which one uses the exponential moment generating function of . On the one hand, from Markov’s inequality one has

for any real parameter . On the other hand, from the joint independence of the one has

Since the take values in an interval of length at most and have mean zero, we have and so . This leads to the Taylor expansion

so on taking expectations we have

Putting all this together, we conclude that

If (say), one can then set to be a small multiple of to obtain a bound of the form

If instead , one can set to be a small multiple of to obtain a bound of the form

In either case, the claim follows.

The following variant of the above proposition is also useful, in which we get a simpler bound at the expense of worsening the quantity slightly:

Proposition 3 (Cheap Hoeffding inequality)Let be independent random variables, with each taking values in an interval with . Write , and write for the mean of , and write

In fact one can take , a fact known as Hoeffding’s inequality; see Exercise 6 below for a special case of this.

*Proof:* We again normalise the to have mean zero, so that . We then have for each , so by Taylor expansion we have for any real that

and thus

Multiplying in , we then have

and one can now repeat the previous arguments (but without the factor to deal with).

Remark 4In the above examples, the underlying random variable was assumed to either be restricted to an interval, or to be subgaussian. This type of hypothesis is necessary if one wishes to have estimates on (2) that are similarly subgaussian. For instance, suppose has a zeta distributionfor some and all natural numbers , where . One can check that this distribution has finite mean and variance for . On the other hand, since we trivially have , we have the crude lower bound

which shows that in this case the expression (2) only decays at a polynomial rate in rather than an exponential or subgaussian rate.

Exercise 5 (Khintchine inequality)Let be iid copies of a Bernoulli random variable drawn uniformly from .

- (i) For any non-negative reals and any , show that
for some constant depending only on . When , show that one can take and equality holds.

- (ii) With the hypotheses in (i), obtain the matching lower bound
for some depending only on . (

Hint:use (i) and Hölder’s inequality.)- (iii) For any and any functions on a measure space , show that
and

with the same constants as in (i), (ii). When , show that one can take and equality holds.

- (iv) (Marcinkiewicz-Zygmund theorem) The Khintchine inequality is very useful in real analysis; we give one example here. Let be measure spaces, let , and suppose is a linear operator obeying the bound
for all and some finite . Show that for any finite sequence , one has the bound

for some constant depending only on . (

Hint:test against a random sum .)- (v) By using gaussian sums in place of random signs, show that one can take the constant in (iv) to be one. (For simplicity, let us take the functions in to be real valued.)

In this set of notes we have not focused on getting explicit constants in the large deviation inequalities, but it is not too difficult to do so with a little extra work. We give just one example here:

Exercise 6Let be iid copies of a Bernoulli random variable drawn uniformly from .

- (i) Show that for any real . (
Hint:expand both sides as an infinite Taylor series in .)- (ii) Show that for any real numbers and any , we have
(Note that this is consistent with (6) with .)

There are many further large deviation inequalities than the ones presented here. For instance, the Azuma-Hoeffding inequality gives Hoeffding-type bounds when the random variables are not assumed to be jointly independent, but are instead required to form a martingale. Concentration of measure inequalities such as McDiarmid’s inequality handle the situation in which the sum is replaced by a more nonlinear function of the input variables . There are also a number of refinements of the Chernoff estimate from the previous notes, that are collectively referred to as “Chernoff bounds“. The Bernstein inequalities handle situations in which the underlying random variable is not bounded, but enjoys good moment bounds. See this previous blog post for these inequalities and some further discussion. Last, but certainly not least, there is an extensively developed theory of large deviations which is focused on the precise exponent in the exponential decay rate for tail probabilities such as (2) when is very large (of the order of ); there is also a complementary theory of *moderate deviations* that gives precise estimates in the regime where is much larger than one, but much less than , for which we generally expect gaussian behaviour rather than exponentially decaying bounds. These topics are beyond the scope of this course.

** — 2. Local limit theorems — **

Let be iid copies of a random variable of mean and variance , and write . On the one hand, the central limit theorem tells us that should behave like the normal distribution , which has probability density function . On the other hand, if is discrete, then must also be discrete. For instance, if takes values in the integers , then takes values in the integers as well. In this case, we would expect (much as we expect a Riemann sum to approximate an integral) the probability distribution of to behave like the probability density function predicted by the central limit theorem, thus we expect

for integer . This is not a direct consequence of the central limit theorem (which does not distinguish between continuous or discrete random variables ), and in any case is not true in some cases: if is restricted to an infinite subprogression of for some and integer , then is similarly restricted to the infinite subprogression , so that (7) totally fails when is outside of (and when does lie in , one would now expect the left-hand side of (7) to be about times larger than the right-hand side, to keep the total probability close to ). However, this turns out to be the only obstruction:

Theorem 7 (Discrete local limit theorem)Let be iid copies of an integer-valued random variable of mean and variance . Suppose furthermore that there is no infinite subprogression of with for which takes values almost surely in . Then one hasfor all and all integers , where the error term is uniform in . In other words, we have

as .

Note for comparison that the Berry-Esséen theorem (writing as, say, ) would give (assuming finite third moment) an error term of instead of , which would overwhelm the main term which is also of size .

*Proof:* Unlike previous arguments, we do not have the luxury here use an affine change of variables to normalise to mean zero and variance one, as this would disrupt the hypothesis that takes values in .

Fix and . Since and are integers, we have the Fourier identity

which upon taking expectations and using Fubini’s theorem gives

where is the characteristic function of . Expanding and noting that are iid copies of , we have

It will be convenient to make the change of variables , to obtain

As in the Fourier-analytic proof of the central limit theorem, we have

as , so by Taylor expansion we have

as for any fixed . This suggests (but does not yet prove) that

A standard Fourier-analytic calculation gives

so it will now suffice to establish that

uniformly in . From dominated convergence we have

so by the triangle inequality, it suffices to show that

This will follow from (9) and the dominated convergence theorem, as soon as we can dominate the integrands by an absolutely integrable function.

From (8), there is an such that

for all , and hence

for . This gives the required domination in the region , so it remains to handle the region .

From the triangle inequality we have for all . Actually we have the stronger bound for . Indeed, if for some such , this would imply that is a deterministic constant, which means that takes values in for some real , which implies that takes values in ; since also takes values in , this would place either in a singleton set or in an infinite subprogression of (depending on whether and are rational), a contradiction. As is continuous and the region is compact, there exists such that for all . This allows us to dominate by for , which is in turn bounded by for some independent of , giving the required domination.

Of course, if the random variable in Theorem 7 did take values almost surely in some subprogression , then either is almost surely constant, or there is a minimal progression with this property (since must divide the difference of any two integers that attains with positive probability). One can then make the affine change of variables (modifying and appropriately) and apply the above theorem to obtain a similar local limit theorem, which we will not write here. For instance, if is the uniform distribution on , then this argument gives

when is an integer of the same parity as (of course, will vanish otherwise). A further affine change of variables handles the case when is not integer valued, but takes values in some other lattice , where and are now real-valued.

We can complement these local limit theorems with the following result that handles the non-lattice case:

Theorem 8 (Continuous local limit theorem)Let be iid copies of an real-valued random variable of mean and variance . Suppose furthermore that there is no infinite progression with real for which takes values almost surely in . Then for any , one hasfor all and all , where the error term is uniform in (but may depend on ).

Equivalently, if we let be a normal random variable with mean and variance , then

uniformly in . Again, this can be compared with the Berry-Esséen theorem, which (assuming finite third moment) has an error term of which is uniform in both and .

*Proof:* Unlike the discrete case, we have the luxury here of normalising and , and we shall now do so.

We first observe that it will suffice to show that

whenever is a Schwartz function whose Fourier transform

is compactly supported, and where the error can depend on but is uniform in . Indeed, if this bound (10) holds, then (after replacing by to make it positive on some interval, and then rescaling) we obtain a bound of the form

for any and , where the term can depend on but not on . Then, by convolving by an approximation to the identity of some width much smaller than with compactly supported Fourier transform, applying (10) to the resulting function, and using (11) to control the error between that function and , we see that

uniformly in , where the term can depend on and but is uniform in . Letting tend slowly to zero, we obtain the claim.

It remains to establish (10). We adapt the argument from the discrete case. By the Fourier inversion formula we may write

By Fubini’s theorem as before, we thus have

and similarly

so it suffices by the triangle inequality and the boundedness and compact support of to show that

for any fixed (where the term can depend on ). We have and , so by making the change of variables , we now need to show that

as . But this follows from the argument used to handle the discrete case.

** — 3. The Poisson central limit theorem — **

The central limit theorem (after normalising the random variables to have mean zero) studies the fluctuations of sums where each individual term is quite small (typically of size ). Now we consider a variant situation, in which one considers a sum of random variables which are *usually* zero, but occasionally equal to a larger value such as . (This situation arises in many real-life situations when compiling aggregate statistics on rare events, e.g. the number of car crashes in a short period of time.) In these cases, one can get a different distribution than the gaussian distribution, namely a Poisson distribution with some intensity – that is to say, a random variable taking values in the non-negative integers with probability distribution

One can check that this distribution has mean and variance .

Theorem 9 (Poisson central limit theorem)Let be a triangular array of real random variables, where for each , the variables are jointly independent. Assume furthermore that

- (i) ( mostly ) One has as .
- (ii) ( rarely ) One has as .
- (iii) (Convergent expectation) One has as for some .
Then the random variables converge in distribution to a Poisson random variable of intensity .

*Proof:* From hypothesis (i) and the union bound we see that for each , we have that all of the lie in with probability as . Thus, if we replace each by the restriction , the random variable is only modified on an event of probability , which does not affect distributional limits (Slutsky’s theorem). Thus, we may assume without loss of generality that the take values in .

By Exercise 20 of Notes 4, a Poisson random variable of intensity has characteristic function . Applying the Lévy convergence theorem (Theorem 27 of Notes 4), we conclude that it suffices to show that

as for any fixed .

Fix . By the independence of the , we may write

Since only takes on the values and , we can write

where . By hypothesis (ii), we have , so by using a branch of the complex logarithm that is analytic near , we can write

By Taylor expansion we have

and hence by (ii), (iii)

as , and the claim follows.

Exercise 10Establish the conclusion of Theorem 9 directly from explicit computation of the probabilities in the case when each takes values in with for some fixed .

The Poisson central limit theorem can be viewed as a degenerate limit of the central limit theorem, as seen by the next two exercises.

Exercise 11Suppose we replace the hypothesis (iii) in Theorem 9 with the alternative hypothesis that the quantities go to infinity as , while leaving hypotheses (i) and (ii) unchanged. Show that converges in distribution to the normal distribution .

Exercise 12For each , let be a Poisson random variable with intensity . Show that as , the random variables converge in distribution to the normal distribution . Discuss how this is consistent with Theorem 9 and the previous exercise.

** — 4. Stable laws — **

Let be a real random variable. We say that has a *stable law* or a *stable distribution* if for any positive reals , there exists a positive real and a real such that whenever are iid copies of . In terms of the characteristic function of , we see that has a stable law if for any positive reals , there exist a positive real and a real for which we have the functional equation

for all real .

For instance, a normally distributed variable is stable thanks to Lemma 12 of Notes 4; one can also see this from the characteristic function . A Cauchy distribution , with probability density can also be seen to be stable, as is most easily seen from the characteristic function . As a more degenerate example, any deterministic random variable is stable. It is possible (though somewhat tedious) to completely classify all the stable distributions, see for instance the Wikipedia entry on these laws for the full classification.

If is stable, and are iid copies of , then by iterating the stable law hypothesis we see that the sums are all equal in distribution to some affine rescaling of . For instance, we have for some , and a routine induction then shows that

for all natural numbers (with the understanding that when ). In particular, the random variables all have the same distribution as .

More generally, given two real random variables and , we say that is in the *basin of attraction* for if, whenever are iid copies of and , there exist constants and such that converges in distribution to . Thus, any stable law is in its own basin of attraction, while the central limit theorem asserts that any random variable of finite variance is in the basin of attraction of a normal distribution. One can check that every random variable lies in the basin of attraction of a deterministic random variable such as , simply by letting go to infinity rapidly enough. To avoid this degenerate case, we now restrict to laws that are *non-degenerate*, in the sense that they are not almost surely constant. Then we have the following useful technical lemma:

Proposition 13 (Convergence of types)Let be a sequence of real random variables converging in distribution to a non-degenerate limit . Let and be real numbers such that converges in distribution to a non-degenerate limit . Then and converge to some finite limits respectively, and .

*Proof:* Suppose first that goes to zero. The sequence converges in distribution, hence is tight, hence converges in probability to zero. In particular, if is an independent copy of , then converges in probability to zero; but also converges in distribution to where is an independent copy of , and is not almost surely zero since is non-degenerate. This is a contradiction. Similarly if has any subsequence that goes to zero. We conclude that is bounded away from zero. Rewriting as and reversing the roles of and , we conclude also that is bounded away from zero, thus is bounded.

Since is tight and is bounded, is tight; since is also tight, this implies that is tight, that is to say is bounded.

Let be a limit point of the . By Slutsky’s theorem, a subsequence of the then converges in distribution to , thus . If the limit point is unique then we are done, so suppose there are two limit points , . Thus , which on rearranging gives for some and real with .

If then on iteration we have for any natural number , which clearly leads to a contradiction as since . If then iteration gives for any natural number , which on passing to the limit in distribution as gives , again a contradiction. If then we rewrite as and again obtain a contradiction, and the claim follows.

One can use this proposition to verify that basins of attraction of genuinely distinct laws are disjoint:

Exercise 14Let and be non-degenerate real random variables. Suppose that a random variable lies in the basin of attraction of both and . Then there exist and real such that .

If lies in the basin of attraction for a non-degenerate law , then converges in distribution to ; since is equal in distribution to the sum of two iid copies of , we see that converges in distribution to the sum of two iid copies of . On the other hand, converges in distribution to . Using Proposition 13 we conclude that for some and real . One can go further and conclude that in fact has a stable law; see the following exercise. Thus stable laws are the only laws that have a non-empty basin of attraction.

Exercise 15Let lie in the basin of attraction for a non-degenerate law .

- (i) Show that for any iid copies of , there exists a unique and such that . Also show that for all natural numbers .
- (ii) Show that the are strictly increasing, with for all natural numbers . (
Hint:study the absolute value of the characteristic function, using the non-degeneracy of to ensure that this absolute value is usually strictly less than one.) Also show that for all natural numbers .- (iii) Show that there exists such that for all . (
Hint:first show that is a Cauchy sequence in .)- (iv) If , and are iid copies of , show that for all natural numbers and some bounded real . Then show that has a stable law in this case.
- (v) If , show that for some real and all . Then show that has a stable law in this case.

Exercise 16 (Classification of stable laws)Let be a non-degenerate stable law, then lies in its own basin of attraction and one can then define as in the preceding exercise.

- (i) If , and is as in part (v) of the preceding exercise, show that for all and . Then show that
for some real . (One can use the identity to restrict attention to the case of positive .)

- (ii) Now suppose . Show that for all (where the implied constant in the notation is allowed to depend on ). Conclude that for all .
- (iii) We continue to assume . Show that for some real number . (
Hint:first show this when is a power of a fixed natural number (with possibly depending on ). Then use the estimates from part (ii) to show that does not actually depend on . (One may need to invoke the Dirichlet approximation theorem to show that for any given , one can find a power of that is somewhat close to a power of .)- (iv) We continue to assume . Show that for all and . Then show that
for all and some real .

It is also possible to determine which choices of parameters are actually achievable by some random variable , but we will not do so here.

It is possible to associate a central limit theorem to each stable law, which precisely determines their basin of attraction. We will not do this in full generality, but just illustrate the situation for the Cauchy distribution.

Exercise 17Let be a real random variable which is symmetric (that is, has the same distribution as ) and obeys the distribution identityfor all , where is a function which is

slowly varyingin the sense that as for all .

- (i) Show that
as , where denotes a quantity such that as . (You may need to establish the identity , which can be done by contour integration.)

- (ii) Let be iid copies of . Show that converges in distribution to a copy of the standard Cauchy distribution (i.e., to a random variable with probability density function ).

## 30 comments

Comments feed for this article

19 November, 2015 at 3:15 pm

Anonymousoh no i’m seeing ‘latex path not specified’ errors

[This is a very strange error. Currently I can get all the LaTeX rendered when viewing the blog (or other wordpress blogs with recent latex) on one machine, but when viewed on a different machine there are plenty of “latex path not specified” errors, regardless of what browser I use. I do not know what could be causing this. -T.]19 November, 2015 at 5:22 pm

AnonymousIn the second line of the proof of proposition 2, it seems that the normalization of should be (instead of ).

[Corrected, thanks – T.]20 November, 2015 at 2:57 pm

arch1I get this problem w/ this your latest blog entry, but not w/ your previous ones. I guess that a LaTeX version change occured between 10/16 and 10/19 which broke compatibility with my OS (Windows 7 Enterprise Service Pack 1), one of yours, and Anonymous #1’s.

But I’m sure someone here can do better than guess (come on, folks..:-)

20 November, 2015 at 3:21 pm

pauljungHi Terry, at the beginning of Section 4, I guess the pi in the Cauchy density should be in the denominator.

[Corrected, thanks – T.]22 November, 2015 at 9:22 pm

BruysGee, I wish WordPress would fix this “latex path not specified.” problem. It’s not exclusive to this blog. I have viewed this blog on different machines and different versions of Windows, and different browsers.

Thanks for your blogs, Terry. When I can understand them, they’re wonderfully lucid. It would be no fun if I could understand them all ;)

23 November, 2015 at 9:33 am

AnonymousThe explanation between proposition 3 and exercise 4 is not sufficiently clear (perhaps some lines are missing) – see e.g.

1. Large part of the three lines below proposition 3 appear in blue.

2. The eleventh line below proposition 3 is ended with “\end”,

3. The 7 lines above exercise 4 are seemingly related to some (unclear – perhaps due to missing lines) example with polynomial decay rate in .

[Corrected, thanks – T.]23 November, 2015 at 1:53 pm

AnonymousIn Hoeffding’s inequality, is optimal ?

23 November, 2015 at 2:07 pm

Terence TaoYes, this can be seen from the Bernoulli example discussed at the start of the section (together with the Taylor expansion of the entropy function).

23 November, 2015 at 3:36 pm

AnonymousThanks!

Possible typo: It seems that in the third line below the definition of the entropy function , the threshold (used to define the probability in the RHS) should be .

23 November, 2015 at 5:30 pm

Terence TaoActually, I believe the factor should be there (this is measuring a large deviation event, much larger than the standard deviation).

23 November, 2015 at 6:10 pm

AnonymousI see. Thanks! (I missed the factor n in the exponent in the RHS.)

23 November, 2015 at 3:11 pm

Lior SilbermanExercise 5: shouldn’t there be th roots on the LHS? It seems to me that otherwise the inequalities won’t scale properly with the .

[Corrected, thanks – T.]24 November, 2015 at 4:42 am

AnonymousThe integral in exercise 17(i) may also be evaluated by its interpretation as

– using Plancherel theorem and its (much simpler!) Fourier transform.

24 November, 2015 at 9:42 am

AnonymousIt seems quite surprising that the RHS of (6) has a Gaussian type decay (wrt ), while the RHS of (5) has a much slower (slightly faster than exponential) decay – in spite of using more(!) information (variance information) than (6). Is there any explanation for the slower decay (i.e. looser bound) in (5) ?

24 November, 2015 at 11:24 am

Terence TaoThe left-hand side of (5) is larger than that of (6) (if is held fixed), because is smaller, and so it is harder to bound and one expects to have weaker decay in on the right-hand side. The difference between (5) and (6) is particularly highlighted in the case analysed just before Proposition 2, in which the underlying random variable takes values in but is mostly 0, so that its variance is much smaller than the square of the length of the interval containing , and so the in (5) is much smaller than that in (6). In this case, the large deviations are predominantly caused by the event when most of the are equal to 1, which is only an exponentially rare event (e.g. the probability that they are all 1 is ) rather than a quadratically-exponential event.

24 November, 2015 at 1:28 pm

AnonymousI asked about the slower decay of the RHS of (5) wrt varying threshold with all the other parameters (including ) held fixed,

To me it seems natural to expect that the RHS of (5) (having the additional variance information) should decay no slower with (for sufficiently large ).

But now I think that perhaps, the explanation is that the decay of the RHS of (5) is starting to be slower than that of (6) only(!) when , but for such large values, the LHS of both (5) and (6) should be zero(!) (since – due to the uniform boundedness of ).

30 November, 2015 at 4:39 am

AnonymousIn the introduction: {\frac{S_n-n mu}{\sqrt{n} \sigma}}, mu is missing a \ (search for ‘mu’)

[Corrected, thanks -T.]30 November, 2015 at 5:13 am

AnonymousIn Theorem 7, the sup term, has a ‘=’ instead of a ‘-‘.

[Corrected, thanks -T.]30 November, 2015 at 5:18 am

Anonymous‘attains with positive probabilit’ should be ‘attains with positive probability’ (missing y at the end). (hope this typo checking helps)

[Corrected, thanks. I certainly appreciate the corrections, though if you could combine multiple corrections into a single comment that would be helpful. -T.]30 November, 2015 at 5:30 am

Anonymous‘Levy convergence theorem’ should be ‘Lévy convergence theorem’ (being annoying, I know)

[Corrected, thanks -T.]13 December, 2015 at 7:44 pm

275A, Notes 5: Variants of the central limit theorem | jianchengblog[…] 来源： 275A, Notes 5: Variants of the central limit theorem […]

27 March, 2016 at 10:25 pm

AnonymousShould replace the Uncategorized tag with math.CA

[Corrected, thanks – T.]18 November, 2016 at 5:42 pm

AnonymousIf is stable, and are iid copies of , then by iterating the stable law hypothesis we see that the sums are all equal in distribution to some affine rescaling of . For instance, we have for some , and a routine induction then shows that

for all natural numbers (with the understanding that when .

Shouldn’t depend on and in the definition of “stable”? How do you get in the induction?

[Yes, will depend on . If and , then a short computation will show that . -T.]18 November, 2016 at 5:52 pm

AnonymousDoes the concept “basin of attraction” here have anything to do with that in dynamical systems (say an ODE)?

26 January, 2017 at 4:55 am

AnonymousHello, I would appreciate it if anyone could shed some lights on how the error term in Theorem 8 depends on “h”. In particular, how quickly can “h” grow without changing the o(1/sqrt(n)) error, and how could one read this off from the proofs in this article? Thank you for your time.

26 January, 2017 at 5:05 am

AnonymousBased on the form of the leading term in Theorem 8, it should be that we need h = o(sqrt(n)) but I’d appreciate any further comments.

26 January, 2017 at 7:09 am

AnonymousInteresting. Suppose we look at the most basic case in which is normally distributed with mean and variance . Then direct computation via calculus seems to suggest that the error term in Theorem 8 behaves like

as . It this were correct, then it would contradict the fact that the error term is uniform in .

If I made a mistake, please let me know.

[You are missing a term of the form – T.]28 January, 2017 at 10:13 am

Terence TaoThe proof here is not optimised to give good dependence on . The precise convergence rate of local limit theorems has been extensively studied in the literature, but it is hard to give a succinct summary of the results, because the nature of the convergence often depends on the hypotheses one places on the . Unfortunately I don’t know of a good survey of the latest bounds in this area (but these sorts of issues were studied extensively by the Russian school, so texts by authors from that school may be helpful).

30 January, 2017 at 4:38 pm

AnonymousDear Prof. Tao, thank you for your reply! I have just looked up some classical papers by Petrov, etc.

7 April, 2021 at 9:44 am

Jonathan RothSay $X_i$ is continuous, $Y_i$ is discrete, and $X_i$ and $Y_i$ are uncorrelated. Let $S_{X,n}, S_{Y,n}$ denote the respective partial sums. Is there an extension to the discrete local limit theorem that you provided that says that $P( S_{X,n} \leq a, S_{Y,n} = b)$ can be approximated by the product of a normal CDF at $a$ and a normal PDF at $b$?