In the previous set of notes we established the central limit theorem, which we formulate here as follows:
Theorem 1 (Central limit theorem) Let
be iid copies of a real random variable
of mean
and variance
, and write
. Then, for any fixed
, we have
This is however not the end of the matter; there are many variants, refinements, and generalisations of the central limit theorem, and the purpose of this set of notes is to present a small sample of these variants.
First of all, the above theorem does not quantify the rate of convergence in (1). We have already addressed this issue to some extent with the Berry-Esséen theorem, which roughly speaking gives a convergence rate of uniformly in
if we assume that
has finite third moment. However there are still some quantitative versions of (1) which are not addressed by the Berry-Esséen theorem. For instance one may be interested in bounding the large deviation probabilities
in the setting where grows with
. Chebyshev’s inequality gives an upper bound of
for this quantity, but one can often do much better than this in practice. For instance, the central limit theorem (1) suggests that this probability should be bounded by something like
; however, this theorem only kicks in when
is very large compared with
. For instance, if one uses the Berry-Esséen theorem, one would need
as large as
or so to reach the desired bound of
, even under the assumption of finite third moment. Basically, the issue is that convergence-in-distribution results, such as the central limit theorem, only really control the typical behaviour of statistics in
; they are much less effective at controlling the very rare outlier events in which the statistic strays far from its typical behaviour. Fortunately, there are large deviation inequalities (or concentration of measure inequalities) that do provide exponential type bounds for quantities such as (2), which are valid for both small and large values of
. A basic example of this is the Chernoff bound that made an appearance in Exercise 47 of Notes 4; here we give some further basic inequalities of this type, including versions of the Bennett and Hoeffding inequalities.
In the other direction, we can also look at the fine scale behaviour of the sums by trying to control probabilities such as
where is now bounded (but
can grow with
). The central limit theorem predicts that this quantity should be roughly
, but even if one is able to invoke the Berry-Esséen theorem, one cannot quite see this main term because it is dominated by the error term
in Berry-Esséen. There is good reason for this: if for instance
takes integer values, then
also takes integer values, and
can vanish when
is less than
and
is slightly larger than an integer. However, this turns out to essentially be the only obstruction; if
does not lie in a lattice such as
, then we can establish a local limit theorem controlling (3), and when
does take values in a lattice like
, there is a discrete local limit theorem that controls probabilities such as
. Both of these limit theorems will be proven by the Fourier-analytic method used in the previous set of notes.
We also discuss other limit theorems in which the limiting distribution is something other than the normal distribution. Perhaps the most common example of these theorems is the Poisson limit theorems, in which one sums a large number of indicator variables (or approximate indicator variables), each of which is rarely non-zero, but which collectively add up to a random variable of medium-sized mean. In this case, it turns out that the limiting distribution should be a Poisson random variable; this again is an easy application of the Fourier method. Finally, we briefly discuss limit theorems for other stable laws than the normal distribution, which are suitable for summing random variables of infinite variance, such as the Cauchy distribution.
Finally, we mention a very important class of generalisations to the CLT (and to the variants of the CLT discussed in this post), in which the hypothesis of joint independence between the variables is relaxed, for instance one could assume only that the
form a martingale. Many (though not all) of the proofs of the CLT extend to these more general settings, and this turns out to be important for many applications in which one does not expect joint independence. However, we will not discuss these generalisations in this course, as they are better suited for subsequent courses in this series when the theory of martingales, conditional expectation, and related tools are developed.
— 1. Large deviation inequalities —
We now look at some upper bounds for the large deviation probability (2). To get some intuition as to what kinds of bounds one can expect, we first consider some examples. First suppose that has the standard normal distribution, then
,
, and
has the distribution of
, so that
has the distribution of
. We thus have
which on using the inequality leads to the bound
Next, we consider the example when is a Bernoulli random variable drawn uniformly at random from
. Then
,
, and
has the standard binomial distribution on
, thus
. By symmetry, we then have
We recall Stirling’s formula, which we write crudely as
as , where
denotes a quantity with
as
(and similarly for other uses of the
notation in the sequel). If
and
is bounded away from zero and one, we then have the asymptotic
where is the entropy function
(compare with Exercise 17 of Notes 3). One can check that is decreasing for
, and so one can compute that
as for any fixed
. To compare this with (4), observe from Taylor expansion that
as .
Finally, consider the example where takes values in
with
and
for some small
, thus
and
. We have
, and hence
with
Here, we see that the large deviation probability is somewhat larger than the gaussian prediction of
. Instead, the exponent
is approximately related to
and
by the formula
We now give a general large deviations inequality that is consistent with the above examples.
Proposition 2 (Cheap Bennett inequality) Let
, and let
be independent random variables, each of which takes values in an interval of length at most
. Write
, and write
for the mean of
. Let
be such that
has variance at most
. Then for any
, we have
There is more precise form of this inequality known as Bennett’s inequality, but we will not prove it here.
The first term in the minimum dominates when , and the second term dominates when
. Sometimes it is convenient to weaken the estimate by discarding the logarithmic factor, leading to
(possibly with a slightly different choice of ); thus we have Gaussian type large deviation estimates for
as large as
, and (slightly better than) exponential decay after that.
In the case when are iid copies of a random variable
of mean
and variance
taking values in an interval of length
, we have
and
, and the above inequality simplifies slightly to
Proof: We first begin with some quick reductions. Firstly, by dividing the (and
,
, and
) by
, we may normalise
; by subtracting the mean from each of the
, we may assume that the
have mean zero, so that
as well. We also write
for the variance of each
, so that
. Our task is to show that
for all . We will just prove the upper tail bound
the lower tail bound then follows by replacing all with their negations
, and the claim then follows by summing the two estimates.
We use the “exponential moment method”, previously seen in proving the Chernoff bound (Exercise 47 of Notes 4), in which one uses the exponential moment generating function of
. On the one hand, from Markov’s inequality one has
for any real parameter . On the other hand, from the joint independence of the
one has
Since the take values in an interval of length at most
and have mean zero, we have
and so
. This leads to the Taylor expansion
so on taking expectations we have
Putting all this together, we conclude that
If (say), one can then set
to be a small multiple of
to obtain a bound of the form
If instead , one can set
to be a small multiple of
to obtain a bound of the form
In either case, the claim follows.
The following variant of the above proposition is also useful, in which we get a simpler bound at the expense of worsening the quantity slightly:
Proposition 3 (Cheap Hoeffding inequality) Let
be independent random variables, with each
taking values in an interval
with
. Write
, and write
for the mean of
, and write
In fact one can take , a fact known as Hoeffding’s inequality; see Exercise 6 below for a special case of this.
Proof: We again normalise the to have mean zero, so that
. We then have
for each
, so by Taylor expansion we have for any real
that
and thus
Multiplying in , we then have
and one can now repeat the previous arguments (but without the factor to deal with).
Remark 4 In the above examples, the underlying random variable
was assumed to either be restricted to an interval, or to be subgaussian. This type of hypothesis is necessary if one wishes to have estimates on (2) that are similarly subgaussian. For instance, suppose
has a zeta distribution
for some
and all natural numbers
, where
. One can check that this distribution has finite mean and variance for
. On the other hand, since we trivially have
, we have the crude lower bound
which shows that in this case the expression (2) only decays at a polynomial rate in
rather than an exponential or subgaussian rate.
Exercise 5 (Khintchine inequality) Let
be iid copies of a Bernoulli random variable
drawn uniformly from
.
- (i) For any non-negative reals
and any
, show that
for some constant
depending only on
. When
, show that one can take
and equality holds.
- (ii) With the hypotheses in (i), obtain the matching lower bound
for some
depending only on
. (Hint: use (i) and Hölder’s inequality.)
- (iii) For any
and any functions
on a measure space
, show that
and
with the same constants
as in (i), (ii). When
, show that one can take
and equality holds.
- (iv) (Marcinkiewicz-Zygmund theorem) The Khintchine inequality is very useful in real analysis; we give one example here. Let
be measure spaces, let
, and suppose
is a linear operator obeying the bound
for all
and some finite
. Show that for any finite sequence
, one has the bound
for some constant
depending only on
. (Hint: test
against a random sum
.)
- (v) By using gaussian sums in place of random signs, show that one can take the constant
in (iv) to be one. (For simplicity, let us take the functions in
to be real valued.)
In this set of notes we have not focused on getting explicit constants in the large deviation inequalities, but it is not too difficult to do so with a little extra work. We give just one example here:
Exercise 6 Let
be iid copies of a Bernoulli random variable
drawn uniformly from
.
- (i) Show that
for any real
. (Hint: expand both sides as an infinite Taylor series in
.)
- (ii) Show that for any real numbers
and any
, we have
(Note that this is consistent with (6) with
.)
There are many further large deviation inequalities than the ones presented here. For instance, the Azuma-Hoeffding inequality gives Hoeffding-type bounds when the random variables are not assumed to be jointly independent, but are instead required to form a martingale. Concentration of measure inequalities such as McDiarmid’s inequality handle the situation in which the sum
is replaced by a more nonlinear function
of the input variables
. There are also a number of refinements of the Chernoff estimate from the previous notes, that are collectively referred to as “Chernoff bounds“. The Bernstein inequalities handle situations in which the underlying random variable
is not bounded, but enjoys good moment bounds. See this previous blog post for these inequalities and some further discussion. Last, but certainly not least, there is an extensively developed theory of large deviations which is focused on the precise exponent in the exponential decay rate for tail probabilities such as (2) when
is very large (of the order of
); there is also a complementary theory of moderate deviations that gives precise estimates in the regime where
is much larger than one, but much less than
, for which we generally expect gaussian behaviour rather than exponentially decaying bounds. These topics are beyond the scope of this course.
— 2. Local limit theorems —
Let be iid copies of a random variable
of mean
and variance
, and write
. On the one hand, the central limit theorem tells us that
should behave like the normal distribution
, which has probability density function
. On the other hand, if
is discrete, then
must also be discrete. For instance, if
takes values in the integers
, then
takes values in the integers as well. In this case, we would expect (much as we expect a Riemann sum to approximate an integral) the probability distribution of
to behave like the probability density function predicted by the central limit theorem, thus we expect
for integer . This is not a direct consequence of the central limit theorem (which does not distinguish between continuous or discrete random variables
), and in any case is not true in some cases: if
is restricted to an infinite subprogression
of
for some
and integer
, then
is similarly restricted to the infinite subprogression
, so that (7) totally fails when
is outside of
(and when
does lie in
, one would now expect the left-hand side of (7) to be about
times larger than the right-hand side, to keep the total probability close to
). However, this turns out to be the only obstruction:
Theorem 7 (Discrete local limit theorem) Let
be iid copies of an integer-valued random variable
of mean
and variance
. Suppose furthermore that there is no infinite subprogression
of
with
for which
takes values almost surely in
. Then one has
for all
and all integers
, where the error term
is uniform in
. In other words, we have
as
.
Note for comparison that the Berry-Esséen theorem (writing as, say,
) would give (assuming finite third moment) an error term of
instead of
, which would overwhelm the main term which is also of size
.
Proof: Unlike previous arguments, we do not have the luxury here use an affine change of variables to normalise to mean zero and variance one, as this would disrupt the hypothesis that takes values in
.
Fix and
. Since
and
are integers, we have the Fourier identity
which upon taking expectations and using Fubini’s theorem gives
where is the characteristic function of
. Expanding
and noting that
are iid copies of
, we have
It will be convenient to make the change of variables , to obtain
As in the Fourier-analytic proof of the central limit theorem, we have
as , so by Taylor expansion we have
as for any fixed
. This suggests (but does not yet prove) that
A standard Fourier-analytic calculation gives
so it will now suffice to establish that
uniformly in . From dominated convergence we have
so by the triangle inequality, it suffices to show that
This will follow from (9) and the dominated convergence theorem, as soon as we can dominate the integrands by an absolutely integrable function.
From (8), there is an such that
for all , and hence
for . This gives the required domination in the region
, so it remains to handle the region
.
From the triangle inequality we have for all
. Actually we have the stronger bound
for
. Indeed, if
for some such
, this would imply that
is a deterministic constant, which means that
takes values in
for some real
, which implies that
takes values in
; since
also takes values in
, this would place
either in a singleton set or in an infinite subprogression of
(depending on whether
and
are rational), a contradiction. As
is continuous and the region
is compact, there exists
such that
for all
. This allows us to dominate
by
for
, which is in turn bounded by
for some
independent of
, giving the required domination.
Of course, if the random variable in Theorem 7 did take values almost surely in some subprogression
, then either
is almost surely constant, or there is a minimal progression
with this property (since
must divide the difference of any two integers that
attains with positive probability). One can then make the affine change of variables
(modifying
and
appropriately) and apply the above theorem to obtain a similar local limit theorem, which we will not write here. For instance, if
is the uniform distribution on
, then this argument gives
when is an integer of the same parity as
(of course,
will vanish otherwise). A further affine change of variables handles the case when
is not integer valued, but takes values in some other lattice
, where
and
are now real-valued.
We can complement these local limit theorems with the following result that handles the non-lattice case:
Theorem 8 (Continuous local limit theorem) Let
be iid copies of an real-valued random variable
of mean
and variance
. Suppose furthermore that there is no infinite progression
with
real for which
takes values almost surely in
. Then for any
, one has
for all
and all
, where the error term
is uniform in
(but may depend on
).
Equivalently, if we let be a normal random variable with mean
and variance
, then
uniformly in . Again, this can be compared with the Berry-Esséen theorem, which (assuming finite third moment) has an error term of
which is uniform in both
and
.
Proof: Unlike the discrete case, we have the luxury here of normalising and
, and we shall now do so.
We first observe that it will suffice to show that
whenever is a Schwartz function whose Fourier transform
is compactly supported, and where the error can depend on
but is uniform in
. Indeed, if this bound (10) holds, then (after replacing
by
to make it positive on some interval, and then rescaling) we obtain a bound of the form
for any and
, where the
term can depend on
but not on
. Then, by convolving
by an approximation to the identity of some width
much smaller than
with compactly supported Fourier transform, applying (10) to the resulting function, and using (11) to control the error between that function and
, we see that
uniformly in , where the
term can depend on
and
but is uniform in
. Letting
tend slowly to zero, we obtain the claim.
It remains to establish (10). We adapt the argument from the discrete case. By the Fourier inversion formula we may write
By Fubini’s theorem as before, we thus have
and similarly
so it suffices by the triangle inequality and the boundedness and compact support of to show that
for any fixed (where the
term can depend on
). We have
and
, so by making the change of variables
, we now need to show that
as . But this follows from the argument used to handle the discrete case.
— 3. The Poisson central limit theorem —
The central limit theorem (after normalising the random variables to have mean zero) studies the fluctuations of sums where each individual term
is quite small (typically of size
). Now we consider a variant situation, in which one considers a sum
of random variables which are usually zero, but occasionally equal to a larger value such as
. (This situation arises in many real-life situations when compiling aggregate statistics on rare events, e.g. the number of car crashes in a short period of time.) In these cases, one can get a different distribution than the gaussian distribution, namely a Poisson distribution with some intensity
– that is to say, a random variable
taking values in the non-negative integers
with probability distribution
One can check that this distribution has mean and variance
.
Theorem 9 (Poisson central limit theorem) Let
be a triangular array of real random variables, where for each
, the variables
are jointly independent. Assume furthermore that
- (i) (
mostly
) One has
as
.
- (ii) (
rarely
) One has
as
.
- (iii) (Convergent expectation) One has
as
for some
.
Then the random variables
converge in distribution to a Poisson random variable of intensity
.
Proof: From hypothesis (i) and the union bound we see that for each , we have that all of the
lie in
with probability
as
. Thus, if we replace each
by the restriction
, the random variable
is only modified on an event of probability
, which does not affect distributional limits (Slutsky’s theorem). Thus, we may assume without loss of generality that the
take values in
.
By Exercise 20 of Notes 4, a Poisson random variable of intensity has characteristic function
. Applying the Lévy convergence theorem (Theorem 27 of Notes 4), we conclude that it suffices to show that
as for any fixed
.
Fix . By the independence of the
, we may write
Since only takes on the values
and
, we can write
where . By hypothesis (ii), we have
, so by using a branch of the complex logarithm that is analytic near
, we can write
By Taylor expansion we have
and hence by (ii), (iii)
as , and the claim follows.
Exercise 10 Establish the conclusion of Theorem 9 directly from explicit computation of the probabilities
in the case when each
takes values in
with
for some fixed
.
The Poisson central limit theorem can be viewed as a degenerate limit of the central limit theorem, as seen by the next two exercises.
Exercise 11 Suppose we replace the hypothesis (iii) in Theorem 9 with the alternative hypothesis that the quantities
go to infinity as
, while leaving hypotheses (i) and (ii) unchanged. Show that
converges in distribution to the normal distribution
.
Exercise 12 For each
, let
be a Poisson random variable with intensity
. Show that as
, the random variables
converge in distribution to the normal distribution
. Discuss how this is consistent with Theorem 9 and the previous exercise.
— 4. Stable laws —
Let be a real random variable. We say that
has a stable law or a stable distribution if for any positive reals
, there exists a positive real
and a real
such that
whenever
are iid copies of
. In terms of the characteristic function
of
, we see that
has a stable law if for any positive reals
, there exist a positive real
and a real
for which we have the functional equation
for all real .
For instance, a normally distributed variable is stable thanks to Lemma 12 of Notes 4; one can also see this from the characteristic function
. A Cauchy distribution
, with probability density
can also be seen to be stable, as is most easily seen from the characteristic function
. As a more degenerate example, any deterministic random variable
is stable. It is possible (though somewhat tedious) to completely classify all the stable distributions, see for instance the Wikipedia entry on these laws for the full classification.
If is stable, and
are iid copies of
, then by iterating the stable law hypothesis we see that the sums
are all equal in distribution to some affine rescaling
of
. For instance, we have
for some
, and a routine induction then shows that
for all natural numbers (with the understanding that
when
). In particular, the random variables
all have the same distribution as
.
More generally, given two real random variables and
, we say that
is in the basin of attraction for
if, whenever
are iid copies of
and
, there exist constants
and
such that
converges in distribution to
. Thus, any stable law is in its own basin of attraction, while the central limit theorem asserts that any random variable of finite variance is in the basin of attraction of a normal distribution. One can check that every random variable lies in the basin of attraction of a deterministic random variable such as
, simply by letting
go to infinity rapidly enough. To avoid this degenerate case, we now restrict to laws that are non-degenerate, in the sense that they are not almost surely constant. Then we have the following useful technical lemma:
Proposition 13 (Convergence of types) Let
be a sequence of real random variables converging in distribution to a non-degenerate limit
. Let
and
be real numbers such that
converges in distribution to a non-degenerate limit
. Then
and
converge to some finite limits
respectively, and
.
Proof: Suppose first that goes to zero. The sequence
converges in distribution, hence is tight, hence
converges in probability to zero. In particular, if
is an independent copy of
, then
converges in probability to zero; but
also converges in distribution to
where
is an independent copy of
, and
is not almost surely zero since
is non-degenerate. This is a contradiction. Similarly if
has any subsequence that goes to zero. We conclude that
is bounded away from zero. Rewriting
as
and reversing the roles of
and
, we conclude also that
is bounded away from zero, thus
is bounded.
Since is tight and
is bounded,
is tight; since
is also tight, this implies that
is tight, that is to say
is bounded.
Let be a limit point of the
. By Slutsky’s theorem, a subsequence of the
then converges in distribution to
, thus
. If the limit point
is unique then we are done, so suppose there are two limit points
,
. Thus
, which on rearranging gives
for some
and real
with
.
If then on iteration we have
for any natural number
, which clearly leads to a contradiction as
since
. If
then iteration gives
for any natural number
, which on passing to the limit in distribution as
gives
, again a contradiction. If
then we rewrite
as
and again obtain a contradiction, and the claim follows.
One can use this proposition to verify that basins of attraction of genuinely distinct laws are disjoint:
Exercise 14 Let
and
be non-degenerate real random variables. Suppose that a random variable
lies in the basin of attraction of both
and
. Then there exist
and real
such that
.
If lies in the basin of attraction for a non-degenerate law
, then
converges in distribution to
; since
is equal in distribution to the sum of two iid copies of
, we see that
converges in distribution to the sum
of two iid copies of
. On the other hand,
converges in distribution to
. Using Proposition 13 we conclude that
for some
and real
. One can go further and conclude that
in fact has a stable law; see the following exercise. Thus stable laws are the only laws that have a non-empty basin of attraction.
Exercise 15 Let
lie in the basin of attraction for a non-degenerate law
.
- (i) Show that for any iid copies
of
, there exists a unique
and
such that
. Also show that
for all natural numbers
.
- (ii) Show that the
are strictly increasing, with
for all natural numbers
. (Hint: study the absolute value
of the characteristic function, using the non-degeneracy of
to ensure that this absolute value is usually strictly less than one.) Also show that
for all natural numbers
.
- (iii) Show that there exists
such that
for all
. (Hint: first show that
is a Cauchy sequence in
.)
- (iv) If
, and
are iid copies of
, show that
for all natural numbers
and some bounded real
. Then show that
has a stable law in this case.
- (v) If
, show that
for some real
and all
. Then show that
has a stable law in this case.
Exercise 16 (Classification of stable laws) Let
be a non-degenerate stable law, then
lies in its own basin of attraction and one can then define
as in the preceding exercise.
- (i) If
, and
is as in part (v) of the preceding exercise, show that
for all
and
. Then show that
for some real
. (One can use the identity
to restrict attention to the case of positive
.)
- (ii) Now suppose
. Show that
for all
(where the implied constant in the
notation is allowed to depend on
). Conclude that
for all
.
- (iii) We continue to assume
. Show that
for some real number
. (Hint: first show this when
is a power of a fixed natural number
(with
possibly depending on
). Then use the estimates from part (ii) to show that
does not actually depend on
. (One may need to invoke the Dirichlet approximation theorem to show that for any given
, one can find a power of
that is somewhat close to a power of
.)
- (iv) We continue to assume
. Show that
for all
and
. Then show that
for all
and some real
.
It is also possible to determine which choices of parameters
are actually achievable by some random variable
, but we will not do so here.
It is possible to associate a central limit theorem to each stable law, which precisely determines their basin of attraction. We will not do this in full generality, but just illustrate the situation for the Cauchy distribution.
Exercise 17 Let
be a real random variable which is symmetric (that is,
has the same distribution as
) and obeys the distribution identity
for all
, where
is a function which is slowly varying in the sense that
as
for all
.
- (i) Show that
as
, where
denotes a quantity such that
as
. (You may need to establish the identity
, which can be done by contour integration.)
- (ii) Let
be iid copies of
. Show that
converges in distribution to a copy of the standard Cauchy distribution (i.e., to a random variable with probability density function
).
30 comments
Comments feed for this article
19 November, 2015 at 3:15 pm
Anonymous
oh no i’m seeing ‘latex path not specified’ errors
[This is a very strange error. Currently I can get all the LaTeX rendered when viewing the blog (or other wordpress blogs with recent latex) on one machine, but when viewed on a different machine there are plenty of “latex path not specified” errors, regardless of what browser I use. I do not know what could be causing this. -T.]
19 November, 2015 at 5:22 pm
Anonymous
In the second line of the proof of proposition 2, it seems that the normalization of
should be
(instead of
).
[Corrected, thanks – T.]
20 November, 2015 at 2:57 pm
arch1
I get this problem w/ this your latest blog entry, but not w/ your previous ones. I guess that a LaTeX version change occured between 10/16 and 10/19 which broke compatibility with my OS (Windows 7 Enterprise Service Pack 1), one of yours, and Anonymous #1’s.
But I’m sure someone here can do better than guess (come on, folks..:-)
20 November, 2015 at 3:21 pm
pauljung
Hi Terry, at the beginning of Section 4, I guess the pi in the Cauchy density should be in the denominator.
[Corrected, thanks – T.]
22 November, 2015 at 9:22 pm
Bruys
Gee, I wish WordPress would fix this “latex path not specified.” problem. It’s not exclusive to this blog. I have viewed this blog on different machines and different versions of Windows, and different browsers.
Thanks for your blogs, Terry. When I can understand them, they’re wonderfully lucid. It would be no fun if I could understand them all ;)
23 November, 2015 at 9:33 am
Anonymous
The explanation between proposition 3 and exercise 4 is not sufficiently clear (perhaps some lines are missing) – see e.g.
1. Large part of the three lines below proposition 3 appear in blue.
2. The eleventh line below proposition 3 is ended with “\end”,
3. The 7 lines above exercise 4 are seemingly related to some (unclear – perhaps due to missing lines) example with polynomial decay rate in
.
[Corrected, thanks – T.]
23 November, 2015 at 1:53 pm
Anonymous
In Hoeffding’s inequality, is
optimal ?
23 November, 2015 at 2:07 pm
Terence Tao
Yes, this can be seen from the Bernoulli example discussed at the start of the section (together with the Taylor expansion of the entropy function).
23 November, 2015 at 3:36 pm
Anonymous
Thanks!
, the threshold
(used to define the probability in the RHS) should be
.
Possible typo: It seems that in the third line below the definition of the entropy function
23 November, 2015 at 5:30 pm
Terence Tao
Actually, I believe the
factor should be there (this is measuring a large deviation event, much larger than the standard deviation).
23 November, 2015 at 6:10 pm
Anonymous
I see. Thanks! (I missed the factor n in the exponent in the RHS.)
23 November, 2015 at 3:11 pm
Lior Silberman
Exercise 5: shouldn’t there be
th roots on the LHS? It seems to me that otherwise the inequalities won’t scale properly with the
.
[Corrected, thanks – T.]
24 November, 2015 at 4:42 am
Anonymous
The integral in exercise 17(i) may also be evaluated by its interpretation as
– using Plancherel theorem and its (much simpler!) Fourier transform.
24 November, 2015 at 9:42 am
Anonymous
It seems quite surprising that the RHS of (6) has a Gaussian type decay (wrt
), while the RHS of (5) has a much slower (slightly faster than exponential) decay – in spite of using more(!) information (variance information) than (6). Is there any explanation for the slower decay (i.e. looser bound) in (5) ?
24 November, 2015 at 11:24 am
Terence Tao
The left-hand side of (5) is larger than that of (6) (if
is held fixed), because
is smaller, and so it is harder to bound and one expects to have weaker decay in
on the right-hand side. The difference between (5) and (6) is particularly highlighted in the case analysed just before Proposition 2, in which the underlying random variable
takes values in
but is mostly 0, so that its variance is much smaller than the square of the length of the interval containing
, and so the
in (5) is much smaller than that in (6). In this case, the large deviations are predominantly caused by the event when most of the
are equal to 1, which is only an exponentially rare event (e.g. the probability that they are all 1 is
) rather than a quadratically-exponential event.
24 November, 2015 at 1:28 pm
Anonymous
I asked about the slower decay of the RHS of (5) wrt varying threshold
with all the other parameters (including
) held fixed,
(for sufficiently large
).
, but for such large
values, the LHS of both (5) and (6) should be zero(!) (since
– due to the uniform boundedness of
).
To me it seems natural to expect that the RHS of (5) (having the additional variance information) should decay no slower with
But now I think that perhaps, the explanation is that the decay of the RHS of (5) is starting to be slower than that of (6) only(!) when
30 November, 2015 at 4:39 am
Anonymous
In the introduction: {\frac{S_n-n mu}{\sqrt{n} \sigma}}, mu is missing a \ (search for ‘mu’)
[Corrected, thanks -T.]
30 November, 2015 at 5:13 am
Anonymous
In Theorem 7, the sup term, has a ‘=’ instead of a ‘-‘.
[Corrected, thanks -T.]
30 November, 2015 at 5:18 am
Anonymous
‘attains with positive probabilit’ should be ‘attains with positive probability’ (missing y at the end). (hope this typo checking helps)
[Corrected, thanks. I certainly appreciate the corrections, though if you could combine multiple corrections into a single comment that would be helpful. -T.]
30 November, 2015 at 5:30 am
Anonymous
‘Levy convergence theorem’ should be ‘Lévy convergence theorem’ (being annoying, I know)
[Corrected, thanks -T.]
13 December, 2015 at 7:44 pm
275A, Notes 5: Variants of the central limit theorem | jianchengblog
[…] 来源: 275A, Notes 5: Variants of the central limit theorem […]
27 March, 2016 at 10:25 pm
Anonymous
Should replace the Uncategorized tag with math.CA
[Corrected, thanks – T.]
18 November, 2016 at 5:42 pm
Anonymous
If
is stable, and
are iid copies of
, then by iterating the stable law hypothesis we see that the sums
are all equal in distribution to some affine rescaling
of
. For instance, we have
for some
, and a routine induction then shows that
for all natural numbers
(with the understanding that
when
.
Shouldn’t
depend on
and
in the definition of “stable”? How do you get
in the induction?
[Yes,
will depend on
. If
and
, then a short computation will show that
. -T.]
18 November, 2016 at 5:52 pm
Anonymous
Does the concept “basin of attraction” here have anything to do with that in dynamical systems (say an ODE)?
26 January, 2017 at 4:55 am
Anonymous
Hello, I would appreciate it if anyone could shed some lights on how the error term in Theorem 8 depends on “h”. In particular, how quickly can “h” grow without changing the o(1/sqrt(n)) error, and how could one read this off from the proofs in this article? Thank you for your time.
26 January, 2017 at 5:05 am
Anonymous
Based on the form of the leading term in Theorem 8, it should be that we need h = o(sqrt(n)) but I’d appreciate any further comments.
26 January, 2017 at 7:09 am
Anonymous
Interesting. Suppose we look at the most basic case in which
is normally distributed with mean
and variance
. Then direct computation via calculus seems to suggest that the error term in Theorem 8 behaves like
as
. It this were correct, then it would contradict the fact that the error term is uniform in
.
If I made a mistake, please let me know.
[You are missing a term of the form
– T.]
28 January, 2017 at 10:13 am
Terence Tao
The proof here is not optimised to give good dependence on
. The precise convergence rate of local limit theorems has been extensively studied in the literature, but it is hard to give a succinct summary of the results, because the nature of the convergence often depends on the hypotheses one places on the
. Unfortunately I don’t know of a good survey of the latest bounds in this area (but these sorts of issues were studied extensively by the Russian school, so texts by authors from that school may be helpful).
30 January, 2017 at 4:38 pm
Anonymous
Dear Prof. Tao, thank you for your reply! I have just looked up some classical papers by Petrov, etc.
7 April, 2021 at 9:44 am
Jonathan Roth
Say $X_i$ is continuous, $Y_i$ is discrete, and $X_i$ and $Y_i$ are uncorrelated. Let $S_{X,n}, S_{Y,n}$ denote the respective partial sums. Is there an extension to the discrete local limit theorem that you provided that says that $P( S_{X,n} \leq a, S_{Y,n} = b)$ can be approximated by the product of a normal CDF at $a$ and a normal PDF at $b$?