Let X be a real-valued random variable, and let be an infinite sequence of independent and identically distributed copies of X. Let
be the empirical averages of this sequence. A fundamental theorem in probability theory is the law of large numbers, which comes in both a weak and a strong form:
Weak law of large numbers. Suppose that the first moment
of X is finite. Then
converges in probability to
, thus
for every
.
Strong law of large numbers. Suppose that the first moment
of X is finite. Then
converges almost surely to
, thus
.
[The concepts of convergence in probability and almost sure convergence in probability theory are specialisations of the concepts of convergence in measure and pointwise convergence almost everywhere in measure theory.]
(If one strengthens the first moment assumption to that of finiteness of the second moment , then we of course have a more precise statement than the (weak) law of large numbers, namely the central limit theorem, but I will not discuss that theorem here. With even more hypotheses on X, one similarly has more precise versions of the strong law of large numbers, such as the Chernoff inequality, which I will again not discuss here.)
The weak law is easy to prove, but the strong law (which of course implies the weak law, by Egoroff’s theorem) is more subtle, and in fact the proof of this law (assuming just finiteness of the first moment) usually only appears in advanced graduate texts. So I thought I would present a proof here of both laws, which proceeds by the standard techniques of the moment method and truncation. The emphasis in this exposition will be on motivation and methods rather than brevity and strength of results; there do exist proofs of the strong law in the literature that have been compressed down to the size of one page or less, but this is not my goal here.
– The moment method –
The moment method seeks to control the tail probabilities of a random variable (i.e. the probability that it fluctuates far from its mean) by means of moments, and in particular the zeroth, first or second moment. The reason that this method is so effective is because the first few moments can often be computed rather precisely. The first moment method usually employs Markov’s inequality
(1)
(which follows by taking expectations of the pointwise inequality ), whereas the second moment method employs some version of Chebyshev’s inequality, such as
(2)
(note that (2) is just (1) applied to the random variable and to the threshold
).
Generally speaking, to compute the first moment one usually employs linearity of expectation
,
whereas to compute the second moment one also needs to understand covariances (which are particularly simple if one assumes pairwise independence), thanks to identities such as
or the normalised variant
. (3)
Higher moments can in principle give more precise information, but often require stronger assumptions on the objects being studied, such as joint independence.
Here is a basic application of the first moment method:
Borel-Cantelli lemma. Let
be a sequence of events such that
is finite. Then almost surely, only finitely many of the events
are true.
Proof. Let denote the indicator function of the event
. Our task is to show that
is almost surely finite. But by linearity of expectation, the expectation of this random variable is
, which is finite by hypothesis. By Markov’s inequality (1) we conclude that
.
Letting we obtain the claim.
Returning to the law of large numbers, the first moment method gives the following tail bound:
Lemma 1. (First moment tail bound) If
is finite, then
.
Proof. By the triangle inequality, . By linearity of expectation, the expectation of
is
. The claim now follows from Markov’s inequality.
Lemma 1 is not strong enough by itself to prove the law of large numbers in either weak or strong form – in particular, it does not show any improvement as n gets large – but it will be useful to handle one of the error terms in those proofs.
We can get stronger bounds than Lemma 1 – in particular, bounds which improve with n – at the expense of stronger assumptions on X.
Lemma 2. (Second moment tail bound) If
is finite, then
.
Proof. A standard computation, exploiting (3) and the pairwise independence of the , shows that the variance
of the empirical averages
is equal to
times the variance
of the original variable X. The claim now follows from Chebyshev’s inequality (2).
In the opposite direction, there is the zeroth moment method, more commonly known as the union bound
or equivalently (to explain the terminology “zeroth moment”)
for any non-negative random variables . Applying this to the empirical means, we obtain the zeroth moment tail estimate
. (4)
Just as the second moment bound (Lemma 2) is only useful when one has good control on the second moment (or variance) of X, the zeroth moment tail estimate (3) is only useful when we have good control on the zeroth moment , i.e. when X is mostly zero.
– Truncation –
The second moment tail bound (Lemma 2) already gives the weak law of large numbers in the case when X has finite second moment (or equivalently, finite variance). In general, if all one knows about X is that it has finite first moment, then we cannot conclude that X has finite second moment. However, we can perform a truncation
(5)
of X at any desired threshold N, where and
. The first term
has finite second moment; indeed we clearly have
and hence also we have finite variance
. (6)
The second term may have infinite second moment, but its first moment is well controlled. Indeed, by the monotone convergence theorem, we have
. (7)
By the triangle inequality, we conclude that the first term has expectation close to
:
. (8)
These are all the tools we need to prove the weak law of large numbers:
Proof of weak law. Let . It suffices to show that whenever n is sufficiently large depending on
, that
with probability
.
From (7), (8), we can find a threshold N (depending on ) such that
and
. Now we use (5) to split
.
From the first moment tail bound (Lemma 1), we know that with probability
. From the second moment tail bound (Lemma 2) and (6), we know that
with probability
if n is sufficiently large depending on N and
. The claim follows.
– The strong law –
The strong law can be proven by pushing the above methods a bit further, and using a few more tricks.
The first trick is to observe that to prove the strong law, it suffices to do so for non-negative random variables . Indeed, this follows immediately from the simple fact that any random variable X with finite first moment can be expressed as the difference of two non-negative random variables
of finite first moment.
Once X is non-negative, we see that the empirical averages cannot decrease too quickly in n. In particular we observe that
whenever
. (9)
Because of this quasimonotonicity, we can sparsify the set of n for which we need to prove the strong law. More precisely, it suffices to show
Strong law of large numbers, reduced version. Let
be a non-negative random variable with
, and let
be a sequence of integers which is lacunary in the sense that
for some
and all sufficiently large j. Then
converges almost surely to
.
Indeed, if we could prove the reduced version, then on applying that version to the lacunary sequence and using (9) we would see that almost surely the empirical means
cannot deviate by more than a multiplicative error of
from the mean
. Setting
for
(and using the fact that a countable intersection of almost sure events remains almost sure) we obtain the full strong law.
[This sparsification trick is philosophically related to the dyadic pigeonhole principle philosophy; see an old short story of myself on this latter topic. One could easily sparsify further, so that the lacunarity constant c is large instead of small, but this turns out not to help us too much in what follows.]
Now that we have sparsified the sequence, it becomes economical to apply the Borel-Cantelli lemma. Indeed, by many applications of that lemma we see that it suffices to show that
(10)
for non-negative X of finite first moment, any lacunary sequence and any
. [This is a slight abuse of the O() notation, but it should be clear what is meant by this.]
[If we did not first sparsify the sequence, the Borel-Cantelli lemma would have been too expensive to apply; see Remark 2 below. Generally speaking, Borel-Cantelli is only worth applying when one expects the events to be fairly "disjoint" or "independent" of each other; in the non-lacunary case, the events
change very slowly in n, which makes the lemma very inefficient. We will not see how lacunarity is exploited until the punchline at the very end of the proof, but certainly there is no harm in taking advantage of this "free" reduction to the lacunary case now, even if it is not immediately clear how it will be exploited.]
At this point we go back and apply the methods that already worked to give the weak law. Namely, to estimate each of the tail probabilities , we perform a truncation (5) at some threshold
. It is not immediately obvious what truncation to perform, so we adopt the usual strategy of leaving
unspecified for now and optimising in this parameter later.
We should at least pick large enough so that
. From the second moment tail estimate (Lemma 2) we conclude that
is also equal to
with probability
. One could attempt to simplify this expression using (6), but this turns out to be a little wasteful, so let us hold off on that for now. However, (6) does strongly suggest that we want to take
to be something like
, which is worth keeping in mind in what follows.
Now we look at the contribution of . One could use the first moment tail estimate (Lemma 1), but it turns out that the first moment
decays too slowly in j to be of much use (recall that we are expecting
to be like the lacunary sequence
); the root problem here is that the decay (7) coming from the monotone convergence theorem is ineffective (one could effectivise this using the finite convergence principle, but this turns out to give very poor results here).
But there is one last card to play, which is the zeroth moment method tail estimate (4). As mentioned earlier, this bound is lousy in general – but is very good when X is mostly zero, which is precisely the situation with . and in particular we see that
is zero with probability
.
Putting this all together, we see that
Summing this in j, we see that we will be done as soon as we figure out how to choose so that
(11)
and
(12)
are both finite. (As usual, we have a tradeoff: making the larger makes (12) easier to establish at the expense of (11), and vice versa when making
smaller.)
Based on the discussion earlier, it is natural to try setting . Happily, this choice works cleanly; the lacunary nature of
ensures (basically from the geometric series formula) that we have the pointwise estimates
and
(where the implied constant here depends on the sequence , and in particular on the lacunarity constant c). The claims (10), (11) then follow from one last application of linearity of expectation, giving the strong law of large numbers.
Remark 1. The above proof in fact shows that the strong law of large numbers holds even if one only assumes pairwise independence of the , rather than joint independence.
Remark 2. It is essential that the random variables are “recycled” from one empirical average
to the next, in order to get the crucial quasimonotonicity property (9). If instead we took completely independent averages
, where the
are all iid, then the strong law of large numbers in fact breaks down with just a first moment assumption. (For a counterexample, consider a random variable X which equals
with probability
for
; this random variable (barely) has finite first moment, but for
, we see that
deviates by at least absolute constant from its mean with probability
. As the empirical means
for
are now jointly independent, the probability that one of them deviates significantly is now extremely close to 1 (super-exponentially close in
, in fact), leading to the total failure of the strong law in this setting.) Of course, if one restricts attention to a lacunary sequence of n then the above proof goes through in the independent case (since the Borel-Cantelli lemma is insensitive to this independence). By exploiting the joint independence further (e.g. by using Chernoff’s inequality) one can also get the strong law for independent empirical means for the full sequence n under second moment bounds.
Remark 3. From the perspective of interpolation theory, one can view the above argument as an interpolation argument, establishing an estimate (10) by interpolating between an
estimate (Lemma 2) and the
estimate (4).
Remark 4. By viewing the sequence as a stationary process, and thus as a special case of a measure-preserving system one can view the weak and strong law of large numbers as special cases of the mean and pointwise ergodic theorems respectively (see Exercise 9 from 254A Lecture 8 and Theorem 2 from 254A Lecture 9).
[Update, Jul 19: some corrections.]
[Update, Jul 20: Connections with ergodic theory discussed.]

33 comments
Comments feed for this article
19 June, 2008 at 7:35 am
Markov’s Inequality « Justin Domke’s Weblog
[...] June 19, 2008 While looking at Tao’s post on the law of large numbers, I claimed to Alap that Markov’s inequality was [...]
19 June, 2008 at 3:25 pm
toomuchcoffeeman
A very clear and educative post: it’s been a long time since I was made to learn about weak and strong LLN, but I don’t recall the proofs seeming so well-motivated at the time.
One question: you remark that
(If one strengthens the first moment assumption to that of finiteness of the second moment
, then we of course have a more precise statement, namely the central limit theorem, but I will not discuss that theorem here.)
I’m a bit confused: do you mean that the CLT can be viewed as a more precise version of the WLLN or the SLLLN?
19 June, 2008 at 3:45 pm
Terence Tao
Dear toomuchcoffeeman,
I’ve clarified the text to reflect the fact that the CLT is a sharper version of the weak LLN. (There is no almost sure (or “strong”) version of the CLT, as the limit in the CLT is random rather than deterministic. One could argue though that Berry-Esseen quantitative version of the CLT is somewhat analogous to the quantitative versions of the LLN (such as (9)) that underlie the strong LLN.)
19 June, 2008 at 6:55 pm
nobelHubel
Prof. Tao, shouldn’t the inequality in the indicator function below equation (1) be reversed?
19 June, 2008 at 7:01 pm
Terence Tao
Thanks for the correction!
19 June, 2008 at 10:29 pm
Joshua Batson
Professor Tao,
Isn’t good control of the zeroth moment attained when P(X !=0) is low, that is, when X is mostly 0?
20 June, 2008 at 2:00 am
athreya
Dear Professor Tao,
In 1995 Mike Keane presented a (much) easier proof of the SLLN. The idea is essentially enclosed in the article:
M. Keane, The essence of large numbers, Algorithms, Fractals, and Dynamics (Okayama/Kyoto, 1992), 125-129, Plenum, New York, 1995.
It is a hands on proof that works with running average omega by omega.
best wishes
Siva
20 June, 2008 at 7:52 am
Terence Tao
Dear Joshua: Thanks for the correction!
Dear Siva: Thanks for the article, which I found at
http://repos.project.cwi.nl:8888/cwi_repository/docs/I/02/2627A.pdf
It appears to reinterpret the strong law of large numbers as a special case of the pointwise ergodic theorem for measure-preserving systems (or for stationary processes, which are much the same thing). But it seems the author has restricted himself to bounded random variables (such as those only taking the values 0 and 1); most of the difficulty in the strong law, as in the proof above, comes from the fact that X can be unbounded (which is why we need to mess around with truncations, as well as the use of the first or zeroth moment method to deal with the large values of X).
20 June, 2008 at 11:47 am
PROFESSOR CONSTANTINO DE SOUSA
VERY GOOD.
CUMPLIMENTS
PROFESSOR CONSTANTINO DE SOUSA
20 June, 2008 at 4:38 pm
Anonymous
There is some philosophy related to the laws of large numbers which I don’t really understand, and which I’d be happy to have explained to me. In theory, we are speaking about a sequence
of random variables on the same common probability space. In practice, we think of instantiating a random variable in a sequence of trials. This is a subtle point: each trial is itself a random variable, distributed exactly as the original one, and leaving in an identical, but separate and independent, probability space. When we toss a die
times, there are
possible outcomes. This means, we are no longer in our original six-element space, but in its cartesian power, with
points. That is, tossing a die
times is described by the probability space which is the product of
copies of the space, corresponding to tossing just one die. So, our
actually leave in different spaces.
How does this agree with the theoretical assumption that the
’s are defined on the same space? To make this more specific: say, what is the “practical meaning” of pointwise convergence in the Strong law? When we think of trials, the variables
live in distinct spaces; on what space they pointwise converge?
On this occasion: the proof of Borel-Cantelli does not need Markov and
are
, with
, such that
is contained in infinitely many of the events
. But then the probability of each of these events is at least as large as that of
; hence the series, having infinitely many terms bounded away from 0, diverges.
indicators. What would it mean that an infinite number of
true with positive probability? The existence of an event
20 June, 2008 at 5:29 pm
Terence Tao
Dear anonymous,
In elementary (finitary) probability theory, the sample space (or probability space) is often defined in textbooks as simply the set of all possible outcomes, as this is the simplest choice of space to work in for most finitary applications. When it comes to infinitary probability theory, though, it is better to take a more flexible and abstract viewpoint: the sample space is now allowed to be an abstract set, and each outcome corresponds to a separate event inside that set. For instance, if one is studying the flip of a single coin, the sample space
could be a two-element set {Heads, Tails}, but it doesn’t have to have just two elements; it could be a much larger set, partitioned into two disjoint subsets, the “Heads” event and the “Tails” event. For instance, the sample space could be the unit interval [0,1] with Lebesgue measure, and the Heads and Tails events could be the intervals [0,1/2) and (1/2,1] respectively.
For the purposes of probability theory, the exact size of the sample space does not matter; the only thing that matters is the algebra (or more precisely,
-algebra) of events and the probability (or measure) which is assigned to each event; the actual points in the sample space are in fact largely irrelevant to probability theory. (See also the notion of equivalence of measure spaces, as defined for instance in Lecture 11 of my 254A course. Equivalent probability spaces may have very different cardinalities, but are indistinguishable from each other for the purposes of doing probability theory.)
To construct a suitable sample space to hold an infinite collection of random variables, such as an infinite sequence of independent die rolls, one can take an inverse limit of the sample spaces associated to finite sequences of die rolls. One could also resort to more ad hoc devices, such as taking the sample space to be the unit interval
with Lebesgue measure (which is the standard sample space for selecting a random variable x uniformly at random from the unit interval) and then defining the value
of the i^th die roll to be the i^th digit of x base 6, plus one. In this space, the law of large numbers has this interpretation: when one selects a number at random from the unit interval, then almost surely, its base 6 digits are uniformly distributed amongst 0,1,2,3,4,5.
Incidentally, your argument for the Borel-Cantelli lemma does not quite work; the assertion “with positive probability, an infinite number of the events
are true” is not the same as “an infinite number of the events
are simultaneously true with positive probability”, because at different points in the sample space, a different (but still infinite) collection of infinite events could be true. For instance, to go back to the infinite die roll example, let
be the event that the n^th die roll is a 1. Then any infinite collection of events
has an intersection that has zero probability; for instance, the probability that all the odd die rolls are all 1 is zero. Equivalently, as you say, no event A of positive probability is contained in infinitely many of the
. Nevertheless, it is true almost surely that infinitely many of the
are true; if one rolls an infinite number of dice, one will almost surely roll an infinite number of ones. But the exact set of rolls that yield these ones vary from outcome to outcome; indeed, there are an uncountable number of possible sets of rolls, which is why one can represent an event of probability 1 as the union of events of probability 0. (This is, incidentally, one way to prove that the reals are uncountable, though certainly not the most direct way, as it requires one to first construct Lebesgue measure.)
20 June, 2008 at 6:40 pm
Anonymous
Got both your points, many thanks!
21 June, 2008 at 7:28 pm
Anonymous
It’s me again – the very same Anonymous.
> Incidentally, your argument for the Borel-Cantelli lemma does not quite > work
How about this? Fix
and find
such
. For
or more of the
to occur, at least one of
must
. We are
that
events
occur, and the probability of this is less than
done!
21 June, 2008 at 7:58 pm
Terence Tao
Yes, this works too; in the notation of the above post, this is a “zeroth moment method” argument rather than a “first moment method” argument. (The first moment method argument gives a more precise upper bound on the probability that at least k events are true for any given k, though.)
24 June, 2008 at 4:00 am
Giovanni Peccati
Dear Terence,
thank you for this beautiful website!
Just a quick note: there exists indeed a series of asymptotic results for partial sums of iid random variables that are known in the literature as “Almost Sure Central Limit Theorems”. This kind of theorems involve some (weighted) “logarithmic” versions of the empirical measures associated with the underlying partial sums. They hold almost surely, and they have a Gaussian probability measure as a limit object. A paper containing several general results in this direction is the following
http://www.springerlink.com/content/v3u54005214j853n/
Best!
Giovanni Peccati
24 June, 2008 at 8:54 am
toomuchcoffeeman
Dear Giovanni,
As the person who (implicitly) asked if there were versions of the CLT akin to the strong law of large numbers: thank you for your post and the link (the results you describe, involving empirical measures, are what I was thinking of, although I didn’t know how to express it).
27 June, 2008 at 11:49 am
Yashar
Dear Terry,
I also share Anonymous’s philosophical/practical concern about the usefulness of the SLLN. Let me clarify: although I understand the mathematical meaning of almost sure pointwise convergence for functions in some abstract measure space, I don’t think SLLN has much value for the ‘usual’ application of the LLN, namely in relating the ‘empirical’ (i.e. trial-wise — or timewise in ergodic theory) averages to expectation values calculated in probability theory. This is so because in ‘reality’ we are only dealing with FINITELY many (say iid) samples from a probability distribution. In this setting, pointwise convergence is meaningless as you need the whole infinite series to be able to talk about convergence, but convergence in probability still makes sense (especially, if we interpret the probability of deviation in WLLN, in the Bayesian degree-of-belief sense as opposed to a frequentist sense).
However, I admit that the ‘usual’ applications may not comprise all useful applications. Your pretty example, where you map the infinite dice-rolling sequence to [0,1], is certainly one such application. However, any generalization of the intuition gained there to cases where the probability space (and thus the set of ‘digits’ of your, now, super-real numbers) are uncountable are beyond me. But I’m aware that these problems are pretty much the very reason why measure theory was invented and that e.g. Kolmogorov’s theorem resolves the issue of constructing the infinite cartesian product of uncountable spaces (am I right?).
Maybe all of these concerns are too ‘philosophical’, and somewhat smell of the ancient problems with infinity, uncountability, etc, which you might find boring. But i’ll be glad to read your views about them if you don’t. More importantly, I’ll be grateful if you could point out some (important) applications of SLLN, whether ‘usual’ in the above sense or otherwise, where WLLN is useless. I’d rather the applications be in some sense intrinsically probabilistic rather than purely ‘geometric’/functional-analytic though.
29 June, 2008 at 9:56 am
Terence Tao
Dear Yashar,
Of course, any mathematical statement involving infinity will not have any direct application to the finitary situations encountered in the physical world, but any “qualitative” or “infinitary” statement in mathematics often tends to have “quantitative” or “finitary” counterparts which do have such an application, or at least gives some useful intuition on how to view these finitary contexts. (See for instance my blog post on this issue.)
Returning specifically to the question of finitary interpretations of the SLLN, these basically have to do with the situation in which one is simultaneously considering multiple averages
of a single series of empirical samples, as opposed to considering just a single such average (which is basically the situation covered by the WLLN). For instance, if one had some random intensity field of grayscale pixels, and wanted to compare the average intensities at 10 x 10 blocks, 100 x 100 blocks, and 1000 x 1000 blocks, then the SLLN suggests that these intensities would be likely to be simultaneously close to the average intensity. (The WLLN only suggests that each of these spatial averages are individually likely to be close to the average intensity, but does not preclude the possibility that when one considers multiple such spatial averages at once, that a few outlying spatial averages will deviate from the average intensity. In my example with only three different averages, there isn’t much difference here, as the union bound only loses a factor of three at most for the failure probability, but the SLLN begins to show its strength over the WLLN when one is considering a very large number of averages at once.)
2 August, 2008 at 9:01 am
Random matrices: Universality of ESDs and the circular law « What’s new
[...] in probability rather than almost sure convergence; this is in complete analogy with the weak and strong law of large numbers, and in fact this law is used in the proof.) In a previous paper, we had established the same [...]
10 October, 2008 at 3:38 pm
Small samples, and the margin of error « What’s new
[...] greater generality than the setting discussed here. (It is closely related to the arguments in my previous post on the law of large numbers.) The main mathematical result we need is Theorem. Let X be a finite set, let A be a subset of X, [...]
14 October, 2008 at 12:31 pm
Non-measurable sets via non-standard analysis « What’s new
[...] idea was to let E be a “random” subset of . If one (non-rigorously) applies the law of large numbers, one expects E to have “density” 1/2 with respect to every subinterval of , which would [...]
9 December, 2008 at 11:40 am
Anonymous
Prof. Tao,
If, for example, S=\sum_{i=1}^{N} X_i, but N is a random variable taking values for a fairly large integer to infinity. Since S can be rewritten as
S= N (\bar X_N), is it safe to say that S \approx N E(N).
27 December, 2008 at 8:52 am
Tricks Wiki: Use basic examples to calibrate exponents « What’s new
[...] G on N vertices, in which each edge has an independent probability of of lying in G. By the law of large numbers, we expect the edge density of such a random graph to be close to on the average. On the other [...]
4 January, 2009 at 2:15 pm
K
What is a good book to learn probability theory from if I already know measure theory (at the level of Rudin’s Real and Complex Analysis)?
4 January, 2009 at 3:34 pm
Terence Tao
Dear K,
I guess it depends on what you plan to use probability for. I myself tend to use probability for combinatorial purposes, and so I actually don’t use the measure-theoretic foundations of probability so heavily (in many combinatorial applications, one can in fact just work with discrete random variables). For such applications, Alon-Spencer’s “the probabilistic method” is very nice.
For the formal measure-theoretic foundations of probability, I have occasionally used Kallenberg’s “Foundations of probability theory”, which is quite thorough.
12 January, 2009 at 7:34 pm
Anonymous
Dear Professor Tao,
Would you know of a good book/resource where one can look up all the LLNs that exist under varying assumptions.
Thank you.
9 February, 2009 at 8:48 pm
WTH is the Law of Large Numbers? « What The Haven
[...] a more involved commentary, see the famous blog by [...]
17 February, 2009 at 9:35 pm
Duc
Dear Professor Tao,
Would it be still sufficient if X_i are not identical?
Is the law still hold if we trade off the identical and independent of those X’s for sup E(X_i^2) bounded and X’s uncorrelated?
I’m reading the book Hilbert Space Methods in Probability and Statistical Inference (http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471592811.html) and wonder if we could prove it by using vectors in Hilbert space.
Thank you
10 March, 2009 at 7:45 pm
Vivek
How does the Weak law of large numbers follow from strong law by dominated convergence theorem. I always thought it was Egoroff’s theorem but I think you have a more simple proof Dr Tao.
Vivek
10 March, 2009 at 7:59 pm
Terence Tao
Dear Vivek,
You are right, Egoroff’s theorem is a better citation here. (Dominated convergence works directly when the random variables are bounded, and for unbounded random variables one can apply dominated convergence to the level sets to get the strong law from the weak, but one may as well just jump straight to Egoroff in the latter case.)
28 April, 2009 at 10:04 pm
lutfu
Dear Prof. Tao,
If we have i.i.d rvs with mean zero, then what is the a.s limit of
for different values of a less than 1?
30 April, 2009 at 9:50 am
Terence Tao
Dear lutfu,
Assuming the random variables have bounded second moment, the limit will be almost surely zero for
, converge (in a distributional sense) to a normal distribution for
, and diverge to infinity almost surely for
, all thanks to the (strong) law of large numbers. For heavy-tailed random variables with infinite second moment, the situation is going to be more complicated, but can be worked out for any specific distribution by a variety of tools (e.g. Fourier analysis).
30 April, 2009 at 1:39 pm
lutfu
Dear Prof Tao,
Thank you for your help. but I am wondering can we explicitly characterize the a.s. limit of
for any real number a , just assuming the existence of first moment?
I know we have law of iterated logarithm thm but it assumes existence of the second moment.
thanks