In Notes 0, we introduced the notion of a measure space , which includes as a special case the notion of a probability space. By selecting one such probability space
as a sample space, one obtains a model for random events and random variables, with random events
being modeled by measurable sets
in
, and random variables
taking values in a measurable space
being modeled by measurable functions
. We then defined some basic operations on these random events and variables:
- Given events
, we defined the conjunction
, the disjunction
, and the complement
. For countable families
of events, we similarly defined
and
. We also defined the empty event
and the sure event
, and what it meant for two events to be equal.
- Given random variables
in ranges
respectively, and a measurable function
, we defined the random variable
in range
. (As the special case
of this, every deterministic element
of
was also a random variable taking values in
.) Given a relation
, we similarly defined the event
. Conversely, given an event
, we defined the indicator random variable
. Finally, we defined what it meant for two random variables to be equal.
- Given an event
, we defined its probability
.
These operations obey various axioms; for instance, the boolean operations on events obey the axioms of a Boolean algebra, and the probabilility function obeys the Kolmogorov axioms. However, we will not focus on the axiomatic approach to probability theory here, instead basing the foundations of probability theory on the sample space models as discussed in Notes 0. (But see this previous post for a treatment of one such axiomatic approach.)
It turns out that almost all of the other operations on random events and variables we need can be constructed in terms of the above basic operations. In particular, this allows one to safely extend the sample space in probability theory whenever needed, provided one uses an extension that respects the above basic operations; this is an important operation when one needs to add new sources of randomness to an existing system of events and random variables, or to couple together two separate such systems into a joint system that extends both of the original systems. We gave a simple example of such an extension in the previous notes, but now we give a more formal definition:
Definition 1 Suppose that we are using a probability space
as the model for a collection of events and random variables. An extension of this probability space is a probability space
, together with a measurable map
(sometimes called the factor map) which is probability-preserving in the sense that
for all
. (Caution: this does not imply that
for all
– why not?)
An event
which is modeled by a measurable subset
in the sample space
, will be modeled by the measurable set
in the extended sample space
. Similarly, a random variable
taking values in some range
that is modeled by a measurable function
in
, will be modeled instead by the measurable function
in
. We also allow the extension
to model additional events and random variables that were not modeled by the original sample space
(indeed, this is one of the main reasons why we perform extensions in probability in the first place).
Thus, for instance, the sample space in Example 3 of the previous post is an extension of the sample space
in that example, with the factor map
given by the first coordinate projection
. One can verify that all of the basic operations on events and random variables listed above are unaffected by the above extension (with one caveat, see remark below). For instance, the conjunction
of two events can be defined via the original model
by the formula
or via the extension via the formula
The two definitions are consistent with each other, thanks to the obvious set-theoretic identity
Similarly, the assumption (1) is precisely what is needed to ensure that the probability of an event remains unchanged when one replaces a sample space model with an extension. We leave the verification of preservation of the other basic operations described above under extension as exercises to the reader.
Remark 2 There is one minor exception to this general rule if we do not impose the additional requirement that the factor map
is surjective. Namely, for non-surjective
, it can become possible that two events
are unequal in the original sample space model, but become equal in the extension (and similarly for random variables), although the converse never happens (events that are equal in the original sample space always remain equal in the extension). For instance, let
be the discrete probability space
with
and
, and let
be the discrete probability space
with
, and non-surjective factor map
defined by
. Then the event modeled by
in
is distinct from the empty event when viewed in
, but becomes equal to that event when viewed in
. Thus we see that extending the sample space by a non-surjective factor map can identify previously distinct events together (though of course, being probability preserving, this can only happen if those two events were already almost surely equal anyway). This turns out to be fairly harmless though; while it is nice to know if two given events are equal, or if they differ by a non-null event, it is almost never useful to know that two events are unequal if they are already almost surely equal. Alternatively, one can add the additional requirement of surjectivity in the definition of an extension, which is also a fairly harmless constraint to impose (this is what I chose to do in this previous set of notes).
Roughly speaking, one can define probability theory as the study of those properties of random events and random variables that are model-independent in the sense that they are preserved by extensions. For instance, the cardinality of the model
of an event
is not a concept within the scope of probability theory, as it is not preserved by extensions: continuing Example 3 from Notes 0, the event
that a die roll
is even is modeled by a set
of cardinality
in the original sample space model
, but by a set
of cardinality
in the extension. Thus it does not make sense in the context of probability theory to refer to the “cardinality of an event
“.
On the other hand, the supremum of a collection of random variables
in the extended real line
is a valid probabilistic concept. This can be seen by manually verifying that this operation is preserved under extension of the sample space, but one can also see this by defining the supremum in terms of existing basic operations. Indeed, note from Exercise 24 of Notes 0 that a random variable
in the extended real line is completely specified by the threshold events
for
; in particular, two such random variables
are equal if and only if the events
and
are surely equal for all
. From the identity
we thus see that one can completely specify in terms of
using only the basic operations provided in the above list (and in particular using the countable conjunction
.) Of course, the same considerations hold if one replaces supremum, by infimum, limit superior, limit inferior, or (if it exists) the limit.
In this set of notes, we will define some further important operations on scalar random variables, in particular the expectation of these variables. In the sample space models, expectation corresponds to the notion of integration on a measure space. As we will need to use both expectation and integration in this course, we will thus begin by quickly reviewing the basics of integration on a measure space, although we will then translate the key results of this theory into probabilistic language.
As the finer details of the Lebesgue integral construction are not the core focus of this probability course, some of the details of this construction will be left to exercises. See also Chapter 1 of Durrett, or these previous blog notes, for a more detailed treatment.
— 1. Integration on measure spaces —
Let be a measure space, and let
be a measurable function on
, taking values either in the reals
, the non-negative extended reals
, the extended reals
, or the complex numbers
. We would like to define the integral
of on
. (One could make the integration variable explicit, e.g. by writing
, but we will usually not do so here.) When integrating a reasonably nice function (e.g. a continuous function) on a reasonably nice domain (e.g. a box in
), the Riemann integral that one learns about in undergraduate calculus classes suffices for this task; however, for the purposes of probability theory, we need the much more general notion of a Lebesgue integral in order to properly define (2) for the spaces
and functions
we will need to study.
Not every measurable function can be integrated by the Lebesgue integral. There are two key classes of functions for which the integral exists and is well behaved:
- Unsigned measurable functions
, that take values in the non-negative extended reals
; and
- Absolutely integrable functions
or
, which are scalar measurable functions whose absolute value
has a finite integral:
. (Sometimes we also allow absolutely integrable functions to attain an infinite value
, so long as they only do so on a set of measure zero.)
One could in principle extend the Lebesgue integral to slightly more general classes of functions, e.g. to sums of absolutely integrable functions and unsigned functions. However, the above two classes already suffice for most applications (and as a general rule of thumb, it is dangerous to apply the Lebesgue integral to functions that are not unsigned or absolutely integrable, unless you really know what you are doing).
We will construct the Lebesgue integral in the following four stages. First, we will define the Lebesgue integral just for unsigned simple functions – unsigned measurable functions that take on only finitely many values. Then, by a limiting procedure, we extend the Lebesgue integral to unsigned functions. After that, by decomposing a real absolutely integrable function into unsigned components, we extend the integral to real absolutely integrable functions. Finally, by taking real and imaginary parts, we extend to complex absolutely integrable functions. (This is not the only order in which one could perform this construction; for instance, in Durrett, one first constructs integration of bounded functions on finite measure support before passing to arbitrary unsigned functions.)
First consider an unsigned simple function , thus
is measurable and only takes values at a finite number of values. Then we can express
as a finite linear combination (in
) of indicator functions. Indeed, if we enumerate the values that
takes as
(avoiding repetitions) and setting
for
, then it is clear that
(It should be noted at this point that the operations of addition and multiplication on are defined by setting
for all
, and
for all positive
, but that
is defined to equal
. To put it another way, multiplication is defined to be continuous from below, rather than from above:
. One can verify that the commutative, associative, and distributive laws continue to hold on
, but we caution that the cancellation laws do not hold when
is involved.)
Conversely, given any coefficients (not necessarily distinct) and measurable sets
in
(not necessarily disjoint), the sum
is an unsigned simple function.
A single simple function can be decomposed in multiple ways as a linear combination of unsigned simple functions. For instance, on the real line , the function
can also be written as
or as
. However, there is an invariant of all these decompositions:
Exercise 3 Suppose that an unsigned simple function
has two representations as the linear combination of indicator functions:
where
are nonnegative integers,
lie in
, and
are measurable sets. Show that
(Hint: first handle the special case where the
are all disjoint and non-empty, and each of the
is expressible as the union of some subcollection of the
. Then handle the general case by considering the atoms of the finite boolean algebra generated by
and
.)
We capture this invariant by introducing the simple integral of an unsigned simple function by the formula
whenever admits a decomposition
. The above exercise is then precisely the assertion that the simple integral is well-defined as an element of
.
Exercise 4 Let
be unsigned simple functions, and let
.
- (i) (Linearity) Show that
and
- (ii) Show that if
and
are equal almost everywhere, then
- (iii) Show that
, with equality if and only if
is zero almost everywhere.
- (iv) (Monotonicity) If
almost everywhere, show that
.
- (v) (Markov inequality) Show that
for any
.
Now we extend from unsigned simple functions to more general unsigned functions. If is an unsigned measurable function, we define the unsigned integral
as
where the supremum is over all unsigned simple functions such that for all
.
Many of the properties of the simple integral carry over to the unsigned integral easily:
Exercise 5 Let
be unsigned functions, and let
.
- (i) (Superadditivity) Show that
and
- (ii) Show that if
and
are equal almost everywhere, then
- (iii) Show that
, with equality if and only if
is zero almost everywhere.
- (iv) (Monotonicity) If
almost everywhere, show that
.
- (v) (Markov inequality) Show that
for any
. In particular, if
, then
is finite almost everywhere.
- (vi) (Compatibility with simple integral) If
is simple, show that
.
- (vii) (Compatibility with measure) For any measurable set
, show that
.
Exercise 6 If
is a discrete probability space (with the associated probability measure
), and
is a function, show that
(Note that the condition
in the definition of a discrete probability space is not required to prove this identity.)
The observant reader will notice that the linearity property of simple functions has been weakened to superadditivity. This can be traced back to a breakdown of symmetry in the definition (3); the unsigned simple integral of is defined via approximation from below, but not from above. Indeed the opposite claim
can fail. For a counterexample, take to be the discrete probability space
with probabilities
, and let
be the function
. By Exercise 6 we have
. On the other hand, any simple function
with
must equal
on a set of positive measure (why?) and so the right-hand side of (4) can be infinite. However, one can get around this difficulty under some further assumptions on
, and thus recover full linearity for the unsigned integral:
Exercise 7 (Linearity of the unsigned integral) Let
be a measure space.
- (i) Let
be an unsigned measurable function which is both bounded (i.e., there is a finite
such that
for all
) and has finite measure support (i.e., there is a measurable set
with
such that
for all
). Show that (4) holds for this function
.
- (ii) Establish the additivity property
whenever
are unsigned measurable functions that are bounded with finite measure support.
- (iii) Show that
as
whenever
is unsigned measurable.
- (iv) Using (iii), extend (ii) to the case where
are unsigned measurable functions with finite measure support, but are not necessarily bounded.
- (v) Show that
as
whenever
is unsigned measurable.
- (vi) Using (iii) and (v), show that (ii) holds for any unsigned measurable
(which are not necessarily bounded or of finite measure support).
Next, we apply the integral to absolutely integrable functions. We call a scalar function or
absolutely integrable if it is measurable and the unsigned integral
is finite. A real-valued absolutely integrable function
can be expressed as the difference
of two unsigned absolutely integrable functions
; indeed, one can check that the choice
and
work for this. Conversely, any difference
of unsigned absolutely integrable functions
is absolutely integrable (this follows from the triangle inequality
). A single absolutely integrable function
may be written as a difference
of unsigned absolutely integrable functions in more than one way, for instance we might have
for unsigned absolutely integrable functions . But when this happens, we can rearrange to obtain
and thus by linearity of the unsigned integral
By the absolute integrability of , all the integrals are finite, so we may rearrange this identity as
This allows us to define the Lebesgue integral of a real-valued absolutely integrable function
to be the expression
for any given decomposition of
as the difference of two unsigned absolutely integrable functions. Note that if
is both unsigned and absolutely integrable, then the unsigned integral and the Lebesgue integral of
agree (as can be seen by using the decomposition
), and so there is no ambiguity in using the same notation
to denote both integrals. (By the same token, we may now drop the modifier
from the simple integral of a simple unsigned
, which we may now also denote by
.)
The Lebesgue integral also enjoys good linearity properties:
Exercise 8 Let
be real-valued absolutely integrable functions, and let
.
- (i) (Linearity) Show that
and
are also real-valued absolutely integrable functions, with
and
(For the second relation, one may wish to first treat the special cases
and
.)
- (ii) Show that if
and
are equal almost everywhere, then
- (iii) Show that
, with equality if and only if
is zero almost everywhere.
- (iv) (Monotonicity) If
almost everywhere, show that
.
- (v) (Markov inequality) Show that
for any
.
Because of part (iii) of the above exercise, we can extend the Lebesgue integral to real-valued absolutely integrable functions that are only defined and real-valued almost everywhere, rather than everywhere. In particular, we can apply the Lebesgue integral to functions that are sometimes infinite, so long as they are only infinite on a set of measure zero, and the function is absolutely integrable everywhere else.
Finally, we extend to complex-valued functions. If is absolutely integrable, observe that the real and imaginary parts
are also absolutely integrable (because
). We then define the (complex) Lebesgue integral
of
in terms of the real Lebesgue integral by the formula
Clearly, if is real-valued and absolutely integrable, then the real Lebesgue integral and the complex Lebesgue integral of
coincide, so it does not create ambiguity to use the same symbol
for both concepts. It is routine to extend the linearity properties of the real Lebesgue integral to its complex counterpart:
Exercise 9 Let
be complex-valued absolutely integrable functions, and let
.
- (i) (Linearity) Show that
and
are also complex-valued absolutely integrable functions, with
and
(For the second relation, one may wish to first treat the special cases
and
.)
- (ii) Show that if
and
are equal almost everywhere, then
- (iii) Show that
, with equality if and only if
is zero almost everywhere.
- (iv) (Markov inequality) Show that
for any
.
We record a simple, but incredibly fundamental, inequality concerning the Lebesgue integral:
Lemma 10 (Triangle inequality) If
is a complex-valued absolutely integrable function, then
Proof: We have
This looks weaker than what we want to prove, but we can “amplify” this inequality to the full strength triangle inequality as follows. Replacing by
for any real
, we have
Since we can choose the phase to make the expression
equal to
, the claim follows.
Finally, we observe that the Lebesgue integral extends the Riemann integral, which is particularly useful when it comes to actually computing some of these integrals:
Exercise 11 If
is a Riemann integrable function on a compact interval
, show that
is also absolutely integrable, and that the Lebesgue integral
(with
Lebesgue measure restricted to
) coincides with the Riemann integral
. Similarly if
is Riemann integrable on a box
.
— 2. Expectation of random variables —
We now translate the above notions of integration on measure spaces to the probabilistic setting.
A random variable taking values in the unsigned extended real line
is said to be simple if it takes on at most finitely many values. Equivalently,
can be expressed as a finite unsigned linear combination
of indicator random variables, where are unsigned and
are events. We then define the simple expectation
of
to be the quantity
and checks that this definition is independent of the choice of decomposition of into indicator functions. Observe that if we model the random variable
using a probability space
, then the simple expectation of
is precisely the simple integral of the corresponding unsigned simple function
.
Next, given an arbitrary unsigned random variable taking values in
, modeled by a probability space
one defines its (unsigned) expectation
as
where ranges over all simple unsigned random variables modeled by the same space
such that
is surely true. There is a subtle issue that this definition might in principle depend on the choice of
(because this affects the pool of available simple unsigned random variables
), but this is not the case:
Exercise 12 Let
be a random variable taking values in
, and let
be a simple unsigned random variable such that
is surely true. Show that there exists a function
taking on finitely many values such that
is surely true. Conclude in particular that the above definition of expectation does not depend on the choice of model
.
The expectation extends the simple expectation (thus for all simple unsigned
), and in terms of a probability space model
, the expectation
is precisely the unsigned integral of
. The expectation of a random variable is also often referred to as the mean, particularly in applications connected to statistics. In some literature
is also called the expected value of
, but this is a somewhat misleading term as often one expects
to deviate above or below
.
A scalar random variable is said to be absolutely integrable if
, thus for instance any bounded random variable is absolutely integrable. If
is real-valued and absolutely integrable, we define its expectation by the formula
where is any representation of
as the difference of unsigned absolutely integrable random variables
; one can check that this definition does not depend on the choice of representation and is thus well-defined. For complex-valued absolutely integrable
, we then define
In all of these cases, the expectation of is equal to the integral of the representation
of
in any probability space model; in the case that
is given by a discrete probability model, one can check that this definition of expectation agrees with the one given in Notes 0. Using the former fact, we can translate the properties of integration already established to the probabilistic setting:
Proposition 13
- (i) (Unsigned linearity) If
are unsigned random variables, and
is a deterministic unsigned quantity, then
and
. (Note that these identities hold even when
are not absolutely integrable.)
- (ii) (Complex linearity) If
are absolutely integrable random variables, and
is a deterministic complex quantity, then
and
are also absolutely integrable, with
and
.
- (iii) (Compatibility with probability) If
is an event, then
. In particular,
.
- (iv) (Almost sure equivalence) If
are unsigned (resp. absolutely integrable) and
almost surely, then
.
- (v) If
is unsigned or absolutely integrable, then
, with equality if and only if
almost surely.
- (vi) (Monotonicity) If
are unsigned or real-valued absolutely integrable, and
almost surely, then
.
- (vii) (Markov inequality) If
is unsigned or absolutely integrable, then
for any deterministic
.
- (viii) (Triangle inequality) If
is absolutely integrable, then
.
As before, we can use part (iv) to define expectation of scalar random variables that are only defined and finite almost surely, rather than surely.
From Exercise 12, the notion of expectation is automatically probabilistic in the same sense. Because of this, we will be easily able to manipulate expectations of random variables without having to explicitly mention an underlying probability space , and so one will now see such spaces fade from view starting from this point in the course.
— 3. Exchanging limits with integrals or expectations —
When performing analysis on measure spaces, it is important to know if one can interchange a limit with an integral:
Similarly, in probability theory, we often wish to interchange a limit with an expectation:
Of course, one needs the integrands or random variables to be either unsigned or absolutely integrable, and the limits to be well-defined to have any hope of doing this. Naively, one could hope that limits and integrals could always be exchanged when the expressions involved are well-defined, but this is unfortunately not the case. In the case of integration on, say, the real line using Lebesgue measure
, we already see four key examples:
- (Moving bump example) Take
. Then
, but
.
- (Spreading bump example) Take
. Then
, but
.
- (Concentrating bump example) Take
. Then
, but
.
- (Receding infinity example) Take
. Then
, but
.
In all these examples, the limit of the integral exceeds the integral of the limit; by replacing with
in the first three examples (which involve absolutely integrable functions) one can also build examples where the limit of the integral is less than the integral of the limit. Most of these examples rely on the infinite measure of the real line and thus do not directly have probabilistic analogues, but the concentrating bump example involves functions that are all supported on the unit interval
and thus also poses a problem in the probabilistic setting.
Nevertheless, there are three important cases in which we can relate the limit (or, in the case of Fatou’s lemma, the limit inferior) of the integral to the integral of the limit (or limit inferior). Informally, they are:
- (Fatou’s lemma) For unsigned
, the integral of the limit inferior cannot exceed the limit inferior of the integral. “Limits (or more precisely, limits inferior) can destroy (unsigned) mass, but cannot create it.”
- (Monotone convergence theorem) For unsigned monotone increasing
, the limit of the integral equals the integral of the limit.
- (Dominated convergence theorem) For
that are uniformly dominated by an absolutely integrable function, the limit of the integral equals the integral of the limit.
These three results then have analogues for convergence of random variables. We will also mention a fourth useful tool in that setting, which allows one to exchange limits and expectations when one controls a higher moment. There are a few more such general results allowing limits to be exchanged with integrals or expectations, but my advice would be to work out such exchanges by hand rather than blindly cite (possibly incorrectly) an additional convergence theorem beyond the four mentioned above, as this is safer and will help strengthen one’s intuition on the situation.
We now state and prove these results more explicitly.
Lemma 14 (Fatou’s lemma) Let
be a measure space, and let
be a sequence of unsigned measurable functions. Then
An equivalent form of this lemma is that if one has
for some and all sufficiently large
, then one has
as well. That is to say, if the original unsigned functions eventually have “mass” less than or equal to
, then the limit (inferior)
also has “mass” less than or equal to
. The limit may have substantially less mass, as the four examples above show, but it can never have more mass (asymptotically) than the functions that comprise the limit. Of course, one can replace limit inferior by limit in the left or right hand side if one knows that the relevant limit actually exists (but one cannot replace limit inferior by limit superior if one does not already have convergence, see Example 16 below). On the other hand, it is essential that the
are unsigned for Fatou’s lemma to work, as can be seen by negating one of the first three key examples mentioned above.
Proof: By definition of the unsigned integral, it suffices to show that
whenever is an unsigned simple function with
. At present,
is allowed to take the infinite
, but it suffices to establish this claim for
that only take finite values, since the claim then follows for possibly infinite-valued
by applying the claim with
replaced by
and then letting
go to infinity.
Multiplying by , it thus suffices to show that
for any and any unsigned
as above.
We can write as the sum
for some strictly positive finite
and disjoint
; we allow the
and the measures
to be infinite. On each
, we have
. Thus, if we define
then the increase to
as
:
. By continuity from below (Exercise 23 of Notes 0), we thus have
as . Since
we conclude upon integration that
and thus on taking limit inferior
But the right-hand side is , and the claim follows.
Of course, Fatou’s lemma may be phrased probabilistically:
Lemma 15 (Fatou’s lemma for random variables) Let
be a sequence of unsigned random variables. Then
As a corollary, if are unsigned and converge almost surely to a random variable
, then
Example 16 We now give an example to show that limit inferior cannot be replaced with limit superior in Fatou’s lemma. Let
be drawn uniformly at random from
, and for each
, let
be the
binary digit of
, thus
when
has odd integer part, and
otherwise. (There is some ambiguity with the binary expansion when
is a terminating binary decimal, but this event almost surely does not occur and can thus be safely ignored.) One has
for all
(why?). It is then easy to see that
is almost surely
(which is consistent with Fatou’s lemma) but
is almost surely
(so Fatou’s lemma fails if one replaces limit inferior with limit superior).
Next, we establish the monotone convergence theorem.
Theorem 17 (Monotone convergence theorem) Let
be a measure space, and let
be a sequence of unsigned measurable functions which is monotone increasing, thus
for all
and
. Then
Note that the limits exist on both sides because monotone sequences always have limits. Indeed the limit in either side is equal to the supremum. The receding infinity example shows that it is important that the functions here are monotone increasing rather than monotone decreasing. We also observe that it is enough for the to be increasing almost everywhere rather than everywhere, since one can then modify the
on a set of measure zero to be increasing everywhere, which does not affect the integrals on either side of this theorem.
Proof: From Fatou’s lemma we already have
On the other hand, from monotonicity we see that
for any natural number , and on taking limits as
we obtain the claim.
Note that continuity from below for measures (Exercise 23.3 of Notes 0 can be viewed as the special case of the monotone convergence theorem when the functions are all indicator functions.)
An important corollary of the monotone convergence theorem is that one can freely interchange infinite sums with integrals for unsigned functions, that is to say
for any unsigned (not necessarily monotone). Indeed, to see this one simply applies the monotone convergence theorem to the partial sums
.
We of course can translate this into the probabilistic context:
Theorem 18 (Monotone convergence theorem for random variables) Let
be a monotone non-decreasing sequence of unsigned random variables. Then
Similarly, for any unsigned random variables
, we have
Again, it is sufficient for the to be non-decreasing almost surely. We note a basic but important corollary of this theorem, namely the (first) Borel-Cantelli lemma:
Lemma 19 (Borel-Cantelli lemma) Let
be a sequence of events with
. Then almost surely, at most finitely many of the events
hold; that is to say, one has
almost surely.
Proof: From the monotone convergence theorem, we have
By Markov’s inequality, this implies that is almost surely finite, as required.
As the above proof shows, the Borel-Cantelli lemma is almost a triviality if one has the machinery of expectation (or integration); but it is remarkably hard to prove the lemma without that machinery, and it is an instructive exercise to attempt to do so.
We will develop a partial converse to the above lemma (the “second” Borel-Cantelli lemma) in a subsequent set of notes. For now, we give a crude converse in which we assume not only that the sum to infinity, but they are in fact uniformly bounded from below:
Exercise 20 Let
be a sequence of events with
. Show that with positive probability, an infinite number of the
hold; that is to say,
. (Hint: if
for all
, establish the lower bound
for all
. Alternatively, one can apply Fatou’s lemma to the random variables
.)
Exercise 21 Let
be a sequence such that
. Show that there exist a sequence of events
modeled by some probability space
, such that
for all
, and such that almost surely infinitely many of the
occur. Thus we see that the hypothesis
in the Borel-Cantelli lemma cannot be relaxed.
Finally, we give the dominated convergence theorem.
Theorem 22 (Dominated convergence theorem) Let
be a measure space, and let
be measurable functions which converge pointwise to some limit. Suppose that there is an unsigned absolutely integrable function
which dominates the
in the sense that
for all
and all
. Then
In particular, the limit on the right-hand side exists.
Again, it will suffice for to dominate each
almost everywhere rather than everywhere, as one can upgrade this to everywhere domination by modifying each
on a set of measure zero. Similarly, pointwise convergence can be replaced with pointwise convergence almost everywhere. The domination of each
by a single function
implies that the integrals
are uniformly bounded in
, but this latter condition is not sufficient by itself to guarantee interchangeability of the limit and integral, as can be seen by the first three examples given at the start of this section.
Proof: By splitting into real and imaginary parts, we may assume without loss of generality that the are real-valued. As
is absolutely integrable, it is finite almost everywhere; after modification on a set of measure zero we may assume it is finite everywhere. Let
denote the pointwise limit of the
. From Fatou’s lemma applied to the unsigned functions
and
, we have
and
Rearranging this (taking crucial advantage of the finite nature of the , and hence
and
), we conclude that
and the claim follows.
Remark 23 Amusingly, one can use the dominated convergence theorem to give an (extremely indirect) proof of the divergence of the harmonic series
. For, if that series was convergent, then the function
would be absolutely integrable, and the spreading bump example described above would contradict the dominated convergence theorem. (Expert challenge: see if you can deconstruct the above argument enough to lower bound the rate of divergence of the harmonic series
.)
We again translate the above theorem to the probabilistic context:
Theorem 24 (Dominated convergence theorem for random variables) Let
be scalar random variables which converge almost surely to a limit
. Suppose there is an unsigned absolutely integrable random variable
such that
almost surely for each
. Then
As a corollary of the dominated convergence theorem for random variables we have the bounded convergence theorem: if are scalar random variables that converge almost surely to a limit
, and are almost surely bounded in magnitude by a uniform constant
, then we have
(In Durrett, the bounded convergence theorem is proven first, and then used to establish Fatou’s theorem and the dominated and monotone convergence theorems. The order in which one establishes these results – which are all closely related to each other – is largely a matter of personal taste.) A closely related corollary (which can also be established directly is that if are scalar absolutely integrable random variables that converge uniformly to
(thus, for each
there is
such that
is surely true for all
), then
converges to
.
A further corollary of the dominated convergence theorem is that one has the identity
whenever are scalar random variables with
absolutely integrable (or equivalently, that
is finite).
Another useful variant of the dominated convergence theorem is
Theorem 25 (Convergence for random variables with bounded moment) Let
be scalar random variables which converge almost surely to a limit
. Suppose there is
and
such that
for all
. Then
This theorem fails for , as the concentrating bump example shows. The case
(that is to say, bounded second moment
) is already quite useful. The intuition here is that concentrating bumps are in some sense the only obstruction to interchanging limits and expectations, and these can be eliminated by hypotheses such as a bounded higher moment hypothesis or a domination hypothesis.
Proof: By taking real and imaginary parts we may assume that the (and hence
) are real-valued. For any natural number
, let
denote the truncation
of
to the interval
, and similarly define
. Then
converges pointwise to
, and hence by the bounded convergence theorem
On the other hand, we have
(why?) and thus on taking expectations and using the triangle inequality
where we are using the asymptotic notation to denote a quantity bounded in magnitude by
for an absolute constant
. Also, from Fatou’s lemma we have
so we similarly have
Putting all this together, we see that
Sending , we obtain the claim.
Remark 26 The essential point about the condition
was that the function
grew faster than linearly as
. One could accomplish the same result with any other function with this property, e.g. a hypothesis such as
would also suffice. The most natural general condition to impose here is that of uniform integrability, which encompasses the hypotheses already mentioned, but we will not focus on this condition here.
Exercise 27 (ScheffĂ©’s lemma) Let
be a sequence of absolutely integrable scalar random variables that converge almost surely to another absolutely integrable scalar random variable
. Suppose also that
converges to
as
. Show that
converges to zero as
. (Hint: there are several ways to prove this result, known as Scheffe’s lemma. One is to split
into two components
, such that
is dominated by
but converges almost surely to
, and
is such that
. Then apply the dominated convergence theorem.)
— 4. The distribution of a random variable —
We have seen that the expectation of a random variable is a special case of the more general notion of Lebesgue integration on a measure space. There is however another way to think of expectation as a special case of integration, which is particularly convenient for computing expectations. We first need the following definition.
Definition 28 Let
be a random variable taking values in a measurable space
. The distribution of
(also known as the law of
) is the probability measure
on
defined by the formula
for all measurable sets
; one easily sees from the Kolmogorov axioms that this is indeed a probability measure.
In the language of measure theory, the distribution on
is the push-forward of the probability measure
on the sample space
by the model
of
on that sample space.
Example 29 If
only takes on at most countably many values (and if every point in
is measurable), then the distribution
is the discrete measure that assigns each point
in the range of
a measure of
.
Example 30 If
is a real random variable with cumulative distribution function
, then
is the Lebesgue-Stieltjes measure associated to
. For instance, if
is drawn uniformly at random from
, then
is Lebesgue measure restricted to
. In particular, two scalar variables are equal in distribution if and only if they have the same cumulative distribution function.
Example 31 If
and
are the results of two separate rolls of a fair die (as in Example 3 of Notes 0), then
and
are equal in distribution, but are not equal as random variables.
Remark 32 In the converse direction, given a probability measure
on a measurable space
, one can always build a probability space model and a random variable
represented by that model whose distribution is
. Indeed, one can perform the “tautological” construction of defining the probability space model to be
, and
to be the identity function
, and then one easily checks that
. Compare with Corollaries 26 and 29 of Notes 0. Furthermore, one can view this tautological model as a “base” model for random variables of distribution
as follows. Suppose one has a random variable
of distribution
which is modeled by some other probability space
, thus
is a measurable function such that
for all
. Then one can view the probability space
as an extension of the tautological probability space
using
as the factor map.
We say that two random variables are equal in distribution, and write
, if they have the same law:
, that is to say
for any measurable set
in the range. This definition makes sense even when
are defined on different sample spaces. Roughly speaking, the distribution captures the “size” and “shape” of the random variable, but not its “location” or how it relates to other random variables. We also say that
is a copy of
if they are equal in distribution. For instance, the two dice rolls in Example 3 of Notes 0 are copies of each other.
Theorem 33 (Change of variables formula) Let
be a random variable taking values in a measurable space
. Let
or
be a measurable scalar function (giving
or
the Borel
-algebra of course) such that either
, or that
. Then
Thus for instance, if is a real random variable, then
and more generally
for all ; furthermore, if
is unsigned or absolutely integrable, one has
The point here is that the integration is not over some unspecified sample space , but over a very explicit domain, namely the reals; we have “changed variables” to integrate over
instead over
, with the distribution
representing the “Jacobian” factor that typically shows up in such change of variables formulae.
If is a scalar variable that only takes on at most countably many values
, the change of variables formula tells us that
if is unsigned or absolutely integrable.
Proof: First suppose that is unsigned and only takes on a finite number
of values. Then
and hence
as required.
Next, suppose that is unsigned but can take on infinitely many values. We can express
as the monotone increasing limit of functions
that only take a finite number of values; for instance we can define
to be
rounded down to the largest multiple of
less than both
and
. By the preceding computation, we have
and on taking limits as using the monotone convergence theorem we obtain the claim in this case.
Now suppose that is real-valued with
. We write
where
and
, then we have
and
for . Subtracting these two identities together, we obtain the claim.
Finally, the case of complex-valued with
follows from the real-valued case by taking real and imaginary parts.
Example 34 Let
be the uniform distribution on
, then
for any Riemann integrable
; thus for instance
for any
.
Remark 35 An alternate way to prove the change of variables formula is to observe that the formula is obviously true when one uses the tautological model
for
, and then the claim follows from the model-independence of expectation and the observation from Remark 32 that any other model for
is an extension of the tautological model.
Exercise 36 Let
be a measurable function with
. If one defines
for any Borel subset
of
by the formula
show that
is a probability measure on
with Stieltjes measure function
. If
is a real random variable with probability distribution
(in which case we call
a random variable with an absolutely continuous distribution, and
the probability density function (PDF) of
), show that
when either
is an unsigned measurable function, or
is measurable with
absolutely integrable (or equivalently, that
.
Exercise 37 Let
be a real random variable with the probability density function
of the standard normal distribution. Establish the Stein identity
whenever
is a continuously differentiable function with
and
both of polynomial growth (i.e., there exist constants
such that
for all
). There is a robust converse to this identity which underpins the basis of Stein’s method, discussed in this previous blog post. Use this identity recursively to establish the identities
when
is an odd natural number and
when
is an even natural number. (This quantity is also known as the double factorial
of
.)
Exercise 38 Let
be a real random variable with cumulative distribution function
. Show that
for all
. If
is nonnegative, show that
for all
.
— 5. Some basic inequalities —
The change of variables formula allows us, in principle at least, to compute the expectation of a scalar random variable as an integral. In very simple situations, for instance when
has one of the standard distributions (e.g. uniform, gaussian, binomial, etc.), this allows us to compute such expectations exactly. However, once one gets to more complicated situations, one usually does not expect to be able to evaluate the required integrals in closed form. In such situations, it is often more useful to have some general inequalities concerning expectation, rather than identities.
We therefore record here for future reference some basic inequalities concerning expectation that we will need in the sequel. We have already seen the triangle inequality
for absolutely integrable , and the Markov inequality
for arbitrary scalar and
(note the inequality is trivial if
is not absolutely integrable). Applying the triangle inequality to the difference
of two absolutely integrable random variables
, we obtain the variant
Thus, for instance, if is a sequence of absolutely integrable scalar random variables which converges in
to another absolutely integrable random variable
, in the sense that
as
, then
as
.
Similarly, applying the Markov inequality to the quantity we obtain the important Chebyshev inequality
for absolutely integrable and
, where the Variance
of
is defined as
Next, we record
Lemma 39 (Jensen’s inequality) If
is a convex function,
is a real random variable with
and
both absolutely integrable, then
Proof: Let be a real number. Being convex, the graph of
must be supported by some line at
, that is to say there exists a slope
(depending on
) such that
for all
. (If
is differentiable at
, one can take
to be the derivative of
at
, but one always has a supporting line even in the non-differentiable case.) In particular
Taking expectations and using linearity of expectation, we conclude
and the claim follows from setting .
Exercise 40 (Complex Jensen inequality) Let
be a convex function (thus
for all complex
and all
, and let
be a complex random variable with
and
both absolutely integrable. Show that
Note that the triangle inequality is the special case of Jensen’s inequality (or the complex Jensen’s inequality, if
is complex-valued) corresponding to the convex function
on
(or
on
). Another useful example is
Applying Jensen’s inequality to the convex function and the random variable
for some
, we obtain the arithmetic mean-geometric mean inequality
assuming that and
are absolutely integrable.
As a related application of convexity, observe from the convexity of the function that
for any and
. This implies in particular Young’s inequality
for any scalar and any exponents
with
; note that this inequality is also trivially true if one or both of
are infinite. Taking expectations, we conclude that
if are scalar random variabels and
are deterministic exponents with
. In particular, if
are absolutely integrable, then so is
, and
We can amplify this inequality as follows. Multiplying by some
and dividing
by the same
, we conclude that
optimising the right-hand side in , we obtain (after some algebra, and after disposing of some edge cases when
or
is almost surely zero) the important Hölder inequality
where we use the notation
for . Using the convention
(thus is the essential supremum of
), we also see from the triangle inequality that the Hölder inequality applies in the boundary case when one of
is allowed to be
(so that the other is equal to
):
The case is the important Cauchy-Schwarz inequality
valid whenever are square-integrable in the sense that
are finite.
Exercise 41 Show that the expressions
are non-decreasing in
for
. In particular, if
is finite for some
, then it is automatically finite for all smaller values of
.
Exercise 42 For any square-integrable
, show that
Exercise 43 If
and
are scalar random variables with
, use Hölder’s inequality to establish that
and
and then conclude the Minkowski inequality
Show that this inequality is also valid at the endpoint cases
and
.
Exercise 44 If
is non-negative and square-integrable, and
, establish the Paley-Zygmund inequality
(Hint: use the Cauchy-Schwarz inequality to upper bound
in terms of
and
.)
Exercise 45 Let
be a non-negative random variable that is almost surely bounded but not identically zero, show that
56 comments
Comments feed for this article
3 October, 2015 at 7:47 pm
pauljung
Great notes so far. You have a stray * appearing as f*(omega) in the first bullet of Exercise 7. In the same exercise, second bullet, I think you want equality instead of greater or equal. In Proposition 12 (i), do you not need that X and Y are integrable?
[Corrected, thanks – T.]
3 October, 2015 at 8:02 pm
pauljung
By the way, I like your statement about Probability Theory being the study of random events which are model-independent. M. Loeve’s Vol. I (pg 173) has a similar take on this view.
3 October, 2015 at 9:46 pm
Anonymous
Great notes! One small correction: in the proof of theorem 16, you should have LHS >= RHS in the second displayed equation.
[Corrected, thanks – T.]
3 October, 2015 at 11:48 pm
PerryZhao
Reblogged this on 木秀于林.
3 October, 2015 at 11:56 pm
pauljung
Ignore my comment about Proposition 12, I didn’t realize unsigned there meant nonnegative.
4 October, 2015 at 4:12 am
Anonymous
Professor – it looks like your informal heuristic for Fatou is backwards.
[Looks like it is in the right direction to me – could you elaborate? -T.]
4 October, 2015 at 9:32 am
Anonymous
Sure – in your informal heuristic bullet list you state:
“For unsigned {f_n}, the limit inferior of the integral cannot exceed the integral of the limit inferior”
but the inequality in Fatou actually goes the other way.
[Got it now. Corrected, thanks -T.]
4 October, 2015 at 4:37 am
David Gonzales
Exercise 9(iv) talks about monotonicity of complex functions $f$ and $g$, which isn’t defined so should probably be changed.
[Oops, that should have been deleted, thanks – T.]
4 October, 2015 at 6:55 am
Jennifer
Thanks for providing these great notes for self-study! Do you by any chance also share pdf versions of them?
[One should be able to print to PDF (with headers and sidebar removed) from the “Print” feature on your browser. -T]
5 October, 2015 at 6:44 pm
obryant
The very first bullet has a typo: wrong notation for the sure event. Thanks for the notes, and also for the pre-work that surely went into creating them.
[Actually, I am using
(the complement of the empty event) to denote the sure event. -T.]
7 October, 2015 at 6:33 am
Not A Music Expert
What kind of music do you listen to?
7 October, 2015 at 9:52 am
John Mangual
I am going to go out on a limb here and say a lot of these complications arise from the non-compactness of
? Markov’s inequality is essentially the pigeonhole principle. Can the same be said of Chebyshev inequality? Or even Hölder inequality?
12 October, 2015 at 11:41 am
Ryan McNeive
Thanks (as always) for these great notes!
I wanted to ask about a typo. In the proof of Fatou’s lemma, on the RHS of the last three equations, surely the sum should be from 1 to n rather than 1 to N? Unless I am very confused.
[One can use the symbol
in place of
here (thus replacing
,
by
respectively) if desired. I chose not to do so as this makes the definition of
a bit confusing unless one also changes the symbol
appearing there to some other symbol. -T.]
[Added, Oct 14: oh, I see the problem now, I had used n for two unrelated things. Fixed now, thanks – T.]
12 October, 2015 at 12:34 pm
275A, Notes 2: Product measures and independence | What's new
[…] the previous set of notes, we constructed the measure-theoretic notion of the Lebesgue integral, and used this to set up the […]
14 October, 2015 at 11:30 am
Sam
Above exercise 3, isn’t there a mistake in {1 \times 1_{[0,2)} + 1 \times 1_{[1,3)}? Perhaps it should be {1 \times 1_{[0,1)} + 1 \times 1_{[0,3)}?
[Corrected, thanks – T.]
14 October, 2015 at 12:07 pm
Sam
I think there is still a typo in the second indicator function:
instead of
.
[Corrected, thanks -T.]
17 October, 2015 at 1:22 pm
Anonymous
Should the second centered equation in Theorem 23, be
? The 2 comes from considering the events
such that
and for these
, 
17 October, 2015 at 7:13 pm
Terence Tao
The factor of 2 is unnecessary; one has the pointwise bound
when
(checking the cases
and
separately) and
otherwise.
18 October, 2015 at 4:51 am
L.
In the proof of the Fatou’s lemma, why do we need the
“modification”? There is at least one point that I am not sure: since the
are allowed to be
, we should have
on
(instead of strict inequality in the note). But we know that on
,
. Now if we define
, then we have
, which is equal to
. The
are still increasing to
.
18 October, 2015 at 10:51 am
Terence Tao
Ah, I did not treat the case when some of the
were infinite; I’ve fixed the proof to address this.
One needs strict inequality in the bound
to ensure that
for sufficiently large n. The bound
does not ensure that
for even a single choice of
(for instance, the
could be increasing and converge to
in the limit).
19 October, 2015 at 10:58 am
L.
Oh, I see the point. Thanks very much for your comments.
2 November, 2015 at 7:06 pm
275A, Notes 4: The central limit theorem | What's new
[…] this and the Paley-Zygmund inequality (Exercise 39 of Notes 1) we also get some lower bound for of the […]
20 November, 2015 at 12:41 pm
Anonymous
Is Chebyshev inequality (8) optimal in the sense that if the first two moments of a random variable
are given with a threshold
, then (8) can be made arbitrarily close to equality by an appropriate selection of a probability distribution for
(depending of course on the given expectation and variance of
and the given threshold
) ?
20 November, 2015 at 1:50 pm
Terence Tao
Yes, this can be seen by experimenting with Bernoulli type random variables (e.g. ones which attain a threshold
with some probability
,
with probability
, and
with probability
, for various choices of parameters
).
20 November, 2015 at 5:54 pm
Anonymous
Thank you! Let me add some comments:
1. If
has mean
and variance
, without loss of generality we may assume that
and if the standard deviation
and the threshold
are given, consider the two cases:
(i)
: In this case the RHS of (8) is greater than 1, so (8) is trivial
but not(!) optimal.
(ii)
: In this case (using your suggestion) choose
and denote
(which is less than 1) . If
attain
with probability
,
with probability
, and
with probability
, then
has zero mean and (from this choice of
) variance
– as required. Moreover, (8) will be
.
arbitrarily close to
) Chebyshev inequality (8) an be made arbitrarily close to equality (i.e. (8) is optimal in this case.)
Hence (by choosing
2. Interestingly, this idea can be applied also for the one-sided chebyshev inequality (Chebyshev-Cantelli inequality):
Where
and
.
and define
.
Choose
If
attain
with probability
and
with probability
, it follows that
and
– as required.
Moreover, the one-sided Chebyshev inequality will be
Hence (by choosing
arbitrarily close to
) the one-sided Chebyshev inequality is also optimal.
21 November, 2015 at 4:51 pm
Anonymous
It is interesting to observe that for case (i) above (for which Chebyshev inequality (8) is non-optimal), the optimal bound on the LHS of (8) is the trivial bound 1. To see that, we use a simplified version of the proof of case (ii) above, in which the random variable
attain
with probability
and
with probability
. This gives
(as required) and
as the probability in the LHS of (8).
25 November, 2015 at 6:40 am
Anonymous
Hi Terry, there’s a small typo at the end of the proof dominated convergence theorem (Theorem 21). The final equation is missing integral signs.
[Corrected, thanks – T.]
28 November, 2015 at 3:26 pm
Anonymous
Just before Exercise 3, there is “can also be written as {1 \times 1_{[0,1)} + 1 \times 1_{[1,3)}}”, but the last subscript is probably meant to be
[Corrected, thanks -T.]
28 November, 2015 at 4:21 pm
Anonymous
‘real-valuked’ to ‘real-valued’ (it’s somewhere in the middle)
[Corrected, thanks -T.]
28 November, 2015 at 4:24 pm
Anonymous
Forgotten subscript:
\displaystyle {\bf E} f_i(X) = \int{\bf R} f_i(x)\ d\mu_X(x)
should be:
\displaystyle {\bf E} f_i(X) = \int_{\bf R} f_i(x)\ d\mu_X(x)
(just after valuked)
[Corrected, thanks -T.]
30 December, 2015 at 9:53 am
Anonymous
Below Remark 31, “… for any measurable set {R} in the range…” should read “… any measurable set {S}..”
[Corrected, thanks – T.]
6 January, 2016 at 5:12 am
Anonymous
Just a small typo, I guest. In the proof of Theorem 32, when the general case is considered and the functions f_n are considered, there is a missed expectation before letting n to infinite. Should be
[Corrected, thanks – T.]
6 January, 2016 at 7:48 am
Anonymous
In exercise 37 I get the integrals with
instead of just
in both cases. Maybe there is a typo here.
[Corrected, thanks – T.]
4 August, 2016 at 5:01 pm
Sebastien Zany
Thanks for these notes!
Is entropy not a probabilistic concept? If not, then how should we think about it?
6 August, 2016 at 12:50 pm
Terence Tao
There are several different notions of entropy used in mathematics, not all of which are directly tied to probability. But the Shannon entropy
of a discrete random variable
is a probabilistic concept, since it can be expressed in terms of probabilities as
(with the convention that
).
15 November, 2016 at 6:16 am
Anonymous
I’m a little confused about the summary at the very beginning of this note. In Notes 0, a measure space
was introduced to model some randomness (an “abstract” probability space). And this abstract probability space is a triple
which satisfies Kolmogrov’s three axioms. The difference between the concrete model
and the abstract probability space
that
models is that
(1)
is a measure space where
is a (concrete)
-algebra on
;
(2)while in
, it does not have to be a measure space and the “event space”
is an abstract
-algebra and hence do not have to be a collection of subsets of
.
Shouldn’t we have
first so that we can talk about we models
using a model
? Why do you define the “conjunction”, “disjunction” and the “probability”
of an event
after a model
is given? (Why is this not a circular argument?) Shouldn’t all these be defined first?
Are you in fact abstracting thing out from a concrete model
to get an abstract probability space
, like we can get an abstract real space
out of a concrete real space
, and one can actually define an abstract probability space
without given any model
in the first place?
15 November, 2016 at 10:10 am
Terence Tao
Yes, if one wants to, one can define abstract probability spaces first without reference to concrete measure spaces and only then talk about their representations by concrete measure spaces. This is discussed in Section 4 of Notes 0. But from a pedagogical point of view this is undesirable as this delays the point in the course where one actually does probability, rather than foundations of probability. This is one reason why most texts compress the foundations by working entirely with concrete probability spaces and not abstract ones (somewhat similarly to how in a first undergraduate linear algebra class, vectors are often _defined_ to be rows or columns of numbers, rather than as elements of an abstract vector space, in order to get on with the linear algebra rather than the foundations of linear algebra).
For the purposes of actually doing probability, the only relevant features of the concept of an abstract probability space are that (a) it can be modeled by at least one concrete probability space, and (b) the basic probabilistic notions (e.g. boolean operations, probability of an event, whether two events are equal etc.) are independent of the model. The existence of such a concept can be guaranteed by the formal definition of an abstract probability space as an abstract sigma algebra equipped with an abstract probability measure (or by the other alternative definitions of this concept in Section 4 of Notes 0). But one could also proceed by defining concrete probability spaces first, and defining an abstraction of that space to be anything isomorphic to the events of that space together with their probabilities and boolean operations; this would suffice for most practical purposes, and is basically the approach taken in Notes 0 before Section 4.
Incidentally, while the 1933 text of Kolmogorov does use a concrete sigma algebra (or, in the language of that era, a field of sets) to model the event space, it is clear from his writing that he would have used an abstraction of this if was available at the time (e.g. on page 1 of your linked translation he writes “what the elements of this set represent is of no importance”). One reason for this is that the correct axiomatisation of an abstract sigma algebra was not identified and justified until the work of Loomis and Sikorski in 1947, which was many years after the original text of Kolmogorov.
15 November, 2016 at 6:31 am
Anonymous
Right after Definition 27, I think you mean “the distribution
on
is the push-forward of the probability measure
.
[Corrected, thanks – T.]
17 February, 2017 at 9:57 pm
254A, Notes 2: The central limit theorem | What's new
[…] this and the Paley-Zygmund inequality (Exercise 42 of Notes 1) we also get some lower bound for of the […]
29 November, 2017 at 7:30 am
haonanz
Hello Prof. Tao, may I ask if you could provide a counter example where two unsigned functions that the integral is strictly superadditive? Based on the construction, I will assume the counter example would be either function to be unbounded value or unbounded support, but I just could not come up with an explicit one. Thanks,
6 December, 2017 at 12:00 pm
Terence Tao
There is no such counterexample for measurable functions; see Exercise 7(vi). In the nonmeasurable case, a simple example would be given when
is a two-element space
with the trivial algebra
and the obvious probability measure with
and
.
22 January, 2019 at 5:18 am
Weibo Shu
Dear Pro. Tao:
which satisfies that
is ok to substitute the condition in theorem 24. But I am a little skeptical, since I can only get
by this condition (here we can adjust m to make N arbitrary big). Without
, I can’t amplify
to
. consequently, the inequality can’t become
. Hence I can't deduce the uniform integrability.
I have some questions about remark 25, you say any function
is there anything wrong with my statement?
22 January, 2019 at 8:28 am
Terence Tao
[Regarding LaTeX formatting: wordpress interprets text between the < and > signs as HTML rather than LaTeX. For a workaround, see the discussion in https://terrytao.wordpress.com/about/ . I have repaired the issues in this particular comment.]
If
, then for any
there exists
such that $g(|X|)/|X| \geq N$ whenever
, which implies that
. (Here of course
should be understood to take values in the positive reals.)
23 January, 2019 at 6:28 pm
Weibo Shu
Thanks for your reply. I know
is positive when
, but I don’t know whether it’s positive on the whole positive real line. However, I want to amplify
to
for the purpose of proving {
} is uniform integrable. That’s because
is uniform bounded (that’s the condition of theorem 24) but
is not necessarily uniform bounded.
what I want to do is ‘
‘ , but without
is non-negative on the positive real line, the second
can’t be hold.
An example is
, though it’s non-negative when
, it’s negative when
less than 1.
24 January, 2019 at 8:02 pm
Terence Tao
27 January, 2019 at 2:02 am
Weibo Shu
So, I think we at least need the condition g is bounded from below by a constant C. Since there is a function which is not bounded from below but still satisfies that
is bounded.
24 January, 2019 at 7:14 am
Weibo Shu
Hi, Prof.Tao, I have some confusion about exercise 37.
(a) may the condition be ‘
‘ rather than ‘
‘ ?
(b) in question (1), I can rewrite the right part as ‘
since
is the radon-nikodym derivative of
. But I can’t further rewrite it as :
. That’s because F_X(x) is not necessarily absolutely continuous. Hence the first equality can’t be hold. The same condition happens in question (2). So my question is if we need an extra condition that F_X(x) is absolutely continuous? if not, how to prove the final equality?
24 January, 2019 at 8:06 pm
Terence Tao
The typo is now corrected.
Integration by parts for the Riemann-Stieltjes integral does not require absolute continuity; bounded variation is sufficient.
3 April, 2019 at 1:54 am
Anonymous
Dear Prof. Tao, I have a confusion on last paragraph of section 2.
I think invariance of expectation isn’t trivial, since concrete probability space is defined first and then abstract space is. Let (Omega, F, P) be the original probability space and (Omega’, F’, P’) a extension of it.
Then pull back sigma field on (Omega’) can be coarser than F’, so there can be simple random variables modeled by (Omega’) which cannot be modeled by (Omega).
Thus extension of original probability space makes abstract probability space bigger, so where to take supremum to calculate expectation is unclear to me.
6 April, 2019 at 3:55 pm
Terence Tao
Ooh, that is a subtle point I had not noticed. The invariance is indeed slightly non-trivial (though not terribly difficult once one knows what to do to resolve it) and I have added an exercise to address it.
5 June, 2021 at 8:01 am
Anonymous
Hello Professor, Exercise 41 seems to be inconsistent with the definition of Lp norms for vectors. In particular, ||v||_{1} \geq ||v||_{\infty} while here ||X||_p is non-decreasing for increasing p. Should we think of these as two different types of Lp norms?
7 June, 2021 at 2:01 pm
Terence Tao
7 June, 2021 at 9:22 am
Anonymous
In the third to last equation for the proof of Fatou’s lemma, the upper limit of the sum should be k and not N.
[Corrected, thanks – T.]
12 June, 2021 at 8:33 am
Anonymous
The second equation of exercise 43 should be ||Y||_p not ||X||_p.
[Corrected, thanks – T.]
21 March, 2022 at 5:42 am
J
Why would one call the example in Remark 2 an “extension”? After all,
is smaller then
.
Can it be replaced with one that
is larger than
(with
still being non-surjective)?
21 March, 2022 at 7:39 am
Terence Tao
This is an extension in which some points in the base sample space
have no lifts whatsoever to the extension
, making the extension potentially smaller; this of course does not happen in the surjective case. But one can certainly have non-surjective extensions that are larger than the base, for instance one can take a further extension
of
(and hence of
) by taking the product of
with another probability space
of arbitrarily large cardinality.